Random notes from mg

a blog by Marius Gedminas

Marius is a Python hacker. He works for Programmers of Vilnius, a small Python/Zope 3 startup. He has a personal home page at http://gedmin.as. His email is marius@gedmin.as. He does not like spam, but is not afraid of it.

Mon, 20 Feb 2012

Converting a gnarly SVN repository to GIT: FAIL.

eazysvn lives in a Subversion repository. I want to bring it (kicking and screaming) into the 21st century and put it on Github.

git-svn is unsuitable for the conversion, because in revision 50 I moved / to /trunk and added the traditional /tags and /branches. With git-svn I either get one third of the history that ignores everything before the layout switch, or I get directories named 'trunk', 'tags' and 'branches'.

Then I thought maybe hg would be smarter about the conversion, and then I could use hg-fast-export to convert hg to git. I enabled the hgsubversion extension:

$ sudo apt-get install hgsubversion
$ echo '[extensions'] >> ~/.hgrc
$ echo 'hgsubversion =' >> ~/.hgrc

(blog posts like this are a good reason why hg ought to steal the 'git config --global foo.bar=baz' syntax from git).

Then I converted the svn repository to Mercurial:

$ hg clone svn+ssh://fridge/home/mg/svn/eazysvn eazysvn-hg

hg log -p confirmed that hg handled the conversion nicely looked right-ish at first glance, except the author information was nonsensical. To fix that:

$ rm -rf eazysvn-hg
$ echo 'mg = Marius Gedminas <marius@gedmin.as>' > AUTHORS
$ hg clone svn+ssh://fridge/home/mg/svn/eazysvn eazysvn-hg -A AUTHORS

Unfortunately, a closer look at hg log now shows that two thirds of the history is lost: hg ignored everything before the layout restructuring, despite printing the log messages of those revisions as it went about the conversion. *sigh*.

Dear lazyweb, surely svn layout reorganization can't be such a rare thing that no tools in existence support it? What should I try next?

P.S. I also tried Bazaar, to see what it would do:

$ bzr branch svn+ssh://fridge/home/mg/svn/eazysvn eazysvn-bzr
Repository with UUID 4fc293c4-4eed-0310-a01a-b4ad72f90fad at svn+ssh://fridge/home/mg/svn/eazysvn contains fewer revisions than cache. This either means that this repository contains an out of date mirror of another repository (harmless), or that the UUID is being used for two different Subversion repositories (potential repository corruption).
bzr: ERROR: exceptions.KeyError: 'missing revision paths for 78'

Traceback (most recent call last):
  File "/usr/lib/python2.7/dist-packages/bzrlib/commands.py", line 946, in exception_to_return_code
    return the_callable(*args, **kwargs)
  File "/usr/lib/python2.7/dist-packages/bzrlib/commands.py", line 1150, in run_bzr
    ret = run(*run_argv)
  File "/usr/lib/python2.7/dist-packages/bzrlib/commands.py", line 699, in run_argv_aliases
    return self.run(**all_cmd_args)
  File "/usr/lib/python2.7/dist-packages/bzrlib/commands.py", line 721, in run
    return self._operation.run_simple(*args, **kwargs)
  File "/usr/lib/python2.7/dist-packages/bzrlib/cleanup.py", line 135, in run_simple
    self.cleanups, self.func, *args, **kwargs)
  File "/usr/lib/python2.7/dist-packages/bzrlib/cleanup.py", line 165, in _do_with_cleanups
    result = func(*args, **kwargs)
  File "/usr/lib/python2.7/dist-packages/bzrlib/builtins.py", line 1263, in run
    from_location)
  File "/usr/lib/python2.7/dist-packages/bzrlib/bzrdir.py", line 919, in open_tree_or_branch
    return bzrdir._get_tree_branch()
  File "/usr/lib/python2.7/dist-packages/bzrlib/controldir.py", line 410, in _get_tree_branch
    branch = self.open_branch(name=name)
  File "/usr/lib/python2.7/dist-packages/bzrlib/plugins/svn/remote.py", line 420, in open_branch
    branch_path = self._determine_relpath(name)
  File "/usr/lib/python2.7/dist-packages/bzrlib/plugins/svn/remote.py", line 369, in _determine_relpath
    layout = repos.get_layout()
  File "/usr/lib/python2.7/dist-packages/bzrlib/plugins/svn/repository.py", line 701, in get_layout
    return self.get_layout_source()[0]
  File "/usr/lib/python2.7/dist-packages/bzrlib/plugins/svn/repository.py", line 720, in get_layout_source
    self._find_guessed_layout(self.get_config())
  File "/usr/lib/python2.7/dist-packages/bzrlib/plugins/svn/repository.py", line 743, in _find_guessed_layout
    revnum, self._hinted_branch_path)
  File "/usr/lib/python2.7/dist-packages/bzrlib/plugins/svn/layout/guess.py", line 143, in repository_guess_layout
    return logwalker_guess_layout(repository._log, revnum, branch_path=branch_path)
  File "/usr/lib/python2.7/dist-packages/bzrlib/plugins/svn/layout/guess.py", line 149, in logwalker_guess_layout
    logwalker.iter_changes(None, revnum, max(0, revnum-GUESS_SAMPLE_SIZE)), revnum, branch_path)
  File "/usr/lib/python2.7/dist-packages/bzrlib/plugins/svn/layout/guess.py", line 104, in guess_layout_from_history
    for (revpaths, revnum, revprops) in changed_paths:
  File "/usr/lib/python2.7/dist-packages/bzrlib/plugins/svn/logwalker.py", line 60, in iter_all_changes
    revpaths = get_revision_paths(revnum)
  File "/usr/lib/python2.7/dist-packages/bzrlib/plugins/svn/logwalker.py", line 295, in get_revision_paths
    return self.cache.get_revision_paths(revnum)
  File "/usr/lib/python2.7/dist-packages/bzrlib/plugins/svn/cache/tdbcache.py", line 187, in get_revision_paths
    raise KeyError("missing revision paths for %d" % revnum)
KeyError: 'missing revision paths for 78'

You can report this problem to Bazaar's developers by running
    apport-bug /var/crash/bzr.1000.2012-02-19T22:12.crash
if a bug-reporting window does not automatically appear.

posted at 00:13 | tags: , | permanent link to this entry | 18 comments
posted by Marius Gedminas at Mon Feb 20 00:19:22 2012
If anybody wants to give it a try, the eazysvn repository is publicly available at http://mg.pov.lt/eazysvn/svn/

I only used the svn+ssh URL because I copied and pasted it out of 'svn info' output in my existing writable checkout.
posted by Marius Gedminas at Mon Feb 20 00:21:39 2012
svn2git is based on git-svn and fails in the same way as git-svn.
posted by Marius Gedminas at Mon Feb 20 00:36:06 2012
What you need to do is convert your SVN repository (with git-svn) both ways to two different git repositories. Once you have that, you can fetch the commits from both into a single repository, modify commit A of the post-layout branch to have commit Z of the pre-layout branch as its parent.

I did something like that to convert a repository that had once been on CVS, and then later moved without history to SVN, to a Git repository that had the full history of both in a single branch.

It's been a while since I've done this, so I honestly don't remember the details, but I know that git filter-branch --parent-filter is the direction you want to look in.
posted by Joseph Spiros at Mon Feb 20 05:06:19 2012
Hello,

> Dear lazyweb, surely svn layout reorganization can't be such a rare thing that no tools in existence support it?

Layout reorganization is that rare thing.

I'm working on subgit http://subgit.com you can try it. But, unfortunately, you also won't get all the repository history. To my best knowledge there is no tool to import such history completely.

And there is a good explanation for that:
In 99,9% of cases layout reorganization happens on the very first stage in the history. Users don't bother to get rid of 100 of 50 000 revisions.
posted by Simon at Mon Feb 20 05:32:26 2012
(I thought I posted this already, but upon page refresh my comment disappeared. So, I'm guessing I pressed "Preview" and not "Submit". If you deleted my comment, I apologize for submitting it a second time.)

A solution to your problem is to use git-svn to convert your repository twice, to two separate repositories, once using the old layout, and once using the new one. Then, fetch the commits from both repositories into a new repository (or from one to the other), and reparent the first commit of the new-layout branch onto the last (usable) commit of the old-layout branch, creating an unbroken chain for the trunk. You'll then need to rewrite all of the tags and branches that were imported as well.

I did something similar a couple years ago to join old CVS repositories with newer SVN repositories into singular Git repositories for many of my projects (reparenting the first SVN commits onto the last CVS commits after importing both using git-cvs and git-svn respectively). Unfortunately, I don't remember the exact details/commands, but you should be on the right track if you look at git filter-branch --parent-filter for rewriting the parent of commits. You might also want to look at a newer concept called "grafts". I don't know anything about them, since I did this years ago, before they existed, but it sounds like a shortcut for accomplishing the same thing.
posted by Joseph Spiros at Mon Feb 20 09:28:42 2012
See http://techbase.kde.org/Projects/MoveToGit/UsingSvn2Git thats what KDE used. Its the most extensive conversion tool I know and no it does not use svn-git. It also may require some more in-depth knowledge about svn and git to create the correct conversion rules, but it does allow movement of projects inside a subversion tree and it also allows to ignore certain revisions on certain sub-trees.
posted by Andreas at Mon Feb 20 10:40:44 2012
ESR struggles to have users for such pathologic cases for his "reposurgeon" [1] project. Have you tried it already?

[1] http://www.catb.org/~esr/reposurgeon/reposurgeon.html
posted by Jan Wicijowski at Mon Feb 20 11:26:10 2012

# set up the initial checkout
mkdir easysvn
cd easysvn
git svn init http://mg.pov.lt/eazysvn/svn/
git svn fetch -r 1:49
# rename the remote svn branch to be equivalent to trunk
find .git -name git-svn | xargs rename 's/git-svn/trunk/'
# clear out the remote fetch setting so we can re-fetch from the new location
git config --unset-all svn-remote.svn.fetch
# re-init to use trunk, branches, tags, then fetch the rest
git svn init -s http://mg.pov.lt/eazysvn/svn/
git svn fetch -r 50:78
# clean up the fact that master points at the old git-svn location, by overwriting it with trunk
git checkout -b trunk remotes/trunk
git branch -M master
posted by David Fraser at Mon Feb 20 12:41:13 2012
(blog posts like this are a good reason why hg ought to steal the 'git config --global foo.bar=baz' syntax from git).

hg already have that.

hg --config extensions.hgsubversion= clone svn+ssh://fridge/home/mg/svn/eazysvn eazysvn-hg
posted by Ghislain Hivon at Mon Feb 20 16:42:51 2012
Are we talking about the same svn2git? KDE has one which they coded for their mass-conversion, which must have encountered a lot of the gnarly cases. Here's how to write a svn2git conffile on an example.
posted by Tobu at Mon Feb 20 18:52:13 2012
Thank you, everyone!

And especially thanks to Raffaele Salmaso, who sent me the eazysvn repository converted to Mercurial using some filemap magic and manual fixups.

I'll push it to Github when I figure out hg-fast-export and clean up the empty git changesets that modified .hgtags in the intermediate repository with rebase.

@Jan: no, I hadn't heard about reposurgeon.

@David: amazing black magic, but a closer look shows me it gets the tags wrong.  Of course I can just discard those and retag manually, there aren't that many of them.

@Ghislain: that syntax sets the config option for one command invocation, not permanently, unlike git config.

@Tobu: no, I tried https://github.com/nirvdrum/svn2git

Again, thank you, everyone!
posted by Marius Gedminas at Mon Feb 20 20:15:13 2012
Another very interesting tool that was pointed out to me is svn-all-fast-export, also packaged in Debian/Ubuntu.
posted by Marius Gedminas at Wed Feb 22 09:58:29 2012
So, svn-all-fast-export works.  The user interface is atrocious (instead of error messages you get core dumps), though.  Here's a short howto:

1. scp the svn repository to your local machine (or clone it with svnsync, I suppose).  Let's call it 'profilehooks'.

2. Make sure it is named differently from your desired destination repository names or you'll get a core dump!  So 'mv profilehooks profilehooks.svn'.

3. Create an authors file in git-svn format (svnusername = Real Name <email@example.com>)

4. Create a rules file like this:

  create repository profilehooks
  end repository

  match /trunk/
  repository profilehooks
  branch master
  end match

  match /
  repository profilehooks
  branch master
  end match

5. Run 'svn-all-fast-export --identity-map=authors --rules=rules --stats profilehooks.svn'

You now have a bare git repo in ./profilehooks.  Enjoy.
posted by Marius Gedminas at Tue Jun 5 00:20:19 2012
Note: in the above I had a repository with a layout reorganization partway through, but without any actual branches or tags.  If you have some, write more rules following the examples in /usr/share/doc/svn-all-fast-export/samples/.

See also: http://gitorious.org/svn2git (the home of svn-all-fast-export, apparently) and http://techbase.kde.org/Projects/MoveToGit/UsingSvn2Git (mentioned in the man page of svn-all-fast-export).
posted by Marius Gedminas at Tue Jun 5 00:23:14 2012
And one more tool, not based on git-svn: http://subgit.com
posted by dear guest at Wed Jun 20 02:39:37 2012
Interesting that you want to convert the source for a subversion utility to git :)
posted by GM at Thu Jan 10 17:12:24 2013
The irony did not escape me. :-)
posted by Marius Gedminas at Thu Jan 10 22:19:01 2013

Name (required)


E-mail (will not be shown)


URL


Comment (some HTML allowed)