I've received more feedback about my last night's post on gnarly svn to git migration than I've expected. Thanks to that feedback (and, mostly, Raffaele Salmaso for doing almost all the work and emailing the result to me) eazysvn is now on GitHub.

The rest of this post will describe the conversion (and verification) in detail, because if I ever need to do this again, I do not want to start from scratch.

Part one: unexpected gift

Raffaele Salmaso did a heroic job and sent me a tarball with a mercurial repository, produced with "something like" this:

> hg clone --layout single $SVN repo-tmp
> hg convert --filemap filemap repo-tmp repo
> cd repo
> hg qinit
> hg qimport -r 0:tip
> hg qpop -a
> cd .hg/patches
> check patches for correctness
> fix tags (svn are different from mercurial ones)
> hg qfinish -a

The conversion had only two problems:

  1. it introduced new commits that modify a new file .htags
  2. it changed the contents of README.txt in changeset 53 (corresponding to svn revision 55, "Allow branch names to have prefixes.") and newer versions:
--- svn version
+++ hg version
-  *scheme://server/path/to/svn/repo*/*subdirs*
+  *scheme://server/path/to/svn/repo*/trunk/*subdirs*

I did not notice either problem at first.

Part two: conversion to git

Note: I wrote this up after I've done everything up to and including part five. The commands and directory names here are not the actual commands and directories I've used; although I tried to be accurate. I've also skipped some false trails.

Converting hg to git was pretty easy:

$ mkdir -p /tmp/conv
$ cd /tmp/conv
$ tar xvjf eazysvn_20120220-133823.tar.bz2
creates /tmp/conv/eazysvn/
$ mkdir eazysvn-git
$ cd eazysvn-git
$ git init
$ hg-fast-export -r /tmp/conv/eazysvn
converts, leaves no working tree; git status shows all files as deleted
$ git checkout
restore working tree

I was a bit surprised by git status showing a bunch of deleted files at the end there. I suppose hg-fast-export expects to be run inside a bare repository, or maybe it expects the user to know enough git to understand what happened and do the git checkout if necessary.

Part three: cleanup

I wanted to drop the empty changesets that were introduced by Raffaele's conversion process for modifying .hgtags. Since the manual page for git filter-branch had an example for dropping all empty changesets, I used it directly.

$ git filter-branch --commit-filter 'git_commit_non_empty_tree "$@"'

This also dropped some changesets that were present in my Subversion repository -- those that manipulated svn properties, and the one that moved everything in svn root under /trunk. I won't miss those.

Note: if you try

$ git filter-branch --commit-filter='git_commit_non_empty_tree "$@"'

(i.e. '=' instead of a space after --commit-filter), you will get a completely baffling error message that doesn't even hint at what is wrong.

At this point I ran gitk --all to look around and discovered that git filter-branch left all the tags pointing to obsolete revisions. I created new tags manually with gitk, including some that were missing in my svn repository. Every release since 1.6.0 is now tagged (releases before that did not have source tarballs on PyPI, so I had no way to verify which checkin corresponded to which release). I also changed the tag naming scheme to be "v1.x.y" instead of just "1.x.y".

Oh, and to get rid of the old history I did

$ git tag -d 1.9.0 1.11.0 1.12.0 1.12.1
$ git gc --prune

Part four: verification

I downloaded all the available releases from PyPI:

$ mkdir -p /tmp/verify
$ cd /tmp/verify
$ for v in 1.6.0 1.6.1 1.7.0 1.8.0 1.9.0 1.10.0 1.11.0 1.12.0 1.12.1; do
>   wget http://pypi.python.org/packages/source/e/eazysvn/eazysvn-$v.tar.gz
>   tar xvzf eazysvn-$v.tar.gz
> done

Exported all of my git tags:

$ mkdir -p /tmp/verify/git
$ cd /tmp/conv/eazysvn-git
$ for v in 1.6.0 1.6.1 1.7.0 1.8.0 1.9.0 1.10.0 1.11.0 1.12.0 1.12.1; do
>   git archive --format=tar --prefix eazysvn-$v/ v$v | gzip \
>       > /tmp/verify/git/eazysvn-$v-git.tar.gz
> done
$ cd /tmp/verify/git
$ for v in 1.6.0 1.6.1 1.7.0 1.8.0 1.9.0 1.10.0 1.11.0 1.12.0 1.12.1; do
>   tar xvzf eazysvn-$v-git.tar.gz
> done

And compared them:

$ diff -ur /tmp/verify /tmp/verify/git

I expected to see "Only in dir1: setup.cfg" messages only for things like 'eazysvn.egg-info' or 'PKG-INFO'. Unfortunately this is where actual differences I mentioned in part one showed up: in README.txt for all trees starting with release 1.9.0.

Part five: rectification

I needed to rewrite history again, but I didn't want to use git filter-branch this time (I didn't want to manually tag all the releases again). I tried fast-export:

$ cd /tmp/conv/eazysvn-git
$ git fast-export --all > ../EXPORT.txt
$ vim ../EXPORT.txt
search and replace 'repo*/*subdirs*' with 'repo*/trunk/*subdirs*'
fix up the file size above each change (increment by 6, the length of 'trunk/')
$ mkdir /tmp/conv/eazysvn-git2
$ cd /tmp/conv/eazysvn-git2
$ git init
$ git fast-import < ../EXPORT.txt
succeeds, leaves no working tree; git status shows all files as deleted
$ git checkout
restore working tree
$ gitk --all

Everything looked about right.

Part six: final verification

But was it actually right?

$ cd /tmp/verify/
$ mkdir pypi
$ mv eazysvn-*/ pypi/
$ rm -rf git/*
$ cd /tmp/conv/eazysvn-git2
$ for v in 1.6.0 1.6.1 1.7.0 1.8.0 1.9.0 1.10.0 1.11.0 1.12.0 1.12.1; do
>   git archive --format=tar --prefix eazysvn-$v/ v$v \
>       | (cd /tmp/verify/git && tar -xf - )
> done
$ cd /tmp/verify
$ diff -ur pypi git

Yes!

Part seven: uploading to GitHub

It was all plain sailing from here:

$ cd /tmp/conv/eazysvn-git2
$ git remote add origin git@github.com:mgedmin/eazysvn.git
$ git push -u origin master
$ git push --tags

And then there were documentation updates (to point to GitHub instead of the old Subversion repository), Makefile updates (make release makes sure my sdist contains everything I have in my repository, because I've been bitten by setuptools magic before), etc.

I also released eazysvn 1.12.2 to PyPI to test my Makefile changes, and because there were unreleased changes that should've been released a long time ago.

So that's it. Only took me three hours from the point where I found a Mercurial repository in my inbox.