<?xml version="1.0" encoding="utf-8"?>
<!-- name="generator" content="pyblosxom/1.4.3 01/10/2008" -->
<!DOCTYPE rss PUBLIC "-//Netscape Communications//DTD RSS 0.91//EN" "http://my.netscape.com/publish/formats/rss-0.91.dtd">

<rss version="0.91">
<channel>
<title>Random notes from mg</title>
<link>http://mg.pov.lt/blog</link>
<description>a blog by Marius Gedminas</description>
<webMaster>marius@gedmin.as</webMaster>
<language>en</language>
  <item>
    <title>Converting a gnarly SVN repository to GIT: success!</title>
    <link>http://mg.pov.lt/blog/eazysvn-git-migration-success.html</link>
    <pubDate>Mon, 20 Feb 2012 23:17 +0300</pubDate>
    <description>
<p>I've received more feedback about my last night's post on <a
href="http://mg.pov.lt/blog/eazysvn-git-migration.html">gnarly
svn to git migration</a> than I've expected.  Thanks to that feedback
(and, mostly, Raffaele Salmaso for doing almost all the work and emailing the
result to me) <a href="https://github.com/mgedmin/eazysvn">eazysvn is now on
  GitHub</a>.</p>

<p>The rest of this post will describe the conversion (and verification)
in detail, because if I ever need to do this again, I do not want to start
from scratch.</p>



<h4>Part one: unexpected gift</h4>

<p>Raffaele Salmaso did a heroic job and sent me a tarball with a mercurial
repository, produced with "something like" this:</p>

<blockquote><pre>
&gt; hg clone --layout single $SVN repo-tmp
&gt; hg convert --filemap filemap repo-tmp repo
&gt; cd repo
&gt; hg qinit
&gt; hg qimport -r 0:tip
&gt; hg qpop -a
&gt; cd .hg/patches
&gt; check patches for correctness
&gt; fix tags (svn are different from mercurial ones)
&gt; hg qfinish -a
</pre></blockquote>

<p>The conversion had only two problems:</p>

<ol>
  <li>it introduced new commits that modify a new file <tt>.htags</tt></li>
  <li>it changed the contents of README.txt in changeset 53 (corresponding to
      svn revision 55, "Allow branch names to have prefixes.") and newer
      versions:
  </li>
</ol>

<blockquote><pre>
--- svn version
+++ hg version
-  *scheme://server/path/to/svn/repo*/*subdirs*
+  *scheme://server/path/to/svn/repo*/trunk/*subdirs*
</pre></blockquote>

<p>
I did not notice either problem at first.
</p>

<h4>Part two: conversion to git</h4>

<p>
Note: I wrote this up <em>after</em> I've done everything up to and including
part five.  The commands and directory names here are not the actual commands
and directories I've used; although I tried to be accurate.  I've also skipped
some false trails.
</p>

<p>
Converting hg to git was pretty easy:
</p>

<blockquote><pre>
<span class="prompt">$</span> <span class="typing">mkdir -p /tmp/conv</span>
<span class="prompt">$</span> <span class="typing">cd /tmp/conv</span>
<span class="prompt">$</span> <span class="typing">tar xvjf eazysvn_20120220-133823.tar.bz2</span>
creates /tmp/conv/eazysvn/
<span class="prompt">$</span> <span class="typing">mkdir eazysvn-git</span>
<span class="prompt">$</span> <span class="typing">cd eazysvn-git</span>
<span class="prompt">$</span> <span class="typing">git init</span>
<span class="prompt">$</span> <span class="typing">hg-fast-export -r /tmp/conv/eazysvn</span>
converts, leaves no working tree; git status shows all files as deleted
<span class="prompt">$</span> <span class="typing">git checkout</span>
restore working tree
</pre></blockquote>

<p>I was a bit surprised by git status showing a bunch of deleted files at the
end there.  I suppose hg-fast-export expects to be run inside a bare
repository, or maybe it expects the user to know enough git to understand what
happened and do the <tt>git checkout</tt> if necessary.</p>

<h4>Part three: cleanup</h4>

<p>I wanted to drop the empty changesets that were introduced by Raffaele's
conversion process for modifying <tt>.hgtags</tt>.  Since the manual page for
git filter-branch had an example for dropping all empty changesets, I used it
directly.</p>

<blockquote><pre>
<span class="prompt">$</span> <span class="typing">git filter-branch --commit-filter 'git_commit_non_empty_tree "$@"'</span>
</pre></blockquote>

<p>This also dropped some changesets that were present in my Subversion
repository -- those that manipulated svn properties, and the one that moved
everything in svn root under /trunk.  I won't miss those.</p>

<p>Note: if you try</p>

<blockquote><pre>
<span class="prompt">$</span> <span class="typing">git filter-branch --commit-filter='git_commit_non_empty_tree "$@"'</span>
</pre></blockquote>

<p>(i.e. '<tt>=</tt>' instead of a space after <tt>--commit-filter</tt>), you will
get a completely baffling error message that doesn't even hint at what is
wrong.</p>

<p>At this point I ran <tt>gitk --all</tt> to look around and discovered that
git filter-branch left all the tags pointing to obsolete revisions.  I created
new tags manually with gitk, including some that were missing in my svn
repository.  Every release since 1.6.0 is now tagged (releases before that
did not have source tarballs on PyPI, so I had no way to verify which checkin
corresponded to which release).  I also changed the tag naming scheme to be
"v1.x.y" instead of just "1.x.y".</p>

<p>Oh, and to get rid of the old history I did</p>

<blockquote><pre>
<span class="prompt">$</span> <span class="typing">git tag -d 1.9.0 1.11.0 1.12.0 1.12.1</span>
<span class="prompt">$</span> <span class="typing">git gc --prune</span>
</pre></blockquote>

<h4>Part four: verification</h4>

<p>I downloaded all the available releases from PyPI:</p>

<blockquote><pre>
<span class="prompt">$</span> <span class="typing">mkdir -p /tmp/verify</span>
<span class="prompt">$</span> <span class="typing">cd /tmp/verify</span>
<span class="prompt">$</span> <span class="typing">for v in 1.6.0 1.6.1 1.7.0 1.8.0 1.9.0 1.10.0 1.11.0 1.12.0 1.12.1; do</span>
<span class="prompt">></span>   <span class="typing">wget http://pypi.python.org/packages/source/e/eazysvn/eazysvn-$v.tar.gz</span>
<span class="prompt">></span>   <span class="typing">tar xvzf eazysvn-$v.tar.gz</span>
<span class="prompt">></span> <span class="typing">done</span>
</pre></blockquote>

<p>Exported all of my git tags:</p>

<blockquote><pre>
<span class="prompt">$</span> <span class="typing">mkdir -p /tmp/verify/git</span>
<span class="prompt">$</span> <span class="typing">cd /tmp/conv/eazysvn-git</span>
<span class="prompt">$</span> <span class="typing">for v in 1.6.0 1.6.1 1.7.0 1.8.0 1.9.0 1.10.0 1.11.0 1.12.0 1.12.1; do</span>
<span class="prompt">></span>   <span class="typing">git archive --format=tar --prefix eazysvn-$v/ v$v | gzip \</span>
<span class="prompt">></span>       <span class="typing">> /tmp/verify/git/eazysvn-$v-git.tar.gz</span>
<span class="prompt">></span> <span class="typing">done</span>
<span class="prompt">$</span> <span class="typing">cd /tmp/verify/git</span>
<span class="prompt">$</span> <span class="typing">for v in 1.6.0 1.6.1 1.7.0 1.8.0 1.9.0 1.10.0 1.11.0 1.12.0 1.12.1; do</span>
<span class="prompt">></span>   <span class="typing">tar xvzf eazysvn-$v-git.tar.gz</span>
<span class="prompt">></span> <span class="typing">done</span>
</pre></blockquote>

<p>And compared them:</p>

<blockquote><pre>
<span class="prompt">$</span> <span class="typing">diff -ur /tmp/verify /tmp/verify/git</span>
</pre></blockquote>

<p>I expected to see "Only in dir1: setup.cfg" messages only for things like
'eazysvn.egg-info' or 'PKG-INFO'.  Unfortunately this is where actual
differences I mentioned in part one showed up: in README.txt for all trees
starting with release 1.9.0.</p>

<h4>Part five: rectification</h4>

<p>I needed to rewrite history again, but I didn't want to use git
filter-branch this time (I didn't want to manually tag all the releases
again).  I tried fast-export:</p>

<blockquote><pre>
<span class="prompt">$</span> <span class="typing">cd /tmp/conv/eazysvn-git</span>
<span class="prompt">$</span> <span class="typing">git fast-export --all > ../EXPORT.txt</span>
<span class="prompt">$</span> <span class="typing">vim ../EXPORT.txt</span>
search and replace 'repo*/*subdirs*' with 'repo*/trunk/*subdirs*'
fix up the file size above each change (increment by 6, the length of 'trunk/')
<span class="prompt">$</span> <span class="typing">mkdir /tmp/conv/eazysvn-git2</span>
<span class="prompt">$</span> <span class="typing">cd /tmp/conv/eazysvn-git2</span>
<span class="prompt">$</span> <span class="typing">git init</span>
<span class="prompt">$</span> <span class="typing">git fast-import &lt; ../EXPORT.txt</span>
succeeds, leaves no working tree; git status shows all files as deleted
<span class="prompt">$</span> <span class="typing">git checkout</span>
restore working tree
<span class="prompt">$</span> <span class="typing">gitk --all</span>
</pre></blockquote>

<p>Everything looked about right.</p>


<h4>Part six: final verification</h4>

<p>But was it actually right?</p>

<blockquote><pre>
<span class="prompt">$</span> <span class="typing">cd /tmp/verify/</span>
<span class="prompt">$</span> <span class="typing">mkdir pypi</span>
<span class="prompt">$</span> <span class="typing">mv eazysvn-*/ pypi/</span>
<span class="prompt">$</span> <span class="typing">rm -rf git/*</span>
<span class="prompt">$</span> <span class="typing">cd /tmp/conv/eazysvn-git2</span>
<span class="prompt">$</span> <span class="typing">for v in 1.6.0 1.6.1 1.7.0 1.8.0 1.9.0 1.10.0 1.11.0 1.12.0 1.12.1; do</span>
<span class="prompt">></span>   <span class="typing">git archive --format=tar --prefix eazysvn-$v/ v$v \</span>
<span class="prompt">></span>       <span class="typing">| (cd /tmp/verify/git && tar -xf - )</span>
<span class="prompt">></span> <span class="typing">done</span>
<span class="prompt">$</span> <span class="typing">cd /tmp/verify</span>
<span class="prompt">$</span> <span class="typing">diff -ur pypi git</span>
</pre></blockquote>

<p>Yes!</p>

<h4>Part seven: uploading to GitHub</h4>

<p>It was all plain sailing from here:</p>

<blockquote><pre>
<span class="prompt">$</span> <span class="typing">cd /tmp/conv/eazysvn-git2</span>
<span class="prompt">$</span> <span class="typing">git remote add origin git@github.com:mgedmin/eazysvn.git</span>
<span class="prompt">$</span> <span class="typing">git push -u origin master</span>
<span class="prompt">$</span> <span class="typing">git push --tags</span>
</pre></blockquote>

<p>And then there were documentation updates (to point to GitHub instead of the
old Subversion repository), Makefile updates (<tt>make release</tt> makes sure
my sdist contains everything I have in my repository, because I've been bitten
by setuptools magic before), etc.</p>

<p>I also released <a href="http://pypi.python.org/pypi/eazysvn">eazysvn
  1.12.2</a> to PyPI to test my Makefile changes, and because there were
unreleased changes that should've been released a long time ago.</p>

<p>So that's it.  Only took me three hours from the point where I found a
Mercurial repository in my inbox.</p>
</description>
  </item>
   </channel>
</rss>
