SVN to GIT repository migration

July 28th, 2011 at 03:18pm dark

Last week we at NetGroup decided that it was time to move our SVN repositories to GIT. While I always use svn-git to interface with Subversion, a server-side move was long awaited.

Here follows a step-to-step migration history, loosely based on this article, that I found as the most useful resource on the Web among those suggested by Google. I publish the code here to help you migrate as well 😉

Preparation

First of all, I listed all the active SVN authors in the repository. From an updated SVN working copy, I created an authors.4git file:

svn log --xml | grep author | sort -u | sed 's:.*>\(.*\)<.*:\1 = :' > authors.4git

I filled the output file with actual username information, using the following syntax:

username = Name Surname <mail.address@example.com>
username2 = Name2 Surname2 <mail.address2@example.com>
(no author) = no author <(no author)>

The last line is important for those commits that lacked an author (from a even older CVS2SVN migration).

Fetch and convert branches and tags

I started to fetch the SVN repo using git-svn:

git svn clone ssh://path/to/old/repo/ --authors-file=authors.4git -s svn2git_tmpdir
cd svn2git_tmpdir

Some sources suggest to use --no-metadata when cloning, but I wanted to preserve SVN commit information in the GIT log: you’ll see later why.

Then I converted SVN tags to actual GIT tags:

for t in `git branch -r | grep 'tags/' | sed s_tags/__` ; do
	git tag $t tags/$t^
	git branch -d -r tags/$t
done

I removed most of the leftovers from git-svn:

git branch -d -r trunk
git config --remove-section svn-remote.svn
rm -rf .git/svn .git/{logs/,}refs/remotes/svn/

In the end, I converted the SVN branches, that git-svn created as remote GIT branches, into actual GIT local branches:

git config remote.origin.url .
git config --add remote.origin.fetch +refs/remotes/*:refs/heads/*
git fetch

Additional tweaks: history and commit messages rewriting

Since the migration is a one-time effort, before completion I wanted to fix some longstanding issues in our repo.

The first one was the removal of some useless commits: a couple of times someone managed to commit loads of useless files, forcing someone else to revert that commit. So, I found the SHA identifier of the oldest of those commits and launched (note the tilde sign at the end):

git rebase -i ${OLDESTBADCOMMIT}~

GIT fired up my $EDITOR and I removed all the useless commits (both the adding and the reverting ones), by deleting the appropriate lines in the configuration file.
The history rewriting was successful. I had to type a couple of times:

git reset
git rebase --continue

because some commits became empty ones and I wanted to discard them. The git rebase had to be run on all the branches that included those useless commits (hint: use git branch --contains ${BADCOMMIT}).

The second problem was the preservation of the SVN metadata inside the GIT commit logs. This was needed because some SVN commit messages referenced other SVN revision numbers, so I wanted to keep those references intact. On the other hand, the previous git rebase had mangled the GIT-committer property of the rewritten commits, by setting me as the committer, even if the author property was left intact (you can see the difference on an actual GIT repository using git log --format=fuller). Both issues can be solved with one command:

git filter-branch -d /tmp/tmpfs/gitrewrite \
	--msg-filter 'sed "/git-svn-id:/s;^.*/\([^/]*\)@\([0-9]*\).*$;Autoconverted from SVN (revision:\2, branch:\1);"' \
	--env-filter 'export GIT_COMMITTER_NAME="${GIT_AUTHOR_NAME}"; export GIT_COMMITTER_EMAIL="${GIT_AUTHOR_EMAIL}"; export GIT_COMMITTER_DATE="${GIT_AUTHOR_DATE}";' \
	-- --all

The -d switch ran the operation (that is said to be I/O intensive) on a tmpfs mount. You can skip it if the history is short enough, or if you don’t care about it.
The first filter edited the git-svn-id lines into a prettier format; the second one restored the GIT-committer properties that I had lost previously.
In the end, --all runs those filters on all the branches.

Push to the new repo

Before pushing to the new repo, I let GIT collect the garbage by pruning unreferenced objects:

git gc --aggressive --prune=now

And, finally, I completed the migration by pushing everything on a bare, empty repository that I had already set up:

git push --all ssh://path/to/new/repo.git
git push --tags ssh://path/to/new/repo.git

To complete the operations, I removed the temporary directory and refetched everything, just to be sure:

cd ..
rm -rf svn2git_tmpdir
git clone ssh://path/to/new/repo.git local_repo_dir

And voila, now you’re using GIT 🙂

Entry Filed under: English,Software

1 Comment Add your own

  • 1. Jesús  |  September 15th, 2012 at 21:12:41

    Hi there! I just wanted to say that this post is awesome, it helped me a lot. Thanks for all the detailed explanations!


Leave a Comment

Required

Required, hidden

Some HTML allowed:
<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>

Trackback this post  |  Subscribe to the comments via RSS Feed


Rate:  

Calendar

July 2011
M T W T F S S
« Jun   Sep »
 123
45678910
11121314151617
18192021222324
25262728293031

Badges

Most Recent Posts

What's played in the cave