Migrating Projects from SVN to Git
PLEASE NOTE:
New projects should use
gitlab.physics
instead (please see the separate Article on that subject);
This Article assumes the reader is familiar with svn
, and with using git
on one's own system. Day-to-day use of git
in general, and setting-up and tweaking one's own gitlab.physics
account and projects, are documented elsewhere. We also assume that the scm.physics
project to be copied has no branches or tags, or that they're used only in the svn
-approved manner. (If only the directory trunk
in your project on scm.physics
is non-empty, you aren't using branches or tags.) If in doubt, please contact itsupport (at) physics.
Please see also:
- Git for Beginners
- Gitlab Basics on
gitlab.physics
(I find this doesn't require you to log in)
- Migrating from svn on
gitlab.physics
- Migrating from SVN to Git, preserving branches and tags (from which much of the following is shamelessly stolen)
- Git Projects with Multiple Contributors
- The collapsed sections below, with extemely preliminary suggestions for splitting metaprojects and for excluding files and directories.
Executive Summary
The basic difference between git
and svn
is that an svn
checkout relies on an upstream svn
server (eg scm.physics
) for its history, and retains only a cached copy of the last checkout on your system; in contrast, every git
clone is a complete single-project repository in its own right, with full history and all commit attributions, which can optionally use a git
server (eg gitlab.physics
) as a remote for collaborative and/or backup purposes.
The fundamental process of migrating a project is:
- Use
git svn
(thesvn
subcommand in thegit
suite) to clone the project from thesvn
server to a directory on your local system.
- Handle various differences between
svn
andgit
.
- Push the resulting
git
project from your local system to thegit
server.
Please note that this doesn't require privileged access to either server at any stage.
Creating a local git
clone
The following will create a git
clone in the directory base-directory/project-name
. This must be entirely separate from any existing svn
checkouts, unless you enjoy disentangling wet spaghetti.
Please resist the urge to prune the
svn
project first. The cloning process copies the entire svn history by default; removing large and/or redundant branches in the obvious manner only makes their tips inaccessible tosvn
, without removing the history. We'll get back to this in the collapsible section below about excludingsvn
branches.
- Set up a new empty project on
gitlab.physics
, ready to receive your files, as documented elsewhere. Resist the temptation to populate it. - Ensure you and your collaborators have all committed any pending updates to the
svn
server. (It is possible to usegit svn
to help keepsvn
andgit
versions of a project in sync, but doing so is decidedly awkward, and has not been tested for compatibility with the suggestions in this Article.)
- Prepare a translation file to map your
scm.physics
identity, and those of any collaborators, to those forgitlab.physics
, of the form:
yourname = Your Name <your.name@physics.ox.ac.uk>
collab1 = Collaborator One <first.collab@physics.ox.ac.uk>
collab2 = Collaborator Two <second.collab@physics.ox.ac.uk>
root = Your Name <your.name@physics.ox.ac.uk>
.... where you should replace yourname by your identifier on
scm.physics
, and collab1 and collab2 are likewise thescm
identities of your collaborators. (The last line mapsroot
, your secret collaborator in whose namegforge
created your project, to yourself. It's your project, right?) Put this file, sayauthors-transform.txt
, inbase-directory
.This is the most tedious part of the exercise. Code to help you with this is believed to be in preparation, but in the meantime, saying the following may help to remind you of your collaborators'
scm.physics
identities:cd svn-checkout-directory svn log --xml | grep author | sort -u | \ perl -pe 's/.*>(.*?)<.*/$1 = $1_streetname <$1_email>/' | \ tee ../authors-transform.txt
.... (where I've had to use backslash-escaping to fold the second line, twice) will produce something of the form:
yourname = yourname_streetname <yourname_email>
collab1 = collab1_streetname <collab1_email>
collab2 = collab2_streetname <collab2_email>
root = root_streetname <root_email>
.... ready for you to manually modify to match reality, including assuming
root
's mantle. (Remember yourname etc are placeholders, not keywords.)Please note:
- You can't miss anybody out of this list. Failure to mention any contributors will yield mysterious complaints later, when
git svn
finds it doesn't know to whom to attribute their commits. - If you have commits from long-gone contributers, it's possible to give them
fakedummy identities instead (egold-contributor@example.org
), or map them onto someone else (as we did withroot
above). It's up to you to decide whether (fromgit
's point of view) to give your old contributors unusable aliases, or tosteal their contributionspretend they didn't exist; either way,git svn
(andgitlab
thereafter) will happily go along with the changes. Your humble Author's instinct is to use originators' then-current names for attributing their contributions, even if they are no longer contributing members, to keep history straight.
- We are setting up identities in git at this point; giving people access to the repo in gitlab is a matter of setting up login identities on the gitlab server, which is an entirely different matter. Were it otherwise, Linus Torvalds would have a semi-infinite number of people who could directly log into his personal git repository, one for every contributor to the Linux kernel. (Apologies for belabouring this distinction, but it's not always immediately obvious, and has been known to bend the brain.)
- You can't miss anybody out of this list. Failure to mention any contributors will yield mysterious complaints later, when
- Use
git svn
to create a fresh clone of your existing project, complete with full multi-collaborator history, in a new subdirectory on your own system.
cd base-directory git svn clone \ -A authors-transform.txt \ --username yourname \ --stdlayout \ --prefix="svn/" \ https://scm.physics.ox.ac.uk/svn/yourproject \ project-name \ |& tee -a yourproject.clone.out
Please note:
- This assumes that your
svn
project has the standardtrunk
,tags
andbranches
layout. Please see the collapsed section below for advice about nonstandard layouts before continuing, to understand what this means, and what to do about it if necessary. - If your
svn
project has some unwontedly large files in it which should never have been committed, please see the collapsed sections below about excluding files and directories or (more drastically) excluding branches before continuing.
- The second line has again had to be folded; the trailing "tee into a file" pipework is part of it, and uses the correct syntax for both
bash
andtcsh
(for completeness,bash
also accepts the Bourne-shell standard form "2>&1
", buttcsh
is less forgiving).
- This assumes that your
- Wait a wee while: first
git svn clone
thinks for a bit, then it patiently checks out everysvn
commit on all branches and tags, cross-compares them to reconstruct the commit and branching history, lodges the corresponding sets of references in thegit
clone it's creating in the directoryproject-name
, and tells you what it's doing in loving detail. This is agreed to be nerve-racking. For extra confidence, page throughyourproject.clone.out
afterwards, eg usingless
, to check quite what happened.
Warning: The above procedure has been seen (twice) to stress OS X enough for
Terminal
to lock up. If this happens, closeTerminal
and restart it. We can check at the end whether the clone was fully successful; if not, we can always start again, possibly instructinggit svn clone
to ignore huge binary blobs, as suggested below in the collapsible section on excluding files.More information as we discover it.
Nonstandard layouts
Not every
svn
project has the standard trunk/branches/tags layout. I shall use as my example a metaproject, encompassing multiple related-but-independent subprojects, where each subproject appears alongside trunk in the svn filestructure, like this:branches proj1 proj1a proj2 proj-libs proj-other tags trunk
You can produce a single git project from this by omitting
--stdlayout
in thegit svn clone
incantation above. However, git is intended for single projects (eg the Linux kernel); metaprojects of the above form tend to be unwieldy, and may be too large for git or gitlab to cope with unaided. (On the one occasion I've tried that, macOS went to lunch for some hours after the clone command had nominally finished, as git decided to garbage-collect the result in the background.)Happily, many such projects tend to have the "embarrassingly parallel" nature, where all that's in common is a set of libraries or build tools, or possibly not even that. If, after discussion with your svn collaborators (and with Central Physics), you instead agree to split up the metaproject, you can proceed by replacing the
--stdlayout
argument togit svn clone
with:--trunk=proj1 \
for the first subproject,
--trunk=proj1a \
for the second, and so on. (Don't forget to add the trailing backslash if you've had to fold the line, and don't forget to use a different filename for the logfile each time. As you'll be doing all this repeatedly, now's the time to brush up on your shell-scripting skills.) Upshot: you will end up with N independent
git
projects, which of course works best if the projects really are effectively freestanding. You'll then, for example, be able to check outproj2
andproj-libs
side by side, and use the latter to support the former.Excluding files and directories
One can exclude files from an svn project by setting the svn:ignore property. Sadly, this has to be done for each directory separately, and it's all too easy to forget to set it, then inadvertently add some inappropriate (and possibly massive) files or subdirectories to the project.
If you've fallen victim to this, you can prevent this propagating into the git repo you're about to produce by adding one or more options of the following form. Please note: the files will be transferred from the svn server, then discarded at the receiving end. (Please see the next collapsed section for a better way to exclude entire branches, which doesn't have this drawback, if that's more appropriate.)
--ignore-paths='sdf$' \ --ignore-paths='/tempdir/' \
.... including the trailing backslashes, if you've had to fold the line.
- The first excludes all files whose entire pathname ends in '
sdf
'; the equivalent for excludingDLL
s is left as an exercise. Don't forget the trailing dollar sign: this is a regular expression, not a shell wildcard. - The second excludes a directory named
tempdir
anywhere in the project. You may in practice need to be more specific by adding the directory's parent, and perhaps grandparent. If in doubt, check the git repo afterwards, and if necessary adjust the regular expression and repeat.
- .... and don't forget to add a trailing backslash to each clause if you've had to fold the line.
While you're thinking about it, to make sure such accidents don't happen in your
git
repo, now's the time to make a note to add one or more corresponding lines to.gitignore
(now using shell wildcarding syntax), which in this case would be:*.sdf
tempdir/
Excluding
svn
branchesIt is entirely possible you may wish to not copy certain large and/or redundant branches from
svn
. Attempts to exclude them by pathnames (as in the previous section) will work, but the undesired files will be transferred from thesvn
server anyway, then discarded at the receiving end. Upshot:git svn clone
produces a bunch of empty commits, and takes just as long to do so as if they weren't empty.The answer to this comes in two parts: transfer only the trunk, then add the desired set of branches to the
git
configuration, and fetch their contents.- Replace the
--stdlayout
line in thegit svn clone
incantation by:--trunk=trunk \ --tags=tags \
This omits all branches from the cloning, but copies over the tags (feel free to drop the second line if you're not bothered about those).
- Now do the cloning, as amended, then add the desired branches to the configuration, by something of the form:
cd project-name git config --local git config svn-remote.svn.branches \ 'branches/{b1,b2,b3}:refs/remotes/svn/*' git config --local
.... where I've had to fold the third line at the backslash. Adjust the comma-separated list of branch names to taste.
- If you're satisfied with the result, then:
git svn fetch
.... to bring over the desired branches. This may take a wee while, but should be quicker than cloning the entire project.
- The first excludes all files whose entire pathname ends in '
- When the music stops (and with the above caveats):
cd project-name
.... and give the new
git
clone of your project a quick lookover: the directory contents should be the same as there would be in a fresh checkout of the trunk of your latestsvn
commit (apart from the respective sacred directories.svn/
and.git/
). You can take the opportunity to clean it up a little, eg by adding or updating aREADME
file in the base directory, and perhaps a.gitignore
file as mentioned above (but see below about producing a.gitignore
file fromsvn:ignore
properties). We suggest, though, that you resist the temptation to do major surgery, at least until you've pushed your project to thegit
server.To see what git thinks is there, say:
git branch -a
This should show you something of the form:
* master remotes/svn/my_first_branch remotes/svn/tags/release_1.0 remotes/svn/trunk
In this,
master
is a local branch corresponding totrunk
which has been checked out for you, and branches and tags are named as themselves; the distinction betweensvn
branches and tags should be clear. Note, by the way, the beneficial effect of the--prefix="svn/"
argument: we would otherwise see:* master remotes/origin/my_first_branch remotes/origin/tags/release_1.0 remotes/origin/trunk
.... which is likely to lead to confusion when the time comes for
origin
to refer to yourgit
server.
On Tags and Branches
To convert a remote svn
branch to a local git
branch, use something of the form:
git branch my_first_branch refs/remotes/svn/my_first_branch
(Don't forget svn/trunk
is already checked out as master
.) It's possible to automate this if you've a huge list, but it's instructive to do it at least once by hand for practice, after which you can script the process. Beware of spaces which may have crept into branch names (these will appear as "%20
", but you should use underscores instead). As with all chainsaws, watch your wrists.
Tags are a matter of personal taste or of house policy: a tag represents a snapshot in time, while a branch can (eg) accumulate production bug fixes. The main practical difference is that attempts to check out a tag yield a 'detached HEAD', and any work therein won't be saved without further effort.
If you wish to convert an svn
tag to a git
tag, use the following. Remember this tag will exist only in the svn remote information, and won't be saved to the git server later.
git tag -a -m"Converted svn tag" \ release_1.0 \ remotes/svn/tags/release_1.0
If you've been naughty and committed updates to the svn
tag, or you wish to reserve that option under git
, convert it to a git
branch, thus:
git branch release/version_1.0 \ remotes/svn/tags/release_1.0
(The suggested subdirectory-like branch naming will help segregate production-release branches from development ones, but means slightly more typing. It's your call.)
We suggest you use both tags and branches at the same time. The following checks out an svn
tag as a git
branch, and tags the git
branch by the svn
tag's original name; both can then be pushed to the git
server.
git branch release/version_1.0 \ remotes/svn/tags/release_1.0 git tag -a -m"Converted svn tag" \ release_1.0 \ release/version_1.0
Handling svn:ignore
Nontrivial svn
projects will use the svn:ignore
property to selectively ignore certain files, and git
uses the text file .gitignore
for the same purpose. However, git svn clone
will by default ignore all svn
properties other than svn:executable
.
To view the svn:ignore
property of the base directory, say (while sat in it):
git svn propget svn:ignore
.... which leads to the following exceedingly quick-and-dirty way of creating .gitignore
for yourself:
git svn propget svn:ignore | tee -a .gitignore
You may well need to repeat this exercise in subdirectories. Don't forget .gitignore
is an ordinary file, so you'll need to add it into git in the normal way, both initially and whenever it's modified, so that the next git commit
saves it:
git add .gitignore
This also pertains only to the current branch. If you've more than one, you'll need to repeat the whole thing from the top for every branch you're interested in.
HOT NEWS: the command:
git svn show-ignore
.... in the root of the git
repo will show you the svn:ignore
properties in all directories (in a git
-compatible form), and:
git svn create-ignore
.... will add a matching .gitignore
into each directory. These need to be checked in as a git
commit; and this needs to be repeated for each branch of interest in which it applies.
Pushing to the git
server
At this point, you've got a local git
clone corresponding to your svn
repository. Now to populate the project on gitlab.physics
which you created at the start of play:
- Tell your local
git
clone to use your new project space ongitlab.physics
as a remote, associating the (traditional) nameorigin
with it:git remote add origin \ git@gitlab.physics.ox.ac.uk:project-name.git
You can copy-paste the full URL from your project's homepage on
gitlab
. - Now push the master branch of your project to the server:
git push -u origin master
The argument
-u origin
(in full:--set-upstream=origin
) tells git that the remote-nameorigin
is the default upstream for the local branchmaster
for subsequentgit push
invocations on the same branch.If you've got multiple branches and
git
tags, you can instead push them all at once (erm, twice):git push --all origin git push --tags origin
If you don't wish to push
git
tags to the server, drop the second command. The defaults forgit
are that tags are transferred when copying from the server (bygit fetch
etc), but are only sent to the server by explicit request. This can be made to make sense.
Quality Assurance
Paranoia Dept: If OS X has lunched the Terminal
process under which you did git svn clone
, or if (like your humble Author) you have an abiding distrust of magic inscrutable software, you'd be wise to compare and contrast the svn and git versions of your project, as they appear on the respective servers.
- Create two entirely separate and completely fresh copies somewhere else (eg
base-directory/trial
):cd somewhere_else svn co --username yourname \ https://scm.physics.ox.ac.uk/svn/yourproject \ checkout_svn_dir git clone \ git@gitlab.physics.ox.ac.uk:project-name.git \ checkout_git_dir
- do a recusive diff of the two copies, eg by:
diff -ur checkout_svn_dir/ checkout_git_dir/ | less
This will show you which files are present in one but not the other, and which files are present in both but differ in content.
- The true paranoid (you know who you are) would loop back to the top and repeat this, once for each active branch of interest.
If you can account for all the differences, congratulations: the git
server's copy of your project matches the svn
server's sufficiently closely. If you can't, that's either a bug or a lack of clarity on our part, or just possibly a misunderstanding on yours; in either case, please send full details to us at itsupport@physics, and we'll investigate.
Postlude
At this point, you'll have a single-user Project on gitlab.physics
which may happen to have others' commits in it. For how to proceed, please see:
Once you (and any collaborators you may have) are happily using gitlab.physics
, feel free to clone the project from gitlab.physics
into another fresh directory on your system (to leave any remaining links to scm.physics
behind), and work in that. Continued use of git svn
(to help keep svn
and git
versions of a project in sync) is possible, but this Article is already too long.
Categories: Development | HOWTO | agile | git | project management | svn