SCM for Beginners (*OBSOLETE*)

PLEASE NOTE:

GForge is no more, and scm.physics is being replaced by a native git facility. New projects should instead use git locally, and gitlab.physics for sharing with others.

If you have any existing projects on scm.physics, run (don't walk) to:

.... while walking is still an option.

PLEASE NOTE: The remainder of this article may be cruelly truncated or disappear altogether without notice, as and when its useful portions have been merged into Git for beginners, the successor in interest to this Article.

Remainder of original article:

Using scm.physics as a sourcecode repository is not quite trivial, and in places is decidedly awkward. This Article is intended to give an overview of how to get the best out of your scm.physics project repository, and to help you avoid some of the more egregious landmines. If this gets too long for comfort, I may reduce it to a quick-start set of instructions, and devolve the bulk into the planned freestanding companion piece.

This Article concentrates on using Subversion at the command line. Contributions on GUI clients, and on git (notably on using git svn to use scm.physics as a "Subversion backend"), are welcomed: please contact us at the usual e-mail address, or add comments via the link at the foot of this Page.

I will throughout be using the mythical example project majoc-fu, with the "unique UNIX name" majoc_fu, which I will pretend I've been developing to start with in the directory /Data/carter/src/majoc_fu on my local system; the eventual Subversion "working directory", within which I will be doing updates, will be under (or be) /Data/carter/svn/majoc_fu. There's also the equally mythical book project book-fu, which has a different and arguably superior local-directory layout. I shall also pretend that my login name on scm.physics is mjcarter, to ensure it's visibly distinct from other usernames I may use herein and elsewhere.

Please see also:

Preparation

  • Obtain a fresh repository, as detailed in Using scm for Source Control.

  • Select your editor of choice, by setting an environment variable:

    setenv EDITOR nedit (tcsh)
    export EDITOR=emacs (bash)

    The command appropriate for your shell can be used as-is at the command prompt, and can also conveniently be put in .tcshrc (resp, .bashrc) for use on later occasions, in the usual way.

Populating a fresh repository

Scenario: your project has been approved. You have an empty repository on scm.physics, and a set of files on your local system.

CAVEAT: svn import considered harmful

Many of the quick-start guides out there suggest using "svn import", which will create and populate a repository in one go. But your repository has already been created for you by GForge, and your collaborator (or you, from another system) may already have waxed keen and committed some files into it. Using "svn import" bypasses most if not all of the safety mechanisms which Subversion provides to protect your files, both locally and in the repository, in this sort of situation. Besides, you'd still need to check out a working copy anyway before you can work on it locally.

The method shown below puts this last step first, after which you can work with said safety mechanisms, rather than trying to second-guess the state of the repository. It's closely related to the description of programming as "debugging an empty file".

DECISION: Subversion or git?

Which source-control mechanism you use locally is up to you. Of the two, git is the more flexible, but perhaps therefore more difficult to learn; while the linear view of history imposed by svn can be both a restriction and a valuable discipline. At the server end, however, the git interfaces for scm.physics are less well developed: in particular, access using http:// or https:// is not available at the time of writing.

We recommend that you leave your repository on scm.physics set to serving Subversion, and using http:// or https:// to interact with it. This leaves you and/or your collaborators with the option of using git locally, and interacting with scm.physics via "git svn" commands: keyphrases to look for (in the literature and online) include "Subversion backend".

The best and least confusing way I've found to populate an empty repository is to check it out into a fresh local directory, copy the files into place, add them into the set known to svn, then commit the result. Let's take these four steps one at a time, using the majoc-fu example. Please see also Different local layout below, which puts everything into a single "howto" list.

Check out empty repository

mkdir /Data/carter/svn (if necessary)
cd /Data/carter/svn
svn checkout \\
   --username mjcarter \\
   http://scm.physics.ox.ac.uk/svn/majoc_fu

.... where I've folded the checkout line at the backslashes to fit in this margin. This will create the directories:

/Data/carter/svn/majoc_fu
/Data/carter/svn/majoc_fu/trunk
/Data/carter/svn/majoc_fu/tags
/Data/carter/svn/majoc_fu/branches

.... on the local system, each of which contains a directory .svn whose contents are sacred to Subversion (translation: touch at your peril), but which should be otherwise empty. To check that that's so, and that my eager collaborator hasn't beaten me to the first commit, I would then do:

cd /Data/carter/svn/majoc_fu/trunk
ls -lFa
svn status -u

This should show an all-but-empty directory, plus:

Status against revision: 1

.... as GForge counts creating the empty repository as the first revision.

Tip of the Day:

If you don't (yet) care about anything but the trunk, you don't need to check out anything but the trunk. Thus, at the initial checking-out stage, instead do this (to continue the example):

mkdir /Data/carter/svn (if necessary)
cd /Data/carter/svn
svn checkout \\
   --username mjcarter \\
   http://scm.physics.ox.ac.uk/svn/majoc_fu/trunk \\
   majoc_fu
cd /Data/carter/svn/majoc_fu
ls -lFa
svn status -u

Note the difference in the checkout URL, and the extra argument to svn to name the desired destination directory. This will create:

/Data/carter/svn/majoc_fu

.... with the same contents (at this stage, only the special subdirectory .svn and contents) which would have been in /Data/carter/svn/majoc_fu/trunk. If at some later stage you wish to start working with branches, you're at liberty to check out one branch, or the full tree, somewhere else on your system, and work there instead (or as well), or to wax adventurous and play with "svn switch".

Copy files into place

cd /Data/carter/svn/majoc_fu ( /trunk : see above)
cp -pr /Data/carter/src/majoc_fu .
svn status

This should show all files and subdirectories, each with ? ("unknown") against it. Take the time to pick through them, looking for and deleting (copies of) files you don't wish to put under version control. (It's possible to persuade svn to ignore them instead, but let's keep things simple for now.)

Add, and commit

Double-check that there's no files you wish to ignore, then say:

svn add *

This tells Subversion that we wish to track said files and directories. After this,

svn status

.... will show all files and subdirectories, this time recursively, each with an A ("added") against it. Check this, and the output of "ls -la", to see if there are any "dot files" which the wildcard above has missed; if so, use "svn add .dot-filename" to add each of them by hand.

svn commit

This starts up the editor set by the environment variable $EDITOR, with some useful and interesting contents. Add a comment describing the commit in the blank line at the top, save the result, and exit from the editor.

The source should then wing its way into scm.physics. If the repository has been updated while we weren't looking, we'd instead get a complaint to that effect .... which is where the rest of this Article comes in.

Serving suggestions

Different local layout:

Scenario: You have a conditioned reflex that (eg) books are in one place and code is in another. Let's take populating an empty repository from soup to nuts, using the mythical book-fu project, with a different local layout (and ignoring branches) .... come to think, this makes more sense, and in particular doesn't involve tangling with the esoteric mysteries of symbolic links.

cd /Data/carter/books
mv book_fu book_fu.orig
svn checkout \\
   --username mjcarter \\
   http://scm.physics.ox.ac.uk/svn/book_fu/trunk \\
   book_fu
cd book_fu
svn status -u
cp -pr ../book_fu.orig/* .
(weed out anything you don't want to save ....)
svn status -u
svn add *
svn status -u
svn commit
svn status -u

Now let's tidy up after ourselves, nondestructively, and do some last paranoia checks before we move on:

cd ..
mkdir tarballs
tar cvzf tarballs/book_fu.orig.tar.gz book_fu.orig/
rm -rf book_fu.orig
ls -lFd book_fu* tarballs/book_fu*
cd book_fu
svn status -u

Variants where the tarball is put into Subversion and checked in are left as an exercise for the true paranoid.

Ignoring files:

It's not often that people wish to add editor backup files such as chapter2.tex~ or chapter2.tex.bak to their repositories, as the repositories will be doing the backups instead. To persuade Subversion to ignore such files, edit the svn:ignore property of the directory, thus:

svn propedit svn:ignore .

This opens your editor of choice. Add the lines:

*~
*.bak

.... and save. Thereafter, they won't clutter up "svn status", and potentially hide more important pending changes.

This is an especial boon for those of use handling LaTeX source, for whom *.dvi, *.bbl and so forth would get underfoot as much as *.o would for someone coding in C. You can use "svn rm" to delete any such files which have got as far as the repository, or "svn revert" if they've been added but not yet committed, and you wish to still keep them around.

The equivalent with git is to add lines to the file .gitignore, one per filename or wildcard. This is an ordinary file, and you should use git add in the usual way to put it under version control.

Tip of the Day: Since I sometimes put directories under version control using git locally and svn remotely, I've got into the habit of using .gitignore as the definitive exclusions list. Since it's an ordinary file, this permits me to use it with (eg) the "-X" option of BSD or GNU tar, or (Advanced Players Only territory) the "-F" flag of "svn propset", often in Makefiles. Why type something more than once if you can automate it?

Caveat: ensure both .svn and .git are in .gitignore, to tell each version-control setup to ignore the other's sacred directories.

Handling dot files and exclusions:

Scenario: the wildcard "*" misses files, or matches too many. The first can happen if you need to include "dot-files" such as .gitignore, and can result in your commit being incomplete; the second if you've told Subversion to ignore certain files (see above), after which svn add * refuses to proceed. (It's also entirely possible for both to happen at the same time.)

In either case, replace:

svn add *

.... with:

svn add --force .

(Note the trailing dot.) This forces Subversion to traverse the entire tree from the named directory (".") leafwards, looking for things to add, but obeying any "ignore" directives in place. As always, check the results with:

svn status

The equivalent under git appears to be:

git add -A .

where the -A flag means "all": again, everything eligible from the current directory (".") leafwards, and not forbidden by any .gitignore files, is added. Since git uses the UNIX default of silence to signal success, saying

git status

afterwards is even more important. (For the intrepid explorer, and in case of accidents, "git reset" seems to be the functional equivalent of the usages of "svn revert" in these circumstances. Lesson of the Day: As with all chainsaws, watch your wrists.)

Files updated locally

Scenario: You have updated the files within your Subversion working directory and/or added new ones. This is mostly a superset of the process suggested for doing the initial population. (I'll assume from now on that you're sitting in your working-copy directory.)

svn status

This will show a ? against the names of new files, and M against any which have been modified. Note that this is local information only: add the "\-u" option, pronounced "\-\-show\-updates" in full, to also fetch the latest revision information from the remote repository. (The reason for local-by-default is to permit you to work without needing constant network connectivity, eg when you're using your laptop on the beach. Handling ingress-of-sand and grumpy-spouse errors is beyond the scope of this Article.)

svn add file

Add file to the set of files known to Subversion. (This can use a wildcard. It doesn't usually matter if the wildcard matches files which are already known to svn: it mutters a bit, but only refuses to proceed if the wildcard matches something it's been told to ignore, like editor backup files. Please see also Handling dot files and exclusions above.) Thereafter,

svn status

will show an A against file, plus an M if it's modified further before the next commit.

svn diff | less

Show a "unified context difference" between what was last committed and the current working set, piped through less to paginate it: lines preceded by - and + are respectively old and new versions. Please note that any new files will be shown in their entirety (think "difference against the empty file"), and those which aren't known to Subversion won't show up here.

svn commit

This again puts you into the editor named in $EDITOR, ready to add a commit message. Do so, save the result, and exit: Subversion will then offer up the local modifications to the remote repository.

Files updated remotely

Scenario: Your eager collaborator has committed some changes to your shared repository. This is where things can get (ahem) interesting if you've also modified your local working copy in the meantime.

Please note: This section is under construction, and in particular doesn't yet deal properly with conflicts or branches. I also admit that it's comparatively new territory for me.

No conflicts

Let's assume for the time being that your current working copy is an ancestor of your collaborator's, and hasn't been modified since you last successfully did a commit to the repository. Most of this section will deal with verifying that that's so.

svn status -u

Show the latest revision number at the server end. If there are after all any new files, these are listed, each with * against it; any out-of-date files will have both * and the revision number of your version thereof.

svn status -v | sort -n | tail

Show the ten latest locally-updated files, together with their revision numbers. It's a quick-and-dirty way to check the number of the revision you last committed.

svn log -r HEAD:BASE

Show the logs between the latest update at your end (BASE) and the server end (HEAD), youngest first; for oldest first, use BASE:HEAD. You'll probably find it's long enough that you need to pipe this through less.

svn diff -r HEAD | less

Show the differences between your local files and the latest revision at the server end, piped through less to paginate it. Take especial notice of deletions (with a - against every line: think "how to empty this file"), to minimise surprises later.

svn diff -c 42

Show the differences made by the specified single commit, at the server end. Annoyingly, this doesn't believe HEAD or BASE can be numbers, and doesn't permit ranges.

svn diff -r 42:HEAD --summarize

Give a summary (similar to "svn status") of the differences between the specified commits, which makes it easier to spot those pernicious deletions. Annoyingly, attempts to use BASE yield complaints; happily, HEAD is allowed. (This can be made to make sense.)

svn update

If the two of you agree that your collaborator's updates are the Way Ahead, this will finally fetch them into your local working copy. You can both heave a sigh of relief, and carry on working.

With conflicts

Scenario: you discover that you and your eager collaborator have both produced significant and significantly-different updates, your collaborator has beaten you to the commit, and both of you have been working on the trunk. (I have seen this happen with two collaborators sitting in the same room while they were discussing the project in question. No names, no pack drill.)

History has forked, but neither your repository nor your collaborator are yet aware that it's happened. This is not a technical problem, but one of communication. No technical solution will solve this unaided: you would at best put your collaborator in the position you're in now, which implicitly invites retaliation in kind.

The very first thing to do is for the two of you to set up a peace conference, to discuss what's happened, and why, and what the two of you should do about it. Possible courses of action include:

  • You agree that one or both of you will work in a separate branch for a while, and reconcile your code differences later. This is what branches are for, and will help simplify said reconciliation process.

  • You agree that one of you should give way to the other.

  • You agree to merge your respective workings, and which of you will do the merging.

I'll go into the details of the creation and resolving of branches, and the dark arts of "svn diff", "svn merge" et al, in the planned freestanding companion piece. In the meantime, please see:

Avoiding conflicts

Conflicts are inevitable, but you (and your eager collaborator) can reduce their impact.

  • Use "svn commit" and "svn update" early and often, and urge your collaborator to do likewise. This'll help by detecting potential conflicts and other divergences early, and by making it easier to "cherry-pick" afterwards (whether you're both working in the trunk or in separate branches).

  • If you know you're about to go off-net for a while (eg if you'll be coding on your laptop on the beach), then spawn-off a branch, and work within that for the duration. This'll make it easier to save your changes when you get back, then to reconcile any differences with whatever's happened on the trunk in the meantime.

  • Consider using git locally, even if you use svn at the repository end (see "git svn" for how to use scm.physics as a "Subversion backend"). This permits you to do partial commits of your work while you're still on the beach, and little luxuries such as being able to say:

    svn add .gitignore
    svn propset svn:ignore -F .gitignore .
    tar cvjf ../tarballs/foo-project.tar.bz2 -X .gitignore .

    .... which will keep your svn:ignore property visible and versioned, and keep your tarballs clutter-free.

Word Differences

Scenario: You are working on a file which is pure readable text (eg README.txt), or on a file in which readable text is embedded in a programming language (eg chapter2.tex, or comments in a C program). You, or your eager collaborator, have added a single word to a paragraph, then "refilled" the paragraph to better fit the margins. Upshot: a single-word change can yield a whole-body change of most of the paragraph.

This happens most often with human-readable text, but even the non-comment parts of full programming languages like C are not immune. I believe one thread in one of the Linux kernel development trees is specifically devoted to handling trivial layout changes, eg those pernicious differences in whitespace characters invisible to the human eye, but which are all too visible to diff.

This is where the traditional line-based diff lets us down, but word-based alternatives now exist. For example, wdiff displays a word-based difference between two named files, one of which may be the standard input (alias "-").

To check your own updates on a single file:

svn cat chapter2.tex | \\
    wdiff - chapter2.tex | \\
    less

To check your updates on all files, using the "post-processing" flag "-d" (much less typing, at that):

svn diff | wdiff -d | less

To check your collaborator's updates (HEAD is the latest commit to the server, and BASE is the revision your current working set was last updated against (roughly, your latest commit)):

svn diff -r BASE:HEAD | wdiff -d | less

You may need to read the following results backwards: it's what would be required to transform your collaborator's last commit to your current working set.

svn diff -r HEAD | wdiff -d | less

Please note: wdiff is being rolled out to Centrally-maintained Astrophysics MacOS X systems as we speak, as part of the standard MacPorts build. I've yet to check whether it's installed by default on Centrally-maintained Linux systems, but it's certainly available under Ubuntu.

For those using git, the "--word-diff" argument is available as a built-in from at least version 1.7.2 (v1.7.10 or later is distributed as part of OS X from Lion onward; annoyingly, the OReilly book on git is largely predicated on version 1.6). Thus:

git diff --word-diff -- chapter2.tex | less
git diff --word-diff | less

.... are roughly equivalent to:

git diff -- chapter2.tex | wdiff -d | less
git diff | wdiff -d | less

Try each, as they're not bit-for-bit equivalent: either can make a right mess if there's substantive differences, but they tend to make different right messes, and the truth lies somewhere between them. (Advanced Player note: wdiff tends to think in blocks, while git diff --word-diff is closer to being line-based, at least with Ubuntu's defaults.)

It is doubtful whether the corresponding (third-party?) extension will be added to Subversion; hence the above manual equivalents.

Commits Mailing List

Scenario: you have collaborators on your project, and wish to be e-notified when they make commits, without having to go and look for yourself. The Good News is that GForge provides mechanisms for this .... the Bad News is that they've traditionally been decidedly nontrivial to set up.

This set of instructions is known to be sufficient. All assistance humbly accepted in determining quite what is now necessary.

When your repository is set up for you, you will have received e-mail telling you which URL to use for communicating with it, and the password for the related Commits e-mail list. For the project with the UNIX name majoc_fu, the commits list would be called majoc_fu-commits.

  • In your web browser, go to the project's homepage, using your scm.physics login and password.

  • Bookmark this, as it'll be your first point of contact.

  • Click on >> Admin on the left, near the top, then on Mailman Admin which will appear below it in response.

  • On the line containing (in my extended example) majoc_fu-commits, click on the link Administrate.

  • Enter your list-administrator password (from the e-mail), and click on Let me in...

  • You will then be at the general-administration page for the mailing list. Bookmark this too.

    [**]The next time you use this second bookmark, you'll be prompted for the same password, so don't throw away the e-mail.

  • On this page, you should find your e-mail address pre-entered into the List Administrator field. Copy-paste it into the List Moderator field, to increase the number of things about the list that you'll be e-notified about.

  • At the foot of the same page, change the maximum size of e-mail messages from 40KBytes to 4095KBytes: large commits are exactly those you wish to be notified about.

  • Change the preferred hostname entry from scm-mail.physics.ox.ac.uk to physics.ox.ac.uk, to keep faith with Central Physics's mail setup.

  • Optionally click on the Yes button against get notices of subscribes and unsubscribes.

  • Click on Submit your changes at the foot of this page, to (ahem) submit your changes.

This sets up the list basics, but you are not yet a member.

  • Click on Membership Management at the top, then Mass Subscription.

  • Enter your real-world e-mail address in the first box.

    [**]Repeat for anyone else who has commit rights on your repository.

  • Submit your changes.

At this point, you will be a member of this list under your outside-world e-mail address. However, any commits you make will be made in your identity on scm.physics, which in full is different: failure to do the next step results in messages about your commits being held for moderation. You're at liberty to manually tend to pending moderator requests, but this gets old rapidly, and I for one am not that patient.

  • Click on Privacy options, then on Sender filters.

  • In the list of non-member addresses whose postings should be automatically accepted, enter your scm.physics identity as an e-mail address; similarly for anyone else who has commit rights. This will be of the form:

    mjcarter(at)scm.physics.ox.ac.uk

    .... translated in the obvious way, and for appropriate values of mjcarter.

  • Submit your changes.

  • Click on Logout to disengage.

  • Check that the "welcome-to" message has arrived in your inbox (keep it, but don't believe what it says about sending mail to it).

You are now subscribed to the list, but nothing is going into it.

  • In your web browser, go back to the project's homepage you bookmarked earlier.

  • In the navigation pane on the left, click on >> SVN, then on Admin below it.

  • Click in the tickbox for Send Commit Emails to Mailing List, then on the Submit button.

Now's the time to do a trivial commit to verify that you're e-informed that it's happened. Setting up Exchange rules (or local equivalent) to pre-route such messages into a Subversion-specific mailbox would be a Good Thing: the details thereof are outside the scope of this Article, but the fact that all commit messages will include a From: field of the form "scm_identity(at)scm.physics.ox.ac.uk" should be hint enough.

Categories: Development | HOWTO | agile | gforge | git | project management | svn