Friday, March 09, 2007

Using Darcs (and EclipseDarcs)

Radek Grzanka has asked, on the EclipseDarcs mailing list, whether I could share some experiences using Darcs in my programming team (and I'm happy to do so :-). I have introduced Darcs to my team about one and a half year ago when we started to do some projects that involved developers sitting in locations other than our office in Karlsruhe, Germany (including external developers who were located in the US), and which also involved people who are, like I was myself at the time, frequently travelling.

For me this is one important plus for Darcs: that it allows fine control over what goes into the repository while being offline (on the train, on the plane, or in a Hotel room that hasn't internet access). With source code control systems that require the presence of a server you can check in less frequently, and therefore you tend to either check in too coarse-grained, or you spend much more time figuring out which source changes belonged to which activity when you are checking in. With Darcs, on the other hand, you can code, then record, then code, record and so on, and when you go online the next time, you can still push selectively.

The teams we are working in are relatively small (up to six people, with some fluctuation), so I couldn't say much on how it works with really big teams. But for us it worked out really well. Darcs is pretty easy to learn and also definitely fun to work with, and that was consistently the feedback I got from all developers. (It also has the occasional bug or sluggishness, especially on Windows, but that's far short of the big-time sucking that CVS can be, in my experience.)

Darcs is also helpful with branching, which we do a reasonable amount of. However, in this area one still needs some tools or techniques around it, mostly for figuring out diffs between branches. This is one of the areas where EclipseDarcs comes in very handy, namely the repository browsing perspective that allows to conveniently inspect patches. (I have also started work on a repo diff view for the Darcs repository browsing perspective, exactly from this background.) The most needed tool there is a viewer that supports viewing patch dependencies. For instance, imagine you have one release repository and one development repository. The latter holds a bunch of patches which are not yet in the release repository. Someone has the job to 'release' some patches from the development repo to the release repo. So it is important for that guy to see which patches depend on which others, in order to figure out what minimally must be pushed.

We typically have one central repository on an intranet server that we call the integration repo, or development repo. Developers work on local repos on their machines, and record changes there. All our work items, bug fixes and tasks that we do are structured in an issue tracker. So typically, when somebody has done some implementation task, that's associated with an issue tracker ID and recorded so (stating the issues tracker ID first thing in the patch comment). After that, it is usually either pushed to the integration repo, or sent to another developer who reviews it and then pushes it. On the integration repo, we have automated processes that build, collect compiler warnings, run CheckStyle, run JUnit, run PDE JUnit and so on. That way, we get early feedback about integration problems (that is, problems that go further than simple conflicts in the source text). This sort of setup is relatively common today (much more so than when we introduced it, some five years ago), and often called 'continuous integration'. It is one of the most valuable practices in software development in my view; however, it requires a certain insight and co-operation from the team. (As with many of these so-called 'agile' practices, the people in your team need to understand and value them - else they won't work much better than any other methodologies.) In our practice, we try to push and pull as often as possible (several times a day). This helps to have the entire team up-to-date with recent changes to the code base, and reduces the chance of integration problems.

In addition to the integration repo, we have branches that hold the released versions. In some cases, they are just subsets of the corresponding integration repo (the latter one holds more patches because some functionality is not yet mature enough to be released to the public, but ok to be integrated with the rest of the code base). In some cases, they are real functionality branches (branches of the main stem that contain some different functionality, e.g. for deployment of a customized version). Naturally, we try to keep the second variant rare - but in my view it is definitely more pleasant to do this with Darcs than with other systems I've worked with.

And of course we have also learned to love such techniques as those which are described on the Darcs wiki as spontaneous branches and preparation branches.

We use EclipseDarcs together with the Darcs command line (the latter one for more complicated recording that spans over multiple files, and for pulling and pushing). EclipseDarcs comes in helpful for inspecting repo contents (as I remarked above), and for adding files, quickly recording single-file changes, and it helps of course by showing which files in the repo have changed since the last recording. Well - and we all know that there is plenty of room for improvement :-) It would be nice to be able some day to have all Darcs functionality integrated in Eclipse, of course. We're only half-way there. But while that is a little inconvenient, I don't think it really should be a reason not to use Darcs. For one thing, the command line interface of Darcs is very convenient, and apart from that I'm convinced that a professional programmer should be able to use a command line tool.

The source trees we work on are almost always Eclipse Plug-In projects. This is a bit nasty when the contents of a repository must be imported into the workspace for the first time. (Especially because the contents of the pristine tree, under _darcs/pristine/, are detected by Eclipse and offered as projects to import; one must be very careful not to select these - the best technique for this is to temporarily move the entire _darcs folder out of the repository, import the projects all at once, and move the _darcs folder back.) On the other hand, this tends naturally to a well-structured and fine-grained organization of the source tree, which helps when recording, and probably also helps with the Darcs performance. For our purposes, the performance is ok (but again, that pretty much depends on the layout and size of your source trees).

Perhaps one or two more misc notes: one should checkpoint repos from time to time, it really helps to reduce the time to initially get repo contents. (It is also definitely faster to use darcs get instead of darcs init && darcs pull --all, but that's probably old news :-)

No comments: