October 13, 2006
svn:externals: just say no

When we set up our Subversion repository, we did it wrong. And by we, I mean me. ;-)

Our application is modular - you can buy one bit, or several, depending on what you want. There's a core module, containing anything that all the other modules can't do without - authentication and authorisation, audit logs, that kind of stuff. Then we have a simple contact management module, a technical accounting module, and so on. (Mostly still to be written.)

Each is a separate web application - a separate WAR file. They communicate via SOAP. (I know, SOAP in this day and age. Still, it was what was required...) And each has its own Java package - uk.co.trisystems.morph.core, uk.co.trisystems.morph.cm, uk.co.trisystems.morph.ta, and so on. But they have a fair amount of code in common, which lives in uk.co.trisystems.morph.common. And that's where we (all right, all right, OK, I) went wrong.

It's not all my fault - I blame VSS. We'd been using it for ages, and in VSS, you do things like this using shared folders. This was our first major project to dump VSS in favour of Subversion, and the closest thing we could find to VSS's shared folders was Subversion's externals. Brain damaged by VSS as I was, that's what we used.

In hindsight, a mistake.

Subversion's externals are really intended not for code shared within your own project, but for genuinely external code - stuff you want to pull in from other projects and so on. Subversion treats externals as 2nd class citizens in a way.

Updating your working copy is not a problem - the usual svn up works fine. But checking your changes back in, that's not so easy. If you've made changes to common code, svn commit leaves them behind. Merging, branching, making patches, just the same - externals are ignored. Fair enough, given what externals are intended for.

It's not so bad for me - bash helps out a lot here. To check all my changes in I do a:

svn st | awk '/^\s?[MAD]/ { print $NF } ' | xargs svn commit -m"Blah blah blah"

(Thanks to Andy for the awk lesson!)

Branching is still a bit of a pain - branching both common and the other apps, and munging the svn:externals property is a fiddle, but I can live with it. But for my Windows-victim colleagues, it's much worse. Padawan Dan has taken to cygwin like a duck to water, but for the rest of the team there's Tortoise.

Tortoise is fine for simple stuff, like updating and looking for conflicts. (Though its conflict resolution doohicky is horrid.) But whenever people try to use it for the slightly more complex jobs such as branching... things just go wrong. And we can never work out what happened. I'm not blaming Tortoise as such - I'm sure that it works fine. But GUIs are just not good for this kind of stuff. You need to be able to work from examples, and with a GUI, it's just too easy to leave some checkbox or other unchecked - and you'll never be able to work out what you did wrong after the fact.

So, what should we have done? Well, I think I know, but this is all a bit suppositional. ;-) What I think we should have done is not to have bothered trying to keep one copy of our common code. Instead, we should have had a copy of the common code in each application, and merged any changes to that common code to the other applications.

Sound right?

Of course, for all this merging, you need good merging tools. But that's another post...

Posted to Software development by Simon Brunning at October 13, 2006 01:07 PM
Comments

I think that your final choice may not be the best one. You should only have a single copy of the common source, period. Now if for some reason, that I may not understand, you can't use a single repo for the common source tree then I'd love to hear more.

Posted by: Anthony Eden on October 13, 2006 01:45 PM

Developers should be comfortable with the command line version of their source control system.

I don't even waste my time helping someone who's having CVS problems then opens up WinCVS (got to be cruel to be kind I say).

Posted by: Darren on October 13, 2006 01:48 PM

About common code, we used to do that system where each project had its own copy (on a different branch too) and it didn't work well.
Merging in of fixes proved to be a nightmare, or didn't get done.

We now do it so that each project has the common code as a jar, and if you want to make changes to the common code, you check it out, make your change, commit it, make a new release, then each project can optionally pick up the new release or stick with what they already have.

Posted by: Darren on October 13, 2006 01:54 PM

See, I told you that awk was good. Glad to be of service.

You've probably already thought about this, but what about sticking everything in a common repository and just checking out the bits you want at build time?

It would probably require a little code in your build script, but then I presume you are using Python for that anyway ;-)

It would, however, make day to day development a lot simpler.

Posted by: Andy Todd on October 13, 2006 01:55 PM

Darren - I agree. I've written extensive instructions on how to do branching, merging, making and applying patches and so on in out internal wiki. If people choose to ignore them and do it their own way, they can sort their own problems out AFAIC.

Anthony - At the moment, we do have only one copy - and it's making life difficult.

I do realise that duplicating the common code in each app not ideal either - there's the risk of someone forgetting to merge changes around to all the apps. But that's how Subversion seems to be set up to work, and a script to run nightly and alert us to mismatches between the common code in all our apps would be a snap to write in Python.

But if anyone has any advice, or better ideas, I'd be very happy to hear them!

Posted by: Simon on October 13, 2006 01:57 PM

Andy & Darren - interesting ideas. Our build script is in Ant, but I'm sure we could work something out...

Posted by: Simon on October 13, 2006 01:58 PM

I think the idea of placing common code into a jar file sounds extremely promising. I would also like to add that I completely agree with the fact that developers *should* become familiar with the CLI of their toolkit, on the other hand I do get a little tired with people that treat developers who favour the GUI versions as though they were a bunch of plebs. I know this wasn’t expressed that strongly in this discussion, but it does seem to pop up quite a lot.

Posted by: Dan on October 13, 2006 02:18 PM

What's wrong with being a pleb

Posted by: Elp on October 13, 2006 02:32 PM

Dan,
I agree - people are entitled use whatever tool they like.

OTOH, so am I. I'm perfectly willing to help people, too, explaining how I do stuff - with the tools of *my* choice. If they choose to use different tools, they'll have to help themselves.

And as you've found yourself, the command line tools are far more powerful once you have got the hang of them, it's easy to follow instructions via cut 'n' paste (without the possibility of forgetting some obscure check box), and you can see what you've done once you've done it.

Posted by: Simon on October 13, 2006 02:43 PM

I should emphasise that the common code as a jar thing works really well if you properly *version* them.
Our script generates jars (and separate source jars) with the version number in the filename.
It then makes it very easy indeed to see which project is using which version.
We did do it for a while without explicit versioning and it became very hard very quickly tracking which project used which version of the code.

Doing all this does slow down changes to the common code as you have to do a separate check out and release etc.
We like this and see it as a feature, you may find it too cumbersome.

Posted by: Darren on October 13, 2006 03:05 PM

My solution is somewhat different. We put all the projects (including the common code) into a single repository. The downside is that every working copy is much bigger... you have store the code for all of the projects rather than just your project and the common code. An advantage is that you can (if desired) run automated tests of ALL the code, thus verifying that the change you just made to some common code didn't break one of the other apps.

However, this probably wouldn't work as well if we weren't already doing something else complicated, which is that instead of doing our development on the trunk and using branches for stability around a given release, we do our development on branches and merge them in when we want to do releases. That itself is a *big pain* but we needed to do it for other reasons (namely that our customers want us to work on 6-8 things at once and not decide until later what to release when).

Posted by: Michael Chermside on October 13, 2006 03:51 PM

Simon: My point was don't have multiple copies of the same code laying around. I think that other projects using the resulting versioned JAR is a good way to do it (as someone else already suggested). That's how I've done it in the past and Darren offered a good followup comment on how to do it to minimize pain (via version numbers in the name).

Posted by: Anthony Eden on October 13, 2006 09:17 PM

I'd go with the seperate jar and consider moving to Maven2 for the build (solves the repository and dependency problems very elegantly).

Posted by: Jonas Olsson on October 26, 2006 11:00 AM

"You've probably already thought about this, but what about sticking everything in a common repository and just checking out the bits you want at build time?"

I am currently facing exactly the same issue (someone having decided to move from Clearcase to svn).
In my experience with "cherry picking check out" is worse, because if you make any modification to a cpco file and do a 'svn status' at the top of your working directory, svn will go "?" on all those cpco files.
If instead you use externals an 'svn status' at the top will give you "U". So at least one (developper or script) knows that somethings need to be done.

I tell you, ppl used to hate the complexity of Clearcase, UCM, but I am starting to hate the apparent simplicity of svn.
At least with Clearcase I could have load rules.

Posted by: Charles on November 7, 2006 06:30 AM

We are in the process of switching from Clearcase to SVN, and we were about to go down the svn:externals path too...but I'm gathering that is not a good idea.

We have bought into the Maven world...but we have one question. We have per-developer branches of each of libraries and products. So the problem is when we need to refactor across multiple libraries, merging back into the trunk from all those libraries and keeping the versions of the in the pom.xml of Maven for each library and product seems like a nightmare.

I noticed that Apache Torque project does use svn:externals, but they even admit that its got lots of gotchas. http://db.apache.org/torque/developer-info/subversion.html

Has anyone had experience in how to handle this?


Posted by: Ben on November 13, 2006 08:24 PM

So what about using the twisted method of source control? Twisted is a huge project witha number of distinct subprojects in it in it and a reasonable amount of common code.

The basic idea with Twisted is you generate a ticket for every change, branch a copy for every change based on the ticket number, and work in your own branch. Before merging back again, merge your branch forward to the latest trunk, run the test suite, and if it all works merge your branch back to the trunk.

I've missed out a code review and QA stage there, but if you want the complete lowdown, you have:

http://divmod.org/trac/wiki/UltimateQualityDevelopmentSystem

The process is not perfect, and not always adhered to fully, but it does save a lot of faffing around. Branches are both cheap and atomic in subversion, so it's not much of a problem, and Divmod developed its own pythonpath mangler to aid this system of development, I assume something similar could be done for Java.

Posted by: Moof on December 2, 2006 01:18 AM

We did the same as you are proposing (multiple copies of common code). After only 6 months, this is turning into a nightmare too (I now think we have 5 subtly different versions of the common code, each one with different, and potentially conflicting, enhancements).
I would recommend sticking with externs. BUT only reference tags - i.e. your extern reference is to a tagged release of the common code, which will never change. If you want to change some common code, you have to check it out separately (on a branch, or the trunk - whatever method you usually use), make your changes, test it, release it, tag it with a release number, then update your extern to reference the new tag.
This works very similarly to Darren's jar file suggestion (except you get source instead of a jar file, which ties in with the way you work at present).
You can actually make changes to the checked out common code on your working copy - you just can't check it in (assuming your tags folder is read-only to ordinary mortals, as it should be).
What you can do, however, is generate a patch file, check out the development version (trunk or branch), and apply the patch. Once you are happy that the change is OK, and you've checked it in and tagged it, you can go back to your original project, change the extern to reference the new tag, and update - Bob's your uncle.

Posted by: Nikki Locke on January 14, 2008 10:50 AM

We have a project with just four externals. That means for every branch we have to also branch each of the four externals and update the links (so, iow, branches either don't happen when they should or it wastes lots of time and is error prone). And externals are not exactly transparent to work with. People have already screwed up like having an external in trunk point to an branch... so you change something in trunk and it changes somebody else's version. Yeah, you can make it 'read only', but still it's a big PITA.

The key is to have one 'official' place for each of the common code to live, and keep that updated. Nothing should use this directly, but copy it in if they need it. It's some work to keep different codes up to date, but on the other hand everything that is happening in the repo is transparent.

Posted by: 0xABADC0DA on May 13, 2008 11:10 PM
Post a comment
Name:


Email Address:


URL:



Comments:


Remember info?