February 04, 2004

Good Tools Are Worth Paying For

by peterb

I definitely would not have survived the past couple of days at work without Perforce, the version control system of champions. I'm sure there are version control systems other than P4 that are just as good; it's just the particular flavor of crack that I'm addicted to. It just seems like the ubiquity of CVS, the de facto standard for open source projects, is yet another great example of Worse is Better.

Most open source projects you meet use CVS, which stands for "Concurrent Versions System." Everyone uses CVS for exactly two reasons:


  • It's free.
  • It's not quite as sucky as RCS, its free precursor

You'll even find advocates on the web explaining why you should use CVS even though it's totally broken. These advocates conveniently overlook the fact that CVS sucks. Let's review, shall we?


  • CVS allegedly supports branches, but no one actually uses them, because they're too hard to use.
  • CVS is the only version control system ever developed that can somehow manage to generate context diffs that are impossible to apply in a meaningful way.
  • CVS doesn't do atomic checkins. This is a huge deal. So if you're changing, for example, a source file and a header file, it's entirely possible for the checkin of the header file to succeed and the checkin of the source file to fail. Congratulations! Your tool just broke the build. Also, without atomic checkins, you can't say "give me all the changes that Bill made at such-and-such a time" -- Bill has no way to indicate to CVS that his change to the header file and the source file are interrelated.
  • No locking (see above).
  • No (meaningful) automated merging; if someone made a change to the repository while you had a file checked out, your likely path to success is to generate a context diff by hand and apply it by hand.
  • No meaningful way to rename files.
  • Doesn't scale well to large projects (and yes, I know people use it for large projects anyway. It still doesn't scale well.)
  • An inconsistent model leading to a clumsy CLI (I'm not complaining that the user interface is primarily via the command line interface; I'm complaining that the particulars of the command line interface aren't good).
  • Litters your source tree with near-infinite amounts of annoying metadata.

So when people call CVS "version control" you need to realize that that's a very optimistic description. It's really more of a hope.

There's just something about this topic that makes people's willing to accept a level of badness that they wouldn't take for a minute, in, say, their OS, or their text editor, or their web browser. Do you think people would keep using Emacs, even though it is both free and Free, if, periodically, it just decided to corrupt the files you were editing? Yet when it comes to version control, people accept just that. For example, here's a CVS apologist I found on java.net:


"CVS expects you to trust merges. A lot of people don't, usually saying how they once spent hours resolviing [sic] a horrible merge. It doesn't occur to them that the problem was not source-control but bad process - two developers editing the same part of the same file, with at least one failing to synch with the server often enough."

Wow, imagine that -- people failed to realize that their version control system didn't actually control versions of the files they were working on. Those crazy kids with their loud rock and roll music and their hamburger sandwiches and their French fried potatoes!

Think about that claim for a minute. Implicit in it is the idea that any software process model where more than one developer is working on a given region of a file at a time is broken. I guess that's a workable theory, if you don't actually have deadlines, or competent developers. Perhaps the lack of deadline pressure is the whole story of why CVS is adequate for free software (as opposed to Free Software -- there are open source software products which are sold for money and therefore have some deadline pressure). But since CVS provides no locking or notification support, how do I know if someone else is working on the same files I'm working on, anyway? What's my process if I work in a company with 10 people? Do we all send out mail saying "I'm editing the fold_monkey() function in monkeybagel.c now"? What if I work in a company with 50 people? 100? 500?

The idea that process should adapt to the inadequacies of the tool is insane. When a tool is inadequate to good process, you should find a better tool. (In this particular case, the inadequacy is not "CVS doesn't do notification," but "CVS doesn't do atomic checkins or make complex merging workable.")

Here's another one that I just have to share, and let's be frank, mock:


"CVS is indeed the best version control system I've used."

Translation: CVS is indeed the only version control system I've used.

The best version control system I've used is Perforce.

So what is Perforce? Perforce is basically CVS if CVS wasn't designed by hamsters on crystal meth. Just about every problem you can identify with CVS is fixed in perforce. There are command line tools that are disturbingly similar to CVS's, with the difference that they actually work. There's a Windows interface, if that's your thing. There's a (good) web interface. Hell, there's even a Half-Life interface to p4. Multiple users working on the same file works intuitively in p4 (the integration procedure is pure love). Multiple branches and/or multiple client views into the same or different branches are easy as pie. You can manage multiple changelists, all checkins are atomic, it's backed by a database, and reverting to earlier versions of the tree (or a branch, or a file) is simple.

The basic unit of a P4 checkin being a "changelist" has another consequence: you don't check in files to Perforce, you check in a changelist. So if I say "I want to see the specific fix that Bill checked in that fixes bug 14982," it's trivial, no matter how many files it touched. Try doing that in CVS.

And, most importantly, if you're working on a project with more than three people, you won't want to go hang yourself every time you need to check out or check in a file. At least not because of the version control system.

Random links: This discussion thread over at Joel on Software discusses some of the differences in a slightly less antagonistic tone than I've used here. It's a good read. Ned Batchelder blogged a bit about this too. Dana Epp is looking for advice on which SCM system to use. I plan to tell him to use Perforce; drop by and counter my advice if you disagree.

Perforce is a commercial product, but it is free (as in beer) for two users or less, and they're pretty liberal about issuing evaluation licenses so you can see what you're getting yourself into. It may seem comparatively expensive at $750/seat (less as you buy more seats) once you are working on a big project, but at my company, at least, it has been worth every single penny. And they claim that they offer free licenses to Open Source projects. So check it out.

What version control do you use? Have you killed anyone yet?

Update: See my article about using Perforce on Mac OS X.

Posted by peterb at February 4, 2004 10:27 PM | Bookmark This
Comments

I can also recommend subversion (http://subversion.tigris.org/) as an alternative to CVS. It has some of the problems CVS has (specifically, it litters up your source with .svn directories), but it solves pretty much all of the serious problems. Specific features are atomic commits, repository-wide version numbering, and a sane strategy for branches and tags. I think perforce is still superior in a number of ways, but subversion is the next best thing around, and a good choice if you can't get ahold of perforce for some reason.

The current major problem I see with subversion is that the branch merging doesn't work quite as well as it ought to yet. They do seem to be working on that part of the system, however, and I expect it'll shape up any time now.

Posted by John Prevost at February 5, 2004 08:43 AM

A few extra notes:

0. you WILL save 10x the per seat cost of a p4
license in just one week of needing a staging branch and not having one because CVS is too retarded to do staging branches well.

1. a branch or tag in cvs uses O(N) space, where N is the number of source files in your tree. this is stupid.

2. every distinct directory in a cvs repo is essentially a little self
contained tree. this encourages people to checkin stuff stupidly and makes constructing atomic state transitions on the tree even
harder. all checkins should be w.r.t. the root of the repository tree as you checked it out.

3. you have to hand edit meta-data on a regular basis whenever cvs loses it's own asshole.

Posted by Pete at February 5, 2004 09:14 AM

check out:
http://better-scm.berlios.de/comparison/comparison.html

and yes, p4 is the best thing ever.

Posted by Dan Belov at February 5, 2004 08:36 PM

p4 is leaps and bounds better than CVS, but it has two fatal flaws:

- it is not free and open

- it is nearly impossible to extract crap from p4 control

Why these are fatal is because it is possible to get p4 into the situation where the client can't talk to the server because of server side state, but there is no way to repair it without having full admin access and someone who understands p4 well enough to fix the problem. On the handful of projects I have worked on under p4 control, there were numerous hours of productivity lost to cleaning up p4 messes.

Beyond that, the command line interface sucks, but the GUI interfaces compensate nicely.

Posted by foobar at February 13, 2004 09:31 PM

The changeset-oriented philosophy which peterb claims to be a major strength of P4 is also the basis of an interesting free tool, tla/arch. See
http://wiki.gnuarch.org/moin.cgi/FrontPage

(Note: It used not to be a serious contender for cross-platform development because it uses very long paths, which run afoul of Win32 and/or Cygwin limitations, but this is getting fixed in a variety of ways.)

Posted by Marc-Antoine Parent at February 16, 2004 12:46 PM

You said:
But since CVS provides no locking or notification support, how do I know if someone else is working on the same files I'm working on, anyway?

I say:
You're wrong, although this is a commonly held misconception. CVS provides both these capabilities and I've used them very effectively in the past. That being said, most projects don't use them because they aren't aware of them, or don't know how to set them up. Take a look at the "cvs edit" and "cvs watch" commands.

Posted by Justin Wojdacki at February 16, 2004 03:20 PM

I'm curious: Did you set up Perforce yourself?

The reason I ask is that I was very excited to try out Perforce after reading your article, especially since for my purposes, the free two-user set-up is ideal. What a great idea! If I as a lone hobbyist get hooked on the free version, then I will very likely buy licenses if I should have some success and become a bigger organization.

Unfortunately, the installation process broke my spirit. Perforce's installation instructions are far too abbreviated and generic. How does one name a port? Shouldn't they mention that a good place to create an environment variable is .bashrc or .bash_profile? Or that /usr/local/bin needs to be added to the PATH? The instructions are written for generic Unix, so I guess that OS X users who prefer GUI interfaces over command-lines need not apply. (As you may infer, I have enough Unix under my belt to be dangerous, but not successful.)

I can't believe they can't write a simple installer to ease the pain. When a software provider neglects to make the installation and set-up process easy, it makes me doubt how well the company supports the product down the road.

If anyone can point me to more explicit directions for installing and starting up Perrforce, I'll just have to stick with CVS as the SCM choice for my Xcode work.

Thanks,

Dave

Posted by David Trevas at February 17, 2004 05:43 AM

I surprised that most of the article and the comments do not mention a few other tools at all. They are Surround SCM from http://www.seapine.com and ClearCase from IBM. While Surround SCM works on the Mac, ClearCase does not. but i have been able to implement both and successfully. Surround has the best CLI as well as GUI. ClearCase is powerful and can be used on the mac via a shared connect. I have setup CC with Virtual PC on the Mac and it works like a charm.

When we compare tools and price is not a criteria then we should be able to look at what is in the market, else we have a comparision that is one sided. As I understand the author of this article has used only Perforce and CVS and for obvious reasons is more happy with Perforce. Try Surround SCM and you will find a lot of what you want to do.

Another factor to consider when we talk about version control: a roundtrip system. a bug tracking tool is as important as the Version control. Only Seapine provides that capability on the Mac presently. And the beauty is it is cross-platform (across - platforms).

Posted by macnixer at February 17, 2004 03:30 PM

Wow, some great feedback here. Let me see if I can address some of the issues raised:

foobar: Obviously, I don't consider "not free and open" to be a "fatal flaw." For those who are more concerned that their software be politically correct than that it actually work, I agree that Perforce might not be the right choice.

Dave: I haven't set up P4 myself on OS X, although I've done it on other platforms. I could believe that it's not super-integrated onto OS X. Tell you what -- I'll give it a shot this week and report back on how it goes. My instinct is that there's a level of assumption that the documentation makes which is reasonable -- for example, if I was writing documentation for how to install a unix-based program, I don't think I would explain how to adjust the user's path -- to me, that fits into background knowledge. But hey, you're the user, and if you found the documentation inadequate, I'm not going to tell you otherwise!

macnixer: the first paragraph of my article points out that there are probably lots of other great SCM systems than Perforce. I talk about p4 because it's the one I know and love. Thanks for bringing Surround to my attention!

Posted by peterb at February 17, 2004 11:45 PM

Thanks, peterb, for looking into setting up Perforce in OS X. I look forward to seeing your report. Perhaps, then I could become a P4 evangelist, too! ;-)

Posted by David Trevas at February 18, 2004 11:04 AM

One thing I do love about CVS is the concurrent model.

I work in an organization separated by wide timezones and hugely different development schedules even for some of the developers in the US.

Using an exclusive access model for us just won't work. We'd lose days of work if we constantly had to be bugging each other about checking files back in.

You don't really talk about how P4 handles such concurrency. I'd be interested to hear more.

Another weakness of CVS you neglected to mention is its horrible support for binary files.

Wade

Posted by Wade Williams at February 18, 2004 11:34 PM

Wade,

Yes, I'm in the same situation -- we often have developers working on the same files, or even the same parts of files. It would kill our development if we had to serialize access to files. Basically, p4 lets any number of users edit any file at any time in their own private workspace. When they submit those files back to the repository, they may have to resolve their changes with changes that occurred while they had the file checked out. Perforce makes that resolve step as easy as it can be, IMHO.


The p4 workflow in more detail looks something like this:

-there's the repository, or depot. That's where the files are
-there's a "client view", which is some space in the filesystem that maps a view of the repository at a given changenumber. client views can be synced to any changelist number with ease. A user can (and will) have many client views into different branches, or to the same branch, of the depot.
-when you want to edit a file, you do "p4 edit ". That makes the file writable and gives notification to anyone else that wants to edit the file. Note, you're not -locking- the file here -- other people can go right ahead and edit it (the version they have synced to the repository in their client view), they just get a nice little note saying "so-and-so is also editing this file."
-you go ahead and do your edits.
-you submit to the repository with "p4 submit".

If no one has submitted a new revision in the meantime, the submit just happens. If someone has submitted a change in the meantime, p4 says "hey, this file has changed," and you type "p4 resolve" to resolve the changes. resolving brings up a pretty nice tool that shows you exactly what lines changed in your submit and in the version in the depot, and asks you what to do.

If you haven't touched the same lines as the other submits, 98% of the time you can just say "p4 resolve -as" which is perforcese for "just go ahead and do the right thing." p4 merges the changes and then you can submit.

If the changes you made conflict with the changes someone else made, p4 asks you to resolve them by hand -- you can throw away your changes, throw away the other guy's changes, or (the usual choice), edit the file. p4 then gives you an annotated version of the file to play with. A (fake) example might look like this:

int some_function(int param)
{
int ix, jx;
>>>> ORIGINAL VERSION file.c#8
char str[10];
====
>>>> THEIR VERSION file.c#9
char name_string[10];
===
>>>> YOUR VERSION file.c
char *str;
str = malloc(MAX_STRING_LEN);
if (str == NULL) {
return -1;
}

/* more stupid code */
}

this is a lame made-up example, but you get the idea; you'd be able to see that the only change the other guy made that conflicts with yours was to change str to name_string, so you'd want to modify your changes to use that name as well (presuming that his change probably changed str to name_string elsewhere in the code -- you wouldn't have to touch those cases if there were no conflicts, because p4 would autoresolve them for you, if you ask.)

I don't know if that directly addresses your question -- if not, feel free to rephrase it and I'll try again.

Posted by peterb at February 18, 2004 11:58 PM

David:

I've put some notes on using Perforce on OS X here:

http://peterb.telerama.com/weblog/archives/000018.html

Posted by peterb at February 19, 2004 03:46 PM

Great article. Thanks.

We went from VSS to Perforce. It was a painful transition, but has been worth it. We currently have people from Japan, Ireland and both coasts of the US using our system. We have a huge base of cross-platform code that is distributed amongst several projects.

Perforce is un-simple. It is not wise to implement this system if you are unwilling to devote time and energy, as well as money to it. You definitely need a dedicated administrator, and this admin needs to do their homework. I administer our server. It makes me rather stressed when the developers are demanding answers right now. We use a Windows server. The server is a big commitment. Choose wisely. I am not a huge Windoze fan, but it has done well for us. There are many aspects to a server, such as which ones the IT Department will support.

If the lack of a one-click installer is enough to make you eschew this kind of product, then run away. Fast. Perforce has so many nooks and crannies that your blood will run cold.

Me, I'm a geek. I kinda like them all. I have developed some interesting automation that leverages them.

Hmm... From what I can see, CVS doesn't seem to be any simpler, but I don't use CVS. I use Perforce, so I won't pretend to compare. I only evaluated CVS briefly at the time we switched away from VSS, and chose Perforce.

One aspect of the free two-user demo is that it doesn't let you create multiple clients (what a rotten name -it should be "workspace view" or something.) Multiple clients are the life blood of our system.

Perforce has second-to-none product support, and that is very important when it comes to a product this complex. I have had to use it numerous times. The people who answer my emails (often frantic emails) are extremely knowlegable and polite. I've asked many stupid questions, and have yet to be slapped upside the head for it.

I think that they have given MacOS short shrift, They have a long list of OSes that they support, but the great majority are varieties of Eunuchs. We run Mac and Windows here. Windows is OK, but the Mac CW plugin is shaky. I find their HFS/POSIX confusion to be particularly yucchy, but they did address it with the AltRoots hack.

Perforce uses a server model. This makes it very, very fast, as the server and client exchange information about their contents, as opposed to VSS, which was file based, and absolutely agonizing over slow connections. The downside is that the server can get confused about what is on your client workspace. The server always knows best, even if it don't know diddly. Sometimes you need to do a force sync or a synchronize status just to convince the server that it suffers from a rectal/cranial inversion.

In any case, we're a pretty satisfied customer. One thing that I definitely don't miss with VSS was the inevitable database corruption just at critical release time. Perforce has been pretty solid, with the exception of some line-ending shenanigans that they still need to fix.

Posted by Chris Marshall at February 20, 2004 09:44 AM

I have used Perforce in the past and thought it was OK and was better than CVS in many ways. However there were annoying things that are still in the product most of them are related to the lack of local meta data.

With Perforce, I have wasted many a day trying to do something that should be somewhat simple. One big plus is they have phone support. However, beware of the frequent response, "We don't support that, because it does not follow the Perforce philosophy". My philosophy is I want to get my job done and not break the build, hopefully that follows the Perforce philosophy. Perforce's philosophy seems to mean no local metadata, which creates problems.

One huge problem with Perforce is its offline behavior. If you are ever on a laptop and need to make changes you are in for a world of hurt later when you go to sync up later. When you are on the road you have to write lock the file and when you can talk to the server create a changelist but the way they recommend ends up adding a bunch of files to the changelist you don't want to add. It seems kind of stupid because all you are doing is telling the server that you are about to work on a file. This does force you to get the latest version which is not a bad thing, but I would want to see better support for offline editing, especially in a commercial product. Part of the problem here is that there is no meta data in the local directory. So there is no easy way to indicate that a local revision has changed. The other reason why it is a problem is your workflow completely changes depending on whether you are connected on the network or not.

Subversion is better at working offline, you can edit files and because it keeps a cached version of the original in the meta data you can even revert offline. You can also do diffs offline. Because it has local metadata, it should also be faster because all it has to do is send the diffs when committing.

The lack of meta data also creates problems, if you accidentally change workspaces to the same set of files, perforce can get very confused and lose data. This does not happen in CVS or Subversion because the meta data is linked to the checkout.

Sometimes, Perforce can get out of sync, and you must "force sync", which causes perforce to check every single file in the subtree you select to see if it is newer than the local file, this is a slow operation. This does not usually happen very often but can create huge problems if even just one file is out of sync, you get people complaining, that the build is broken, but the real problem is the latest file did not get synced. Telling people when you use force sync is kind of hard to explain, especially less saavy users. Often times users always use force sync once they have been burned by a bad sync. This ties up the server and takes a lot of time, especially if you have a lot of files.

Another problem, touched on above, in is that Perforce write locks all your local files until you check it out from the server, many programs are unable to deal with locked files gracefully or are not integrated with Perforce. So you have to go to the command line or to the separate p4 gui to edit the files that do not have built in Perforce integration. This is not a problem with CVS or subversion.

Another major problem with Perforce is that it does not have any built in way to do a reverse sync. That is, I have added new files on my local directory and forget to add them to Perforce. Perforce will not tell you that you forgot to add the files so you break the build. I don't know how many times, I have checked in files, thinking that I would work on it when I got home, but forgot to check in one or two so the whole evening is shot. CVS and Subversion both tell you if you forgot to add files. You can set up, once again, in metadata, which files to exclude when reverse syncing.

Subversion fixes almost all of the problems with CVS. I would recommend it, I think it is the future of source code control. Its only problem is that is still kind of new so integration is not completely there yet. If you are the cautious type, wait for another year. Unless you are currently using CVS, then, I would consider switching now.

This is not say that Perforce could not fix the problems I mentioned but they seems somewhat unwilling to do so, we asked for the improvments metioned years ago, but have not seen any movement in that direction and got the philosophy response. The lack of local metadata, I think, actually hurts them.

Posted by possen at March 16, 2004 03:41 PM

I've dealt with a lot of SCM tools, though not Perforce so far. My personal favorite at the moment is subversion, for many of the reasons you state, but I've also used CVS many times because it was the best option at the time (this is less a statement about the goodness of CVS than a statement about the badness of the other available options... It's amazing just how much changeset-oriented tools with minimal CLI support, strict locking and per-file semantics can make actual development hellish).

However, I just have to say that your description of conflict resolution workflow looks pretty much exactly like my experience of CVS conflict resolution in many a real situation, so I'm sort of curious how it's supposedly better. *smile*

Branching and lack of changesets are definitely big weaknesses in CVS, though. (On the other hand, there've been times where the non-atomic nature of CVS commits has been really handy for one reason or another, so I'm sort of torn on that issue.)

Anyway, a fun read nonetheless.

Posted by Ysabel at May 9, 2005 01:17 PM

Stay away from rational clearcase and clearquest.

I find these tools difficult and extremely painful to use. Or maybe it's just the poor and incomplete integration of these tools at my current place of employment.

I often have to go back in the source tree created and locate the code that I wrote and recheck it into the current branch.

It's the developers fault. Yeah right.


Posted by Bryan Pauquette at February 7, 2006 01:37 PM

Please help support Tea Leaves by visiting our sponsors.
Archives

2006
November October September August July June May April March February January

2005
December November October September August July June May April March February January

2004
December November October September August July June May April March February January