The lay of the land at milestone05

Posted by rick Tue, 29 Nov 2005 22:42:00 GMT

We’re under steam with development on the rewrite project, and are nearing our next milestone, “milestone05”, which is due on Thursday. (Now, you might think that this would be the 5th milestone, but in actuality it’s the 6th, since we had a “milestone02a” wedged unceremoniously between “milestone02” and “milestone03”. New team, new tools, new processes—sometimes you just have to adapt.)

Regardless, we’re on a 2 week (really twice a month) milestone pace, meaning that there’s a fairly tight cycle between “stable” releases. We schedule a batch of features for a milestone, divvy the tasks out, and then work through the features for the milestone. As with any development project, reality will step in. From time to time we split a feature into a piece we can manage in a milestone, and a problem that may be relegated to the next milestone (or, if we’re lucky, resolved in this milestone). By the target date we hit the milestone, launch a stable demo site, and then move on to the next milestone.

We’re running a continuous integration server for the project, with a mandate that noone should “break the build”. If you “break the build” you work hard to get it fixed ASAP. Build breakage is becoming less and less frequent—partly because developers are getting in the habit of doing small check-ins and running our test suite before committing. Also, our test suite is growing quickly more and more comprehensive, and our environments are stabilizing—what happens on the build server is likely to be what happens on a development system. If it breaks on the build server it probably would’ve broken on your machine had you run the tests.

Now, from time to time we’ll hit a situation where the build breaks but the tests run on a developer’s system—or vice versa: I run the tests on my machine and they work, I push a commit, the build server is happy, another developer gets the pull and is hosed. :-/

The oddball cases have almost always been due to some local changes not yet committed on a developer’s system, or some weirdness with moving around things in a local (subversion) checkout in manners not advised. Confusion about branches (which we use rarely anyway) has accounted for some headaches. These thing are more and more rare though as we all settle into a normal work pattern that uses our tool chain in efficient and normal ways.

We had one case where there was an actual problem in the framework which caused tests to fail on Windows systems against one database flavor (Oracle), but nowhere else. This happens to be (as we understand it) related to Windows botched libc time implementation—basically, if the folklore is right, Microsoft took an early BSD libc implementation, then neglected to patch it when the BSD folks did. C’est la vie. You can read more gory details if you’d like (fwiw, I think this commit actually will fix the problem once we get around to testing it again, even though the changelog doesn’t report the fix).

With all this automated testing it sort of seems unnecessary to break things into bi-monthly milestones—I mean, after all, the application works now, and it will work (with a few minor incidents that we try to minimize) after every commit. Why not just have a live system that users can look at, poke at, give feedback on, etc.? As Dave Astels noted in an email recently, Rails really gives the sort of tight zero-turnaround feedback loop that we could conceivably have daily milestones.

Well, in a sense we almost have that—we have a nightly build that gets refreshed around midnight every night. This includes a fresh pull of our converted data (or really whatever the high water mark is on our conversion project that night), a fresh svn update, a dump of all old session data, and a restart of our lighttpd server. On any given day you can look at the nightly application and know that you’re no more than 24 hours off the current build (actually it’s closer to “no more than 8 hours” since nobody is really working on the system outside normal business hours). Users can hit the nightly site and see what the (working) application looks like today. —And, if you’re a developer, you can pull the tree any time, run “script/server” and use the application right this second.

However, we still stick to a milestone plan for a few reasons (at least that I can think of off the top of my head):

  1. It makes it easier to plan, estimate, and negotiate with users on when certain features can or might land in the system.
  2. A milestone gives us a stable point to look at, even if nightlies get a bit flaky or unstable.
  3. A milestone target lets us group related features together for deployment as a group.
  4. A somewhat stable target lets users identify bugs, or comment about feature implementations (etc.) without the target moving from under them. It’s frustrating to spend time thinking about one’s opinion of an application only for it to change by the time one gets a chance to tell someone. The nightlies will show progress, but the milestones give us something to talk about.

There are others—dinner forces them out of my head at the moment.

That said, let me close with a current “rake stats” snapshot of where we’re at. I’ll try to post these from time to time so that readers can get a low-granularity sense of what’s going on behind the scenes.

+----------------------+-------+-------+---------+---------+-----+-------+
| Name                 | Lines |   LOC | Classes | Methods | M/C | LOC/M |
+----------------------+-------+-------+---------+---------+-----+-------+
| Helpers              |   317 |   157 |       0 |      14 |   0 |     9 |
| Controllers          |   884 |   632 |       8 |      60 |   7 |     8 |
| APIs                 |     0 |     0 |       0 |       0 |   0 |     0 |
| Components           |     0 |     0 |       0 |       0 |   0 |     0 |
|   Functional tests   |  1470 |  1057 |      24 |     102 |   4 |     8 |
| Models               |  1261 |   668 |      25 |      54 |   2 |    10 |
|   Unit tests         |  2413 |  1823 |      17 |     209 |  12 |     6 |
| Libraries            |   545 |   255 |       4 |      25 |   6 |     8 |
+----------------------+-------+-------+---------+---------+-----+-------+
| Total                |  6890 |  4592 |      78 |     464 |   5 |     7 |
+----------------------+-------+-------+---------+---------+-----+-------+
  Code LOC: 1712     Test LOC: 2880     Code to Test Ratio: 1:1.7

update:

I forgot to include a screenshot from our current nightly build! Here’s a shot of the (AJAXified) in-place address editor for a staff member (me). Our model for people and organizations closely mirrors Martin Fowler’s Party and Accountability patterns from his Analysis Patterns text (worth the money at just about any price).

Here’s a shot (hopefully, we’ll all laugh at this one day). Yes, I know there’s not much there to see, but there’s a lot under the hood, and still not a lot of code. We’re happy with where we are so far. Enjoy.

Tags , ,

Site upgrade

Posted by rick Sun, 20 Nov 2005 17:10:00 GMT

This site is now running on lighttpd & fcgi, fronted by Apache running mod_proxy. I had been running a number of sites on this server using simple Apache/FCGI, but I’ve found stability with that combination to be far from ideal. The Apache FCGI linkage seems fairly fragile. Sometimes the Rails controller doesn’t get a chance to fire properly, resulting in HTTP 500 errors; sometimes an FCGI process gets seemingly disconnected and goes into a tailspin, sucking CPU like crazy; and overall Apache/FastCGI really isn’t (“fast”, that is).

Since we’re using Apache 1.3.33 on this FreeBSD server, however, I can’t (without digging around for a patch I’ve yet to actually find, and then risking the rebuild of Apache) have Apache pass the hostname properly back to lighttpd (this is an Apache2 feature). So I’m running a separate lighttpd intance for each Rails site (there are 5 on this server), each on a different port. Less than ideal, but it seems to be working, and fast.

Our CenterNet milestone sites and nightly build sites are using a similar configuration—a subdomain per site. But there we’re fronting with Apache2, and hence only need 1 lighttpd process for the whole enchilada.

If anyone knows how to easily work around this Apache1 + multiple lighttpd on different ports setup issue I’d be pleased to hear it for my personal sites.

Posted in  | Tags , , ,

Evaluation: moving from Java to Ruby on Rails

Posted by rick Wed, 16 Nov 2005 16:32:00 GMT

In September (or so) we went through the process of trying to decide whether we wanted to keep plowing through the JBoss Java stack we were building with or to pursue an alternate technology. We did some test prototyping of part of our first component (of 6) in Ruby on Rails and then a test re-implementation of the full component in Ruby on Rails.

The productivity increase (and code footprint decrease) was basically staggering. We undertook a full analysis of the consequences of shifting our development from our Java stack to a Ruby on Rails platform. Ultimately we decided to shift from Java to Ruby on Rails.

The summary document of the issues (edited to protect the guilty :-) can be found on this site at Evaluation: moving from Java to Ruby on Rails for the CenterNet rewrite.

Posted in ,  | Tags , , , ,