Wednesday, September 5, 2007

Dealing with Maven bloat and complexity

I recently came across an article questioning whether Maven is too complex and bloated. The short answer is it can be. If you aren't taking some steps up front to deal with some of Maven's quirks, the benefit it provides can quickly be outweighed by the administrative overhead and performance hit it's project management scheme imposes.

At my job we have a fairly sophisticated Maven project. It involves custom code generation plugins, cross-platform C++ compiles and deployments, assemblies, you name it. The number of engineers working on this project is probably around 80, and the number of Maven projects for this system is probably around 50.

It's not the most beautiful system in the world, but at the end of the day a lot of Maven's benefits are realized. Your average developer can check out an individual component from src, run "mvn test", and be reasonably assured
that things will compile, run, and execute unit tests, with minimal up front configuration of the build environment.

This is no small feat. It didn't start off easy. Maven is complicated. It's large. Documentation could no doubt be better. But at the end of the day, we're in a better place than we would have been with ant or make. Some of the steps we've taken to make things easier follow.

  1. Create your own repository. If you want to be able to do repeatable builds, don't let your projects access any repositories outside your control. Poms get broken, sites go offline... this causes all sorts of chaos. We ended up with a single repository server, but multiple repositories. The salient ones are maven-releases, maven-snapshots, our-releases, and our-snapshots. We put the artifacts we needed into maven-releases and maven-snapshots manually. We added profiles that can be enabled to get to the public repositories, but these are never enabled on our automated build machines.
  2. Use inheritance correctly. It's really tempting to use pom inheritance to capture project structure. This is what module tags are for. Pom inheritance is to allow you to apply similar configurations easily (like Java inheritance). It took us a long time to unwind this mess. We have a base-java pom that sets up all the reports we want to run for java, a base-model one to handle domain models, etc.
  3. Fight the urge to tightly couple your large project. Maven leads you to fine grained componentization, which leads to a looser coupling of components in the build/release sense. There is a natural tendency to be uncomfortable with this (what's really going into my final build), but fight the urge to make it one giant system that gets built from the ground up. Executing releases is a nightmare with one giant system.
  4. Use version ranges where possible. This makes dealing with #3 easier. The odds that a component needs that specific release of a component (especially if you're doing agile and releasing every 30 days) is pretty slim. Most just need the latest. Also get familiar with the dependency convergence report.
  5. Make sure people understand what Maven goals are necessary for doing work. If lots of people are complaining about site generation taking too long, you have a clue that people don't get this. Your average developer should be running test and install, very rarely site.
  6. Decide on a versioning scheme up front, and make sure you can execute on it. Nothing is more frustrating than realizing it takes two weeks to get all your poms revved to the next revision. The maven release plugin has been fairly unreliable, but it leads you into a set of best practices that work even if you're taking the steps manually.
After taking these steps, the nearly universal sentiment is that while Maven is indeed complicated, it's a step forward when compared to ant/make. At the end of the day, designing a build process for large systems is difficult and rarely gets the attention from the development org it deserves. Maven doesn't make it dirt simple, but it makes the overall management of the build system simpler.

5 comments:

Per Olesen said...

Hi Mc,

You have some interesting points, I will give you.

You write "... tempting to use pom inheritance to capture project structure..." and talk about some way you have figured out to use pom inheritance.

Sounds interesting. Can you elaborate more on that or point me in the direction of some official maven documentation that mentions the right/best way to use pom inheritance?

mccv said...

When we first started, we tried to use POM inheritance in a way that matched our project structure. Say I have a webapp that contains three components. My directory structure looks like

webapp
webapp-model
webapp-view
webapp-common

We first tried making webapp-model inherit from webapp. This is attractive in some ways, but now let's assume that I have ten different webapp projects with a similar structure (which is what we have right now). In that case all *-model projects will have similar aspects, whether it's the reports you want to run, various dependencies, etc.

Once you hit this, you want to be able to quickly apply changes to all the *-model projects. Creating a base pom that contains these sorts of definitions and inheriting from that has made things much easier. It means you have to start using module tags in the higher level webapp pom, but that's a small price to pay.

Does that answer the question? Feel free to ping me via email if you want more specifics.

Anonymous said...

Okay, thank you for taking the time to clarify. I think I understand you.

If I have a directory structure like this:

./pom.xml
  ./common/pom.xml
  ./service/pom.xml
  ./webapp/pom.xml

where common, service and webapp are modules and all three inherit ./pom.xml, then we are using pom inheritance like you like it!?

mccv said...

That's actually the opposite of what I'd recommend on a large project... Eventually you'll probably find that the various "common" projects have more in common with each other than they do with the base pom.xml in your directory hierarchy.

As an example, in our case each server component runs test cases differently than client components do. They need to get deployed to a server and tested there, so we create a base server-pom.xml, and have all the server projects inherit from that. If your test strategy changes, you can apply that to all your server components, rather than having to hunt through each one and make the manual change.

On the other hand, the way you have it described is a convenient way to share versioning/deployment info. At some point there is probably a tipping point that nudges you from one way to another... for us it was when we had 5-10 projects that had common subcomponents. Your mileage may vary...

Anonymous said...

Aah, okay. Now I see. Interesting. This seems to have its full potential unleashed on very large projects only. Which is also what you are describing.

Currently, I am more biased to trying to keep my builds small, so I won't get into this problem at all (I hope).

My hope is, that we are able to split up very large systems into smaller, manegeable, self-contained services. The overall system will then only exist as a consequence of the smaller service systems operating together (dare I say SOA).

We then have other complexity. As in ensuring that services are compatible with eachother, when they are build, versioned and deployed independently. Do you have any experience in coping with large system complexity in that way?