Friday, January 23, 2015

Artefact Repositories

Working from the same set of components helps to catch problems early

In my last posts I provided an overview of version control and the use of a continuous integration server. This post adds an artefact repository to the development infrastructure. The artefact repository is a versioning store for compiled code components (such as jar files) and helps teams using and managing both 3rd-party and in-house developed components.

Most of my recent experience has been in the Java world and in using Apache Maven to manage builds, dependencies and various other tasks. I came across Maven several years ago and its dependency management feature quickly sold me. At the time I’d joined a team who had development documentation that required I locate a range of library and binary files scattered across the web. As I started to establish my development environment it became apparent that a number of the required components were no longer hosted at their original location and I had to ask team members to send me their copies. This is maddening enough when the team is in the same location but it completely collapses if you are remote, especially if you’re working in an open source community.

At first I wrestled with the idea of storing the files in our Subversion repository but it occurred to me that we’d still need some sort of system/script for bringing these files together and making sure the correct versions were being used. Likewise for setting up an FTP/HTTP accessible file store. After a fair bit of research I decided that Maven really gave me what I was after: the ability to manage in-house and 3rd-party components used by the build.

I’ve enjoyed using Maven but the XML configuration can get tedious and lately I’ve enjoyed working with Gradle. Dependency management in Gradle includes both Maven and Ivy repositories and Gradle has (improving) support for publishing to these repositories.

Once you discover the benefits of an artefact repository in its “pull” model (where you’re primarily grabbing components out of it) you can then really extend the advantages by setting up a local repository for managing your own components. Beyond that you can start distributing your components to central artefact repositories so that they can be easily utilised by other developers.

Apache Maven is a build configuration tool and not an artefact repository. By default, Maven will use the central repository to source dependencies and will cache them in your home directory (~/.m2/repository/). This article really encourages you to look at the benefits of running your own artefact repository.

What are the benefits?

Here are some of the key reasons to use an artefact repository:

They stop you needing to hunt out library files from the web, network storage or your colleague’s USB drive.
Setup documentation that includes “get a copy of CrazyLib.jar from” is just asking for trouble. Instead of this you define a dependency in your build configuration and the build tool downloads a copy from the artefact repository.
Harmonise the libraries and their versions across the team
Reduce the “it works on my laptop” and “d’oh we’ve been working on different versions for the past 3 months” by defining the build configuration used across the team.
Real code re-use
To my mind the model of an artefact repository makes code reuse much more achievable than other approaches I’ve encountered. Structuring functionality into components helps improve system architecture and, when done right, means that a project can publish re-usable components based on specific functionality.

The role of the artefact repository

Development infrastructure example
Proxy for other repositories
The Central Repository provides a huge array of components to the Maven community. By setting up a local proxy for this repository you can cut down on network traffic and help speed up builds.
Local store of 3rd-party modules
When you start using a 3rd party module (perhaps a jar or war) that isn’t available in another repository you can upload it to your local repo and make it available to the whole team.
Working with your continuous integration (CI) environment
The CI environment should be charged with publishing SNAPSHOTs to your repository as a post-build step. This means that other (sub-)teams will pull down the newest version of the component if they happen to be using it.
I’ve also found it extremely useful to configure a CI build that performs a clean-room build on a daily basis. This ensures that the CI system doesn’t use its cache of components and downloads everything configured as a dependency. This can be handy in discovering if “lingering” components are generating false positives in the build.
Release hosting
When ready to release you switch from releasing SNAPSHOT (development) components to the production (release) version. Artefact repositories can (and usually do) host both and this demarcates stable and non-stable components.
Repositories such as Sonatype Nexus can also provide interfaces used by package managers such as yum. This allows you to distribute your packages into the same repository as your code components.

Backing up your repository is really important as it will (over time) start to contain components that you may not be able to recover/rebuild in the case of a system failure.


My primary experience has been with Sonatype’s Nexus product - the OSS version specifically. There are a few other repositories worth investigating:

A handy comparison is also available

More than just code modules

At the core of a repository such as Nexus is a filesystem that tracks versions. This gives you a piece of infrastructure that can handle the various components and packages that aren’t well-suited to version control systems such as Git. Here are a few ways in which I’ve utilised an artefact repository:

Host virtual machine images
For this work I created a build environment in Rake that used Packer to generate virtual machine images and store them in Nexus. Team members and the build environment could then use a Vagrant file to grab a copy of the image - ensuring that we had a stable platform for testing and creating demo systems. It gives you “platform defined as code”.
Deployment file store
Using tools such as Puppet has become increasingly prevalent but I notice many people resort to using version control for storing large files that are needed in deployment. Ideally you might look at setting up a local package repository but converting an existing application distribution to RPM can be time-consuming. Artefact repositories provide a system for hosting versioned files and an HTTP-based access point.
Production build proxy
I was recently involved in a project that utilised Python and I noticed that the “release” onto the production server involved the deployment tool dragging a heap of libraries from the Python Package Index (PyPi). I may be a bit old-school but the idea of a production system grabbing libraries from the public Internet makes me rather queasy. My suggestion was (at the very least) to set up a local proxy (such as devpi) to at least divert the egress into something a little more “controllable”. A number of artefact repositories provide services for a variety of package managers - providing a central repo for many uses.


Over these past 3 posts I’ve covered version control, continuous integration and now artefact repositories. I hope these introductions describe the utility of each piece of development infrastructure.

Thursday, January 15, 2015

Continuous Integration

“It works on my machine” really translates to “I don’t know why it works - I just clicked buttons” and the offender forced to buy lunch for the team
Continuous Integration (CI) focusses on ensuring that a project’s code successfully builds whenever the code base (usually stored in a version control system) changes. The “continuous” aspect relates to the fact that the build is run every time code is checked in (committed). Given that some approaches see each team member commit code several times a day, the CI system may be quite busy. Having worked in a team where one member kept on committing code that broke the build, using a CI approach helped us determine where problems were coming from and saved time misspent thinking that your own code is wrong (svn blame[1]).
In fact you could have a CI process that rolls back a commit that breaks the build.
The CI approach differs somewhat from approaches such as “nightly builds” as it really is continuous and can really help make sure people are only committing code that doesn’t break the build[2]. This constant feedback loop should trigger a fix immediately whilst the code is “top of mind”. An associated work practice is “pull often, commit small, commit often” so that the team is working in-step and the CI process helps capture issues before they get too big[3].
This may sound a little complex but a CI process really has only 3 responsibilities:
  1. Trigger a build when new code is checked into version control[4]
  2. Run the build and its associated unit tests
  3. Report on any failures
Your CI process doesn’t have to be a huge bells and whistles affair. In the most basic case it may be a desktop PC that developers walk over to and manually start a build. The use of a separate system for CI helps reduce (but not eliminate) the false positives that occur when a build succeeds on a developer’s machine because of the miscellaneous debris crawling around developer laptops (old versions, libraries on classpaths that aren’t included in the build config etc).
There are many CI systems around that help you get going with a more automated CI environment:
If you’re after an online service, take a look at Travis - I’ve not used it but it gets good press.

Automating your build

Automating your build is extremely useful in terms of successfully establishing your CI environment but, beyond this, it’s a good candidate for the “best practice” list. In a CI environment an (efficient) automated build is extremely important as the build should be possible without manual intervention. Long build times may indicate that the build needs to be broken up into smaller components or that your tests are verging away from unit tests towards integration testing.


In a coding approach where you use branching workflows you might have the CI system “watch” only certain branches. Furthermore, it could be useful to consider a Read Only Master Branch in which individuals/features/ideas/etc have their own branch but a merge to the master branch is tested before being accepted.

Component-based development

A CI server can be very useful in projects that have teams working on separate components. For example, Apache Maven-based projects can have their CI server deploy SNAPSHOT artefacts to an artefact server (such as Sonatype Nexus). This means that other developers with that dependency will have the new component version downloaded the next time they run a build.

Steak knives

As you develop your CI infrastructure you can start exploring a number of toolsets that can further aid the development effort:
  • Software metric tools such as SonarQube can help you hone in on areas of weakness in your code quality (e.g. duplication, dodgy coding practice or a lack of documentation)
    • These are best run less frequently (nightly) as they can be time consuming
  • Generate your documentation such as your javadoc or Maven site on a nightly basis
  • Create a “clean room” build that freshly downloads all dependencies before building - this really helps catch issues such as 3rd-party libraries that just disappear.
  • Deploy an instance of the built service into a virtual machine for user testing, interface testing (e.g. with Selenium or integration tests[5]
    • Also best run less frequently - especially if you’re going to be soaking up a fair bit of system resources.
  • Get Chuck Norris in to make sure people know you’re serious!

Test drive

Most CI systems are quite easy to install and get running - even just on your laptop. I’d suggest that the best first-step is to allocate 3–4 hours to install a CI system (try Jenkins), configure a job for your main codebase, run it and see how it goes. Then, add in Chuck Norris.

  1. This was used as an in-joke - see the SVN Book - but, seriously, the CI server is not a torture device that lets everyone insult a team member.  ↩
  2. In a regular environment I consider the build to be broken when it won’t compile or a unit test fails. Usually a less-frequent build would run other tests (integration, UI etc) but I usually label failures differently (e.g. “broke the deployment”)  ↩
  3. It can be useful to manually trigger the more comprehensive build and test suite once a feature is complete - why wait until tomorrow to see if it’s broken?  ↩
  4. Look at GitHub and BitBucket (Web)hooks for the push-model approach.  ↩
  5. Most of my integration testing experience has been manual or script-based. However, projects such as Citrus Framework look to provide a good basis for easily established integration tests.  ↩