This works great while the codebase is small, code flux is reasonable and tests are fast. As a codebase grows over time, the effectiveness of such a system decreases. As more code is added, each clean run takes much longer and more changes gets crammed into a single run. If something breaks, finding and backing out the bad change is a tedious and error prone task for development teams.
Software development at Google is big and fast. The code base receives 20+ code changes per minute and 50% of the files change every month! Each product is developed and released from ‘head’ relying on automated tests verifying the product behavior. Release frequency varies from multiple times per day to once every few weeks, depending on the product team.
With such a huge, fast-moving codebase, it is possible for teams to get stuck spending a lot of time just keeping their build ‘green’. A continuous integration system should help by providing the exact change at which a test started failing, instead of a range of suspect changes or doing a lengthy binary-search for the offending change. To find the exact change that broke a test, we could run every test at every change, but that would be very expensive.
To solve this problem, we built a continuous integration system that uses dependency analysis to determine all the tests a change transitively affects and then runs only those tests for every change. The system is built on top of Google’s cloud computing infrastructure enabling many builds to be executed concurrently, allowing the system to run affected tests as soon as a change is submitted.
Here is an example where our system can provide faster and more precise feedback than a traditional continuous build. In this scenario, there are two tests and three changes that affect these tests. The gmail_server_tests are broken by the second change, however a typical continuous integration system will only be able to tell that either change #2 or change #3 caused this test to fail. By using concurrent builds, we can launch tests without waiting for the current build/test cycle to finish. Dependency analysis limits the number of tests executed for each change, so that in this example, the total number of test executions is the same as before.
Let’s look deeper into how we perform the dependency analysis.
We maintain an in-memory graph of coarse-grained dependencies between various tests and build rules across the entire codebase. This graph, several GBs in-memory, is kept up-to-date with each change that gets checked in. This allows us to transitively determine all tests that depend on the code modified in a given change and hence need to be re-run to know the current state of the build. Let’s walk through an example.
Consider two sample projects, each containing a different set of tests:
where the build dependency graph looks like this:
Case1: Change in common library
As soon as this change is submitted, we start a breadth-first search to find all tests that depend on it.
The example above illustrates how we optimize the number of tests run per change without sacrificing the accuracy of end results for a project. A lesser number of tests run per change allows us to run all affected tests for every change that gets checked in, making it easier for a developer to detect and deal with an offending change.
Use of smart tools and cloud computing infrastructure in the continuous integration system makes it fast and reliable. While we are constantly working on making improvements to this system, thousands of Google projects are already using it to launch-and-iterate quickly and hence making faster user-visible progress.