Testing at the speed and scale of Google

Continuous integration systems play a crucial role in keeping software working while it is being developed. The basic steps most continuous integration systems follow are:

1. Get the latest copy of the code.
2. Run all tests.
3. Report results.
4. Repeat 1-3.

This works great while the codebase is small, code flux is reasonable and tests are fast. As a codebase grows over time, the effectiveness of such a system decreases. As more code is added, each clean run takes much longer and more changes gets crammed into a single run. If something breaks, finding and backing out the bad change is a tedious and error prone task for development teams.

Software development at Google is big and fast. The code base receives 20+ code changes per minute and 50% of the files change every month! Each product is developed and released from ‘head’ relying on automated tests verifying the product behavior. Release frequency varies from multiple times per day to once every few weeks, depending on the product team.

With such a huge, fast-moving codebase, it is possible for teams to get stuck spending a lot of time just keeping their build ‘green’. A continuous integration system should help by providing the exact change at which a test started failing, instead of a range of suspect changes or doing a lengthy binary-search for the offending change. To find the exact change that broke a test, we could run every test at every change, but that would be very expensive.

To solve this problem, we built a continuous integration system that uses dependency analysis to determine all the tests a change transitively affects and then runs only those tests for every change. The system is built on top of Google’s cloud computing infrastructure enabling many builds to be executed concurrently, allowing the system to run affected tests as soon as a change is submitted.

Here is an example where our system can provide faster and more precise feedback than a traditional continuous build. In this scenario, there are two tests and three changes that affect these tests. The gmail_server_tests are broken by the second change, however a typical continuous integration system will only be able to tell that either change #2 or change #3 caused this test to fail. By using concurrent builds, we can launch tests without waiting for the current build/test cycle to finish. Dependency analysis limits the number of tests executed for each change, so that in this example, the total number of test executions is the same as before.

Let’s look deeper into how we perform the dependency analysis.

We maintain an in-memory graph of coarse-grained dependencies between various tests and build rules across the entire codebase. This graph, several GBs in-memory, is kept up-to-date with each change that gets checked in. This allows us to transitively determine all tests that depend on the code modified in a given change and hence need to be re-run to know the current state of the build. Let’s walk through an example.

Consider two sample projects, each containing a different set of tests: 

where the build dependency graph looks like this:
We will see how two isolated code changes, at different depths of the dependency tree, are analyzed to determine affected tests, that is the minimal set of tests that needs to be run to ensure that both Gmail and Buzz projects are “green”.

Case1: Change in common library

For first scenario, consider a change that modifies files in common_collections_util.

As soon as this change is submitted, we start a breadth-first search to find all tests that depend on it.

Once all the direct dependencies are found, continue BFS to collect all transitive dependencies till we reach all the leaf nodes.

When done, we have all the tests that need to be run, and can calculate the projects that will need to update their overall status based on results from these tests.

Case2: Change in a dependent project:

When a change modifying files in youtube_client is submitted.

We perform the same analysis to conclude that only buzz_client_tests is affected and status of Buzz project needs to be updated:

The example above illustrates how we optimize the number of tests run per change without sacrificing the accuracy of end results for a project. A lesser number of tests run per change allows us to run all affected tests for every change that gets checked in, making it easier for a developer to detect and deal with an offending change.

Use of smart tools and cloud computing infrastructure in the continuous integration system makes it fast and reliable. While we are constantly working on making improvements to this system, thousands of Google projects are already using it to launch-and-iterate quickly and hence making faster user-visible progress.

– Pooja Gupta, Mark Ivey and John Penix

Source link