Remote Build Execution & Testcontainers at Paxos

John Amicangelo & Cameron Fleet
May 1, 2024

Paxos underwent significant growth in 2023, increasing from 353,286 to 532,806 lines of Golang test code, growing its engineering headcount and launching industry-defining products. This growth highlighted areas of strain across the company. In engineering, it was clear an emerging area of concern was the monorepo’s continuous integration (CI) pipeline P90/P95/P99 runtimes and flake rates.

Monorepo woes

Developer productivity is a top priority at Paxos, and the performance and flakiness of the CI pipeline was clearly going to become a driver of downward velocity. With a P99 uncached master pipeline spiking at ~40 mins, P90 at ~30 mins and an upwards trend line, we set to solving it.

Uncached build runtime percentiles prior to RBE & Testcontainers

Digging deeper into the P90/P95/P99 builds showed clear signs of resource contention due to the entire pipeline being executed on a single EC2 hosted Jenkins instance. The main group of tests causing the resource contention was our suite of medium sized tests, sized medium due to their usage of various Docker containers which were spun up in a single docker-compose in the build pipeline. We aimed to reduce contention by parallelizing work across more machines.

Bazel and BuildBuddy

We wanted something that could orchestrate parallel test execution across multiple, isolated environments. There are several tools and architectures that claim to solve this problem, but we settled on Bazel, the open source version of Google’s internal build tool, and BuildBuddy as an enterprise feature provider. Compared to other solutions, this combination has several advantages:

Bazel handles dependency, build and other caching across machines. More importantly, the correctness of this caching is fundamental to Bazel’s design, unlike other tools, such as Make or Gradle, where running clean or rm commands periodically is often accepted.
Bazel and BuildBuddy manage the allocation of resources and distribution of work (e.g., test executions) across remote machines. Compared to other CI systems that lock you into a custom API or topology to work around in order to maximize parallelism, devs can stay blissfully ignorant of where their tests actually run.
Bazel and BuildBuddy scale horizontally. Our CI runs often use upwards of 1,000 executors in parallel. Other CI providers have seemingly arbitrary limits on certain resource requests depending on the vendor’s particular architecture.

There are several alternatives to Bazel that offer most if not all of the same capabilities. We chose Bazel over all of them because it’s the most mature, and we have several former Google employees at Paxos that have experience with it.

Improving test portability with Testcontainers

Sounds good, right? Unfortunately, there was a catch — our medium tests discussed earlier all depended on some of the containers being spun up in a single docker-compose. Spinning up all containers in the Docker compose for every executor is wasteful. Configuring BuildBuddy to run docker-compose for each executor is non-trivial, and most importantly, the docker-compose was growing with every new container or version of the same container a developer wanted to depend on. This risked taking us back to the same world of resource contention we aimed to move away from!

We turned to Testcontainers, which solved the core problem of allowing each test to specify exactly what containers they depend on. Furthermore, we no longer had to maintain an ever-growing docker-compose.

We started by ensuring all containers used in medium tests were behind a facade, writing both a testcontainer and docker-compose based implementation. By far, our most commonly used container was postgres — so we started here and gradually migrated all other implementations to Testcontainers.

First, we ensured a Docker daemon was running on all remote executors that required it. Thankfully, BuildBuddy makes this easy with a few additional lines in exec_properties declared in the BUILD.bazel, which we applied using a custom Bazel macro whenever the test size was set to medium.

				
					    "test.workload-isolation-type": "firecracker",
    "test.init-dockerd": "true",

Next, we needed to overcome Docker Hub rate limits. We quickly discovered we were exceeding the rate of ‘100 pulls per 6 hours per IP address’ for anonymous users. We considered raising the number of pulls we could make from Docker Hub by purchasing a Docker Core subscription, but there was a better option: create a pull-through cache using BuildBuddy’s remote cache. For every image pulled in code by Testcontainers, there’s a corresponding oci_pull rule provided by rules_oci. If a test needs an image, it declares a data dependency on a tarball of it. At runtime, the location of the tarball is built and passed to the test to be loaded by the Docker daemon.

With those two hurdles overcome, we migrated our whole test suite to directly use Testcontainers, achieving the primary goal of allowing our tests to execute on remote executors. Testcontainers alone gave immediate benefits to developers: the ability to easily add new images, far less costly specification of image versions to match production and running medium tests locally no longer required a manual docker-compose up.

Monorepo ROCKS!

So, was it worth it? Absolutely. We saw an immediate drop in our uncached build P99s runtimes from 30-50 minutes to a consistent 10-15 minutes.

Uncached build runtime percentiles, showing pre and post RBE & Testcontainers

We also saw drastically fewer flake occurrences on our CI builds. This, alongside the runtime improvements and the developer benefits mentioned above, was a boon for developer productivity and satisfaction in CI tooling at Paxos.

Looking ahead

Our previous monolithic CI system buckled under the pressure of our rapidly growing engineering team. Bazel and BuildBuddy made it possible to scale with that growth, not only through parallel execution but also ephemeral, isolated test environments with a common API. Bringing Bazel into our monorepo was a year-long initiative, but it was worth it, according to the most recent and exceedingly positive feedback from our internal Developer Experience Survey, validated by raw data and saved time.