Supercharge Traditional CI/CD Pipelines and Cut Costs With EngFlow¶

At EngFlow, we've been dedicated to improving the speed, efficiency, and productivity of development processes for almost half a decade. As the market has evolved and FinOps gains more traction, we've expanded our focus beyond accelerating development to include optimizing costs and fostering sustainability in software engineering.

Let's discover how EngFlow can effectively help you achieve your goals by reducing infrastructure expenses and maximizing resource utilization.

Add Costs of Traditional CI Usage¶

In a traditional continuous integration (CI) setup, various jobs are executed based on different triggers, such as:

Every commit of a pull request
Commits to the main branch
Each release made
Scheduled period runs

This process involves pulling code onto an instance, executing a build process, and uploading test results or artifacts to a server. And the overall cost of maintaining these setups? Well, that depends on job duration, instance types, network traffic, storage, retention, and other factors. To estimate the cost of a CI installation, you can use the following formula:

Infrastructure costs from CI usage
Compute Costs = Cost for Instance Type * Average(Job Duration)
Output Costs = Average(Size of Outputs) * (Network Cost + (Storage Costs * Retention Costs))
Input Costs = Average(Size of Inputs) * Network Cost
Job Costs = Compute Costs + Input Costs + Output Costs

Note: Because many things can affect the impact of CI on your infrastructure costs, all results from this calculation are estimates.

Traditional CI setups often struggle with auto-scaling, which can add to cloud expenses. While good auto-scaling isn’t readily available, most companies address it when configuring a persistent setup. On the other hand, fully ephemeral setups eliminate the need for auto-scaling.

As a developer, you have control over factors like input and output size, build processes, complexity, and the build process duration. However, optimizing these factors can be both time-consuming and pricey.

Common Optimization Techniques¶

Decreasing job duration can help effectively reduce infrastructure costs, providing a tangible return on investment and improving developer productivity. There are several effective techniques for speeding up your pipeline, such as incremental builds, local caches, and skipping initial provisioning. Using persistent instances that remain active for longer than a single job is another common strategy. But, these may increase costs when inactive and raise security concerns.

Supported by most build systems, Caching is another widely-used technique for speed improvement. Bazel, for instance, offers multiple layers of caches, some of which benefit from a persistent setup. In a well-designed CI infrastructure, it's ideal to leverage all available caching.

Reducing input/output (I/O) operations also helps minimize idle time, expediting build times and preventing additional costs.

Comparing Remote Execution to Traditional CI¶

Remote Execution gained prominence as a solution for streamlining automated software at scale. It utilizes shared devices to provide greater parallelism, caching, and resource efficiency than a single large instance. Remote execution existed before, with similar solutions existing as early as 2002. Google was among the early adopters and helped standardize it by creating the Remote Execution API.

Traditional CI¶

As shown below, traditional CI builds often underutilize resources:

Every build is different and depends on where the changes are made. When a build system can effectively leverage caches, CI builds can vary from a low number of execution tasks to a high demand for CPU time. For instance, if changes are made very low in the stack. By choosing an appropriately powerful machine for CI, you face the risk of wasting resources for most builds while only running quickly for a few worst-case scenarios. When it comes to CI infrastructure, a common trade-off considered is between performance and costs.

One common example of a worst-case scenario: If a build only modifies a README file in the repository, the machine should have a 100% cache hit rate. It should be idle for most of the time while the job is running. However, when updating something low in the stack, like a toolchain (Java JDK, C++ Compiler, etc.), everything must be built from scratch and requires a powerful machine.

EngFlow Remote Execution¶

EngFlow overcomes resource underutilization by leveraging shared resources, resulting in improved efficiency and reduced costs:

EngFlow allows you to reduce infrastructure costs while maintaining scalability and performance. CI setups normally use larger instances, involve more complex builds, and a much higher amount of unused resources. While you won’t be able to get rid of your CI nodes entirely, you can downsize them to the bare minimum. This may leave resources unused, but in recent years, Bazel has introduced dynamic execution to use available local resources while concurrently using Remote Execution. And while there is a minimal increase to I/O costs, the benefits far outweigh that or any potential concerns.

Remote Execution vs. Caching¶

EngFlow offers both Remote Caching and Remote Execution. The choice between the two depends on factors such as the scale and complexity of actions. For larger and more complex builds, Remote Execution is the preferred option as it provides the ability to run individual actions in a distributed, remote cluster parallel to caching outputs. However, for smaller and simpler builds, Remote Caching may be more suitable.

Key to Optimized Remote Execution¶

We briefly touched on how Remote Execution can increase I/O costs – while it’s minor compared to compute costs, it’s not inconsequential. That’s why EngFlow chooses to run our CI on a Remote Execution cluster. When a job starts, it doesn’t need as many inputs and it doesn’t need to download as many outputs over the wire. Instead all the data will live within the same cluster.

That’s not the only benefit though, we’ve been investing heavily in optimizing autoscaling and minimizing compute cloud costs. This allows us to set up highly customizable and scalable self-hosted workers for a CI that’s semi-ephemeral. Meaning that when instances live longer than a single job, they’re removed if the CI demands less computing than what’s provisioned. Builds are faster, and we reduce costs by destroying unneeded instances. Approaching our CI setup this way reduced our costs by over 50%.

Our success with this approach led to the creation of CI Runners. To meet customers where they are, CI Runners currently supports GitHub Actions and BuildKite. There are plans to support a wider range of systems in the future. Integrating seamlessly with multiple CIs is a hidden strength of CI Runners as they dramatically reduce the efforts and costs of future CI migrations.

Get Started with EngFlow¶

Your key to reducing infrastructure costs, optimizing your CI, and boosting productivity is here. Battle-tested and used by a number of industry giants, EngFlow is ready to transform your development workflow!

Why stand in the way of developer happiness? We’d love to learn about your build and CI setup, let’s talk about how much we can save you or see how you can get started with a free trial on the website.

You can also explore the invocation analyzer to audit your build times and see if they’re ready for Remote Execution. Let’s break records together and see what EngFlow can do for you!