This post is about the GoCD team’s journey in building elastic agents, our experience while using it in anger and the benefits we saw.
Why elastic agents
GoCD operates on a server-agent model where a server communicates with many agents. The server is responsible for orchestrating various stages, jobs and tasks on the agents. The agents are responsible for their execution in the deployment pipeline.
Oftentimes it makes sense to parallelize, to speed up build and test execution, get feedback quickly and deploy rapidly.
But for some teams, parallelization requires an intense use of hardware resources for brief periods of time. The situation becomes particularly frustrating when different branches of code or environments require different kinds of hardware and software to run on.
In many cases, this leads to:
- excessive provisioning of build machines, leaving a lot of hardware under-utilized when builds and tests are not running.
- re-provisioning some of the hardware based on need, leading to a significant lead time depending on its availability
- slower build time when sufficient hardware is not available.
We wanted to help reduce infrastructure costs and improve hardware utilization. We believed that allowing organizations to bring up GoCD agents on-demand will achieve that goal. So we designed elastic agents to allow teams to leverage a combination of their own hardware and the cloud to run their builds.
Benefits we observed
We started using the elastic agents for Docker plugin as it was the first one we built. We observed these benefits from elastic agents since we started to use them.
1. Save time, reduce effort
It has saved us hours of wasted developer productivity per week.
Before elastic agents, if we wanted to test something new that needed different hardware or software, we needed to provision build machines, and have the relevant software installed on these agents. Then if we chose to not keep the change, we also had to decommission the machines. It was time consuming to do all of that, and took the focus away from development.
Using elastic agents, we can apply the idea behind ImmutableServer to our build machines. We now build an operating system image with all the required software (with the versions we need). This allowed us to operate quickly, and experiment without the worry of managing hardware.
2. Less pain and overhead
Every team member feels that our build and deployment process is much easier and less painful.
We can now quickly bring up Docker containers (with all the required software) and kick off the builds within a certain environment without affecting other environments. We completely avoid all the pain we experienced before. Moving to a combination of on-premises hardware and Amazon ECS has helped the team move a lot faster. We are not dependent on hardware being procured and provisioned with the right set of tools/software version on it, before we start using it.
3. Improve utilization of resources
We never had this capability to optimize our resources as we do now.
The real beauty of elastic agents was that it improved our infrastructure utilization. We can now use the same hardware to run and test GoCD on several operating systems improving the overall utilization of our hardware. If someone needs more machines to run builds, we can now terminate existing machines and bring up new ones. And if we still need more, we can overflow our builds on AWS with very little reconfiguring.
How can your team benefit from elastic agents?
Several open-source elastic agent plugins are currently available. We already have an Amazon Elastic Container Service(ECS) plugin which a lot of our users are trying out. A guiding principle for GoCD is to keep it as agnostic as possible. This allows the user to choose what technology they want to use and provision agents there. Therefore, we put a lot of time and effort into our plugin API, so you can just plug in any cloud provider you have or build your own plugin.
Depending on user demand and the popularity of the cloud providers among our users we will potentially build support for agents running on Amazon EC2 instances, Azure and other cloud providers.