Replicated is a 5-year old infrastructure software company working to make it easy for businesses to install and operate third party software. We don’t want you to have to send your data to a multi-tenant SaaS provider just to use their services. Our team is made up of twenty-two people distributed throughout the US. One thing that’s different about Replicated is our developers don’t actually store or execute code on their laptops; all of our development happens on remote instances in the cloud.
Our product, KOTS, runs in Kubernetes and manages the lifecycle of 3rd-party applications in the Kubernetes cluster. Building and validating the product requires a developer to have access to a cluster. But as we started to hire more and more engineers it became ridiculous to ask everyone to run their own local Kubernetes cluster. We needed to both simplify and secure our setup to allow every engineer to run their environment in the cloud, and we needed to do it in a way which was seamless and secure.
Previous Dev Environments with Docker for Mac
We started with each developer building their own local environments, using whatever tools they were comfortable with. Our first attempt to build a standard development environment that works for our engineering team was to use Docker for Mac and its built-in Kubernetes distribution. We would buy the best MacBook Pros available (16 GB, then 32 GB, now 64 GB), and everyone would have the entire stack running on their laptop.
This worked pretty well, except that there was a set of problems that our engineers would continue to hit–battery life was terrible because of the constant CPU usage, Docker For Mac was different from “real Kubernetes” in some meaningful ways, and Docker for Mac’s built-in K8s regularly would just sometimes stop working and the developer would need to uninstall and reinstall the entire stack. It was miserable.
We’d lose hours every week from engineers troubleshooting their local environments. When a front end engineer (who wasn’t expected to be a Kubernetes expert) would have issues, they’d need to pair and get help from a backend engineer; consuming not just one but two people’s valuable time.
We needed something better.
To The Cloud
Rather than running Docker locally, we now create an instance in Google Cloud for each developer. These instances have no public IP and are based on our machine image which has all of our prerequisites installed. This includes many tools, including a Kubernetes distribution that’s completely local to the server. We run a docker registry in each developer’s cluster as a cluster add-on. The cloud server has a magical tool called cloudflared running on it that replaces all of the network configuration and security work we would otherwise have had to do.
Cloudflared powers Argo Tunnel. When it starts, cloudflared creates four secure HTTP/2 tunnels to two Cloudflare data centers. When a request comes in for a development machine, Cloudflare routes that request over one of those tunnels directly to the machine running that developer’s environment. For example, my hostname is “marc.repl.dev”. Whenever I connect to that, from anywhere on earth, Cloudflare will see that I reach my development environment securely. If I need to spin up a new development environment, there is no configuration to do, wherever is running cloudflared with the appropriate credentials will receive the traffic. This all works on any cloud and in any cloud region.
This configuration has several advantages over a traditional deployment. For one, the server does not have a public IP and we don’t need to have any ports open in the Google Load Balancer, including for SSH. The only way to connect to these servers is through the Argo Tunnel, secured by Cloudflare Access. Access provides a BeyondCorp-style method of authentication, this ensures that the environment can be reached from anywhere in the world without the use of a VPN.
BeyondCorp is an elaborate way of saying that all our authentication is managed in a single place. We can write a policy which defines which machines a user should have access to and trust it will be applied everywhere. This means rather than managing SSH certificates which are hard to revoke and long-living, we can allow developers to login with the same Google credentials we use everywhere else! Should, knock on wood, a developer leave, we can revoke those credentials instantly; no more worrying what public keys they still might have lying around.
What happens on the developer’s machines?
Through Argo Tunnel and Access we now have the ability to connect to our new development instances, but that isn’t enough to allow our engineers to work. They need to be able to write and execute code on that remote machine in a seamless way. To solve that problem we turned to the Remote SSH extension for VS Code. In the words of the documentation for that project:
The Visual Studio Code Remote SSH extension allows you to open a remote folder on any remote machine, virtual machine, or container with a running SSH server and take full advantage of VS Code’s feature set. Once connected to a server, you can interact with files and folders anywhere on the remote filesystem.
With Remote SSH, VS Code seamlessly reads and writes files to the developer’s remote server. When a developer opens a project, it feels local and seamless, but everything is authenticated by Access and proxied through Argo over SSH. Our developers can travel anywhere in the world, and trust their development environment will be accessible and fast.
Locally, a developer has a .ssh/config file to define local ports to forward through the SSH connection to a port that’s only available on the remote server. For example, my .ssh/config file contains:
LocalForward 8080 127.0.0.1:30080
LocalForward 8005 127.0.0.1:30015
To build and execute code our developers open the embedded terminal in VS Code. This automatically connects them to the remote server. We use skaffold, a Kubernetes CLI for local development. A simple skaffold dev starts the stack on their remote machine which feels local because it’s all happening inside VS Code. Once it’s started, the developer can access localhost in their browser to view the results of their work by visiting http://localhost:8080. The SSH config above will forward this traffic to port 30080 on the remote server. Port 30080 on the remote server is a NodePort configured in the local cluster, that has the web server running in it. All of our APIs and web servers have static NodePorts for local development environments.
Now, when a developer starts at Replicated, their first day (or even week) isn’t consumed by setting up the development environment–now it takes less than an hour. We have a Terraform script that makes it easy to replace any one of our developer’s machines in seconds.
All developers at Replicated have now been using this environment for nine months. We haven’t eliminated the problems that occasionally come up where Kubernetes isn’t playing nicely, or Docker uses too much disk space. However, these problems do occur much less frequently than they did on Docker for Mac. We now have two new options that weren’t easily available when everyone ran their environment locally.
First, a backend engineer can just ssh through the Argo Tunnel into the other developers server to troubleshoot and help. Every development environment has become a collaborative place. This is great when two engineers aren’t in the same room. Also, we’re less attached to our development environments–if my server isn’t working properly for unknown reasons, instead of troubleshooting it for hours, I can delete it and get a new clean one.
Some additional benefits include:
- Developers can have multiple envs easily (to try out a new k8s version, for example)
- Battery life is awesome again on laptops
- We don’t need the biggest and most powerful laptops anymore (Hello Chromebooks and Tablets)
- Developers can choose their local OS and environment (MacOS, Windows, Linux) because they are all supported, as long as SSH is supported.
- Code does not live on a developer laptop; it doesn’t travel with them to coffee shops and other insecure places. This is great for security purposes–a lost laptop no longer means the codebase is out there with it.
Beyond just telling you what we did, we’d like to show you how to replicate it for yourself! This assumes you have a domain which is already configured to use Cloudflare.
- Create an instance to represent your development environment in the cloud of your choice.
gcloud compute instances create my-dev-universe
2. Configure your instance to run cloudflared when it starts up, and give it a helpful hostname like dev.mysite.com.
sudo apt-get install cloudflare/cloudflare/cloudflared
cat “hostname: dev.mysite.comn” > ~/.cloudflared/config.yml
sudo cloudflared service install
3. Write an Access policy to allow only you to access your machine
4. Configure your local machine to SSH via Cloudflare:
sudo apt-get install cloudflare/cloudflare/cloudflared
cloudflared access ssh-config –hostname dev.mysite.com –short-lived-cert >> ~/.ssh/config
5. In VS Code select ‘Remote-SSH: Connect to Host…’ from the Command Palette and enter [email protected]. A browser window will open where you will be prompted to login with the identity provider you configured with Cloudflare.
6. You’re done! If you select File > Open you will be seeing files on your remote machine. The embedded terminal will also execute code on that remote machine.
7. Once you’re ready to get a production-ready setup for your team, take a look at the instructions we share with our team.
There is no doubt that the world is becoming more Internet-connected, and that deployment environments are becoming more complex. It stands to reason that it’s only a matter of time before all software development happens through and in concert with the Internet.
While it might not be the best solution for every team, it has resulted in a dramatically better experience for Replicated and we hope it does for you as well.
How to get started
Replicated develops remotely with Cloudflare Access, a remote access gateway that helps you secure access to internal applications and infrastructure without a VPN.
Effective until September 1, 2020, Cloudflare is making Access and other Cloudflare for Teams products free to small businesses. We’re doing this to help ensure that small businesses that implement work from home policies in order to combat the spread of the Coronavirus (COVID-19) can ensure business continuity.
You can learn more and apply at cloudflare.com/smallbusiness now.