Tinder’s move to Kubernetes

Devesh Bhardwaj
3 min readDec 26, 2020
source -: https://www.druva.com

Why

Almost three years ago, Tinder decided to move its platform to Kubernetes. Kubernetes afforded us an opportunity to drive Tinder Engineering toward containerization and low-touch operation through immutable deployment. Application build, deployment, and infrastructure would be defined as code.

We were also looking to address challenges of scale and stability. When scaling became critical, we often suffered through several minutes of waiting for new EC2 instances to come online. The idea of containers scheduling and serving traffic within seconds as opposed to minutes was appealing to us.

It wasn’t easy. During our migration in early 2019, we reached critical mass within our Kubernetes cluster and began encountering various challenges due to traffic volume, cluster size, and DNS. We solved interesting challenges to migrate 200 services and run a Kubernetes cluster at scale totaling 1,000 nodes, 15,000 pods, and 48,000 running containers.

How

Starting January 2018, we worked our way through various stages of the migration effort. We started by containerizing all of our services and deploying them to a series of Kubernetes hosted staging environments. Beginning October, we began methodically moving all of our legacy services to Kubernetes. By March the following year, we finalized our migration and the Tinder Platform now runs exclusively on Kubernetes.

Building Images for Kubernetes

There are more than 30 source code repositories for the microservices that are running in the Kubernetes cluster. The code in these repositories is written in different languages (e.g., Node.js, Java, Scala, Go) with multiple runtime environments for the same language.

The build system is designed to operate on a fully customizable “build context” for each microservice, which typically consists of a Dockerfile and a series of shell commands. While their contents are fully customizable, these build contexts are all written by following a standardized format. The standardization of the build contexts allows a single build system to handle all microservices.

source -: tinder engineer tech blog

In order to achieve the maximum consistency between runtime environments, the same build process is being used during the development and testing phase. This imposed a unique challenge when we needed to devise a way to guarantee a consistent build environment across the platform. As a result, all build processes are executed inside a special “Builder” container.

The implementation of the Builder container required a number of advanced Docker techniques. This Builder container inherits local user ID and secrets (e.g., SSH key, AWS credentials, etc.) as required to access Tinder private repositories. It mounts local directories containing the source code to have a natural way to store build artifacts. This approach improves performance, because it eliminates copying built artifacts between the Builder container and the host machine. Stored build artifacts are reused next time without further configuration.

For certain services, we needed to create another container within the Builder to match the compile-time environment with the run-time environment (e.g., installing Node.js bcrypt library generates platform-specific binary artifacts). Compile-time requirements may differ among services and the final Dockerfile is composed on the fly.

The End Result

Through these learnings and additional research, we’ve developed a strong in-house infrastructure team with great familiarity on how to design, deploy, and operate large Kubernetes clusters. Tinder’s entire engineering organization now has knowledge and experience on how to containerize and deploy their applications on Kubernetes.

On our legacy infrastructure, when additional scale was required, we often suffered through several minutes of waiting for new EC2 instances to come online. Containers now schedule and serve traffic within seconds as opposed to minutes. Scheduling multiple containers on a single EC2 instance also provides improved horizontal density. As a result, we project substantial cost savings on EC2 in 2019 compared to the previous year.

It took nearly two years, but we finalized our migration in March 2019. The Tinder Platform runs exclusively on a Kubernetes cluster consisting of 200 services, 1,000 nodes, 15,000 pods, and 48,000 running containers. Infrastructure is no longer a task reserved for our operations teams. Instead, engineers throughout the organization share in this responsibility and have control over how their applications are built and deployed with everything as code.

--

--