What even is kubernetes?
Anybody who has been involved in any sort of devops operations in the last couple of months has heard of kubernetes. Currently the largest and most active open source project, it is the clear winner in the space of container orchestration systems and a whole ecosystem is evolving around it. Usually whenever people want to host containers in the cloud, kuberenetes is the technology of choice, be it in Google’s GKE or directly on Amazon EC2 or Azure Compute instance. There ar are ready to go installations on basically every cloud provider and integration with the largest providers is also very strong from kuberenetes’ side. However, there may be situations where using cloud infrastructure is not an option. You may need to comply with company policies, require access to protected networks or simply want to avoid cloud vendor lock-in. In such environments it can make sense to host your own kubernetes installations. We went down this path and we’ll share what to look out for when and if you want to venture in this direction.
Kubernetes (k8s for short) is built around the concept of resources and reconciliation loops. From a user’s perspective, you define a set of resources you want your cluster to provision by specifying a resource definition. This is usually done in YAML files and can look like the following:
- name: nginx
- containerPort: 80
That’s the definition of a pod, which is the smallest unit of workload resources in k8s. Using kubectl, the k8s command line tool, you would tell your cluster you want it to deploy the resource, and it will do it’s best to comply.
kubectl create -f res.yaml
In this case, a pod controller on one of your servers would be made aware that you want to host a nginx container. It would then attempt to start the container (under the restrictions of processor and memory quotas) until the container is up, or failed for some reason. This process of ‘reconciliation’ is central to k8s and can be applied to all kinds of resources: Pods, deployments, network policies, persistent volumes – even entire k8s clusters can be treated in the same way from within another cluster (that’s how cluster federation is implemented).
The reconciliation loop abstraction is extremely powerful (and surprisingly easy to extend) and allows the architecture of the cluster to differentiate cleanly between a control plane, where the resource state is recorded, and a worker plane where the actual resources are provisioned. All interaction between these planes – but also between the user and the cluster – is handled by a powerful api. The control plane stores it’s state in etcd, a distributed key-value store. All functionality is exposed via HTTP APIs by master nodes, that can be easily load balanced to achieve fault tolerance and high availability. The worker nodes are simple servers running docker and a kubelet service, which is configured with the master endpoints and credentials to connect to the API. Once it comes up, it will advertise itself to the master, and its resources will become available for scheduling within the cluster.
Things to consider when setting up your on prem clusters
To run an on-prem k8s cluster in a production mode you will need to go through 4 steps:
- Set up a fault tolerant etcd cluster
- Deploy a number of master nodes connecting to etcd
- Load balance the master nodes
- Provision the worker nodes
The k8s cluster will put a heavy write load on your etcd cluster, especially when you have a large number of worker nodes. K8s 1.9 officially supports up to 5000 nodes currently, but scaling anywhere near that number requires splitting up and heavily optimizing the underlying etcd. Even for smaller deployments you should consider running etcd on write-optimized machines backed by SSDs.
The master is basically implemented as a go binary that listens for HTTP traffic and stores all state in the etcd cluster. You can scale the number of master nodes arbitrarily which is nice considering that a number of API interactions use long running HTTP connections, and the number of connections can become a limiting factor.
Provisioning the worker nodes can be automated nicely, and there really aren’t restrictions on what kind of nodes you put into your cluster. There are even reports of clusters mixing windows and linux nodes although I wouldn’t recommend going there just yet.
What we learned so you don’t have to
Choose the right base operating system
We are using container linux as the operating system for master and worker nodes alike. We chose it because it offers thight integration into the k8s environment, a fast and fully automated patch cycle, and simple automated installation. However, it was a challenge to adapt to the fact that nodes may go down at any point. We’ve implemented a feature to allow for controlled update windows within kubernetes, to be able to better see what’s going on in our cluster.
Beware the release cycle
kubernetes is an incredibly active project and very quickly developing. Every 3 months a new version is released, and the changelogs often span multiple pages. We found it to be good practice to stay 2 versions behind the current release to avoid bleeding edge issues and gain the wisdom of the masses migrating to the new version.
Educate your users
k8s is an incredibly powerful system, making things possible that sometimes seem like magic (automatically scaling pod based on internal application metrics, anyone?). But it is only as good as the people maintaining and using it. Applications must be built in a certain way to fully harness the systems capabilities, your deployment workflows are likely to change and it may even have an impact on how you organize your Operations and Development departments. But the benefits you gain are well worth the pains!
Written by Sebastian Pleschko
Head of Operations
Take a look into a developers’ life within REWE Group and see how REWE connects food retail with digitalization and software development in the video below: