‘Learning never stops’ is one of our fundamentals. So, when three members of the SRE Team got the chance to attend the Container Days conference in Hamburg in September, Matt Bennett, DevOps Engineer, says it was a great opportunity to learn, interact and share with peers.
The three of us – Max, Maurits and me – were happy to attend the conference, as it’s great to see how other companies are doing things, and what tools exist to make our job easier. And although it was called ‘Container Days’, the overall theme was still DevOps in general. So, some of our learnings weren’t related to just containers or Kubernetes, but how we can structure our processes and systems to improve in general. We’re already putting some of those learnings into practice.
The highlights soon became clear
Throughout the conference, there were a couple of topics that kept coming up. Most of them had a dedicated talk, but we also heard them mentioned in general throughout the conference.
1. eBPFs are an exciting new technology
We heard a lot about these during the conference, and this seems like an interesting new field. ‘eBPF’ stands for extended Berkeley Packet Filter, but that doesn’t really capture what it’s all about. In a nutshell, an eBPF is technology that lets you run a sandboxed program within the Linux kernel in response to a given event. And eBPF can be very powerful, since it is universal – everything that happens on a Linux VM goes through the kernel.
Some practical uses of eBPFs include network monitoring/filtering and sidecar-less service mesh. It’s unlikely that funda will do anything with eBPFs, but it’s still exciting to us. After all, it’s an innovation in the tools we use to run/monitor our platform.
2. A solution for deploying to local preview environments
One of our biggest pain points comes when we shift a feature/app from local development into Kubernetes. You need to do a lot of extra configuration at this stage. Take identity and access, for example: up until Kubernetes, Devs can use their own Azure accounts to authenticate. But once in Kubernetes, we need to use pod identity. This is one of the many things that can slow Developers down when they’re sharing code to be tested.
One solution to this can be found in Okteto. It lets each Developer have their own, personal Kubernetes environment. So, they can deploy to that environment from their local computer as they write code, and test that it works in that environment. There are other tools that could help, but they demoed Okteto at the conference and it looked really nice. Check it out.
3. There’s a wave of new Kubernetes Security Awareness Tools
Kubernetes has obviously established itself as a container orchestration platform. Which means that it's really interesting to malicious actors. Fortunately, it’s at a state of maturity at which people are really starting to think about security. That definitely showed at the conference.
In the vendor hall, we saw a bunch of booths that were advertising security awareness tools. For example, Falco can detect threats to our clusters, and Neuvector can even provide an entire security platform and react to threats in real time.
Since this space is already big and continues to grow, one of our key takeaways from the conference was that we need to assess all of the offerings and see how they can help us.
4. The benefits of using Kubernetes Operators
We can also address the complexity in our clusters by using Operators. At the conference, we heard an excellent talk called ‘Kubernetes made simpler: Using and developing Kubernetes Operators’. It’s a great introduction of Operators and how to start developing them.
The concept behind Operators is very promising. An Operator contains all of the logic for installing and maintaining a component in the cluster. It’s different from a helm chart, because it does more than just install the component. It also runs maintenance activities like updating the component.
Our consulting company, Opster, built a great example of this, called ‘OpenSearch Operator’. To spin up a new cluster, all Developers need to do is create an OpenSearchCluster resource, and it manages the rest.
Much like eBPFs, funda probably won’t need our own custom Operators, but being aware of the technology and how it works is still valuable. We should likely assess whether we should use it or not.
5. There are 4 best practices to deal with Kubernetes misbehaving
I would recommend this talk to anyone who’s just starting out with Kubernetes. It’s a complicated system, and it gets even more complicated by the applications we put on it. The talk covers the reasons why it can be so confusing to debug applications in Kubernetes and offers the following four best practices to deal with it. The speaker provided great details and really nice examples of each of the practices, so it’s really worth a watch.
- The biggest takeaway from this talk was Change Tracking. It’s true that we already do this to a certain extent, but the advice was to have it somewhere visible and easily findable. Right now, we have lots of sources for change tracking, but it’s very difficult to view everything as a whole. This is something SRE should be working on.
- The speaker also mentions Logging as a best practice. And funda definitely has room for improvement here. The advice in the presentation includes logging whenever an unexpected event happens. And also that you should, while you’re writing the log, think about the Developer who will read your log later. Just because you understand ‘silly error #34312’, doesn’t mean everyone will.
- Distributed tracing is another very powerful tool. It allows you to view how certain processes move around your infrastructure, drill down into individual requests and see how long they took. It can not only give you a much better view of how your applications work with each other, but also help you identify which ones are misbehaving. Lucky for us, funda already has distributed tracing with Datadog, so it’s something we can – and already do – use.
- Metrics are another thing we have at funda. These can be a big benefit during troubleshooting. We output standard performance metrics in APM (think: request count, latency, etc.) and we can have custom metrics per app.
Check out these great extras
There were a couple of other great talks that are worth checking out when you have a minute.
- Lessons learned from writing thousands of lines of IaC inspired us to further investigate drift detection for our base infrastructure in Terraform.
- 56 dog years as a cloud native was part history lesson, part advice for working in the cloud-native space. It was a great one.
- Not so graceful: K8s node failure and stateful workloads is a great intro if you want to learn more about PersistentVolumes (PVs) in Kubernetes, and what happens to them in cluster fail states, like when nodes go down.
- Hijack a Kubernetes Cluster – a walkthrough is a pretty interesting talk about how to escalate privileges in a Kubernetes Cluster. It shows really well why vigilance is necessary, because bad things can happen from simple misconfigurations.