Today it is time for the second part of my Trivadis mini-series "Security for Kubernetes". This time we're looking at runtime thread detection, alerting and actions. Many people now use analysis tools to check their frameworks and Docker images for potential security vulnerabilities. In the next part of the mini-series, I'll tell you about our current setup for Java.
What we want to look at now is the last step in the chain: what do we do when our components are running in the Kubernetes cluster?
So far, not much, we have hopefully rebuilt the code in the CI pipeline from time to time to look for vulnerabilities using the tools mentioned above. But after that, it usually stops.
Software ages and vulnerabilities are added day by day, wouldn't it be good if we were notified at runtime about the unusual behavior of containers or Kubernetes nodes?
The open source project Falco comes to the rescue:
Falco, the cloud-native runtime security project, is the de facto Kubernetes threat detection engine. Falco was created by Sysdig in 2016 and is the first runtime security project to join CNCF as an incubation-level project. Falco detects unexpected application behavior and alerts on threats at runtime.
Falco hooks into the kernel and analyzes all system calls against its rules engine. If a rule is violated, an alert can be triggered.
Image source: https://falco.org/docs/getting-started/
An example: Someone gains shell access to one of the containers and tries to install additional tools via Packet Manager.
As admin I then get the following alerts in the Falco frontend:
I also get an alert via Google Chat message:
Ok, so we would get notified. But what if this happens in the middle of the night?
We can also define actions. In this specific case we could, for example, activate a NetworkPolicy. This leaves the affected POD still running, but it has no network connection anymore. As admin, we can then analyze the POD in the morning without the attacker being able to cause further damage.
What can we monitor with this? By default, the following cases are handled:
Privilege escalation using privileged containers
Namespace changes using tools like setns
Read/Writes to well-known directories such as /etc, /usr/bin, /usr/sbin, etc
Ownership and Mode changes
Unexpected network connections or socket mutations
Spawned processes using execve
Executing shell binaries such as sh, bash, csh, zsh, etc
Executing SSH binaries such as ssh, scp, sftp, etc
Mutating Linux coreutils executables
Mutating login binaries
Mutating shadowutil or passwd executables such as shadowconfig, pwck, chpasswd, getpasswd, change, useradd, etc, and others.
The default installation already comes with a lot of useful rules, if we have special cases or discover a new attack vector, we can add new rules ourselves.
In the beginning, it makes sense to run Falco for a few days and then adapt the alerting / rule-set to your Kubernetes cluster.
But sometimes you can also find out some strange things: On my first installation we found out that on the base image of the Kubernetes nodes a Splunk daemon and an antivirus scanner is running. I wasn't aware that there was a virus scanner running before, but now we know that at specific times the performance can be affected by the scanner. As you see you can also analyze some behavior of the cluster when you don’t have everything under your control (like AMI images on AWS).
Falco is so far available in version 0.27, but is already used by several large companies on their productive cluster. I could not find any negative impact on our cluster so far.
Did you know that there is an open source thread protection engine for Kubernetes? What do you think about it?