Container Escape Telemetry, Part 5: Tuning eBPF Tools From Defaults to Detection
What Tetragon, Falco, and Tracee ship with out of the box, what you have to build yourself, and every configuration pitfall we hit along the way. The practical tuning guide for container runtime security tools.
This is Part 5 of the container escape telemetry series (overview). Part 1 covered isolation primitives. Part 2 covered the lab and tools. Part 3 was the per-scenario data. Part 4 covered volume, signal-to-noise, and tool selection. This post is for the practitioner who just installed one of these tools and wants to know what to do next.
The gap between “tool installed” and “tool detecting container escapes” is larger than any of these projects’ documentation suggests. Here’s what that gap actually looks like and how to close it.
What You Get Out of the Box
Tetragon: Powerful Engine, Empty Hangar
Tetragon ships with zero escape-detection policies. The TracingPolicy CRD system is genuinely powerful – you can hook arbitrary kernel functions, filter on arguments in-kernel, and enforce SIGKILL on match – but the default install observes process lifecycle events and nothing else. No kprobes for setns, no hooks for splice, no file access monitoring. You’re writing YAML from scratch.
I wrote 8 TracingPolicy files covering 31 kernel function hooks for this research:
| Policy | What It Hooks | Why |
|---|---|---|
| 00-baseline-collection | cap_capable, wake_up_new_task, security_bprm_check, cgroup_mkdir/rmdir/attach, setuid/setgid/setresuid, dup3 | Ambient kernel behavior baseline. The cap_capable hook alone generated 55.8% of total Tetragon volume. |
| 01-namespace-changes | sys_unshare, sys_setns, sys_clone3, security_task_alloc | Core escape detection. Any setns from container to host namespace is near-certain escape. |
| 02-cgroup-manipulation | sys_mount, security_file_open (release_agent, notify_on_release, cgroup.procs) | S01/S02 cgroup escapes. The mount + write(release_agent) chain is the canonical pattern. |
| 03-privilege-escalation | security_bprm_check, cap_capable, sys_ptrace | Binary execution context and privilege checks during exploitation. |
| 04-sensitive-file-access | security_file_open (/proc/1/root/*, /etc/shadow, docker.sock, core_pattern), security_socket_connect | S04/S05 sensitive path monitoring. Namespace annotations make host-vs-container filtering trivial. |
| 05-network-activity | tcp_connect, tcp_close | Network connection tracking. Lower priority for escape detection but useful for lateral movement. |
| 06-cve-specific | do_splice, sys_splice, sys_fsopen, sys_fsconfig, sys_setsockopt, sys_symlinkat, sys_symlink, security_file_open (/proc/self/exe) | CVE-specific hooks: DirtyPipe (S08), fsconfig (S09), Netfilter (S10), maskedPaths (S11), runc overwrite (S12). |
| 07-capability-abuse | sys_mount, sys_finit_module, sys_init_module, sys_bpf, sys_keyctl, security_file_open (/dev/kmsg) | S13/S14 in-namespace capability abuse. These are the operations a --privileged container can do without ever calling setns. |
Writing these took multiple iterations. The YAML is straightforward once you understand the schema, but choosing which kernel functions to hook and which arguments to filter on requires understanding both the exploit chain and the kernel internals. You’re effectively writing eBPF programs through a YAML abstraction layer. That’s powerful, but the authoring cost is real.
Falco: Broad Coverage With Gaps
Falco ships with 400+ community-maintained rules, MITRE ATT&CK mappings, and broad container runtime integration. Out of the box, it detected 5 of 13 scoreable scenarios (S01-S14, excluding S06 baseline and S15 stress test):
Default rules that fired (4 rules across 5 detected scenarios; a 5th rule – “Drop and execute new binary” – caught gcc compilation during S08 but not the splice exploit itself, so it doesn’t count as detecting DirtyPipe):
- “Container Escape via release_agent” – S01, S02
- “Namespace Change from Container” – S02, S03
- “Docker Socket Access from Container” – S04 (partial: runc init only, not in-container abuse)
- “Sensitive Host File Read” – S05
- “Drop and execute new binary” – S08 (fired on gcc, not splice – does not count as detecting DirtyPipe)
Default rules that didn’t exist:
- No splice rule (S08 DirtyPipe)
- No fsconfig/fsopen rule (S09)
- No netfilter/netlink socket rule (S10)
- No symlink-to-proc rule (S11)
- No /proc/self/exe write rule (S12)
- No keyctl, kernel module, BPF, /dev/kmsg, or /proc/*/environ rules (S13, S14)
The engine supports every one of these syscalls. The gap is rule coverage, not capability. Closing it required 12 custom rules and rule changes, which I covered in detail in Part 4.
Tracee: Behavioral Depth, Configuration Required
Tracee ships with 330+ syscall hooks and the TRC-* behavioral signature engine. The behavioral signatures (magic_write, commit_creds, switch_task_ns) are Tracee’s standout feature – they detect exploit *consequences* rather than specific syscall sequences. But the default event set and filtering configuration matter enormously.
The policy file I built enables 30 event types:
Process lifecycle: sched_process_exec, sched_process_exit, clone, clone3
Namespace operations: unshare, setns, switch_task_ns
File operations: security_file_open (with path prefix filters), openat (filtered to /proc/self/exe), symlink, symlinkat
Credential changes: setuid, setgid, setresuid, setresgid, capset, commit_creds
Network: security_socket_connect, setsockopt
Cgroup: cgroup_mkdir, cgroup_rmdir, mount
CVE-specific: splice, pipe, pipe2, dup, dup2, dup3, ptrace
Behavioral: magic_write
The migration from CLI flags to policy YAML was necessary because --events doesn’t support inline argument filters. You can’t do --events openat.args.pathname=/proc/self/exe on the command line; you need a policy YAML file with a filters block under the event entry.
What We Had to Enable or Customize
Tetragon Flags That Change Everything
--enable-process-ns
--enable-process-cred
--enable-ancestors base,kprobe,tracepoint
--rb-size-total 67108864
--process-cache-size 65536
Each of these is off by default. Here’s what you lose without them:
--enable-process-ns: Without this, you lose all 10 namespace inodes on every event. That means you can’t distinguish a container process from a host process in your telemetry. You can’t filter setns events to only those originating from containers. You can’t prove that a security_file_open on /proc/1/environ came from a container namespace. This flag is effectively mandatory for container security use cases. I don’t understand why it’s off by default.
--enable-process-cred: Without this, no uid/gid/euid/egid/suid/sgid/fsuid/fsgid on any event. No capability sets. You can see that cap_capable fired but not which capability was being checked or what the process’s effective set was.
--enable-ancestors: Without this, you get the immediate process only. No parent, no grandparent, no process tree. When an exploit spawns a chain of processes, you need the ancestry to trace it back to the entry point.
--rb-size-total 67108864 (64 MB): The default ring buffer size is much smaller. During the S15 stress test, the 64 MB buffer handled all 15 scenarios with zero drops. If you’re monitoring production workloads with high container churn, you may need to increase this further. The tradeoff is memory: this is pinned kernel memory that can’t be swapped.
--process-cache-size 65536: The process cache maps PIDs to process metadata. The default is smaller. If you see “process not found in cache” warnings in Tetragon logs, increase this. Each cache miss means an event ships without full process context.
Falco: rule_matching=all
This is the single most important Falco configuration change for detection engineering, and discovering it was one of the more frustrating debugging sessions in this project.
By default, Falco uses rule_matching=first: when multiple rules match the same event, only the first matching rule fires. I discovered this when the “Keyctl Invocation from Container” rule never fired on S13 despite keyctl clearly executing 504 times. The reason: “Drop and execute new binary” matched the same execve_exit event first (keyctl was freshly installed by apt-get), and Falco stopped evaluating.
Setting rule_matching=all makes all matching rules fire on the same event. This is essential if you have overlapping rules at different priority levels. A process that triggers both “Drop and execute new binary” (WARNING) and “Keyctl Invocation from Container” (WARNING) should fire both, because they represent different threat signals from the same event.
Tracee: The data. vs args. Trap
Tracee’s policy YAML supports argument filtering under each event’s filters block. The syntax looks like this:
- event: security_file_open
filters:
- data.pathname=/etc/shadow*
The data. prefix is required. An earlier version of Tracee used args. as the prefix. Here’s what makes this dangerous: args.pathname=/etc/shadow* passes YAML validation, the policy loads without error, Tracee starts normally, and the filter is silently ignored at runtime. Every security_file_open event passes through as if unfiltered.
I discovered this when Tracee generated 3.76M events instead of the expected ~15K. The policy file looked correct. Tracee reported no errors. The only symptom was a 248x event volume explosion. If I hadn’t been comparing against expected numbers, this misconfiguration would have been invisible.
The --events CLI flag and policy YAML rules block share keyword names but use different filter syntaxes. If you’re migrating from CLI to policy YAML (which you should, because CLI flags don’t support argument-level filtering), verify your filters are actually working by checking event counts, not just that the container starts.
Configuration Pitfalls
These are the bugs that cost hours. Every one was silent until it caused a visible failure.
Falco: Folded Scalar YAML Bombs
Falco rules use YAML folded scalars (>) for multi-line condition and output fields. The syntax is:
condition: >
container and evt.type = splice
output: >
splice() from container (user=%user.name)
If the output: > line ends up on the same line as the last line of the condition block – which can happen easily during copy-paste editing – Falco doesn’t report a parse error. Instead, it rejects the entire custom-rules.yaml file with a LOAD_ERR that appears only in the container log. All custom rules silently fail to load. The default rules still work, so Falco appears healthy.
I hit this twice in the same file. The first time, I noticed because the splice rule wasn’t firing on S08. The second time, Falco crash-looped 27 times before I checked docker logs falco and found the error. The fix was a single newline both times.
Recommendation: After any custom rule change, check docker logs falco 2>&1 | grep LOAD_ERR before running scenarios.
Falco: evt.dir Deprecation
Falco 0.43 deprecated evt.dir = < (the “entry” direction filter). The field still exists but always evaluates to true. Six of the default rules used evt.dir = < as part of their conditions. After the deprecation, those rules match both entry and exit events, effectively doubling their firing rate and creating false positives.
There’s no warning at rule load time. The only way to discover this is by noticing unexpected event volumes or by reading the changelog. If you’re running Falco 0.43+ with rules that use evt.dir, audit them.
Falco: Event File Accumulation
The harness uses touch to create the Falco output file before starting the container. If Falco appends to an existing file (the default behavior with file_output), events from previous runs contaminate your analysis. Between runs, you need to truncate with : > falco-events.json, not just touch.
This is a lab-specific issue, but the same principle applies in production if you’re rotating Falco output files. Stale events in the current file will confuse any analysis pipeline that assumes the file only contains events from the current session.
Tracee: –events Doesn’t Support Argument Filters
You can’t do:
--events openat.args.pathname=/proc/self/exe
Tracee will silently ignore the filter part and enable openat with no filtering. To get argument-level filtering, you need a policy YAML file. The documentation mentions this, but the error mode is silent acceptance rather than a parse error, so it’s easy to miss.
Tetragon: Policy File Timing
Tetragon loads TracingPolicy YAML files from a directory mounted into the container. The policies must be on disk before the container starts. If you SCP policy files to the VM and start Tetragon in the same script, there’s a race condition: the SCP might not complete before docker run reads the policy directory.
I hit this with 07-capability-abuse.yaml. The file was created after the Tetragon container was already running. Tetragon loaded 7 of 8 policies, and the capability abuse hooks were silently missing from the entire S13/S14 run. The only way I caught it was comparing the docker logs tetragon | grep "Added TracingPolicy" count against the number of YAML files on disk.
Recommendation: After starting Tetragon, verify the loaded policy count matches the file count:
policy_count=$(find /path/to/policies -name '*.yaml' | wc -l)
loaded_count=$(docker logs tetragon 2>&1 | grep -c "Added TracingPolicy with success")
if [[ "$loaded_count" -lt "$policy_count" ]]; then
echo "WARNING: $loaded_count/$policy_count policies loaded"
fi
The –restart=always + Skip Guard Bug
This one cost two full re-runs. All three install functions originally had a guard:
if docker ps --format '{{.Names}}' | grep -q "^toolname$"; then
echo "Tool is already running -- skipping reinstall."
return
fi
Combined with --restart=always on the container, here’s what happened after a VM reboot: Docker auto-restarted the old container with the old configuration. The install function saw the container running and returned early. The new configuration (updated policies, new filters, different flags) was never applied. The scenario ran against the old config.
The symptom was subtle: event counts and timestamps matched a previous run exactly. Not “similar” – identical. The container log’s start timestamp was from three hours ago, not from the current run. If I hadn’t been comparing timestamps, I’d have analyzed stale data and drawn wrong conclusions.
The fix was removing the skip guard entirely and always recreating:
# Always recreate -- the container may have been auto-restarted after
# reboot (--restart=always) with a stale config.
docker rm -f toolname 2>/dev/null || true
This applies to any containerized tool with --restart=always where you expect configuration changes to take effect on restart. The container restart and the configuration update are separate operations, and the skip guard conflates “container is running” with “container is running with the right config.”
False Positives and Noise Sources
Every tool generates noise. Understanding the noise sources is how you tune effectively.
runc Init: 18 False Positives Per Container Start
runc, the OCI runtime that Docker uses to create containers, calls fsopen, fsconfig, symlink, and creates netlink sockets during normal container initialization. These are exactly the syscalls that CVE-specific rules target. Without exclusions, every docker run generates ~18 false positive events across the CVE rules.
The required Falco exclusion:
not proc.name in (runc, "runc:[0:PARENT]", "runc:[1:CHILD]", "runc:[2:INIT]")
This needs to be on every CVE-specific rule. The four runc process names correspond to different stages of the container creation process – parent, child, and init. Missing any one of them leaks false positives.
Tetragon and Tracee have the same problem but handle it differently. Tetragon’s namespace metadata lets you filter in post-processing: runc init processes have is_host: true because they’re running in the host’s namespaces before the container’s namespaces are created. Tracee’s container scope (scope: container in the policy) filters out most runc init activity, but some events from runc:[2:INIT] still appear because it executes after the container’s namespaces are created but before the container’s entrypoint runs.
Tetragon cap_capable: 55.8% of All Volume
The cap_capable kprobe in the baseline policy fires on every kernel capability check. A single ls command triggers multiple capability checks. An apt-get install triggers thousands. Across all 15 scenarios, cap_capable accounted for 55.8% of total Tetragon event volume.
The events are genuinely useful – the 13x spike during DirtyPipe (1,360 vs 103 baseline) is a real anomaly signal. But for production, you have options:
- Remove from baseline policy entirely. You lose the ambient capability monitoring but cut volume in half. The CVE-specific and capability-abuse policies still catch the important signals.
- Add
matchNamespacesto scope it to containers only. Host-side capability checks (the majority) are filtered in-kernel. - Rate-limit via a downstream pipeline. Keep all events but sample or aggregate before storage.
Tracee security_file_open: The Biggest Knob
Without path filters, security_file_open is the single biggest source of Tracee event volume. During S13, apt-get install keyutils alone generated 12,000+ security_file_open events from the package manager reading every file in the dpkg database. Across all 15 scenarios, unfiltered security_file_open accounted for ~98% of total events.
Path-prefix filtering reduced total events from 3.76M to 15,169 – a 99.6% reduction – with zero detection regressions. The filters I used match the same sensitive paths as Tetragon’s TracingPolicies:
- event: security_file_open
filters:
- data.pathname=/etc/shadow*
- data.pathname=/proc/1/*
- data.pathname=/proc/self/*
- data.pathname=/var/run/docker.sock*
- data.pathname=/proc/sys/kernel/core_pattern*
- data.pathname=/dev/kmsg*
- data.pathname=/dev/sda*
- data.pathname=/etc/passwd*
# ... (full list in the install-tools.sh source)
This is not optional for production. Without these filters, Tracee is vulnerable to adversarial openat flooding (S15 demonstrated 10.1M dropped events and a missed escape). With them, Tracee handled the same flood with zero drops.
Falco “Drop and execute new binary”
This rule fires on any binary that was written to disk and then executed. It’s high-confidence for supply chain attacks (attacker drops a binary and runs it). It’s noisy in containers that install packages at runtime: apt-get install keyutils followed by keyctl triggers it, and so does every other freshly-installed package.
In production, this rule is valuable but needs context-dependent tuning. If your containers never install packages at runtime (immutable images), leave it at CRITICAL. If they do, either suppress it for known package managers or downgrade to INFORMATIONAL for containers with known install-at-runtime patterns.
Tuning Recommendations
Tetragon
- Rate-limit or scope
cap_capable. This is the single highest-impact change. Either addmatchNamespacesto restrict to container processes, or remove it from the baseline policy and rely on the CVE-specific and capability-abuse policies for escape detection. Cuts volume by roughly half. - Tune
--rb-size-totalbased on workload. 64 MB (67108864 bytes) handled all 15 scenarios including S15’s syscall flood with zero drops. For production workloads with higher container churn, consider 128 MB or higher. Monitorreceivedandlostcounters in Tetragon’s container log. - Use
matchNamespacesin TracingPolicy. This filters events in-kernel to only processes in specific namespaces, reducing volume without losing detection capability for container-targeted policies. - Always verify policy load count. After starting Tetragon, check that the number of “Added TracingPolicy with success” log lines matches the number of YAML files. Silent policy load failures are the most common source of coverage gaps.
Falco
- Set
rule_matching=all. Without this, overlapping rules suppress each other and you’ll have invisible detection gaps. The volume increase is modest. - Add runc exclusions to every custom rule. Use the full four-process-name exclusion (
runc, "runc:[0:PARENT]", "runc:[1:CHILD]", "runc:[2:INIT]"). Missing any one leaks false positives. - Drop Debug-priority lifecycle rules in production. The “Container Process Started”, “Container Process Exited”, and “Container Process Forked” rules I used for research generate significant volume. In production, keep them only if you’re feeding a behavioral analytics pipeline.
- Validate custom rules after every change. Check
docker logs falco 2>&1 | grep LOAD_ERRimmediately after starting. Folded scalar YAML errors are silent until they crash the entire custom rules file. - Audit rules using
evt.dir. If you’re on Falco 0.43+,evt.dir = <always evaluates true. Remove or replace any rules that depend on directional filtering.
Tracee
- Use policy YAML files, not
--eventsCLI. CLI flags don’t support argument-level filtering. Policy YAML is the only way to get path-prefix filters onsecurity_file_openand pathname filters onopenat. - Filter
security_file_openby path prefix. This is the single most important tuning change. Without it, you’re vulnerable to adversarial flooding, you’ll generate 248x more events, and you’ll lose escape-critical events under load. The path list should match whatever sensitive paths your Tetragon policies or Falco rules target. - Use
data.prefix, neverargs.in policy YAML filters.args.is silently ignored. There is no error, no warning, and no indication that your filter isn’t working. Verify by checking event counts after deployment. - Scope to container workloads. The
scope: containerdirective in the policy file limits Tracee’s observation to container-namespaced processes. Without it, host-side activity generates events that are irrelevant for container escape detection. - Add
openatwith pathname filter for /proc/self/exe. This captures CVE-2019-5736 write attempts thatsecurity_file_openmisses because the VFS rejects the open with ETXTBSY before the LSM hook fires. It’s a single event addition but closes a real detection gap.
The Detection Authoring Cost
Across all three tools, closing the gap between “installed” and “maximum detection coverage” required:
- Tetragon: 8 TracingPolicy files with 31 kernel function hooks. Multiple iterations debugging YAML schemas and argument types. Total authoring effort was the highest of any tool because you’re specifying kernel function signatures.
- Falco: 12 custom rules and rule changes. Lower per-rule effort than Tetragon (you’re writing conditions against event fields, not kernel function signatures), but the YAML syntax pitfalls and the
rule_matchingdiscovery added debugging time. - Tracee: 1 policy YAML file with 30 event types and argument filters. The lowest authoring effort because Tracee’s behavioral signatures (
magic_write,commit_creds) provide escape detection without per-CVE rule authoring. The main effort was discovering thedata.vsargs.prefix issue and the path-filtering requirement.
Tetragon and Tracee can reach 12/13 with custom work. Falco reaches 11/13 – the S12 runc overwrite is blocked before Falco can observe the write, and S04’s socket attribution remains a partial detection. All three have the engine capability to detect the underlying syscalls. The difference is how much authoring each requires to get there. If your team has kernel expertise and wants maximum control, Tetragon’s TracingPolicy system rewards that investment. If you want fast iteration on rule conditions without kernel internals knowledge, Falco’s rule syntax is more accessible. If you want the broadest behavioral coverage with the least custom authoring, Tracee’s signature engine plus a well-tuned policy file gets you there with the least effort.
The gap between “tool supports this syscall” and “tool detects this attack” is entirely configuration. That’s simultaneously the most frustrating and most empowering finding from this research. Frustrating because every tool requires non-trivial tuning. Empowering because it means the tools you already have are more capable than their default configurations suggest.
Part 6 takes these tuning lessons and tests them against reality: TeamPCP, a threat actor active right now, uses the exact container escape techniques our lab scenarios simulate. The gap between default configurations and real-world threat coverage is where TeamPCP lives.
LLM Disclosure
Claude (Anthropic) was used throughout this project to assist with lab setup and automation, telemetry analysis and correlation, and authoring this blog series.
