Skip to content

Instantly share code, notes, and snippets.

@astro-stan
Last active February 20, 2026 02:04
Show Gist options
  • Select an option

  • Save astro-stan/ddf6e3e5e2916aa3cd283828d5eb9638 to your computer and use it in GitHub Desktop.

Select an option

Save astro-stan/ddf6e3e5e2916aa3cd283828d5eb9638 to your computer and use it in GitHub Desktop.
App-gateway with Multus-CNI example
# The gateway namespace needs to be privileged to set up firewall and routing rules
# The clients do not need to be in privileged namespaces
apiVersion: v1
kind: Namespace
metadata:
name: app-gateway
labels:
pod-security.kubernetes.io/enforce: privileged
---
# Inspired by https://github.com/solidDoWant/infra-mk3/tree/master/cluster/gitops/networking/vpn
#
# This network definition defines a macvlan interface, configured via DHCP. It
# creates a default route in a separate routing table (usually `100`) and will
# forward all traffic using its routing table through the macvlan interface.
#
# The MAC address of this interface can be used by external networking equipment
# to assign a static IP (via DHCP reservation) and route traffic through via
# policy-based routing (for example through a site-to-site VPN)
#
# The idea behind this is the following:
#
# ```
# ------------ -------------
# | Client pod | <------bridge------> | Gateway pod | <---macvlan---> LAN/Internet
# ------------ (L2 only, ------------- (SNAT)
# public IP and DNS (CoreDNS)
# gateway)
# ```
#
# NOTE:
# If the primary CNI used is Cilium, then it must be configured to disable
# the socket-level loadbalancer and fall back to the tc loadbalancer at the
# veth interface (`socketLB.hostNamespaceOnly=true`). Without this, pods
# with this interface attached will not be able to connect to service IP
# addresses. It is unclear (to me) if this is expected or a bug. See
# https://github.com/cilium/cilium/issues/43896
#
# TODO: Use interface alias instead of `ethX` when
# https://github.com/siderolabs/talos/issues/12604 is resolved.
#
# TODO: Stop hardcoding lower MTU than necesasry when
# https://github.com/coredns/coredns/issues/7844 is resolved.
apiVersion: k8s.cni.cncf.io/v1
kind: NetworkAttachmentDefinition
metadata:
name: app-gateway-macvlan
namespace: app-gateway
spec:
config: '{
"cniVersion": "0.3.1",
"name": "app-gateway-macvlan",
"plugins": [
{
"type": "macvlan",
"master": "eth1",
"mode": "bridge",
"ipam": {
"type": "dhcp"
},
"mtu": 1420
},
{
"type": "sbr"
},
{
"type": "tuning",
"sysctl": {
"net.ipv6.conf.all.disable_ipv6": "1",
"net.ipv6.conf.default.disable_ipv6": "1"
}
}
]
}'
---
# Inspired by https://github.com/solidDoWant/infra-mk3/tree/master/cluster/gitops/networking/vpn
#
# This network definition defines a bridge interface with a static IP.
#
# It is designed to be attached to a **single** pod, which has other means of
# dispatching traffic out of the cluster. For example using the `macvlan`
# interface defined above.
#
# The idea behind this is the following:
#
# ```
# ------------ -------------
# | Client pod | <------bridge------> | Gateway pod | <---macvlan---> LAN/Internet
# ------------ (L2 only, ------------- (SNAT)
# public IP and DNS (CoreDNS)
# gateway)
# ```
#
# This document defines the "gateway" side of the bridge interface.
#
# For this to work properly:
#
# 1. The "client" side of the interface **must**:
# - Have the same device name (in this case `br0-net`)
# - Have other interface options (such as MTU) match this interface config
# - Have its gateway set to the IP of this interface
# - Have a route for `${KUBEDNS_SVC_IP}/32` set with this interface as the
# gateway. Adding this route to client pods 'hijacks' the pod DNS traffic
# and reroutes it to the gateway pod, which in this case runs CoreDNS. The
# traffic is then DNAT-ed and the resolved DNS request is then returned to
# the client pod. Unless explicitly configured otherwise, the default
# `kube-dns` service IP address is the 10th IP of the service network. For
# example, if the service network is `10.0.0.0/16`, then the `kube-dns`
# service IP should be `10.0.0.10` by default.
# 2. The primary CNI:
# - If the primary CNI used is Cilium, then it must be configured to disable
# the socket-level loadbalancer and fall back to the tc loadbalancer at
# the veth interface (`socketLB.hostNamespaceOnly=true`). Without this the
# client DNS traffic will leak through `eth0` interface even if a
# `$CLUSTER_DNS_SVC_IP/32` is present at in the client pod's routing table.
apiVersion: k8s.cni.cncf.io/v1
kind: NetworkAttachmentDefinition
metadata:
name: app-gateway-bridge
namespace: app-gateway
spec:
config: '{
"cniVersion": "0.3.1",
"name": "app-gateway-bridge",
"plugins": [
{
"type": "bridge",
"bridge": "br0-net",
"ipam": {
"type": "static",
"addresses": [
{
"address": "${APP_GATEWAY_IP_CIDR}"
}
]
}
},
{
"type": "tuning",
"sysctl": {
"net.ipv6.conf.all.disable_ipv6": "1",
"net.ipv6.conf.default.disable_ipv6": "1"
}
}
]
}'
---
# yaml-language-server: $schema=https://kubernetes-schemas.pages.dev/helm.toolkit.fluxcd.io/helmrelease_v2.json
#
# Heavily inspired by: https://github.com/solidDoWant/infra-mk3/blob/master/cluster/gitops/networking/vpn/dns/hr.yaml
apiVersion: helm.toolkit.fluxcd.io/v2
kind: HelmRelease
metadata:
name: app-gateway
namespace: app-gateway
spec:
interval: 15m
chart:
spec:
chart: coredns
version: 1.43.3
sourceRef:
kind: HelmRepository
namespace: flux-system
name: coredns-charts
interval: 15m
timeout: 20m
maxHistory: 3
install:
createNamespace: true
remediation:
retries: 3
upgrade:
cleanupOnFail: true
remediation:
retries: 3
uninstall:
keepHistory: false
values:
# Since DNS requests are not served over the primary CNI interface, there is
# no k8s service abstraction and load-balancing available. Set replicas to 1
replicaCount: 1
resources:
requests:
cpu: 50m
prometheus:
service:
enabled: true
monitor:
enabled: true
service:
name: app-gateway
serviceAccount:
create: true
isClusterService: false
priorityClassName: system-cluster-critical
# Containerd 2.0 enables binding to port numbers <= 1024 by default, so the
# NET_BIND_SERVICE capability is not needed
# For some weird reason as of 1.12.3 this is needed or the entrypoint fails
# with "permission denied" upon `exec`.
# securityContext:
# capabilities:
# add: []
servers:
- zones:
- zone: .
scheme: dns://
use_tcp: true # Respond to both TCP and UDP requests
port: 53
plugins:
- name: errors
- name: health
parameters: 0.0.0.0:8080 # Default is localhost only
configBlock: lameduck 5s
- name: ready
parameters: 0.0.0.0:8181 # Default is localhost only
- name: log
- name: prometheus
parameters: 0.0.0.0:9153 # Default is localhost only
- name: reload
- name: loop
- name: loadbalance
- name: bind
parameters: 0.0.0.0 # Default is localhost only
# Reject requests from public IP space or pods not on the app-gateway bridge
# network
- name: acl
configBlock: |-
allow net ${APP_GATEWAY_NET}
drop
- name: whoami
- name: kubernetes
parameters: cluster.local in-addr.arpa
configBlock: |-
pods verified
fallthrough in-addr.arpa
# Resolve PTR records for services
- name: k8s_external
parameters: in-addr.arpa
configBlock: |-
fallthrough in-addr.arpa
# Fix PTR record lookups
- name: rewrite
parameters: stop
configBlock: |-
name suffix .in-addr.arpa. .in-addr.arpa.
answer name auto
answer value (.*)\.cluster\.local\. {1}.cluster.local
- name: cache
parameters: 30
# Forward everything else to public servers, which are only reachable via the macvlan gateway
- name: forward
parameters: . tls://9.9.9.9 tls://149.112.112.112
configBlock: |-
tls_servername dns.quad9.net
policy random
health_check 1s
failfast_all_unhealthy_upstreams
next NXDOMAIN SERVFAIL
- name: forward
parameters: . tls://1.1.1.1 tls://1.0.0.1
configBlock: |-
tls_servername one.one.one.one
policy random
health_check 1s
failfast_all_unhealthy_upstreams
podAnnotations:
k8s.v1.cni.cncf.io/networks: '[
{
"name": "app-gateway-bridge",
"interface": "br-in0"
},
{
"name": "app-gateway-macvlan",
"interface": "macv-out0",
"mac": "${APP_GATEWAY_MACVLAN_MAC}"
}
]'
livenessProbe:
initialDelaySeconds: 5
readinessProbe:
initialDelaySeconds: 5
# XXX: Enabling this will cause a deadlock during node drain in single-node
# clusters
# podDisruptionBudget:
# minAvailable: 1
k8sAppLabelOverride: app-gateway
initContainers:
- name: setup-routing
image: nicolaka/netshoot:v0.15
imagePullPolicy: IfNotPresent
securityContext:
capabilities:
add: ["NET_ADMIN"]
command:
- sh
- -c
- |
set -eu
MACVLAN_IP="$(ip -4 addr show macv-out0 | awk '/inet / {print $2}' | cut -d/ -f1)"
MACVLAN_TABLE_ID="$(ip rule show | grep "$MACVLAN_IP" | awk '/lookup/ {print $NF}')"
BRIDGE_GATEWAY_IP="$(ip -4 addr show br-in0 | awk '/inet / {print $2}' | cut -d/ -f1)"
BRIDGE_GATEWAY_NET="$(ip route show dev br-in0 | grep -v default | awk '{print $1}')"
ETH0_INGRESS_FWMARK="0x123"
IP_PORT_PAIRS="${APP_GATEWAY_CLIENT_PORT_FORWARDS}"
if [ -z "$MACVLAN_TABLE_ID" ] || [ "$MACVLAN_TABLE_ID" = "main" ]; then
echo "the macv-out0 interface (IP: $MACVLAN_IP) is expected to have its own routing table. Did you forget to add the 'sbr' binary to the NetworkAttachmentDefinition?"
exit 1
fi
#------------------------ Default firewall rules ------------------------#
# Drop everything by default
iptables -P INPUT DROP
iptables -P FORWARD DROP
iptables -P OUTPUT DROP
# Allow all response traffic
iptables -A INPUT -m conntrack --ctstate ESTABLISHED,RELATED -j ACCEPT
iptables -A OUTPUT -m conntrack --ctstate ESTABLISHED,RELATED -j ACCEPT
iptables -A FORWARD -m conntrack --ctstate ESTABLISHED,RELATED -j ACCEPT
#---------------------------------- lo ----------------------------------#
# Recreate the local traffic rule in case it has been changed by any of the
# CNI plugins
ip rule del priority 0
ip rule add from all lookup local priority 0
# Allow all local traffic
iptables -A INPUT -i lo -j ACCEPT
iptables -A OUTPUT -o lo -j ACCEPT
#--------------------------------- eth0 ---------------------------------#
# Set fwmark for ingress traffic coming through eth0
# This is used to identify and route the resulting response traffic
iptables -t mangle -A PREROUTING -i eth0 -m conntrack --ctstate NEW -j CONNMARK --set-mark "$ETH0_INGRESS_FWMARK"
iptables -t mangle -A OUTPUT -j CONNMARK --restore-mark
# Allow all ingress eth0 traffic
iptables -A INPUT -i eth0 -j ACCEPT
# Allow all egress eth0 traffic destined for the cluster nets
iptables -A OUTPUT -d "${SVC_NET}" -o eth0 -j ACCEPT
iptables -A OUTPUT -d "${POD_NET}" -o eth0 -j ACCEPT
#-------------------------------- br-in0 --------------------------------#
# DNAT between ${KUBEDNS_SVC_IP}:53 and $BRIDGE_GATEWAY_IP:53 (on which
# CoreDNS is listening) for all ingress traffic coming over br-in0.
# This effectively hijacks all KubeDNS destined traffic coming from client
# pods and forces resolution by this CubeDNS instance instead of KubeDNS.
# Requires client pods to be configured to send KubeDNS requests to this pod
for PROTO in tcp udp; do
iptables -t nat -A PREROUTING -i br-in0 -d "${KUBEDNS_SVC_IP}" -p "$PROTO" --dport 53 -j DNAT --to-destination "$BRIDGE_GATEWAY_IP:53"
iptables -A INPUT -i br-in0 -d "$BRIDGE_GATEWAY_IP" -p "$PROTO" --dport 53 -j ACCEPT
done
# Allow pinging the gateway
iptables -A INPUT -i br-in0 -p icmp --icmp-type echo-request -d "$BRIDGE_GATEWAY_IP" -j ACCEPT
#------------------------------- macv-out0 ------------------------------#
# SNAT all egress macv-out0 traffic
iptables -t nat -A POSTROUTING -o macv-out0 -j MASQUERADE
# Allow all egress traffic through macv-out0
iptables -A OUTPUT -o macv-out0 -j ACCEPT
#------------------------- macv-out0 <--> br-in0 ------------------------#
# Port forwarding
for PAIR in $IP_PORT_PAIRS; do
# Split the pair by colon
TARGET_IP="$${PAIR%:*}"
TARGET_PORT="$${PAIR#*:}"
for PROTO in tcp udp; do
# Set up DNAT from macv-out0:$TARGET_PORT to $TARGET_IP:$TARGET_PORT
iptables -t nat -A PREROUTING -i macv-out0 -p "$PROTO" --dport "$TARGET_PORT" -j DNAT --to-destination "$TARGET_IP:$TARGET_PORT"
# Set up SNAT for $TARGET_IP:$TARGET_PORT to appear from $BRIDGE_GATEWAY_IP
iptables -t nat -A POSTROUTING -o br-in0 -d "$TARGET_IP" -p "$PROTO" --dport "$TARGET_PORT" -j SNAT --to-source "$BRIDGE_GATEWAY_IP"
# Allow forwarded ingress traffic between macv-out0:$TARGET_PORT and $TARGET_IP:$TARGET_PORT
iptables -A FORWARD -i macv-out0 -o br-in0 -d "$TARGET_IP" -p "$PROTO" --dport "$TARGET_PORT" -j ACCEPT
done
done
# Allow all traffic between br-in0 and macv-out0
iptables -A FORWARD -i br-in0 -o macv-out0 -j ACCEPT
#-------------------------------- Logging -------------------------------#
# Log anything that didn't match the ACCEPT rules above
iptables -A INPUT -m limit --limit 5/min -j LOG --log-prefix "APP_GATEWAY_INPUT_DROP: " --log-level 4
iptables -A OUTPUT -m limit --limit 5/min -j LOG --log-prefix "APP_GATEWAY_OUTPUT_DROP: " --log-level 4
iptables -A FORWARD -m limit --limit 5/min -j LOG --log-prefix "APP_GATEWAY_FORWARD_DROP: " --log-level 4
#----------------------------- Routing rules ----------------------------#
# Use the main table when routing traffic destined for the cluster nets
ip rule add to "${SVC_NET}" lookup main priority 100
ip rule add to "${POD_NET}" lookup main priority 100
# Ensure eth0 ingress traffic is always returned back out through eth0
# regardless of its destination. This allows things like health probes to
# work, since the destination is the kubelet (the node IP)
ip rule add fwmark "$ETH0_INGRESS_FWMARK" lookup main priority 100
# Use the main table when routing traffic destined for the br-in0 network
ip rule add to "$BRIDGE_GATEWAY_NET" lookup main priority 105
# For everything else use the macv-out0 table
# This effectively forces all egress traffic to not destined for the
# br-in0 or cluster networks out through macv-out0.
ip rule add from all lookup "$MACVLAN_TABLE_ID" priority 110
#------------------------------------------------------------------------#
apiVersion: helm.toolkit.fluxcd.io/v2
kind: HelmRelease
metadata:
name: cilium
namespace: kube-system
annotations:
meta.helm.sh/release-name: cilium
meta.helm.sh/release-namespace: kube-system
labels:
app.kubernetes.io/managed-by: Helm
spec:
interval: 15m
chart:
spec:
chart: cilium
version: 1.18.6
sourceRef:
kind: HelmRepository
name: home-ops-mirror
namespace: flux-system
interval: 15m
timeout: 20m
maxHistory: 3
driftDetection:
mode: warn
install:
remediation:
retries: 3
upgrade:
cleanupOnFail: true
remediation:
retries: 3
remediateLastFailure: true
uninstall:
keepHistory: false
values:
######################## MULTUS RELEVANT CONFIG ##############################
cni:
exclusive: false
# Make sure there isnt a lingering `/etc/cni/net.d/05-cilium.conflist` file
# left on any of the nodes. See multus chart troubleshooting docs for more info
confPath: /etc/cni/net.d/cilium
socketLB:
# Do not perform SVC IP DNAT at the socket level, instead defer DNAT until
# the packet hits the veth pair on the host
#
# This is needed for 2 reasons:
# - Allow SVC connectivity in multi-homed pods (see https://github.com/cilium/cilium/issues/43896)
# - Prevent leaking DNS traffic from client pods which route KubeDNS service IPs through an app gateway
hostNamespaceOnly: true
##############################################################################
hubble:
enabled: false
envoy:
enabled: false
cluster:
name: ${CLUSTERNAME}
id: 1
ipv4NativeRoutingCIDR: ${PODNET}
securityContext:
privileged: true
capabilities:
ciliumAgent:
- CHOWN
- KILL
- NET_ADMIN
- NET_RAW
- IPC_LOCK
- SYS_ADMIN
- SYS_RESOURCE
- DAC_OVERRIDE
- FOWNER
- SETGID
- SETUID
cleanCiliumState:
- NET_ADMIN
- SYS_ADMIN
- SYS_RESOURCE
cgroup:
automount:
enabled: false
hostRoot: /sys/fs/cgroup
enableRuntimeDeviceDetection: true
endpointRoutes:
enabled: true
ipam:
mode: kubernetes
k8sServiceHost: 127.0.0.1
k8sServicePort: 7445
kubeProxyReplacement: true
kubeProxyReplacementHealthzBindAddr: 0.0.0.0:10256
localRedirectPolicy: true
operator:
replicas: 1
rollOutPods: true
prometheus:
enabled: true
serviceMonitor:
enabled: true
dashboards:
enabled: true
prometheus:
enabled: true
serviceMonitor:
enabled: true
dashboards:
enabled: true
rollOutCiliumPods: true
apiVersion: v1
kind: Namespace
metadata:
name: app-gateway-client-nets
---
# Inspired by https://github.com/solidDoWant/infra-mk3/tree/master/cluster/gitops/networking/vpn
#
# This network definition defines a bridge interface with a static IP.
#
# It is designed to be attached to client pods, which wish to have their
# public traffic (and DNS!) routed over a common gateway (i.e. a gateway pod).
#
# The public IP routes are generated via this script:
#
# ```bash
# $ cat <<EOF | python -
# from netaddr import IPSet, IPNetwork
#
# subset = IPSet(IPNetwork("0.0.0.0/0"))
# subset.remove("10.0.0.0/8") # Private
# subset.remove("172.16.0.0/12") # Private
# subset.remove("192.168.0.0/16") # Private
# subset.remove("0.0.0.0/8") # Self-Identification/This network
# subset.remove("127.0.0.0/8") # Loopback
# subset.remove("169.254.0.0/16") # Link-Local/Cloud Metadata
# subset.remove("255.255.255.255/32") # Limited Broadcast
# subset.remove("224.0.0.0/4") # Local multicast
#
# for subnet in subset.iter_cidrs():
# print(f"{{\"dst\": \"{str(subnet)}\"}},")
# EOF
# ```
#
# The idea behind this is the following:
#
# ```
# ------------ -------------
# | Client pod | <------bridge------> | Gateway pod | <---macvlan---> LAN/Internet
# ------------ (L2 only, ------------- (SNAT)
# public IP and DNS (CoreDNS)
# gateway)
# ```
#
# This document defines the "client" side of the bridge interface. For this to
# work properly, the "gateway" side of the interface **must**:
#
# - Have the same device name (in this case `br0-net`)
# - Have other interface options (such as MTU) match this interface config
# - Have its IP address set to the value of this interface's gateway IP
# - Perform traffic SNAT and DNS DNAT (the kube-dns service IP must be DNAT-ed
# and rerouted towards the CoreDNS instance running on the gateway pod)
apiVersion: k8s.cni.cncf.io/v1
kind: NetworkAttachmentDefinition
metadata:
name: client-net-static
namespace: app-gateway-client-nets
spec:
config: '{
"cniVersion": "0.3.1",
"name": "client-net-static",
"plugins": [
{
"type": "bridge",
"bridge": "br0-net",
"ipam": {
"type": "static",
"routes": [
{"dst": "${KUBEDNS_SVC_IP}/32", "gw": "${APP_GATEWAY_IP}"},
{"dst": "${APP_GATEWAY_NET}", "gw": "${APP_GATEWAY_IP}"},
{"dst": "1.0.0.0/8", "gw": "${APP_GATEWAY_IP}"},
{"dst": "2.0.0.0/7", "gw": "${APP_GATEWAY_IP}"},
{"dst": "4.0.0.0/6", "gw": "${APP_GATEWAY_IP}"},
{"dst": "8.0.0.0/7", "gw": "${APP_GATEWAY_IP}"},
{"dst": "11.0.0.0/8", "gw": "${APP_GATEWAY_IP}"},
{"dst": "12.0.0.0/6", "gw": "${APP_GATEWAY_IP}"},
{"dst": "16.0.0.0/4", "gw": "${APP_GATEWAY_IP}"},
{"dst": "32.0.0.0/3", "gw": "${APP_GATEWAY_IP}"},
{"dst": "64.0.0.0/3", "gw": "${APP_GATEWAY_IP}"},
{"dst": "96.0.0.0/4", "gw": "${APP_GATEWAY_IP}"},
{"dst": "112.0.0.0/5", "gw": "${APP_GATEWAY_IP}"},
{"dst": "120.0.0.0/6", "gw": "${APP_GATEWAY_IP}"},
{"dst": "124.0.0.0/7", "gw": "${APP_GATEWAY_IP}"},
{"dst": "126.0.0.0/8", "gw": "${APP_GATEWAY_IP}"},
{"dst": "128.0.0.0/3", "gw": "${APP_GATEWAY_IP}"},
{"dst": "160.0.0.0/5", "gw": "${APP_GATEWAY_IP}"},
{"dst": "168.0.0.0/8", "gw": "${APP_GATEWAY_IP}"},
{"dst": "169.0.0.0/9", "gw": "${APP_GATEWAY_IP}"},
{"dst": "169.128.0.0/10", "gw": "${APP_GATEWAY_IP}"},
{"dst": "169.192.0.0/11", "gw": "${APP_GATEWAY_IP}"},
{"dst": "169.224.0.0/12", "gw": "${APP_GATEWAY_IP}"},
{"dst": "169.240.0.0/13", "gw": "${APP_GATEWAY_IP}"},
{"dst": "169.248.0.0/14", "gw": "${APP_GATEWAY_IP}"},
{"dst": "169.252.0.0/15", "gw": "${APP_GATEWAY_IP}"},
{"dst": "169.255.0.0/16", "gw": "${APP_GATEWAY_IP}"},
{"dst": "170.0.0.0/7", "gw": "${APP_GATEWAY_IP}"},
{"dst": "172.0.0.0/12", "gw": "${APP_GATEWAY_IP}"},
{"dst": "172.32.0.0/11", "gw": "${APP_GATEWAY_IP}"},
{"dst": "172.64.0.0/10", "gw": "${APP_GATEWAY_IP}"},
{"dst": "172.128.0.0/9", "gw": "${APP_GATEWAY_IP}"},
{"dst": "173.0.0.0/8", "gw": "${APP_GATEWAY_IP}"},
{"dst": "174.0.0.0/7", "gw": "${APP_GATEWAY_IP}"},
{"dst": "176.0.0.0/4", "gw": "${APP_GATEWAY_IP}"},
{"dst": "192.0.0.0/9", "gw": "${APP_GATEWAY_IP}"},
{"dst": "192.128.0.0/11", "gw": "${APP_GATEWAY_IP}"},
{"dst": "192.160.0.0/13", "gw": "${APP_GATEWAY_IP}"},
{"dst": "192.169.0.0/16", "gw": "${APP_GATEWAY_IP}"},
{"dst": "192.170.0.0/15", "gw": "${APP_GATEWAY_IP}"},
{"dst": "192.172.0.0/14", "gw": "${APP_GATEWAY_IP}"},
{"dst": "192.176.0.0/12", "gw": "${APP_GATEWAY_IP}"},
{"dst": "192.192.0.0/10", "gw": "${APP_GATEWAY_IP}"},
{"dst": "193.0.0.0/8", "gw": "${APP_GATEWAY_IP}"},
{"dst": "194.0.0.0/7", "gw": "${APP_GATEWAY_IP}"},
{"dst": "196.0.0.0/6", "gw": "${APP_GATEWAY_IP}"},
{"dst": "200.0.0.0/5", "gw": "${APP_GATEWAY_IP}"},
{"dst": "208.0.0.0/4", "gw": "${APP_GATEWAY_IP}"},
{"dst": "240.0.0.0/5", "gw": "${APP_GATEWAY_IP}"},
{"dst": "248.0.0.0/6", "gw": "${APP_GATEWAY_IP}"},
{"dst": "252.0.0.0/7", "gw": "${APP_GATEWAY_IP}"},
{"dst": "254.0.0.0/8", "gw": "${APP_GATEWAY_IP}"},
{"dst": "255.0.0.0/9", "gw": "${APP_GATEWAY_IP}"},
{"dst": "255.128.0.0/10", "gw": "${APP_GATEWAY_IP}"},
{"dst": "255.192.0.0/11", "gw": "${APP_GATEWAY_IP}"},
{"dst": "255.224.0.0/12", "gw": "${APP_GATEWAY_IP}"},
{"dst": "255.240.0.0/13", "gw": "${APP_GATEWAY_IP}"},
{"dst": "255.248.0.0/14", "gw": "${APP_GATEWAY_IP}"},
{"dst": "255.252.0.0/15", "gw": "${APP_GATEWAY_IP}"},
{"dst": "255.254.0.0/16", "gw": "${APP_GATEWAY_IP}"},
{"dst": "255.255.0.0/17", "gw": "${APP_GATEWAY_IP}"},
{"dst": "255.255.128.0/18", "gw": "${APP_GATEWAY_IP}"},
{"dst": "255.255.192.0/19", "gw": "${APP_GATEWAY_IP}"},
{"dst": "255.255.224.0/20", "gw": "${APP_GATEWAY_IP}"},
{"dst": "255.255.240.0/21", "gw": "${APP_GATEWAY_IP}"},
{"dst": "255.255.248.0/22", "gw": "${APP_GATEWAY_IP}"},
{"dst": "255.255.252.0/23", "gw": "${APP_GATEWAY_IP}"},
{"dst": "255.255.254.0/24", "gw": "${APP_GATEWAY_IP}"},
{"dst": "255.255.255.0/25", "gw": "${APP_GATEWAY_IP}"},
{"dst": "255.255.255.128/26", "gw": "${APP_GATEWAY_IP}"},
{"dst": "255.255.255.192/27", "gw": "${APP_GATEWAY_IP}"},
{"dst": "255.255.255.224/28", "gw": "${APP_GATEWAY_IP}"},
{"dst": "255.255.255.240/29", "gw": "${APP_GATEWAY_IP}"},
{"dst": "255.255.255.248/30", "gw": "${APP_GATEWAY_IP}"},
{"dst": "255.255.255.252/31", "gw": "${APP_GATEWAY_IP}"},
{"dst": "255.255.255.254/32", "gw": "${APP_GATEWAY_IP}"}
]
}
},
{
"type": "tuning",
"sysctl": {
"net.ipv6.conf.all.disable_ipv6": "1",
"net.ipv6.conf.default.disable_ipv6": "1"
}
}
]
}'
---
# Inspired by https://github.com/solidDoWant/infra-mk3/tree/master/cluster/gitops/networking/vpn
#
# This network definition defines a bridge interface with a range of IPs.
#
# It is designed to be attached to client pods, which wish to have their
# public traffic (and DNS!) routed over a common gateway (i.e. a gateway pod).
#
# The public IP routes are generated via this script:
#
# ```bash
# $ cat <<EOF | python -
# from netaddr import IPSet, IPNetwork
#
# subset = IPSet(IPNetwork("0.0.0.0/0"))
# subset.remove("10.0.0.0/8") # Private
# subset.remove("172.16.0.0/12") # Private
# subset.remove("192.168.0.0/16") # Private
# subset.remove("0.0.0.0/8") # Self-Identification/This network
# subset.remove("127.0.0.0/8") # Loopback
# subset.remove("169.254.0.0/16") # Link-Local/Cloud Metadata
# subset.remove("255.255.255.255/32") # Limited Broadcast
# subset.remove("224.0.0.0/4") # Local multicast
#
# for subnet in subset.iter_cidrs():
# print(f"{{\"dst\": \"{str(subnet)}\"}},")
# EOF
# ```
#
# The idea behind this is the following:
#
# ```
# ------------ -------------
# | Client pod | <------bridge------> | Gateway pod | <---macvlan---> LAN/Internet
# ------------ (L2 only, ------------- (SNAT)
# public IP and DNS (CoreDNS)
# gateway)
# ```
#
# This document defines the "client" side of the bridge interface. For this to
# work properly, the "gateway" side of the interface **must**:
#
# - Have the same device name (in this case `br0-net`)
# - Have other interface options (such as MTU) match this interface config
# - Have its IP address set to the value of this interface's gateway IP
# - Perform traffic SNAT and DNS DNAT (the kube-dns service IP must be DNAT-ed
# and rerouted towards the CoreDNS instance running on the gateway pod)
apiVersion: k8s.cni.cncf.io/v1
kind: NetworkAttachmentDefinition
metadata:
name: client-net
namespace: app-gateway-client-nets
spec:
config: '{
"cniVersion": "0.3.1",
"name": "client-net",
"plugins": [
{
"type": "bridge",
"bridge": "br0-net",
"ipam": {
"type": "host-local",
"subnet": "${APP_GATEWAY_NET}",
"rangeStart": "${APP_GATEWAY_CLIENTS_IP_DHCP_RANGE_START}",
"rangeEnd": "${APP_GATEWAY_CLIENTS_IP_DHCP_RANGE_END}",
"gateway": "${APP_GATEWAY_IP}",
"routes": [
{"dst": "${KUBEDNS_SVC_IP}/32"},
{"dst": "1.0.0.0/8"},
{"dst": "2.0.0.0/7"},
{"dst": "4.0.0.0/6"},
{"dst": "8.0.0.0/7"},
{"dst": "11.0.0.0/8"},
{"dst": "12.0.0.0/6"},
{"dst": "16.0.0.0/4"},
{"dst": "32.0.0.0/3"},
{"dst": "64.0.0.0/3"},
{"dst": "96.0.0.0/4"},
{"dst": "112.0.0.0/5"},
{"dst": "120.0.0.0/6"},
{"dst": "124.0.0.0/7"},
{"dst": "126.0.0.0/8"},
{"dst": "128.0.0.0/3"},
{"dst": "160.0.0.0/5"},
{"dst": "168.0.0.0/8"},
{"dst": "169.0.0.0/9"},
{"dst": "169.128.0.0/10"},
{"dst": "169.192.0.0/11"},
{"dst": "169.224.0.0/12"},
{"dst": "169.240.0.0/13"},
{"dst": "169.248.0.0/14"},
{"dst": "169.252.0.0/15"},
{"dst": "169.255.0.0/16"},
{"dst": "170.0.0.0/7"},
{"dst": "172.0.0.0/12"},
{"dst": "172.32.0.0/11"},
{"dst": "172.64.0.0/10"},
{"dst": "172.128.0.0/9"},
{"dst": "173.0.0.0/8"},
{"dst": "174.0.0.0/7"},
{"dst": "176.0.0.0/4"},
{"dst": "192.0.0.0/9"},
{"dst": "192.128.0.0/11"},
{"dst": "192.160.0.0/13"},
{"dst": "192.169.0.0/16"},
{"dst": "192.170.0.0/15"},
{"dst": "192.172.0.0/14"},
{"dst": "192.176.0.0/12"},
{"dst": "192.192.0.0/10"},
{"dst": "193.0.0.0/8"},
{"dst": "194.0.0.0/7"},
{"dst": "196.0.0.0/6"},
{"dst": "200.0.0.0/5"},
{"dst": "208.0.0.0/4"},
{"dst": "240.0.0.0/5"},
{"dst": "248.0.0.0/6"},
{"dst": "252.0.0.0/7"},
{"dst": "254.0.0.0/8"},
{"dst": "255.0.0.0/9"},
{"dst": "255.128.0.0/10"},
{"dst": "255.192.0.0/11"},
{"dst": "255.224.0.0/12"},
{"dst": "255.240.0.0/13"},
{"dst": "255.248.0.0/14"},
{"dst": "255.252.0.0/15"},
{"dst": "255.254.0.0/16"},
{"dst": "255.255.0.0/17"},
{"dst": "255.255.128.0/18"},
{"dst": "255.255.192.0/19"},
{"dst": "255.255.224.0/20"},
{"dst": "255.255.240.0/21"},
{"dst": "255.255.248.0/22"},
{"dst": "255.255.252.0/23"},
{"dst": "255.255.254.0/24"},
{"dst": "255.255.255.0/25"},
{"dst": "255.255.255.128/26"},
{"dst": "255.255.255.192/27"},
{"dst": "255.255.255.224/28"},
{"dst": "255.255.255.240/29"},
{"dst": "255.255.255.248/30"},
{"dst": "255.255.255.252/31"},
{"dst": "255.255.255.254/32"}
]
}
},
{
"type": "tuning",
"sysctl": {
"net.ipv6.conf.all.disable_ipv6": "1",
"net.ipv6.conf.default.disable_ipv6": "1"
}
}
]
}'
---
apiVersion: v1
kind: Namespace
metadata:
name: whatever
---
# yaml-language-server: $schema=https://kubernetes-schemas.pages.dev/helm.toolkit.fluxcd.io/helmrelease_v2.json
apiVersion: helm.toolkit.fluxcd.io/v2
kind: HelmRelease
metadata:
name: normal-client
namespace: whatever
spec:
interval: 15m
chart:
spec:
chart: app-template
version: 15.31.4
sourceRef:
kind: HelmRepository
name: truecharts
namespace: flux-system
interval: 15m
timeout: 20m
maxHistory: 3
install:
createNamespace: true
remediation:
retries: 3
upgrade:
cleanupOnFail: true
remediation:
retries: 3
values:
networkPolicy:
# Network policy will only block traffic over the primary CNI
# traffic over the Multus network is unaffected
main:
enabled: true
# Only block egress traffic
policyType: egress
egress:
# Block everything except to this chart's namespace
- to:
- podSelector:
matchExpressions:
- key: app.kubernetes.io/name
operator: Exists
workload:
main:
podSpec:
annotations:
k8s.v1.cni.cncf.io/networks: app-gateway-client-nets/client-net@br-out0
containers:
main:
enabled: false
service:
main:
enabled: false
addons:
netshoot:
enabled: true
container:
enabled: true
primary: true
command:
- /bin/sh
- -c
- sleep infinity
securityContext:
capabilities:
add: [] # Otherwise needs a priviledged namespace to deploy
---
# yaml-language-server: $schema=https://kubernetes-schemas.pages.dev/helm.toolkit.fluxcd.io/helmrelease_v2.json
apiVersion: helm.toolkit.fluxcd.io/v2
kind: HelmRelease
metadata:
name: port-forwarded-client
namespace: whatever
spec:
interval: 15m
chart:
spec:
chart: app-template
version: 15.31.4
sourceRef:
kind: HelmRepository
name: truecharts
namespace: flux-system
interval: 15m
timeout: 20m
maxHistory: 3
install:
createNamespace: true
remediation:
retries: 3
upgrade:
cleanupOnFail: true
remediation:
retries: 3
values:
networkPolicy:
# Network policy will only block traffic over the primary CNI
# traffic over the Multus network is unaffected
main:
enabled: true
# Only block egress traffic
policyType: egress
egress:
# Block everything except to this chart's namespace
- to:
- podSelector:
matchExpressions:
- key: app.kubernetes.io/name
operator: Exists
workload:
main:
podSpec:
annotations:
k8s.v1.cni.cncf.io/networks: '[
{
"namespace": "app-gateway-client-nets",
"name": "client-net-static",
"interface": "br-out0",
"ips": ["${APP_GATEWAY_CLIENT_TEST_FILE_SERVER_IP_CIDR}"]
}
]'
containers:
main:
enabled: false
service:
main:
enabled: false
addons:
netshoot:
enabled: true
container:
enabled: true
primary: true
command:
- /bin/sh
- -c
- python -m http.server ${APP_GATEWAY_CLIENT_TEST_FILE_SERVER_PORT} --bind 0.0.0.0
securityContext:
capabilities:
add: [] # Otherwise needs a priviledged namespace to deploy
# All of these values are more or less arbitrary and can be configured to your liking
APP_GATEWAY_MACVLAN_MAC: 00:11:22:33:44:55
APP_GATEWAY_NET: 192.168.123.0/24
APP_GATEWAY_IP: 192.168.123.1
APP_GATEWAY_IP_CIDR: 192.168.123.1/24
APP_GATEWAY_CLIENTS_IP_DHCP_RANGE_START: 192.168.123.100
APP_GATEWAY_CLIENTS_IP_DHCP_RANGE_END: 192.168.123.254
APP_GATEWAY_CLIENT_TEST_FILE_SERVER_IP_CIDR: 192.168.123.10/24
APP_GATEWAY_CLIENT_TEST_FILE_SERVER_PORT: "8989"
APP_GATEWAY_CLIENT_PORT_FORWARDS: 192.168.123.10:8989
KUBEDNS_SVC_IP: 172.17.0.10
---
# Limitations:
# - Currently limited to a single node. If you need multi-node, you need to
# - replace the `host-local` IPAM binary with `whereabouts`
# - Add a VXLAN interface so client pods can communicate with the gateway pod across nodes
# - Pin your gateway pod to the specific node you want it on
# - See https://github.com/solidDoWant/infra-mk3/tree/master/cluster/gitops/networking/vpn for more info
#
# Tests
#
# $ kubectl -n whatever exec deployments/normal-client-app-template -- ping 1.1.1.1
# PING 1.1.1.1 (1.1.1.1) 56(84) bytes of data.
# 64 bytes from 1.1.1.1: icmp_seq=1 ttl=58 time=13.1 ms
# 64 bytes from 1.1.1.1: icmp_seq=2 ttl=58 time=13.3 ms
# ^C
# $ kubectl -n whatever exec deployments/port-forwarded-client-app-template -- ping 1.1.1.1
# PING 1.1.1.1 (1.1.1.1) 56(84) bytes of data.
# 64 bytes from 1.1.1.1: icmp_seq=1 ttl=58 time=13.8 ms
# 64 bytes from 1.1.1.1: icmp_seq=2 ttl=58 time=13.6 ms
# ^C
# $ curl ${MACLAN_IP_ON_LAN}:8989
# <!DOCTYPE HTML>
# <html lang="en">
# <head>
# <meta charset="utf-8">
# <title>Directory listing for /</title>
# </head>
# <body>
# <h1>Directory listing for /</h1>
# <hr>
# <ul>
# <li><a href=".oh-my-zsh/">.oh-my-zsh/</a></li>
# <li><a href=".zshrc">.zshrc</a></li>
# <li><a href="motd">motd</a></li>
# </ul>
# <hr>
# </body>
# </html>
# yaml-language-server: $schema=https://kubernetes-schemas.pages.dev/helm.toolkit.fluxcd.io/helmrelease_v2.json
apiVersion: helm.toolkit.fluxcd.io/v2
kind: HelmRelease
metadata:
name: multus-cni
namespace: kube-system
spec:
interval: 15m
chart:
spec:
chart: multus-cni
version: 1.1.1
sourceRef:
kind: HelmRepository
name: truecharts
namespace: flux-system
interval: 15m
timeout: 20m
maxHistory: 3
install:
crds: CreateReplace
createNamespace: true
remediation:
retries: 3
upgrade:
crds: CreateReplace
cleanupOnFail: true
remediation:
retries: 3
uninstall:
keepHistory: false
values:
multus:
# Primary CNI value can be:
# - "", in which case the aphabetically first file in the CNI JSON
# configuration directory (pointed to by `persistence.cniconf.mountPath`)
# will be used
# - The relative path to a CNI config file inside the CNI JSON configuration
# directory (pointed to by `persistence.cniconf.mountPath`). For example
# `05-cilium.conflist`
primaryCniConfigFile: "cilium/05-cilium.conflist"
# Set to true to make Multus wait for the primary CNI to be ready before
# it starts attaching extra networks to pods. This can help with pods
# crash-looping when primary CNI is not ready
readinessIndicatorEnabled: false
logLevel: verbose # "debug", "error", "verbose", or "panic"
# Whether to isolate `NetworkAttachmentDefinition`s by namespace.
# Setting this to `true` will prevent sharing NAD resources across namespaces
namespaceIsolation: true
# A list of namespaces for which the `namespaceIsolation` (when set to `true`)
# does not apply. `NetworkAttachmentDefinition`s defined in these namespaces
# can be used across all namespaces. Note Multus treats the `default` namespace
# as global by default (when `globalNamespaces` is empty)
globalNamespaces:
- app-gateway-client-nets
# A list of additional networks to attach to every pod
# The items of this list can be the names of `NetworkAttachmentDefinition`s,
# names of CNI configuration files, paths to CNI configuation files and more.
# See Multus documentation for more information
defaultNetworks: []
# A list namespaces, which are excluded from attaching networks from the
# `defaultNetworks` list. By default, this chart's namespace is excluded.
systemNamespaces: []
# A map of capabilities, that are supported by at least one of the used CNI
# plugins. See Multus documentation for more information
capabilities: {}
# You should not need to change the Multus CNI config version, however,
# if you get an error about version incompatibility with the primary CNI it
# might help to match the multus CNI version to your primary CNI version
cniVersion: 0.3.1
integrations:
talos:
enabled: true # Must be enabled when installing on a Talos cluster
# Select extra reference CNIs to be installed
# Note:
# - If you are installing any of the reference CNIs, it is strongly
# recommended to override the `talosCniImage.tag` to match your
# Talos version
# - To install CNIs that are present in the `talosCniImage` but not listed
# below, simply add them below in the form of `<binary_name>: true`
installCni:
macvlan: true
ipvlan: false
sbr: true
dhcp: true
static: true
tuning: true
# Enable the chart's uninstall mode. This will clean up leftover chart
# configuration data and CNI plugins, allowing for a cleaner uninstall.
#
# Note:
# If the Talos integration is enabled, this chart assumes it has full control
# over all CNIs listed in `integrations.talos.installCni`(and set to `true`).
# During uninstall, it will remove all CNIs that are enabled (`true`). If this
# is undesired, set the keys of the relevant CNI names to `false` before
# enabling the uninstall mode.
uninstall: false
---
# yaml-language-server: $schema=https://kubernetes-schemas.pages.dev/source.toolkit.fluxcd.io/helmrepository_v1.json
apiVersion: source.toolkit.fluxcd.io/v1
kind: HelmRepository
metadata:
name: truecharts
namespace: flux-system
spec:
type: oci
interval: 5m
url: oci://oci.trueforge.org/truecharts
---
# yaml-language-server: $schema=https://kubernetes-schemas.pages.dev/source.toolkit.fluxcd.io/helmrepository_v1.json
apiVersion: source.toolkit.fluxcd.io/v1
kind: HelmRepository
metadata:
name: coredns-charts
namespace: flux-system
spec:
type: oci
interval: 5m
url: oci://ghcr.io/coredns/charts
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment