Skip to content

Instantly share code, notes, and snippets.

@drewr
Created March 5, 2026 22:40
Show Gist options
  • Select an option

  • Save drewr/2055b97dc8e7518cc298e58413688459 to your computer and use it in GitHub Desktop.

Select an option

Save drewr/2055b97dc8e7518cc298e58413688459 to your computer and use it in GitHub Desktop.

Datum Cloud Authoritative DNS Service: Research Report

1. Overview

The authoritative DNS service solves two distinct but related problems:

User-facing DNS management: Datum Cloud customers own domain names (e.g., example.com) and want Datum to serve authoritative DNS for them. Users create a Domain resource to claim ownership, a DNSZone resource to declare a hosted zone, and DNSRecordSet resources to manage records. Datum's infrastructure then serves live DNS responses for those zones.

Infrastructure DNS bootstrapping: Datum's own infrastructure (datumdomains.net, datumproxy.net, datum-cloud.net, prism.*.datum.net) needs authoritative nameservers. This is handled by a separate, static system (datum-auth-dns) consisting of KnotDNS + HickoryDNS instances managed via DNSEndpoint CRs and ExternalDNS RFC2136 updates.

The two systems share the same physical nameserver fleet but serve different purposes: the dns-operator system is the dynamic, Kubernetes-API-driven path for customer zones; the datum-auth-dns system is the static infrastructure path for Datum's own zones.


2. Major Components

Component Role
network-services-operator Watches user Domain resources; verifies domain ownership (TXT/HTTP/DNSZone); fetches RDAP/WHOIS registration metadata; creates DNSRecordSet resources for Gateway hostnames
dns-operator (control-plane role: --role=replicator) Watches DNSZone and DNSRecordSet resources across all project control planes (via Milo multi-cluster discovery); replicates them into the downstream cluster
dns-operator (downstream/agent role) In the downstream cluster; reconciles DNSZone / DNSRecordSet objects into PowerDNS via its API; manages DNSZoneClass / DNSZone / DNSRecordSet lifecycle
PowerDNS Auth 5.1 Authoritative DNS server; uses LMDB backend; configured to expand ALIAS records via an in-pod recursor
PowerDNS Recursor 5.1 In-pod sidecar used exclusively by PowerDNS for ALIAS record expansion (forwards to 1.1.1.1/8.8.8.8); listens on 127.0.0.1:5300
LightningStream Synchronizes LMDB state between the authoritative dns-operator writer and the read-only PowerDNS replicas; uses GCS (S3-compatible API) as the shared object store
GCS bucket (via Crossplane) Central object store for LMDB snapshots; provisioned by Crossplane storage.gcp.upbound.io/v1beta1 Bucket
external-dns-webhook Reads DNSEndpoint and Gateway HTTPRoute resources; translates them into DNSRecordSet objects in the Milo control plane for Datum's own infrastructure zones
external-dns (infra system) Reads Gateway routes; syncs DNS records to GCP Cloud DNS for the cluster's own *.staging.env.datum.net / *.production.env.datum.net hostnames
KnotDNS (datum-auth-dns) Authoritative NS for datumproxy.net, datum-cloud.net, prism.*.datum.net infrastructure zones; updated via RFC2136
HickoryDNS (datum-auth-dns) Serves only ns4.* addresses; exists to support memory-safety (Prossimo) demonstrating a Rust DNS server in production
ExternalDNS RFC2136 sidecars Paired with each KnotDNS/HickoryDNS pod; watch DNSEndpoint CRs and push updates via RFC2136 NSUPDATE
Milo control plane (milo-apiserver) Provides per-project Kubernetes-compatible API servers; DNS CRDs (DNSZone, DNSRecordSet) are installed into it and are the source-of-truth for customer DNS configuration
Redis (optional) Shared cache for RDAP/WHOIS registry lookup results and rate-limit state across network-services-operator replicas

3. Data Flow

3a. User creates a DNS zone and records (customer path)

1. User creates DNSZone + DNSRecordSet in their project control plane
   (Milo per-project apiserver, namespace = project namespace)

2. dns-operator (control-plane, --role=replicator) is watching all project
   control planes via Milo multi-cluster discovery
   -> Discovers new DNSZone, looks up its DNSZoneClass (e.g., datum-external-global-dns)
   -> Replicates DNSZone + all associated DNSRecordSets into the downstream cluster
      (datum-dns-system namespace)

3. dns-operator (downstream/agent) reconciles in the downstream cluster
   -> Calls PowerDNS API to create the zone in LMDB
   -> Writes each record set to LMDB via the PowerDNS API
   -> Sets status.Accepted=True, status.Programmed=True on DNSZone/DNSRecordSet
   -> Writes back nameservers (from DNSZoneClass.spec.nameServerPolicy.static)
      to DNSZone.status.nameservers

4. LightningStream detects the LMDB change (schema_tracks_changes: true)
   -> Uploads a new LMDB snapshot to the GCS bucket

5. Each PowerDNS DaemonSet pod's lightningstream container (in receive mode)
   polls the GCS bucket and downloads the latest snapshot to its local LMDB volume

6. PowerDNS reads zone data from the shared LMDB file
   -> DNS queries for the zone are now answered live on port 53

7. The GCS bucket has dual access:
   - Primary SA (objectAdmin) - the dns-operator writer
   - Secondary SA (objectViewer) - the read-only edge replicas
sequenceDiagram
    actor User
    participant Milo as Milo API Server<br/>(project control plane)
    participant Replicator as dns-operator<br/>(replicator)
    participant Downstream as Downstream Cluster<br/>API Server
    participant Agent as dns-operator<br/>(agent)
    participant PDNS as PowerDNS<br/>HTTP API :8082
    participant LSWriter as LightningStream<br/>(writer, primary SA)
    participant GCS as GCS Bucket<br/>datum-lightningstream
    participant LSReader as LightningStream<br/>(receiver, secondary SA)
    participant LMDB as /lmdb/db<br/>(emptyDir per pod)

    User->>Milo: kubectl create DNSZone + DNSRecordSet
    Milo-->>Replicator: watch event (DNSZone created)
    Replicator->>Downstream: replicate DNSZone + DNSRecordSet
    Downstream-->>Agent: watch event (DNSZone created)
    Agent->>PDNS: POST /api/v1/servers/localhost/zones (create zone)
    Agent->>PDNS: PUT /api/v1/.../zones/{zone}/records (write records)
    PDNS->>LMDB: write zone + records (LMDB append)
    Agent->>Milo: patch DNSZone status (Accepted=True, nameservers=[...])
    LMDB-->>LSWriter: schema_tracks_changes detects write
    LSWriter->>GCS: upload LMDB delta snapshot + update_marker
    loop each edge DaemonSet pod
        LSReader->>GCS: poll update_marker
        GCS-->>LSReader: new marker detected
        LSReader->>GCS: download delta snapshot
        LSReader->>LMDB: apply delta to local /lmdb/db
    end
    Note over LMDB: PowerDNS now serves live answers<br/>from memory-mapped LMDB
Loading

3b. Domain ownership verification

1. User creates Domain resource with spec.domainName = "example.com"

2. DomainReconciler (network-services-operator) runs:
   a. Validates eTLD+1 via publicsuffix
   b. Checks for existing DNSZone referencing this Domain
      -> If DNSZone exists, is Accepted+Programmed, and its status.nameservers
         overlap with Domain.status.nameservers from RDAP -> marks Verified=True
         via VerifiedDNSZone path (no TXT/HTTP challenge needed)
   c. If no DNSZone, generates a UUID verification token:
      -> DNS path: TXT record _datum-custom-hostname.<domainname>
      -> HTTP path: GET http://<domain>/.well-known/datum-custom-hostname-challenge/<uid>
   d. Retries on backoff (5s -> 1m -> 5m with 25% jitter)
   e. Concurrently runs RDAP/WHOIS lookup via registrydata.Client
      -> Populates status.registration (registrar, expiry, DNSSEC)
      -> Populates status.nameservers (from RDAP NS delegation)

3. Once Verified=True, Gateway controller can proceed to create DNS records
flowchart TD
    A[User creates Domain<br/>spec.domainName = example.com] --> B[Validate eTLD+1<br/>via publicsuffix]
    B --> C{DNSZone exists for<br/>this domain AND<br/>Accepted+Programmed?}
    C -->|Yes| D[Compare DNSZone.status.nameservers<br/>vs RDAP nameservers]
    D --> E{Nameservers overlap?}
    E -->|Yes| F[VerifiedDNSZone=True\nFastest path — no challenge needed]
    E -->|No| G[Fall through to TXT/HTTP]
    C -->|No| G
    G --> H[Generate UUID verification token]
    H --> I{Try TXT record<br/>_datum-custom-hostname.example.com}
    I -->|Found| J[VerifiedDNS=True]
    I -->|Not found| K{Try HTTP challenge<br/>/.well-known/datum-custom-hostname-challenge/UUID}
    K -->|200 OK| L[VerifiedHTTP=True]
    K -->|Fail| M[Backoff: 5s → 1m → 5m\n±25% jitter\nRetry]
    M --> I
    F --> N[RDAP/WHOIS lookup\npopulates status.registration\nstatus.nameservers]
    J --> N
    L --> N
    N --> O[Verified=True\nGateway controller can create DNSRecordSets]
Loading

3c. Gateway hostname DNS programming

1. User creates Gateway with hostname "api.example.com" on their project control plane

2. GatewayReconciler (network-services-operator) processes the Gateway:
   a. Lists Domains in the same namespace
   b. For "api.example.com" -> checks zones [example.com]
   c. Finds Domain "example.com" with VerifiedDNSZone=True
   d. Finds DNSZone for "example.com"
   e. Determines record type:
      - Apex domain (example.com) -> ALIAS record
      - Subdomain (api.example.com) -> CNAME record
   f. Creates DNSRecordSet named "{gateway-name}-{sha256(hostname)[:8]}"
      pointing hostname -> GatewayDNSAddress (words+entropy subdomain under TargetDomain)
   g. Sets owner reference on the DNSRecordSet (for GC when Gateway is deleted)

3. dns-operator picks up the new DNSRecordSet -> programs into PowerDNS

4. LightningStream replicates to all edge pods -> live DNS

3d. Infrastructure zone updates (datum-auth-dns path)

1. Admin commits a DNSEndpoint CR to infra Git repo
   (e.g., datumproxy.net_dnsendpoint.yaml with A/AAAA glue records for ns1-ns4)

2. FluxCD applies the DNSEndpoint CR to the cluster

3. ExternalDNS sidecar in the KnotDNS pod watches DNSEndpoint CRs
   -> Sends RFC2136 NSUPDATE to the local knotd on 127.0.0.1:1053
   -> KnotDNS updates its zone in memory

4. HickoryDNS ExternalDNS sidecar does the same for its pod (ns4 only)

5. DNS clients query ns1-ns4.datumproxy.net -> hit the LoadBalancer IPs
   -> Routed to a KnotDNS or HickoryDNS pod via Cilium BGP

3e. DNS Query Request Flow (Live Query Path)

This section traces a UDP DNS query for a customer-managed record from the moment it leaves the client's resolver to the moment the response is returned.

Step 1: BGP advertisement and anycast delivery

The PowerDNS DaemonSet is exposed through a Kubernetes LoadBalancer Service of type datum-managed-auth-dns. The Service carries a Cilium IPAM annotation that pins four specific IPv4 and four specific IPv6 addresses:

# apps/dns-operator/downstream/edge/service.yaml
lbipam.cilium.io/ips:
  67.14.160.128, 67.14.161.128, 67.14.162.128, 67.14.163.128   (IPv4)
  2607:ed40:0:8000::1, 2607:ed40:1:8000::1, ...                 (IPv6)

These IPs correspond to the published nameservers in the production DNSZoneClass:

# apps/dns-operator/downstream/production/dnszoneclass.yaml
nameServerPolicy:
  mode: Static
  static:
    servers:
      - ns1.datumdomains.net.   (-> 67.14.160.128)
      - ns2.datumdomains.net.   (-> 67.14.161.128)
      - ns3.datumdomains.net.   (-> 67.14.162.128)
      - ns4.datumdomains.net.   (-> 67.14.163.128)

Cilium's BGP control plane (bgpControlPlane: enabled: true in infrastructure/cilium/base/cilium-values.yaml) advertises these LoadBalancer IPs upstream via eBGP. The CiliumBGPAdvertisement resource named auth-dns selects Services labeled app.kubernetes.io/part-of: datum-managed-auth-dns:

# infrastructure/bgp/edge/auth-dns-advertisement.yaml
spec:
  advertisements:
    - advertisementType: "Service"
      service:
        aggregationLengthIPv4: 24   # aggregate to /24
        aggregationLengthIPv6: 44   # aggregate to /44
        addresses:
          - LoadBalancerIP
      selector:
        matchExpressions:
          - key: app.kubernetes.io/part-of
            operator: In
            values:
              - datum-auth-dns
              - datum-managed-auth-dns

Each worker node in an edge cluster has a CiliumBGPClusterConfig generated per-node with sessions to two IPv4 and two IPv6 peers at NetActuate (ASN 36236), using the cluster's local ASN (e.g., 33438 for us-central-1-charlie):

# infrastructure/bgp/edge/generated/clusters/us-central-1-charlie/bgp-worker-2de7846f-dfw.json
"localASN": 33438,
"peers": [
  { "peerASN": 36236, "peerAddress": "209.177.156.100" },
  { "peerASN": 36236, "peerAddress": "209.177.156.254" },
  { "peerASN": 36236, "peerAddress": "2607:f740:100::f99"  },
  { "peerASN": 36236, "peerAddress": "2607:f740:100::fa1"  }
]

The BGP peer config (infrastructure/bgp/edge/peer-config.yaml) uses:

  • Hold time: 90 seconds, keepalive: 30 seconds
  • Graceful restart enabled with 120-second restart time
  • Dual-stack (IPv4 unicast + IPv6 unicast) families

Since every worker node in an edge cluster advertises the same /24 (IPv4) or /44 (IPv6) prefix, Internet traffic is routed to the nearest edge cluster by the upstream AS (anycast via BGP). Within the cluster, Cilium ECMP spreads traffic across multiple nodes.

A companion DaemonSet (cilium-bgp-route-reconciler, RECONCILE_INTERVAL_SECONDS=30) runs on every node with hostNetwork: true. Every 30 seconds it calls cilium bgp routes advertised and uses ip route add local <prefix> dev lo table local to install a local route for each advertised prefix. This ensures the node's kernel accepts packets destined for the LoadBalancer IP without dropping them at the PREROUTING hook before Cilium's eBPF programs can handle them.

Cilium is configured with:

loadBalancer:
  mode: dsr          # Direct Server Return
  dsrDispatch: geneve
  serviceTopology: true
routingMode: tunnel
tunnelProtocol: geneve
kubeProxyReplacement: true

DSR mode means the node that receives the packet from the upstream BGP peer answers directly without hairpinning back through a central load balancer node. serviceTopology: true causes Cilium to prefer pods on the local node when they exist (trafficDistribution: PreferClose in the Service spec confirms this preference).

Step 2: Packet arrives at the node; Cilium eBPF processing

The incoming UDP packet on port 53 is intercepted by Cilium's XDP or TC eBPF program before it reaches the kernel's normal network stack. Cilium's kube-proxy replacement identifies the destination address as a LoadBalancer VIP, performs DNAT, and selects a backend Pod endpoint. Because the DaemonSet runs one pod per non-control-plane node and trafficDistribution: PreferClose is set, Cilium will select the pod on the same node if one is healthy there.

The DaemonSet pod does not run with hostNetwork: true — it uses a standard pod network namespace. Port 53 is declared as a named container port:

ports:
  - containerPort: 53
    name: dns
    protocol: UDP
  - containerPort: 53
    name: dns-tcp
    protocol: TCP

PowerDNS requires NET_BIND_SERVICE to bind a port below 1024 inside the container, which is explicitly granted:

securityContext:
  runAsUser: 953
  runAsGroup: 953
  capabilities:
    drop: ["ALL"]
    add: ["NET_BIND_SERVICE"]

After Cilium's DNAT the packet is delivered into the pod's network namespace and received by the PowerDNS process on 0.0.0.0:53 / [::]:53 (both IPv4 and IPv6, as configured by local-address=0.0.0.0,:: in pdns.conf).

Step 3: PowerDNS processes the query

PowerDNS Auth 5.1 is configured as a pure authoritative server with no recursion or zone transfer support:

# apps/dns-operator/downstream/edge/pdns.conf
primary=no
secondary=no
disable-axfr=yes

Zone discovery caches are disabled entirely so that zone additions are visible immediately from LMDB without requiring a cache flush:

zone-cache-refresh-interval=0
zone-metadata-cache-ttl=0

All DNS data is served from LMDB:

load-modules=liblmdbbackend.so
launch=lmdb
lmdb-filename=/lmdb/db
lmdb-shards=1
lmdb-random-ids=yes
lmdb-flag-deleted=yes
lmdb-map-size=1000   # megabytes
lmdb-lightning-stream=yes

The lmdb-lightning-stream=yes flag activates LightningStream-compatible operation: PowerDNS uses the LMDB file in a read-only or append-limited fashion and relies on LightningStream to manage the full file. lmdb-flag-deleted=yes means deleted records are flagged rather than physically removed so LightningStream can track tombstones across replicas.

Query processing flow inside PowerDNS:

  1. PowerDNS receives the UDP datagram and parses the DNS message.
  2. It looks up the zone name in the LMDB backend by walking up the owner name hierarchy until it finds a zone apex that matches.
  3. It looks up the requested record type (QTYPE) within that zone.
  4. If the record exists and is a normal type (A, AAAA, TXT, MX, etc.), PowerDNS builds the answer section and returns it immediately.
  5. If the record type is CNAME, PowerDNS returns the CNAME and, depending on the query, may follow it.
  6. If the record is an ALIAS type (used for apex domains), PowerDNS triggers ALIAS expansion (step 4 below).

The PowerDNS server-id is set per-pod as $(NODE_NAME)/$(POD_NAME) which is visible in CHAOS TXT id.server queries and in log output — useful for debugging which replica answered.

Step 4: ALIAS expansion via in-pod recursor

When PowerDNS encounters an ALIAS record (the Datum equivalent of ANAME/CNAME-at-apex), it needs to resolve the target hostname to A/AAAA records to return a synthesized answer in the A/AAAA query response. It cannot use the system resolver because that would create circular dependencies with any zones it is itself authoritative for.

Instead, PowerDNS is configured to forward ALIAS expansion queries to the in-pod recursor:

# pdns.conf
resolver=127.0.0.1:5300
expand-alias=yes

The recursor listens only on loopback and only accepts queries from loopback (hardened to prevent misuse):

# recursor.conf
incoming:
  listen:
    - "127.0.0.1:5300"
    - "[::1]:5300"
  allow_from:
    - "127.0.0.1/32"
    - "::1/128"

The recursor itself has no caching for zones it is authoritative for (it is not authoritative for anything). All queries are forwarded to Cloudflare and Google public resolvers:

forward_zones_recurse:
  - zone: "."
    forwarders:
      - "1.1.1.1"
      - "8.8.8.8"

So for an ALIAS record pointing example.com. at exciting-word-12ab.prism.global.datum-cloud.net., the path is:

Client resolves A? for example.com
  -> PowerDNS finds ALIAS record pointing at exciting-word-12ab.prism.global.datum-cloud.net.
  -> PowerDNS sends A? for that target to 127.0.0.1:5300
  -> Recursor forwards to 1.1.1.1 or 8.8.8.8
  -> Gets A records for the canonical name
  -> PowerDNS synthesizes an A response for example.com with those addresses
  -> Returns to client

The recursor's resource allocation is deliberately generous (4Gi memory limit, 2 CPU) to handle its recursive resolution work, while the auth server itself needs far less (1Gi memory).

Prometheus metrics from the recursor are scraped on port 8083 by the PodMonitor.

Step 5: Response returned to client

PowerDNS builds the DNS response packet with the appropriate answer, authority, and additional sections, sets the AA (Authoritative Answer) bit, and sends the UDP datagram back. In DSR mode the response goes directly from the pod back to the client without passing through the ingress node again.

Step 6: LightningStream consistency model

The data PowerDNS reads is kept current by LightningStream. In the edge DaemonSet pods the lightningstream container runs in receive mode:

args: ["--config", "/etc/lightningstream/lightningstream.yaml",
       "--minimum-pid", "50", "receive"]

--minimum-pid 50 tells LightningStream to wait until the system has had at least 50 PIDs allocated (a proxy for "other processes in the pod have started") before beginning sync. This prevents it from downloading a snapshot before PowerDNS has opened the LMDB file.

LightningStream is configured to watch two LMDB databases on the same /lmdb/ volume:

# lightningstream.yaml (edge configmap)
lmdbs:
  main:
    path: /lmdb/db
    options:
      no_subdir: true
      create: true
    schema_tracks_changes: true
  shard:
    path: /lmdb/db-0
    options:
      no_subdir: true
      create: true
    schema_tracks_changes: true

schema_tracks_changes: true means LightningStream relies on schema-level change tracking rather than polling the entire LMDB for changes, which is the efficient mode for PowerDNS's LMDB backend.

Storage is GCS accessed via its S3-compatible API:

storage:
  type: s3
  options:
    endpoint_url: https://storage.googleapis.com/
    bucket: datum-lightningstream           # production
    use_update_marker: true

use_update_marker: true causes LightningStream to write a small marker object to the bucket after each snapshot upload. Receivers poll for this marker to detect when a new snapshot is available without having to list all objects on every check.

The edge pods use the secondary GCS service account (objectViewer only), provisioned by ExternalSecrets from GCP Secret Manager:

# apps/dns-operator/downstream/edge/external-secret.yaml
spec:
  secretStoreRef:
    name: gcp-secret-store
    kind: ClusterSecretStore
  target:
    name: s3-credentials
  dataFrom:
    - extract:
        key: dns-s3-credentials-secondary

The primary (objectAdmin) account is used only by the dns-operator agent's LightningStream instance (the StatefulSet in datum-dns-system) which writes new snapshots after PowerDNS zones are programmed through the API.

LightningStream polling interval: LightningStream does not expose a configurable poll interval in this config file; instead it uses the update marker and change notification approach. In practice, after the dns-operator agent writes a new zone or record via the PowerDNS API, LightningStream detects the LMDB change via schema_tracks_changes, uploads the snapshot to GCS (with an update marker), and receivers detect the marker and download the delta. The end-to-end propagation from API write to all edge pods seeing the change is typically sub-minute in normal GCS-connected conditions.

LightningStream metrics are exposed on port 8500 (lmdb-metrics) and scraped by the PodMonitor alongside the PowerDNS API metrics on 8082 and recursor metrics on 8083.

Step 7: Caching layers summary

PowerDNS Auth does not enable a packet cache in this deployment. The zone-cache-refresh-interval=0 and zone-metadata-cache-ttl=0 settings eliminate internal metadata caches. This means every query results in a direct LMDB lookup, which is the correct behavior for a low-latency memory-mapped database — LMDB reads are effectively in-process memory accesses (the OS page cache holds the mapped pages). The absence of a packet cache ensures that record changes propagated by LightningStream are visible immediately without cache staleness.

The recursor does maintain its own internal cache for recursive lookups (standard PowerDNS Recursor behavior), but this only affects ALIAS expansion targets — not the authoritative records themselves. The recursor's cache TTL is governed by the TTLs returned by upstream resolvers for the target hostnames.

The default TTL for records within a zone is 300 seconds, inherited from DNSZoneClass.spec.defaults.defaultTTL: 300. Per-record TTLs can override this (e.g., the NS glue records for datumdomains.net use ttl: 300 explicitly in the static DNSRecordSet manifests). Clients' recursive resolvers will cache responses for those TTL durations, so in practice the propagation delay visible to an end user is: LightningStream sync time + client resolver cache TTL.

Networking diagram for the query path

Internet client (resolver)
    |  UDP port 53 to 67.14.160.128 (ns1.datumdomains.net)
    v
NetActuate upstream router (ASN 36236)
    |  BGP ECMP across edge clusters advertising the /24
    v
Edge node (Datum AS 33438, Cilium BGP peer)
    |  kernel receives packet; lo has local route for 67.14.160.128 (via reconciler)
    |  Cilium XDP/TC eBPF intercepts; DNAT to pod IP; DSR configured
    v
Pod: datum-managed-auth-dns-<node> (namespace: datum-managed-auth-dns)
  +-- container: pdns         (ports 53/udp, 53/tcp, 8082/tcp)
  |     |  reads from /lmdb/db via LMDB mmap
  |     |  for ALIAS: queries 127.0.0.1:5300
  |     v
  +-- container: pdns-recursor (port 5300/tcp+udp loopback only, 8083/tcp metrics)
  |     |  forwards to 1.1.1.1 / 8.8.8.8
  |     v
  +-- container: lightningstream (port 8500/tcp metrics)
        |  polls GCS bucket (storage.googleapis.com / datum-lightningstream)
        |  downloads LMDB deltas, applies to /lmdb/db
        v
     emptyDir volume: /lmdb   (shared by pdns + lightningstream)
sequenceDiagram
    participant Client as DNS Client<br/>(resolver)
    participant NetActuate as NetActuate<br/>ASN 36236
    participant Cilium as Cilium eBPF<br/>(edge node)
    participant PDNS as PowerDNS<br/>container :53
    participant LMDB as /lmdb/db<br/>(mmap)
    participant Recursor as pdns-recursor<br/>127.0.0.1:5300
    participant Upstream as Upstream Resolver<br/>1.1.1.1 / 8.8.8.8

    Client->>NetActuate: UDP query A? example.com<br/>dst=67.14.160.128:53
    Note over NetActuate: BGP ECMP selects<br/>nearest edge cluster
    NetActuate->>Cilium: forward packet to edge node
    Note over Cilium: XDP/TC intercepts; DNAT to pod IP<br/>DSR mode: response goes direct to client
    Cilium->>PDNS: deliver UDP datagram to pod :53

    PDNS->>LMDB: lookup zone for example.com (mmap read)
    LMDB-->>PDNS: zone found

    alt Normal record (A, AAAA, TXT, MX, CNAME, ...)
        PDNS->>LMDB: lookup QTYPE records (mmap read)
        LMDB-->>PDNS: records returned
        PDNS-->>Client: DNS response (AA bit set, DSR direct)
    else ALIAS record (apex domain)
        PDNS->>LMDB: lookup ALIAS target hostname (mmap read)
        LMDB-->>PDNS: ALIAS → exciting-word-12ab.prism.global.datum-cloud.net.
        PDNS->>Recursor: A? exciting-word-12ab.prism.global.datum-cloud.net<br/>UDP 127.0.0.1:5300
        Recursor->>Upstream: recursive query (forwards all via ".")
        Upstream-->>Recursor: A records for canonical name
        Recursor-->>PDNS: resolved A/AAAA records
        PDNS-->>Client: synthesized A response for example.com (AA bit set, DSR direct)
    end
Loading

4. Key CRDs / API Types

dns.networking.miloapis.com/v1alpha1 (from go.miloapis.com/dns-operator@v0.5.1)

Resource Scope Purpose
DNSZoneClass Cluster Defines a class of DNS backend (controller name, nameserver policy, TTL defaults). Example: datum-external-global-dns with PowerDNS controller and 4 static nameservers
DNSZone Namespaced A hosted zone. References a DNSZoneClass. Status populated with nameservers, recordCount, DomainRef. Has selectable fields on spec.domainName and status.domainRef.name
DNSRecordSet Namespaced One DNS record type within a zone. References a DNSZone. Supports A, AAAA, ALIAS, CNAME, TXT, MX, SRV, CAA, NS, SOA, PTR, TLSA, HTTPS, SVCB. Has selectable fields on spec.dnsZoneRef.name and spec.recordType

networking.datumapis.com/v1alpha (from network-services-operator)

Resource Scope Purpose
Domain Namespaced Represents a domain name a user wants to claim. Tracks ownership verification state (TXT/HTTP/DNSZone), RDAP/WHOIS registration metadata, nameserver delegation. Immutable spec.domainName
HTTPProxy Namespaced High-level L7 proxy abstraction that creates a Gateway + HTTPRoute underneath
Gateway (Gateway API) Namespaced Extended by network-services-operator which programs DNSRecordSet resources for each hostname whose domain has VerifiedDNSZone=True

externaldns.k8s.io/v1alpha1

Resource Scope Purpose
DNSEndpoint Namespaced Used by the datum-auth-dns path; contains static A/AAAA/CNAME records that ExternalDNS RFC2136 sidecars apply to KnotDNS/HickoryDNS

5. External Integrations

System How Used
GCP Cloud DNS Used by the external-dns infrastructure system for cluster-owned zones (*.staging.env.datum.net, *.production.env.datum.net). Workload Identity authenticated
RDAP (openrdap/rdap) registrydata.Client queries RDAP providers (bootstrapped per TLD) to fetch domain registration metadata, expiry, registrar, nameservers. Rate-limited with token bucket + block windows
WHOIS (domainr/whois) Fallback when RDAP bootstrap has no TLD match. Queries IANA bootstrap then registry/registrar WHOIS host
GCS (via S3 API) LightningStream uses GCS HMAC keys (S3-compatible) as the central replication bus for PowerDNS LMDB state. Crossplane provisions the bucket and HMAC keys
Milo / per-project control planes dns-operator (control-plane role) uses multi-cluster runtime (sigs.k8s.io/multicluster-runtime) with a Milo provider to discover all project control planes and watch their DNS resources
PowerDNS API dns-operator agent directly calls the local PowerDNS HTTP API (port 8082) to create/update zones and records in LMDB
NetActuate (ASN 36236) BGP upstream provider at each edge PoP; peers with Cilium on each worker node to receive LoadBalancer IP prefix advertisements
cert-manager CSI driver TLS for the dns-operator webhook (csi.cert-manager.io); also used for client cert auth to the Milo control plane
Redis (optional) Shared cache for RDAP/WHOIS results and rate-limit state across network-services-operator replicas

6. Infrastructure Deployment

FluxCD Kustomization graph

clusters/{env}/apps/dns-operator.yaml
  -> apps/dns-operator/control-plane/{staging|production}/
     dependsOn: [victoria-metrics, milo-apiserver]
     sourceRef: GitRepository flux-system
     Kustomizations deployed:
       dns-operator-manager      (--role=replicator; watches Milo projects)
       dns-operator-core-control-plane-resources  (installs CRDs into Milo apiserver)

clusters/{env}/apps/dns-operator-downstream.yaml
  -> apps/dns-operator/downstream/{staging|production}/
     Kustomizations deployed:
       dns-operator-agent        (manages PowerDNS; sources from OCIRepository)
       [edge DaemonSet deployed via apps/dns-operator/downstream/edge/]

clusters/{env}/infrastructure/dns-operator-storage.yaml
  -> apps/dns-operator/storage/{staging|production}/
     Deploys Crossplane GCS Bucket + ServiceAccounts + HMAC keys for LightningStream

clusters/{env}/apps/datum-auth-dns.yaml  (staging only; edge has its own)
  -> apps/datum-auth-dns/{staging|edge}/
     dependsOn: [victoria-metrics]
     Deploys KnotDNS + HickoryDNS DaemonSets + ExternalDNS sidecars

clusters/{env}/apps/external-dns-webhook.yaml  (staging only)
  -> apps/external-dns-webhook/staging/
     dependsOn: [dns-operator]
     Deploys ExternalDNS with Datum webhook provider

channels/edge/stable/infrastructure/cilium-bgp-announcements/
  -> infrastructure/bgp/edge/ + generated/clusters/${cluster}
     Deploys CiliumBGPPeerConfig, CiliumBGPAdvertisement (auth-dns, downstream-gateway,
     edge-services), CiliumLoadBalancerIPPool, per-node CiliumBGPClusterConfig,
     and the cilium-bgp-route-reconciler DaemonSet

OCI artifacts

Image Source
ghcr.io/datum-cloud/dns-operator-kustomize Kustomize bundle for dns-operator (both control-plane and agent paths)
ghcr.io/datum-cloud/external-dns-webhook Custom ExternalDNS webhook provider
powerdns/pdns-auth-51 PowerDNS authoritative server
powerdns/pdns-recursor-51:5.1.9 PowerDNS recursor (ALIAS expansion sidecar)
powerdns/lightningstream:main LightningStream LMDB sync agent

Namespaces

Namespace Contents
datum-dns-system dns-operator manager + agent, LMDB secrets
datum-managed-auth-dns (edge) PowerDNS + Recursor + LightningStream DaemonSet pods
datum-auth-dns KnotDNS + HickoryDNS DaemonSets
external-dns-webhook ExternalDNS + webhook provider
external-dns Infrastructure ExternalDNS (GCP Cloud DNS)

7. Notable Patterns and Design Decisions

LightningStream for horizontal scale without a shared database. PowerDNS uses LMDB (a memory-mapped file) as its backend. The standard problem with LMDB in multi-pod deployments is that it cannot be shared across nodes. LightningStream solves this by having exactly one writer (the dns-operator agent, via the PowerDNS API) and N readers (edge pods). The agent writes to LMDB, LightningStream uploads deltas to GCS, and every edge pod's LightningStream container receives those deltas and applies them locally. This avoids a shared database entirely.

ALIAS record support via in-pod recursor. PowerDNS's ALIAS record type (ANAME-style) expands the target hostname to A/AAAA records on the fly. To do this, PowerDNS needs a resolver. A co-located pdns-recursor container listens on 127.0.0.1:5300, accepting only loopback traffic, and forwards recursion to Cloudflare/Google. The authoritative server is configured with resolver=127.0.0.1:5300 and expand-alias=yes. This means apex domains can point to a CDN hostname without needing the client to do CNAME chasing.

DSR (Direct Server Return) with anycast BGP. Cilium is configured with loadBalancer.mode: dsr and dsrDispatch: geneve. Combined with per-node BGP adjacencies to NetActuate, each edge node receives and directly answers DNS packets for its local pod without hairpinning. The cilium-bgp-route-reconciler DaemonSet (running every 30 seconds) ensures the local kernel routing table has a local route for each advertised prefix on lo, which is required for the node to accept packets destined for LoadBalancer IPs that Cilium's eBPF intercepts.

Zone cache completely disabled. zone-cache-refresh-interval=0 and zone-metadata-cache-ttl=0 disable PowerDNS's internal zone metadata caches. This is intentional: since LMDB is memory-mapped, lookups are already in-process memory accesses, and zone cache staleness would delay visibility of records just programmed by LightningStream. The tradeoff is slightly more LMDB reads per query (still O(1) by design).

No packet cache. There is no cache-ttl or query-cache-ttl setting in pdns.conf, so PowerDNS's packet cache is off by default. This means every DNS query causes an LMDB read. For a memory-mapped database on modern hardware this is extremely fast (sub-microsecond for pages already in the OS page cache), and it eliminates the possibility of serving stale data after a record update propagates via LightningStream.

Domain verification before DNS programming. The Gateway DNS controller will not create a DNSRecordSet for a hostname unless its apex Domain resource has VerifiedDNSZone=True. The fastest verification path is the DNSZone path: if the user has already delegated to Datum's nameservers (matching Domain.status.nameservers vs DNSZone.status.nameservers), no TXT record or HTTP challenge is needed. The controller reads nameservers from RDAP/WHOIS to make this determination.

Multi-cluster reconciliation via multicluster-runtime + Milo. The dns-operator control-plane component uses sigs.k8s.io/multicluster-runtime with the Milo provider. This means the operator dynamically discovers all project control planes (per-tenant Kubernetes-compatible API servers) and opens watch connections to each. A single operator instance handles all projects. In staging and production the discovery.mode is milo; in the single-cluster server config it is single.

Two separate DNS stacks for infrastructure. The datum-auth-dns system (KnotDNS + HickoryDNS) serves Datum's own infrastructure zones (ns1-ns4 glue records, prism-internal zones). This is deliberately kept separate from the customer DNS stack (PowerDNS + dns-operator). The infrastructure zones are managed by DNSEndpoint CRs and RFC2136 updates, not the dns-operator API, because they predate it and have simpler, static record sets. HickoryDNS is deployed alongside KnotDNS specifically to support the Prossimo memory-safety initiative by running ns4 on a Rust DNS implementation.

Conflict detection for DNSRecordSets. When the Gateway controller creates a DNSRecordSet, it checks whether an existing record with the same hostname annotation is already owned by a different manager (labelManagedBy). If a conflict is detected, the hostname condition is set to DNSRecordReasonConflict rather than silently overwriting.

GatewayDNSAddress uses words + entropy. The canonical hostname assigned to a Gateway (the CNAME/ALIAS target) is not a sequential name. It is generated by words.WordsAndEntropy(suffix, gatewayUID) — a deterministic but human-readable address derived from the gateway's UUID, scoped under a configured TargetDomain. This prevents hostname enumeration.

Dual GCS credentials with least-privilege. Two GCS service accounts are provisioned by Crossplane: one with roles/storage.objectAdmin (used by the dns-operator agent's LightningStream writer) and one with roles/storage.objectViewer (used by all edge pod LightningStream receivers). This ensures that a compromised edge pod cannot modify the DNS data that other pods read.


8. C4 Diagram (PlantUML)

@startuml Datum Authoritative DNS - Container Diagram
!include https://raw.githubusercontent.com/plantuml-stdlib/C4-PlantUML/master/C4_Container.puml

LAYOUT_WITH_LEGEND()

title Datum Cloud Authoritative DNS Service — Container Diagram

Person(user, "Datum User", "Creates domains, zones, record sets, gateways")
Person(infra_admin, "Infra Admin", "Manages Datum's own DNS zones via GitOps")

System_Boundary(control_plane, "Control Plane Cluster") {

    Container(milo_apiserver, "Milo API Server", "Kubernetes-compatible API server", "Per-tenant project control planes; hosts DNSZone, DNSRecordSet, Domain, Gateway CRDs")

    Container(network_services_operator, "network-services-operator", "Go / controller-runtime", "Reconciles Domain (RDAP/WHOIS verification), DNSRecordSet creation for Gateway hostnames")

    Container(dns_operator_manager, "dns-operator (replicator)", "Go / multicluster-runtime", "Discovers all project control planes via Milo; replicates DNSZone + DNSRecordSet to downstream cluster")

    Container(redis, "Redis", "Redis", "Optional shared cache for RDAP/WHOIS lookup results and rate-limit state")

    ContainerDb(milo_etcd, "Milo etcd", "etcd", "Stores DNSZone, DNSRecordSet, Domain, Gateway objects for all projects")
}

System_Boundary(downstream_cluster, "Downstream / Edge Cluster") {

    Container(dns_operator_agent, "dns-operator (agent)", "Go / controller-runtime", "Watches local DNSZone + DNSRecordSet; programs PowerDNS via HTTP API; runs LightningStream writer")

    Container(powerdns, "PowerDNS Auth 5.1", "C++ DNS server (LMDB backend)", "Serves authoritative DNS on :53 (UDP+TCP); no packet cache; LMDB reads are memory-mapped; ALIAS expansion via in-pod recursor")

    Container(pdns_recursor, "PowerDNS Recursor 5.1", "C++ DNS recursor", "Listens on 127.0.0.1:5300 loopback only; forwards to 1.1.1.1/8.8.8.8; used solely for ALIAS expansion")

    Container(lightningstream_writer, "LightningStream (writer)", "Go LMDB sync — in dns-operator agent pod", "Detects LMDB changes via schema_tracks_changes; uploads delta snapshots to GCS with update marker")

    Container(lightningstream_reader, "LightningStream (receiver)", "Go LMDB sync — in each DaemonSet pod", "Polls GCS update marker; downloads delta snapshots; applies to local /lmdb/db emptyDir volume")

    Container(bgp_reconciler, "cilium-bgp-route-reconciler", "Bash DaemonSet (hostNetwork)", "Every 30s: queries cilium bgp routes; installs local kernel routes on lo for advertised prefixes")

    ContainerDb(lmdb, "LMDB emptyDir", "Memory-mapped file per pod", "Per-pod local DNS zone data; written by lightningstream_reader; read by PowerDNS via mmap")
}

System_Boundary(infra_dns_stack, "Infrastructure Auth DNS (datum-auth-dns)") {
    Container(knotdns, "KnotDNS", "C DNS server", "Serves ns1-ns3 for datumproxy.net, datum-cloud.net, prism zones; RFC2136 updated")
    Container(hickorydns, "HickoryDNS", "Rust DNS server", "Serves ns4 only; Prossimo memory-safety initiative")
    Container(extdns_rfc2136, "ExternalDNS (RFC2136)", "Go sidecar per pod", "Watches DNSEndpoint CRs; pushes updates via RFC2136 NSUPDATE to local knotd/hickory")
}

System_Boundary(infra_external_dns, "Infrastructure ExternalDNS") {
    Container(extdns_gcp, "ExternalDNS (GCP)", "Go HelmRelease", "Watches Gateway routes; syncs *.staging/production.env.datum.net to GCP Cloud DNS")
    Container(extdns_webhook, "external-dns-webhook", "Go webhook provider HelmRelease", "Translates Gateway HTTPRoute / DNSEndpoint into DNSRecordSet objects for Datum infra zones")
}

System_Ext(gcs, "Google Cloud Storage", "Stores LightningStream LMDB delta snapshots; bucket datum-lightningstream; S3-compatible endpoint storage.googleapis.com")
System_Ext(gcp_cloud_dns, "GCP Cloud DNS", "Hosts cluster-level infrastructure zones")
System_Ext(rdap, "RDAP Providers", "Per-TLD RDAP endpoints (Verisign, IANA, etc.) — rate-limited, cached")
System_Ext(whois, "WHOIS Providers", "IANA bootstrap + registry WHOIS servers — fallback to RDAP")
System_Ext(dns_resolvers, "Cloudflare / Google DNS", "1.1.1.1, 8.8.8.8 — upstream resolvers used by recursor for ALIAS expansion")
System_Ext(netactuate, "NetActuate (ASN 36236)", "BGP upstream at each PoP; peers with Cilium per-node; receives /24 and /44 prefix advertisements for anycast")

' User interactions
Rel(user, milo_apiserver, "Creates Domain, DNSZone, DNSRecordSet, Gateway", "kubectl / API")
Rel(infra_admin, milo_etcd, "Commits DNSEndpoint CRs via GitOps", "FluxCD Git")

' Control plane internal
Rel(network_services_operator, milo_apiserver, "Watches Domain, DNSZone, Gateway; writes DNSRecordSet", "k8s watch/patch")
Rel(network_services_operator, rdap, "RDAP domain lookup (cached, rate-limited)", "HTTPS")
Rel(network_services_operator, whois, "WHOIS fallback (cached, rate-limited)", "TCP/43")
Rel(network_services_operator, redis, "Cache RDAP/WHOIS results and rate-limit state", "Redis protocol")
Rel(dns_operator_manager, milo_apiserver, "Discovers project control planes; watches DNSZone + DNSRecordSet", "k8s watch multicluster")
Rel(dns_operator_manager, dns_operator_agent, "Replicates DNSZone + DNSRecordSet into downstream cluster", "k8s create/update")

' Downstream — write path
Rel(dns_operator_agent, powerdns, "Creates/updates zones and records", "HTTP :8082 PowerDNS API")
Rel(lightningstream_writer, gcs, "Uploads LMDB delta snapshots + update marker", "GCS S3 API (primary objectAdmin SA)")

' Downstream — read path
Rel(lightningstream_reader, gcs, "Polls update marker; downloads deltas", "GCS S3 API (secondary objectViewer SA)")
Rel(lightningstream_reader, lmdb, "Applies deltas to local LMDB file", "filesystem write")
Rel(powerdns, lmdb, "Reads zone + record data", "mmap read (O(1) in-process)")
Rel(powerdns, pdns_recursor, "ALIAS expansion queries", "UDP/TCP 127.0.0.1:5300 loopback")
Rel(pdns_recursor, dns_resolvers, "Recursive resolution for ALIAS targets", "UDP/TCP :53")

' BGP and networking
Rel(bgp_reconciler, netactuate, "Installs local kernel routes so node accepts LB VIP packets; Cilium peers advertise /24 /44 prefixes", "Cilium BGP + iproute2")

' Infrastructure DNS
Rel(extdns_rfc2136, milo_etcd, "Watches DNSEndpoint CRs", "k8s watch")
Rel(extdns_rfc2136, knotdns, "RFC2136 NSUPDATE", "TCP loopback :1053")
Rel(extdns_rfc2136, hickorydns, "RFC2136 NSUPDATE", "TCP loopback")
Rel(extdns_gcp, gcp_cloud_dns, "Upsert DNS records", "GCP DNS API")
Rel(extdns_webhook, milo_apiserver, "Writes DNSRecordSet for infrastructure zones", "k8s create/update")

@enduml

Key Source Files

Operator source code:

  • /Users/aar/src/datum-cloud/network-services-operator/api/v1alpha/domain_types.go
  • /Users/aar/src/datum-cloud/network-services-operator/internal/controller/domain_controller.go
  • /Users/aar/src/datum-cloud/network-services-operator/internal/controller/gateway_dns_controller.go
  • /Users/aar/src/datum-cloud/network-services-operator/internal/registrydata/DESIGN.md
  • /Users/aar/src/datum-cloud/network-services-operator/internal/config/config.go

DNS operator API types (Go module cache):

  • /Users/aar/go/pkg/mod/go.miloapis.com/dns-operator@v0.5.1/api/v1alpha1/dnszone_types.go
  • /Users/aar/go/pkg/mod/go.miloapis.com/dns-operator@v0.5.1/api/v1alpha1/dnsrecordset_types.go
  • /Users/aar/go/pkg/mod/go.miloapis.com/dns-operator@v0.5.1/api/v1alpha1/dnszoneclass_types.go

Infrastructure repo — edge DNS serving:

  • /Users/aar/src/datum-cloud/infra/apps/dns-operator/downstream/edge/daemonset.yaml
  • /Users/aar/src/datum-cloud/infra/apps/dns-operator/downstream/edge/pdns.conf
  • /Users/aar/src/datum-cloud/infra/apps/dns-operator/downstream/edge/recursor.conf
  • /Users/aar/src/datum-cloud/infra/apps/dns-operator/downstream/edge/lightningstream.yaml
  • /Users/aar/src/datum-cloud/infra/apps/dns-operator/downstream/edge/service.yaml
  • /Users/aar/src/datum-cloud/infra/apps/dns-operator/downstream/edge/external-secret.yaml
  • /Users/aar/src/datum-cloud/infra/apps/dns-operator/downstream/edge/podmonitor.yaml

Infrastructure repo — BGP and networking:

  • /Users/aar/src/datum-cloud/infra/infrastructure/bgp/edge/auth-dns-advertisement.yaml
  • /Users/aar/src/datum-cloud/infra/infrastructure/bgp/edge/peer-config.yaml
  • /Users/aar/src/datum-cloud/infra/infrastructure/bgp/edge/cilium-bgp-local-routes-daemonset.yaml
  • /Users/aar/src/datum-cloud/infra/infrastructure/bgp/edge/tools/reconcile-cilium-bgp-routes.sh
  • /Users/aar/src/datum-cloud/infra/infrastructure/cilium/base/cilium-values.yaml
  • /Users/aar/src/datum-cloud/infra/infrastructure/bgp/edge/generated/clusters/us-central-1-charlie/bgp-worker-2de7846f-dfw.json

Infrastructure repo — control plane and downstream operator:

  • /Users/aar/src/datum-cloud/infra/apps/dns-operator/control-plane/base/manager.yaml
  • /Users/aar/src/datum-cloud/infra/apps/dns-operator/downstream/production/manager-kustomization-patch.yaml
  • /Users/aar/src/datum-cloud/infra/apps/dns-operator/downstream/production/dnszoneclass.yaml
  • /Users/aar/src/datum-cloud/infra/apps/dns-operator/downstream/production/dnsrecordsets.yaml
  • /Users/aar/src/datum-cloud/infra/apps/dns-operator/storage/base/gcs-lightningstream.yaml

Infrastructure repo — datum-auth-dns (static infra zones):

  • /Users/aar/src/datum-cloud/infra/apps/datum-auth-dns/README.md
  • /Users/aar/src/datum-cloud/infra/apps/datum-auth-dns/base/knotdns/knot.conf
  • /Users/aar/src/datum-cloud/infra/apps/datum-auth-dns/edge/zones/datumproxy.net_dnsendpoint.yaml

Infrastructure repo — cluster entrypoints:

  • /Users/aar/src/datum-cloud/infra/clusters/staging/apps/dns-operator.yaml
  • /Users/aar/src/datum-cloud/infra/clusters/staging/apps/dns-operator-downstream.yaml
  • /Users/aar/src/datum-cloud/infra/clusters/production/apps/dns-operator.yaml
  • /Users/aar/src/datum-cloud/infra/clusters/production/apps/dns-operator-downstream.yaml
  • /Users/aar/src/datum-cloud/infra/channels/edge/stable/infrastructure/cilium-bgp-announcements/cilium-bgp-announcements.yaml
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment