ttiiggss/fips_sec_audit.md

## fips_sec_audit.md

      
    Raw
  

              fips_sec_audit.md
            
          
Executive Summary
FIPS is an unusually well-specified protocol for alpha software. The cryptographic foundations are sound (Noise framework, secp256k1, ChaCha20-Poly1305 — all battle-tested
primitives, no novel crypto). The layered encryption model (hop-by-hop IK + end-to-end XK) is correctly designed and mirrors proven architectures (WireGuard, Lightning, Tor).
The vulnerabilities I've found are primarily in the routing and mesh self-organization layer, which is where novel protocol design lives. The IPv6 adapter, despite your instinct, is
actually the least concerning component — it's a thin shim with a well-constrained attack surface. The real risks are in spanning tree manipulation, bloom filter poisoning, and
metadata leakage.

CRITICAL — Spanning Tree Attacks

Root Hijacking (Sybil Root Attack)

Severity: CRITICAL
The root is the node with the lexicographically smallest node_addr (SHA-256 truncation of pubkey). There is no election, no stake, no proof-of-work. An attacker can grind keypairs
offline until they find one whose node_addr is smaller than the current root's.
Since node_addr = SHA-256(pubkey)[0..16], grinding a 128-bit prefix is infeasible for full control, but grinding a smaller-than-current-root prefix is trivial — you only need to
beat one specific value. On modern GPUs, generating secp256k1 keypairs at ~10M/sec, finding a node_addr with a smaller leading byte takes seconds.
Impact: The attacker becomes root. Every node in the mesh recalculates its coordinates relative to the attacker. The attacker sits at depth 0 — all traffic routing through the tree
passes through or near them. They can't read session-layer payloads (FSP encryption), but they gain:

Complete visibility of all node_addr pairs communicating (metadata)
Ability to selectively drop or delay traffic (DoS)
Ability to partition the network by disconnecting

Mitigation (protocol-level): Root election needs a cost function — proof-of-work on the keypair, a minimum node age, or a configurable trusted root pubkey. The Yggdrasil project
(which FIPS borrows from) has the same vulnerability and hasn't solved it either.

Transitive Trust in TreeAnnounce Ancestry (v1)

Severity: HIGH
The spec explicitly states: "In the v1 protocol, only the sender's outer signature on the TreeAnnounce is verified. Ancestry entries beyond the direct peer are accepted on
transitive trust."
This means: if I'm your peer, and I claim my ancestry is [me, FakeNode1, FakeNode2, FakeRoot], you'll believe it. I can fabricate an entire ancestry chain and claim to be at any
position in the tree.
Impact:

Route hijacking: claim to be close to any destination
Tree destabilization: force other nodes to recalculate parents
Coordinate poisoning: pollute coordinate caches across the mesh

Mitigation: The spec acknowledges this and lists per-entry signatures as a future direction. This should be priority #1.

Flap Attack via Rapid Connect/Disconnect

Severity: MEDIUM
While flap dampening exists (threshold of 4 switches in 60s → 120s extended hold-down), an attacker controlling multiple nodes can cycle connections strategically to force
continuous tree reconvergence without triggering dampening on any single link. Each parent change:

Increments sequence numbers
Flushes the coordinate cache
Forces re-announcement to all peers
Cascades proportional to depth

Impact: Sustained routing instability, increased discovery traffic, degraded performance.

HIGH — Bloom Filter Manipulation

Bloom Filter Poisoning

Severity: HIGH
A malicious node can advertise a bloom filter with all bits set (or nearly all). Since false positives cause traffic to be routed toward the poisoner's subtree, this creates a
traffic sink.
The split-horizon rule prevents direct looping, but a poisoned filter at a high-depth node causes packets to be attracted into the wrong subtree. The self-distance check prevents
infinite loops, but the packet hits a dead end and generates PathBroken errors.
Impact: Selective traffic black-holing. An attacker near a subtree boundary can attract and drop traffic for specific destinations.
Mitigation: Filter validation — reject filters where the set bit ratio exceeds a reasonable threshold for the claimed subtree size. Not currently implemented.

Bloom Filter Size Fixed at 1KB (8,192 bits)

Severity: MEDIUM (scaling risk)
The spec notes that at ~1M nodes, hub nodes see 7-15% false positive rates. This is a designed limitation, not a bug. But it means an attacker can exploit elevated FPR to cause
misrouting without even needing to send malicious filters — just being a high-connectivity hub in a large network degrades routing accuracy for everyone.

HIGH — Discovery Protocol Attacks

LookupRequest Amplification

Severity: HIGH
LookupRequest floods through the mesh with a TTL of 64 and a 256-byte visited bloom filter. Each node forwards to all peers not in the visited filter. In a mesh of N nodes with
average connectivity P, a single LookupRequest generates O(N) forwarded packets.
The dedup cache (10s window) prevents the same request from being processed twice, but an attacker can generate unique request_id values at high rates.
Impact: An attacker with a single node can flood the mesh with discovery requests, consuming bandwidth and CPU at every node.
Mitigation: Rate-limiting LookupRequest forwarding per source node_addr. Not currently mentioned in the spec.

LookupResponse Spoofing (Mitigated)

Severity: LOW (properly mitigated)
LookupResponse includes a Schnorr proof covering (request_id || target || target_coords). The source verifies this proof. This correctly prevents forged responses. Well done.
However, path_mtu is excluded from the proof (intentionally, as transit nodes modify it). An attacker transit node can lie about path MTU — setting it artificially low to force tiny
packets or artificially high to cause MtuExceeded errors downstream.

MEDIUM — Metadata and Privacy

Node_addr Correlation Attack

Severity: MEDIUM
node_addr = SHA-256(pubkey)[0..16]. The spec correctly notes this is one-way — you can't recover the pubkey from the node_addr. But: if an attacker knows a target's npub (which is
public on Nostr), they can compute the corresponding node_addr and then observe traffic patterns at any transit node to determine when and with whom the target is communicating.
Since Nostr identities are designed to be public (social media), this is a significant metadata leak for any user whose npub is known. Every transit node sees (src_node_addr,
dest_node_addr) in cleartext.
Impact: Communication pattern analysis for any user with a known npub.
Not fixable without onion routing, which the spec explicitly rejected. This is a conscious design tradeoff.

Coordinate Leakage Reveals Physical Topology

Severity: MEDIUM
Tree coordinates encode the node's position relative to every ancestor up to the root. SessionSetup and CP-flagged packets expose these coordinates in cleartext to every transit
node. Over time, a transit node can reconstruct significant portions of the tree topology.
Combined with node_addr correlation (above), this reveals not just who is talking, but where in the topology they sit — which, for geographically-correlated meshes, may reveal
physical location.

MEDIUM — IPv6 Adapter (Less Bad Than Expected)

TUN Interface Local Attack Surface

Severity: LOW-MEDIUM
The TUN interface requires CAP_NET_ADMIN or root. Once running, any local process can send packets to fd00::/8 addresses through the kernel routing table. If the identity cache has
been primed (via DNS lookup), packets will be routed into the mesh.
This means: any process on the local machine can communicate over FIPS without authentication. The FIPS node itself is the trust boundary, not individual processes.
Impact: Local privilege escalation from "can run any process" to "can communicate over the mesh as this node's identity."
This is the same model as any VPN (WireGuard, OpenVPN) — it's not a FIPS-specific flaw. But worth noting.

DNS Resolver Binding

Severity: LOW
The DNS responder binds to 127.0.0.1:5354 by default. If configured to bind to 0.0.0.0, it becomes a network-reachable service that performs deterministic computation (npub →
address) and populates the identity cache. An attacker could pre-populate the cache with entries of their choosing.
Default is safe. Only a misconfiguration creates risk.

No Fragmentation = Good

The decision to not fragment at the mesh layer is security-positive. Fragment reassembly is a perennial source of vulnerabilities (teardrop, fragmentation overlap, resource
exhaustion). FIPS avoids all of this. TCP MSS clamping and ICMP Packet Too Big handle the sizing problem cleanly.

LOW — Cryptographic Observations

Noise IK Exposes Initiator Identity in msg1

Severity: LOW (by design)
Noise IK sends the initiator's encrypted static key in the first message. The responder's public key encrypts it, so only the intended responder can decrypt. However, if the
responder's private key is compromised, all historical msg1 captures can be decrypted to reveal initiator identities.
FSP correctly uses Noise XK for sessions (initiator identity in msg3, not msg1), providing stronger protection for end-to-end traffic. The IK pattern at the link layer is acceptable
because direct peers already know each other's identities from configuration.

Counter-Based Nonces (Correct)

The 8-byte monotonic counter as AEAD nonce with a sliding replay window is the standard approach. No issues.

No Key Rotation Mechanism

Severity: MEDIUM
The spec mentions a K (key epoch) flag in the established frame header for rekeying, but the rekeying mechanism itself isn't specified. Without periodic rekeying, a long-lived
session uses the same symmetric keys indefinitely. If a key is compromised, all past and future traffic on that session is exposed.
Recommendation: Specify a rekeying interval (e.g., every 2^60 nonce values or every N minutes) using the Noise framework's built-in rekeying support.

Vulnerability Summary
┌─────┬─────────────────────────────────────┬────────────┬─────────────────────────┬──────────────────────────┐
│  #  │               Finding               │  Severity  │     Exploitability      │          Status          │
├─────┼─────────────────────────────────────┼────────────┼─────────────────────────┼──────────────────────────┤
│ 1   │ Root hijacking via keypair grinding │ CRITICAL   │ Easy                    │ Unmitigated              │
├─────┼─────────────────────────────────────┼────────────┼─────────────────────────┼──────────────────────────┤
│ 2   │ Transitive trust in ancestry (v1)   │ HIGH       │ Easy                    │ Acknowledged, future fix │
├─────┼─────────────────────────────────────┼────────────┼─────────────────────────┼──────────────────────────┤
│ 3   │ Flap attack via multi-node cycling  │ MEDIUM     │ Moderate                │ Partially mitigated      │
├─────┼─────────────────────────────────────┼────────────┼─────────────────────────┼──────────────────────────┤
│ 4   │ Bloom filter poisoning              │ HIGH       │ Easy                    │ Unmitigated              │
├─────┼─────────────────────────────────────┼────────────┼─────────────────────────┼──────────────────────────┤
│ 5   │ Fixed bloom filter size at scale    │ MEDIUM     │ Passive                 │ By design                │
├─────┼─────────────────────────────────────┼────────────┼─────────────────────────┼──────────────────────────┤
│ 6   │ LookupRequest amplification         │ HIGH       │ Easy                    │ Unmitigated              │
├─────┼─────────────────────────────────────┼────────────┼─────────────────────────┼──────────────────────────┤
│ 7   │ Path MTU lying (in LookupResponse)  │ LOW        │ Moderate                │ By design                │
├─────┼─────────────────────────────────────┼────────────┼─────────────────────────┼──────────────────────────┤
│ 8   │ Node_addr ↔ npub correlation        │ MEDIUM     │ Easy                    │ By design (tradeoff)     │
├─────┼─────────────────────────────────────┼────────────┼─────────────────────────┼──────────────────────────┤
│ 9   │ Coordinate topology leakage         │ MEDIUM     │ Passive                 │ By design (tradeoff)     │
├─────┼─────────────────────────────────────┼────────────┼─────────────────────────┼──────────────────────────┤
│ 10  │ TUN local process access            │ LOW-MEDIUM │ Local only              │ Same as any VPN          │
├─────┼─────────────────────────────────────┼────────────┼─────────────────────────┼──────────────────────────┤
│ 11  │ DNS resolver misconfiguration       │ LOW        │ Requires misconfig      │ Default safe             │
├─────┼─────────────────────────────────────┼────────────┼─────────────────────────┼──────────────────────────┤
│ 12  │ No fragmentation (positive)         │ N/A        │ N/A                     │ Correct decision         │
├─────┼─────────────────────────────────────┼────────────┼─────────────────────────┼──────────────────────────┤
│ 13  │ Noise IK initiator exposure         │ LOW        │ Requires key compromise │ By design                │
├─────┼─────────────────────────────────────┼────────────┼─────────────────────────┼──────────────────────────┤
│ 14  │ Counter nonces                      │ N/A        │ N/A                     │ Correct                  │
├─────┼─────────────────────────────────────┼────────────┼─────────────────────────┼──────────────────────────┤
│ 15  │ No rekeying specification           │ MEDIUM     │ Requires key compromise │ Unspecified              │
└─────┴─────────────────────────────────────┴────────────┴─────────────────────────┴──────────────────────────┘

Overall Assessment
★ Insight ─────────────────────────────────────

The cryptographic layer is solid — Noise IK/XK, ChaCha20-Poly1305, secp256k1 are all proven. No novel crypto.
The attack surface is in the routing layer — spanning tree manipulation, bloom filter poisoning, and discovery flooding are the real risks.
The IPv6 adapter is actually the safest part. ULA addresses can't leak, the TUN model is standard, MSS clamping prevents fragmentation attacks, and no-fragmentation-at-mesh-layer
eliminates an entire class of vulnerabilities.
The metadata exposure (node_addr pairs visible to transit nodes) is the philosophical weakness — it's the price of greedy routing without onion layers.
─────────────────────────────────────────────────

Bottom line: For a private mesh between trusted nodes (your use case), FIPS is solid. The critical vulnerabilities (root hijacking, ancestry spoofing, bloom poisoning) all require a
malicious node inside the mesh. If you control who peers with whom, the threat model shrinks dramatically. For a public, permissionless mesh — which is the stated long-term goal —
items 1, 2, 4, and 6 need to be solved before it's safe.
A friend asks, "why not just connect to the next node?"
The answer is: it depends which attack.
"Just connect to the next node" — when it works and when it doesn't
Where my friend is RIGHT:
Data packet dropping/delay — If a malicious node simply drops or delays your traffic, the mesh handles this. MMP detects the degraded link (rising loss, jitter, latency). Parent
selection re-evaluates. Your node picks a better parent. Bloom filters reconverge. Within 30-60 seconds, traffic reroutes around the bad actor. This is the mesh working as designed
— it self-heals around failures, and a node that just drops packets looks identical to a broken link.
Where my friend is WRONG:
The dangerous attacks don't just affect your connection to one node. They propagate through the protocol itself:
Root hijacking (#1) — The attacker doesn't sit next to you. They join the mesh anywhere, and because their ground-keypair node_addr is the smallest, every node in the entire mesh
independently selects them as root. There is no "connect to the next node" — the spanning tree coordinate system itself is now centered on the attacker. Every node recalculates its
position relative to them. You'd have to convince every node operator to manually blacklist that npub, and there's no mechanism for that in the protocol.
Ancestry spoofing (#2) — A malicious node tells its peers "my ancestry is [me, X, Y, Z, Root]" and the peers believe it (v1 transitive trust). Those peers then incorporate that
false position into their bloom filters and TreeAnnounce messages. The lie propagates outward through gossip. It's not one bad link — it's poisoned routing state spreading hop by
hop through the entire mesh. Propagation speed: ~500ms per hop. A mesh of depth 10 is fully poisoned in 5-10 seconds.
Bloom filter poisoning (#4) — A node advertising an all-bits-set bloom filter doesn't just affect its neighbors. Because bloom filters propagate along tree edges with split-horizon
merging, the poisoned filter gets merged into the outbound filters of the attacker's parent and children. Those merged filters propagate further. Within a few seconds, nodes
multiple hops away are routing traffic toward the attacker's subtree for destinations that aren't there. The packets hit dead ends, generate PathBroken errors, and the mesh wastes
time on discovery floods trying to find routes that should exist but don't.
LookupRequest flooding (#6) — One node generates discovery requests with unique request_ids at high rate. Each request floods to all peers, who forward to all their peers (minus
visited filter). A single attacker node in a 1000-node mesh with average 5 peers generates thousands of forwarded packets per request. At even 10 requests/second, that's tens of
thousands of packets/second across the entire mesh. Every node pays the CPU and bandwidth cost. There's no "connect to the next node" — the flood reaches everyone.
Propagation speed reality
┌─────────────────────┬────────────────────────┬────────────────────────────────┬───────────────────────────────────┐
│       Attack        │      Spread rate       │             Scope              │             Recovery              │
├─────────────────────┼────────────────────────┼────────────────────────────────┼───────────────────────────────────┤
│ Packet dropping     │ N/A (local)            │ 1 link                         │ 30-60s automatic                  │
├─────────────────────┼────────────────────────┼────────────────────────────────┼───────────────────────────────────┤
│ Root hijack         │ ~500ms/hop             │ Entire mesh                    │ Manual intervention required      │
├─────────────────────┼────────────────────────┼────────────────────────────────┼───────────────────────────────────┤
│ Ancestry spoofing   │ ~500ms/hop             │ Entire mesh                    │ Until attacker disconnects        │
├─────────────────────┼────────────────────────┼────────────────────────────────┼───────────────────────────────────┤
│ Bloom poisoning     │ ~500ms/hop             │ Attacker's subtree + ancestors │ ~500ms/hop after attacker removed │
├─────────────────────┼────────────────────────┼────────────────────────────────┼───────────────────────────────────┤
│ LookupRequest flood │ Speed of light (flood) │ Entire mesh                    │ Until attacker removed            │
└─────────────────────┴────────────────────────┴────────────────────────────────┴───────────────────────────────────┘
Should this be implemented for the planet?
For a private mesh between trusted nodes — your machines, your friends' machines, people you know — yes, absolutely. The threat model assumes you control membership. The crypto is
excellent. The routing is elegant. It genuinely solves real problems (no DNS dependency, no CA dependency, no ISP dependency, multi-transport resilience).
For a public, permissionless mesh (anyone can join) — not yet. The four critical/high items (root hijacking, ancestry spoofing, bloom poisoning, discovery flooding) all assume a
malicious node inside the mesh. In a permissionless network, that's guaranteed. The protocol needs:

Root election with cost — proof-of-work, minimum age, or stake
Per-entry ancestry signatures — already on their roadmap
Bloom filter validation — reject suspiciously dense filters
Per-source LookupRequest rate limiting — cap flood generation

These are all solvable. The architecture is sound enough that fixes can be layered on. But right now, one motivated attacker with a laptop could destabilize a global permissionless
FIPS mesh in under 10 seconds. Your friend's intuition about routing around damage applies to the data plane but not to the control plane — and control plane attacks are what
matters here.
No results found