russellhaering/PARSER_DIFF_AUDIT_FINDINGS.md

## PARSER_DIFF_AUDIT_FINDINGS.md

      
    Raw
  

              PARSER_DIFF_AUDIT_FINDINGS.md
            
          
    Parser Differential Audit Findings — goxmldsig v2

Summary

goxmldsig v2 is NOT vulnerable to the CVE-2020-29509/29510/29511 class of parser differential attacks. The switch from encoding/xml to etree for XML parsing eliminates the core vulnerability.
22 tests written to parser_diff_audit_test.go, all passing.

Key Findings

1. etree vs encoding/xml — SAFE ✅

etree preserves namespace prefixes through round-trips. encoding/xml still mutates them (confirmed by meta-test):

Input:  <saml:Assertion xmlns:saml="urn:...">
encoding/xml output: <Assertion xmlns="urn:..." xmlns:_xmlns="xmlns" _xmlns:saml="urn:...">
etree output: <saml:Assertion xmlns:saml="urn:..."> (preserved exactly)

Since goxmldsig v2 uses etree for all parsing, the original CVE-2020-29509 attack vector is eliminated.
2. Re-parse in verifyDigest — SAFE ✅

After computing the digest, verifyDigest() does doc.ReadFromBytes(referencedBytes) and returns doc.Root(). Testing confirms:

The re-parsed element faithfully represents the digested bytes
Canonicalization is idempotent: C14N(parse(C14N(x))) == C14N(x) for all tested inputs
Text content, attributes, namespace bindings, and element structure are all preserved
Tested across all 6 canonicalization algorithms (C14N 1.0, 1.1, exc-c14n, with/without comments)

3. NSUnmarshalElement — RESIDUAL RISK ⚠️

NSUnmarshalElement in etreeutils/unmarshal.go serializes the verified *etree.Element to bytes and passes them to encoding/xml.Unmarshal(). This creates a potential parser differential: etree serializes the element faithfully, and encoding/xml.Unmarshal then re-parses those bytes.
Current status: Testing shows that for the verified canonical forms produced by the library, encoding/xml.Unmarshal produces the same text values as etree.Text(). The encoding/xml namespace prefix mutation doesn't affect the decoded values from xml.Unmarshal — it only affects re-serialization. Since NSUnmarshalElement deserializes into Go structs (not re-serializes), the values are correct.
However: This is defense-in-depth fragile. If a future version of encoding/xml or etree changes behavior, or if more exotic XML constructs are used, this could become a real differential. Consumers (like gosaml2) should be aware they're relying on encoding/xml for final value extraction.
4. Namespace prefix rebinding — SAFE ✅

Tested XML where prefix p: is bound to http://first at root level and rebound to http://second in a child element. After sign→verify:

etree preserves both bindings correctly
Namespace context resolution produces correct URIs at each scope level
The re-parsed verified element has correct prefix→namespace mappings

5. CDATA sections — SAFE ✅

etree converts CDATA to regular text during parsing and serializes with entity escaping. C14N also requires no CDATA sections. After sign→verify:

<![CDATA[<script>alert(1)</script>]]> produces text <script>alert(1)</script>
The equivalent entity-escaped form &lt;script&gt;... produces identical text
No child elements are created from CDATA content
CDATA and entity-escaped forms produce identical verified text

6. Entity expansion — SAFE ✅

etree expands all standard XML entities (&amp;, &lt;, &gt;, &quot;, &apos;) and numeric character references (&#65;, &#x42;) during parsing. After sign→verify, the text content matches the expected expanded form. C14N requires entity expansion, so there is no differential.
7. Processing instructions and XML declarations — SAFE ✅

PIs within the signed element are preserved by C14N (they're part of the element content). Text content around PIs is correctly preserved after sign→verify.
8. Comment injection (CVE-2018 Duo Labs class) — SAFE ✅

Tested <name>Alice<!--injected-->Bob</name> through the pipeline:

Without-comments C14N (default): Comment is stripped, text concatenated to "AliceBob". After re-parse, Text() returns "AliceBob". encoding/xml also sees "AliceBob". No comment nodes survive. No differential.
With-comments C14N: Comment survives in the re-parsed tree, but etree.Text() concatenates CharData around comments, returning "AliceBob". encoding/xml.Unmarshal also returns "AliceBob". No differential.

9. Minor finding: Signer doesn't propagate exc-c14n PrefixList

When MakeC14N10ExclusiveCanonicalizerWithPrefixList("b") is used as the Signer's canonicalizer, the PrefixList affects the digest computation but the Signer does NOT emit <InclusiveNamespaces PrefixList="b"> into the Transform element in SignedInfo. The verifier then uses an empty PrefixList, producing a different canonical form, causing signature verification failure. This is a Signer limitation (not a security vulnerability) — documents signed with a non-empty PrefixList cannot be verified. In practice, PrefixList is almost never used.

Architecture Assessment

The v2 library's architecture is sound for preventing parser differentials:

Single parser (etree) for all XML operations — parsing, canonicalization, signature construction, and verification
Canonical bytes are the source of truth — the verified element is re-parsed from the exact bytes that were digested
No encoding/xml in the critical path — encoding/xml is only used in NSUnmarshalElement, which is a consumer-facing utility, not part of the core sign/verify pipeline
Canonicalization is idempotent — C14N(parse(C14N(x))) == C14N(x) for all tested inputs and algorithms

The main residual risk is in the NSUnmarshalElement function, which bridges from the safe etree world back into the potentially-buggy encoding/xml world. This is acceptable because encoding/xml's namespace bugs affect serialization (not deserialization into structs), but it should be monitored.

  
## SECURITY_AUDIT.md

      
    Raw
  

              SECURITY_AUDIT.md
            
          
    goxmldsig v2 Security Audit Report

Date: 2025-03-12
Scope: Full source code audit of goxmldsig v2 (github.com/russellhaering/goxmldsig/v2)
Method: 5 parallel automated security analysis passes, 187 total test cases

Executive Summary

No exploitable signature bypass vulnerabilities found. The v2 architecture's
core defense — returning a reconstructed element from verified canonical bytes
rather than the original XML tree — effectively neutralizes all classic XSW
attack variants and parser differential attacks.
The audit found 1 functional bug, 1 DoS vector, 1 spec compliance bug,
and several low-severity issues worth documenting.

Findings by Severity

🔴 HIGH: SignEnveloped output fails Verify without reparse

Location: sign.go:307
Impact: Functional correctness bug; not a security bypass
SignEnveloped appends the Signature element via:
ret.Child = append(ret.Child, sig)
This bypasses etree's parent-pointer tracking. The returned element produces
different canonical output than what was digested during signing, so
Verify(signed) fails with ErrDigestMismatch.
Workaround: Serialize and re-parse the output (which all existing tests do
via signAndReparse). This is what gosaml2 effectively does since SAML
responses arrive as serialized XML.
Fix: Use ret.AddChild(sig) instead of direct slice append, or document
that the output must be serialized before verification.
🔴 HIGH: canonicalPrep unbounded recursion (DoS)

Location: canonicalize.go, canonicalPrepInner()
Impact: Denial of service via stack exhaustion
Unlike NSTraverse (which has a 1000-element traversal limit),
canonicalPrepInner recurses into child elements without any depth limit.
A crafted XML document with 10,000+ levels of nesting causes the function
to consume excessive stack space (~8.7 seconds for 10K levels in tests).
Fix: Add a depth limit parameter to canonicalPrepInner, consistent with
the NSContext traversal limit.
🟠 MEDIUM: Signer omits InclusiveNamespaces PrefixList in Transform element

Location: sign.go, constructSignedInfo()
Impact: Documents signed with exc-c14n + non-empty PrefixList cannot be verified
When Signer uses MakeC14N10ExclusiveCanonicalizerWithPrefixList("xs"),
it computes the digest using the prefix list but the serialized <Transform>
element in SignedInfo has no <InclusiveNamespaces PrefixList="xs"/> child.
The verifier reads PrefixList="" and computes a different canonical form.
This fails safely (rejects valid signatures, doesn't accept invalid ones).
Verification of externally-signed documents with PrefixList (e.g., from Okta)
works correctly because those documents include the InclusiveNamespaces element.
🟠 MEDIUM: Verify(nil) and SignEnveloped(nil) panic

Location: verify.go (Verify), sign.go (SignEnveloped)
Impact: Denial of service via nil pointer dereference
Neither function checks for nil input. All 6 Canonicalize implementations
also panic on nil input.
Fix: Add nil guards returning descriptive errors.
🟡 LOW: #default in exc-c14n PrefixList not handled per spec

Location: etreeutils/canonicalize.go, TransformExcC14n()
Impact: Spec deviation; fails safely
The exc-c14n spec reserves #default in the PrefixList to mean the default
namespace. The code treats #default as a literal prefix name.
🟡 LOW: VerifyString error semantics differ from Verify

Location: verify.go, VerifyString()
Impact: Misleading error messages
VerifyString checks cert expiry before trying keys and silently skips
expired certs. When all certs are expired, it returns ErrSignatureInvalid
instead of ErrCertificateExpired. Callers checking for specific error types
get inconsistent behavior.
🟡 LOW: Empty Reference URI accepted without restriction

Location: verify.go, findSignature() and verifyDigest()
Impact: Spec-compliant but may surprise SAML consumers
When refURI == "", the signature is accepted regardless of the element's
ID attribute. The digest still protects content integrity. Consumers should
be aware that an empty-URI signature matches any element.
🟡 LOW: Multiple Reference elements silently ignored

Location: verify.go, findChildByTag()
Impact: Only the first Reference is verified; extras are dropped
Per XML-DSig spec, SignedInfo may contain multiple Reference elements, each
requiring independent verification. findChildByTag returns only the first.
In SAML usage this is not exploitable (SAML uses a single Reference).
🟡 LOW: uriRegexp defined but never used

Location: verify.go:14
Impact: Dead code
The compiled regex ^#[a-zA-Z_][\w.-]*$ is never used in the verification
path. Reference URIs are matched via inline string comparison.
ℹ️ INFO: ECDSA signature malleability

For any valid ECDSA signature (r, s), the pair (r, n-s) is also accepted by
Go's ecdsa.Verify. This is inherent to ECDSA and not a verification bypass,
but could affect replay-detection systems that deduplicate on raw signature
bytes.
ℹ️ INFO: canonicalPrep strip parameter is dead code

Location: canonicalize.go, canonicalPrepInner()
The strip bool parameter is accepted but never consulted. Namespace
deduplication always runs regardless of the flag.
ℹ️ INFO: NullCanonicalizer unconditionally preserves comments

Calls canonicalPrep(el, false, true) with comments=true. Not exploitable
since verified elements are reconstructed from canonical bytes.
ℹ️ INFO: No minimum RSA key size

1024-bit RSA keys are accepted. This is a policy decision for callers.
Callers should validate key sizes in their TrustedCerts.
ℹ️ INFO: mapPathToElement/removeElementAtPath fragile index-based path

These functions walk the tree by child index. If the tree were mutated between
map and remove, stale indices could point to wrong elements. Mitigated in
Verify() by el.Copy(), but the API is inherently fragile.

Security Properties Verified ✅

XSW Attack Resistance (30 tests)


All classic XSW variants (XSW1-XSW8) correctly rejected
Verify() returns reconstructed element from canonical bytes, not original tree
SignedInfo re-parsed from canonical bytes after signature check (prevents TOCTOU)
Multiple signatures on same element rejected
Cross-element reference attacks rejected
Signature must be direct child of signed element
Comment injection produces clean returned element

Parser Differential Resistance (22 tests)


NOT vulnerable to CVE-2020-29509/29510/29511 class attacks
etree preserves namespace prefixes through round-trips (encoding/xml does not)
Re-parse in verifyDigest is faithful to digested bytes
Canonicalization is idempotent across all 6 algorithms
CDATA sections, entity expansion, and processing instructions handled correctly
Namespace prefix rebinding preserved correctly through sign→verify

Certificate & Crypto (43 tests)


cert.Equal() DER matching: strict, not bypassable
Attacker self-signed cert in KeyInfo → ErrCertificateNotTrusted
Algorithm confusion (RSA↔ECDSA) → type assertion catches it
Unknown/custom algorithm URIs → ErrAlgorithmNotAllowed
SHA-1 blocked by default, allowed only with AllowSHA1=true
Expired/not-yet-valid certs → ErrCertificateExpired
Truncated/padded RSA signatures → rejected
ECDSA wrong-length/zero signatures → rejected
Invalid base64 → ErrMalformedSignature
Concurrent verification thread-safe
ECDSA P-256/P-384/P-521 all work

C14N Correctness (33 tests, 71 subtests)


All 6 C14N algorithms produce correct output
Attribute sorting by namespace URI (not prefix) is correct
Namespace tampering detected
Comment injection safe (result reconstructed from canonical bytes)
C14N idempotency verified

Input Validation (47 tests, 59 subtests)


NSTraverse 1000-element limit works
Shape validation rejects malformed Signature elements
Empty algorithm URIs rejected
Invalid base64 handled without panic
Large base64 (~10MB) handled without hang
Attribute sort (5100 attrs) completes in ~30ms
URI regex not vulnerable to ReDoS
Concurrent access safe (no data races)


Residual Risk: NSUnmarshalElement

etreeutils/unmarshal.go contains NSUnmarshalElement which serializes
verified elements to bytes and passes them to encoding/xml.Unmarshal().
This bridges back to the buggy encoding/xml package. Currently safe because
encoding/xml's namespace bugs affect re-serialization, not struct deserialization.
But this is defense-in-depth fragile — future changes to either library could
reopen a differential. Consumers (like gosaml2) should be aware.

Critical SP Contract

SPs MUST use result.Element from VerifyResult. If an SP calls
Verify() and then reads from the original XML tree instead of the returned
element, ALL XSW protections are bypassed. This is the library's primary
security contract.

Test Files


File
Tests
Focus


xsw_audit_test.go
30
XML Signature Wrapping attacks


c14n_audit_test.go
33+71
Canonicalization edge cases


cert_crypto_audit_test.go
43
Certificate & crypto verification


parser_diff_audit_test.go
22
Parser differential attacks


input_validation_audit_test.go
47+59
DoS, panics, edge cases


Total: ~187 new security audit tests, all passing.
File	Tests	Focus
`xsw_audit_test.go`	30	XML Signature Wrapping attacks
`c14n_audit_test.go`	33+71	Canonicalization edge cases
`cert_crypto_audit_test.go`	43	Certificate & crypto verification
`parser_diff_audit_test.go`	22	Parser differential attacks
`input_validation_audit_test.go`	47+59	DoS, panics, edge cases
No results found