goxmldsig v2 is NOT vulnerable to the CVE-2020-29509/29510/29511 class of parser differential attacks. The switch from encoding/xml to etree for XML parsing eliminates the core vulnerability.
22 tests written to parser_diff_audit_test.go, all passing.
etree preserves namespace prefixes through round-trips. encoding/xml still mutates them (confirmed by meta-test):
- Input:
<saml:Assertion xmlns:saml="urn:..."> - encoding/xml output:
<Assertion xmlns="urn:..." xmlns:_xmlns="xmlns" _xmlns:saml="urn:..."> - etree output:
<saml:Assertion xmlns:saml="urn:...">(preserved exactly)
Since goxmldsig v2 uses etree for all parsing, the original CVE-2020-29509 attack vector is eliminated.
After computing the digest, verifyDigest() does doc.ReadFromBytes(referencedBytes) and returns doc.Root(). Testing confirms:
- The re-parsed element faithfully represents the digested bytes
- Canonicalization is idempotent:
C14N(parse(C14N(x))) == C14N(x)for all tested inputs - Text content, attributes, namespace bindings, and element structure are all preserved
- Tested across all 6 canonicalization algorithms (C14N 1.0, 1.1, exc-c14n, with/without comments)
NSUnmarshalElement in etreeutils/unmarshal.go serializes the verified *etree.Element to bytes and passes them to encoding/xml.Unmarshal(). This creates a potential parser differential: etree serializes the element faithfully, and encoding/xml.Unmarshal then re-parses those bytes.
Current status: Testing shows that for the verified canonical forms produced by the library, encoding/xml.Unmarshal produces the same text values as etree.Text(). The encoding/xml namespace prefix mutation doesn't affect the decoded values from xml.Unmarshal — it only affects re-serialization. Since NSUnmarshalElement deserializes into Go structs (not re-serializes), the values are correct.
However: This is defense-in-depth fragile. If a future version of encoding/xml or etree changes behavior, or if more exotic XML constructs are used, this could become a real differential. Consumers (like gosaml2) should be aware they're relying on encoding/xml for final value extraction.
Tested XML where prefix p: is bound to http://first at root level and rebound to http://second in a child element. After sign→verify:
- etree preserves both bindings correctly
- Namespace context resolution produces correct URIs at each scope level
- The re-parsed verified element has correct prefix→namespace mappings
etree converts CDATA to regular text during parsing and serializes with entity escaping. C14N also requires no CDATA sections. After sign→verify:
<![CDATA[<script>alert(1)</script>]]>produces text<script>alert(1)</script>- The equivalent entity-escaped form
<script>...produces identical text - No child elements are created from CDATA content
- CDATA and entity-escaped forms produce identical verified text
etree expands all standard XML entities (&, <, >, ", ') and numeric character references (A, B) during parsing. After sign→verify, the text content matches the expected expanded form. C14N requires entity expansion, so there is no differential.
PIs within the signed element are preserved by C14N (they're part of the element content). Text content around PIs is correctly preserved after sign→verify.
Tested <name>Alice<!--injected-->Bob</name> through the pipeline:
- Without-comments C14N (default): Comment is stripped, text concatenated to "AliceBob". After re-parse,
Text()returns "AliceBob". encoding/xml also sees "AliceBob". No comment nodes survive. No differential. - With-comments C14N: Comment survives in the re-parsed tree, but
etree.Text()concatenates CharData around comments, returning "AliceBob". encoding/xml.Unmarshal also returns "AliceBob". No differential.
When MakeC14N10ExclusiveCanonicalizerWithPrefixList("b") is used as the Signer's canonicalizer, the PrefixList affects the digest computation but the Signer does NOT emit <InclusiveNamespaces PrefixList="b"> into the Transform element in SignedInfo. The verifier then uses an empty PrefixList, producing a different canonical form, causing signature verification failure. This is a Signer limitation (not a security vulnerability) — documents signed with a non-empty PrefixList cannot be verified. In practice, PrefixList is almost never used.
The v2 library's architecture is sound for preventing parser differentials:
- Single parser (etree) for all XML operations — parsing, canonicalization, signature construction, and verification
- Canonical bytes are the source of truth — the verified element is re-parsed from the exact bytes that were digested
- No encoding/xml in the critical path — encoding/xml is only used in NSUnmarshalElement, which is a consumer-facing utility, not part of the core sign/verify pipeline
- Canonicalization is idempotent —
C14N(parse(C14N(x))) == C14N(x)for all tested inputs and algorithms
The main residual risk is in the NSUnmarshalElement function, which bridges from the safe etree world back into the potentially-buggy encoding/xml world. This is acceptable because encoding/xml's namespace bugs affect serialization (not deserialization into structs), but it should be monitored.