(Claude Code; Opus 4.6)
Runtime stack memory corruption on Windows amd64 causes a DEP violation (Exception
0xc0000005 code 0x8) when a goroutine jumps to a corrupted return address. The
corruption always overwrites the high 32 bits of a return address, replacing a valid
code pointer (e.g. 0x00007ff6XXXXXXXX) with a value whose high 32 bits are a
small number (e.g. 0x00000010XXXXXXXX). The low 32 bits are preserved. The
corrupted value is a normal heap/stack address that gets written over the return
address's upper dword.
The crash was initially reported as a Go 1.26 regression, but testing showed it also reproduces with Go 1.25.0 and Go tip (master). It may have become more frequent in 1.26 due to changes in binary layout or stack usage patterns.
The crash reliably reproduces by running tailscale.com/tsnet tests on Windows
amd64 with -test.count=3:
GOOS=windows GOARCH=amd64 go test -c -o tsnet_test.exe ./tsnet/
tsnet_test.exe -test.timeout=90s -test.count=3
The crash typically occurs during TestConn or later tests, in a goroutine running
derpserver.(*sclient).run which reads DERP frames via ReadFrameHeader. The
crashing goroutine was created by net/http.(*Server).Serve for an HTTPS DERP
connection.
The crash was tested on:
- Windows 11 build 26200 (12th Gen Intel i7-1255U, hybrid P+E cores)
- GitHub Actions Windows runners (Azure VMs, various CPUs)
- Always the high 32 bits of an 8-byte return address on the goroutine stack
- The corrupted return address was pushed by a CALL instruction (to
ReadFrameHeader) - The goroutine's RET pops this corrupted value and jumps to it, causing DEP violation
- The value written over the high 32 bits is a normal-range address (not a special constant)
GODEBUG=asyncpreemptoff=1prevents the crash (test times out instead)- This was the first and most definitive finding
- Setting
stackMin=4096(or higher) prevents the crash (test times out instead) - The default
stackMin=2048allows the crash - The crashing goroutine consistently has
stackcopycountof 10-12, meaning its stack was copied/grown 10-12 times during its lifetime - The combination of async preemption + small stacks + stack growth is the trigger
- Instrumented
preemptMto verify the resumePC was correctly written to the goroutine stack byPushCalland still correct afterSetThreadContext - The value was always correct at that point
- The corruption occurs later, after the goroutine has been resumed
GOGC=offstill crashesGODEBUG=gcshrinkstackoff=1still crashes
stackPoisonCopy=1(fills old stack with 0xfd after copy) still crashes with the same pattern (no 0xfd values in the corrupted data)stackFaultOnFree=1(maps old stack pages as inaccessible) still crashes with the same pattern (no access violation on old stack pages)
- Added a post-
adjustframeverification incopystackthat compared every 8-byte value in the new stack against the old stack for the corruption pattern. It did not fire. This means the corruption is not introduced bymemmoveoradjustframeduring the copy itself.
debugCheckBP=truedetected 98 instances of invalid frame pointers duringcopystackwhen goroutines in Windows cgo callback chains (e.g., the desktop session watcher'spumpThreadMessages->getMessage-> Windows callback ->wndProc->destroyWindow-> Windows callback ->callbackWrap) need stack growth- The BP chain in these goroutines crosses from the Go goroutine stack to the
Windows system/thread stack.
adjustframeencounters a BP value outside the goroutine's stack range. - In the non-debug path,
adjustpointercorrectly skips adjustment of out-of-range values, so this is "safe" in that it doesn't corrupt data, but it means the BP chain is broken after the stack copy. - This is a SEPARATE bug from the DERP return address corruption. The DERP
goroutine has no cgo frames. Both bugs involve
copystackbut in different goroutines with different stack structures.
The Go linker emits .pdata (Windows SEH function table) entries unsorted,
violating the PE/COFF spec requirement. Windows RtlLookupFunctionEntry does a
binary search on these entries. The sort fix exists on master but is not in Go
1.26.x. However, the crash still occurs with the sort fix applied (tested on
master). The .pdata issue is a real bug but is not the cause of this crash.
Added a check for GetThreadContext returning 0 (failure). It never failed.
Added a loop to scan the goroutine stack for the corruption pattern immediately
after PushCall + SetThreadContext. No corruption was found at that point,
confirming the corruption happens later.
Setting stackMin to 4096, 8192, or 65536 all prevent the crash, but also cause
the test to time out. Larger stacks mean less stack growth, which avoids the bug.
This doesn't pinpoint the mechanism but confirms stack growth is part of the
trigger.
Wrote two minimal Go programs that create many goroutines doing frame-reading over TCP/TLS connections with small stack frames, similar to the DERP server pattern. Neither crashed. The full tsnet test suite is needed to trigger the bug, suggesting it requires a specific combination of goroutine count, stack depth, I/O patterns, and timing.
The goroutine is deep in a call stack when async-preempted. It yields via
asyncPreempt -> asyncPreempt2 -> mcall -> gopreempt_m. When later
rescheduled, it returns through asyncPreempt back to the interrupted code. The
interrupted code (or a subsequent function call) triggers stack growth via
morestack -> newstack -> copystack.
During this stack growth, something goes wrong. The memmove and adjustframe
produce correct results (verified), but something after copystack returns uses a
stale or incorrect reference to the old stack location, writing data to what is now
either freed memory or another goroutine's stack. This write overwrites 4 bytes of
a return address with heap/stack address data.
The stackFaultOnFree test should have caught a write to freed stack pages but
didn't, which means either:
- The old stack pages were immediately reused (returned to the stack pool and given to another goroutine), making them accessible
- The corruption is on the NEW stack, not the old one, but the post-copy check
missed it (perhaps due to timing - the corruption happens after
copystackreturns) - The corruption involves a different mechanism entirely
When a goroutine is async-preempted via SuspendThread + SetThreadContext +
ResumeThread, and then its stack is grown, there might be a subtle interaction
where Windows retains internal references to the old stack (e.g., for APC delivery,
exception handling, or thread context restoration) that become stale after the
stack moves. This wouldn't be caught by stackFaultOnFree if Windows keeps its
own mappings.
-
Use a real debugger: Set a hardware data breakpoint (DR0-DR3) on the return address location to catch the exact instruction that overwrites it. This requires a Windows debugger (WinDbg) attached to the process.
-
Add per-goroutine stack-copy tracking with frame validation: After each
copystack, walk the new stack's frame pointer chain and validate that all return addresses look like valid code pointers (high bits in the expected image range). -
Bisect the stack growth: Instead of growing the stack to a new allocation, try growing it in-place (remap to a larger region) to eliminate the memmove/pointer-adjustment path.
-
Test with
stackNoCache=1: Prevent stack page reuse to see if the corruption changes (would confirm if old stack pages being reused by other goroutines is part of the story). -
Investigate the cgo callback BP bug: The 98 invalid-BP-during-copystack warnings are a real bug that should be filed separately. While not the direct cause of the DERP corruption, they indicate that
copystackhas difficulty with certain frame layouts on Windows.
Windows Stack Memory Corruption Investigation
Root Cause
Windows overlapped I/O APIs (
WSARecv,WSARecvFrom,WSASend,WSASendto,AcceptEx, etc.) receive pointers to output parameters (lpNumberOfBytesRecvd,lpFlags,lpFromlen). When the operation completes asynchronously(
ERROR_IO_PENDING), Windows writes completion results back to these addresses.In Go's
internal/poll/fd_windows.go, these output parameters were declared asstack-local variables inside closures passed to
execIO:The closure executes the Windows API call, then returns to
execIO.execIOthenenters
waitIOto park the goroutine until the I/O completes. But by this point,the closure's stack frame has been popped -
flagsandqtyare dead stack space,now reused by
waitIO's call frames (return addresses, saved registers, etc.).When Windows completes the I/O and writes the results to the original addresses, it
overwrites whatever now occupies those stack locations. A 32-bit write of a small
integer (like a flags value or byte count) to the high 32 bits of a return address
produces the observed corruption pattern:
0x00007ff6XXXXXXXXbecomes0x00000010XXXXXXXX.The affected call sites are every
execIOclosure that passes&qty,&flags, or&rsan(fromlen) to a Windows socket API:Read(kindNet path):WSARecvgets&qtyand&flagsReadFrom,ReadFromInet4,ReadFromInet6:WSARecvFromgets&qty,&flags,&rsanWrite(kindNet path):WSASendgets&qtyWriteTo,WriteToInet4,WriteToInet6:WSASendtogets&qtyWritev:WSASendgets&qtyacceptOne:AcceptExgets&qtywaitForReading:WSARecvgets&qtyand&flagsReadMsg,ReadMsgInet4,ReadMsgInet6:WSARecvMsggets&qtyWSAGetOverlappedResultinexecIOitself: gets&flagsThe Fix
Move the output parameters from closure stack locals to fields in the
heap-allocated
operationstruct, which is pooled and lives for the entire durationof the I/O operation:
Then change all closures to use
&o.qty,&o.flags,&o.rsaninstead of stacklocals.
Why Async Preemption and Stack Growth Appear Involved
The bug requires this sequence:
WSARecv/etc., which returnsERROR_IO_PENDINGflags,qty) become dead stack spaceexecIOcallswaitIO, whose call frames reuse the dead stack spacewaitIO's return addresses or saved registersWithout stack growth, this corruption may be "silent" - the corrupted stack
locations happen to be in the goroutine's current stack segment, and the values
written (small integers) may not cause an immediate crash if the goroutine doesn't
use those particular stack slots again. But with stack growth:
copystackmoves the goroutine stack to a new allocationThis explains why
GODEBUG=asyncpreemptoff=1andstackMin=4096both prevent thecrash - they reduce stack growth, reducing the chance of old stack addresses being
freed and reused.
stackFaultOnFree=1changed the corruption pattern (from0x10XXXXXXXXto0x12)because decommitting old stack pages changes what Windows reads/writes there.
Verification
GODEBUG=asyncpreemptoff=1(no fix)stackMin=4096(no fix)GOGC=off(no fix)stackFaultOnFree=1(no fix)Note: the tsnet tests are independently flaky on Windows and sometimes hang
regardless of this bug. During investigation we used a 90s timeout which was often
too short. With a 300s timeout the fixed binary passes cleanly.
Reproduction
The crash reliably reproduces by running
tailscale.com/tsnettests on Windowsamd64 with
-test.count=3:Tested on Windows 11 build 26200 (12th Gen Intel i7-1255U) and GitHub Actions
Windows runners.
Other Bugs Found During Investigation
Unsorted
.pdatasection (linker bug)The Go linker emits
.pdata(Windows SEH function table) entries unsorted,violating the PE/COFF spec requirement that RUNTIME_FUNCTION entries be sorted by
function start address. This can cause
RtlLookupFunctionEntry(binary search) toreturn incorrect results. Fixed on master by commit
bbed50aaa3but not yet in Go1.26.x. Not the cause of the stack corruption, but a real bug.
Invalid frame pointers during cgo callback stack growth
debugCheckBP=truedetected 98 instances of invalid frame pointers duringcopystackfor goroutines in Windows cgo callback chains. The frame pointer chainin these goroutines crosses from the Go goroutine stack to the Windows system stack.
adjustframeencounters BP values outside the goroutine's stack range. In thenon-debug path,
adjustpointercorrectly skips adjustment of out-of-range values,so no data corruption occurs, but the BP chain is broken after the stack copy. This
is a separate issue from the I/O corruption bug.
Investigation Path
What we tried (chronological)
.pdatasorting - not the causeGODEBUG=asyncpreemptoff=1- prevented crash, pointed to preemptionGOGC=off- still crashed, ruled out GCGODEBUG=gcshrinkstackoff=1- still crashed, ruled out stack shrinkingPushCallinpreemptM- write was correct, corruption is laterSetThreadContextdidn't corrupt stack - it didn'tstackPoisonCopy=1- still crashed with same pattern (no 0xfc in corrupted data)stackMin=4096/8192/65536- prevented crash, confirmed stack growth involvementstackcopycountfield to g struct - crashing goroutine had 10-12 copiesadjustframeverification incopystack- no corruption detected during copydebugCheckBP=true- found the cgo callback BP bug (separate issue)debugCheckBPas warning (non-fatal) - 98 BP warnings plus DERP crash, confirmed separatestackFaultOnFree=1- different crash pattern (PC=0x12), suggested stale pointer readstackNoCache=1- still crashed, stack cache reuse not the mechanismstackFaultOnFree=1+stackNoCache=1- crash at PC=0x12, confirmed old stack involvementexecIOclosures for stack-allocated output parameters passed to Windows APIsoperationstruct - FIXED