Skip to content

Instantly share code, notes, and snippets.

@djanatyn
Created November 25, 2025 16:48
Show Gist options
  • Select an option

  • Save djanatyn/3ea1c8b7907eeb205c540fb4d90de8d1 to your computer and use it in GitHub Desktop.

Select an option

Save djanatyn/3ea1c8b7907eeb205c540fb4d90de8d1 to your computer and use it in GitHub Desktop.

Rust Atomic Spinlock Analysis

Overview

This analysis shows how Rust atomic operations with different memory orderings (Acquire, Release, Relaxed) translate through LLVM IR to actual ARM64 assembly instructions.

Key Findings

1. Lock Acquisition (Acquire Ordering)

Rust Source:

while self
    .locked
    .compare_exchange(false, true, Ordering::Acquire, Ordering::Relaxed)
    .is_err()
{
    std::hint::spin_loop();
}

LLVM IR:

bb8.i:
  %8 = cmpxchg ptr %_4.i, i8 0, i8 1 acquire monotonic, align 1
  %9 = extractvalue { i8, i1 } %8, 1
  br i1 %9, label %bb11.i, label %bb4.i3.i

bb4.i3.i:  ; spin loop
  tail call void @llvm.aarch64.isb(i32 noundef 15)
  %10 = cmpxchg ptr %_4.i, i8 0, i8 1 acquire monotonic, align 1
  %11 = extractvalue { i8, i1 } %10, 1
  br i1 %11, label %bb11.i, label %bb4.i3.i

ARM64 Assembly:

LBB5_2:                         ; spin loop
    isb                         ; Instruction Synchronization Barrier
LBB5_3:
    mov   w13, #0              ; expected value (false)
    add   x14, x10, #16        ; address of lock
    casab w13, w12, [x14]      ; Compare And Swap Acquire Byte
    cmp   w13, #0              ; check if we got the lock
    b.ne  LBB5_2               ; if not, spin again

2. Lock Release (Release Ordering)

Rust Source:

self.locked.store(false, Ordering::Release);

LLVM IR:

store atomic i8 0, ptr %_4.i release, align 1

ARM64 Assembly:

stlurb wzr, [x10, #16]  ; STore reLease Unscaled Register Byte

3. Counter Operations (Relaxed Ordering)

Rust Source:

let current = counter.load(Ordering::Relaxed);
counter.store(current + 1, Ordering::Relaxed);

LLVM IR:

%12 = load atomic i64, ptr %_18.i monotonic, align 8
%_6.i = add i64 %12, 1
store atomic i64 %_6.i, ptr %_18.i monotonic, align 8

ARM64 Assembly:

ldr x13, [x11, #16]    ; regular load (no barriers)
add x13, x13, #1       ; increment
str x13, [x11, #16]    ; regular store (no barriers)

Atomic Instruction Details

casab (Compare And Swap Acquire Byte)

  • Full Name: Compare And Swap Acquire Byte
  • Memory Ordering: Acquire semantics
  • Effect:
    • Atomically compares value at [x14] with w13 (0/false)
    • If equal, writes w12 (1/true) to [x14]
    • Returns original value in w13
    • Ensures all subsequent loads/stores see memory state after this operation
  • x86-64 equivalent: lock cmpxchg followed by load barrier

stlurb (STore reLease Unscaled Register Byte)

  • Full Name: Store Release Unscaled Register Byte
  • Memory Ordering: Release semantics
  • Effect:
    • Stores byte value to memory
    • Ensures all prior loads/stores complete before this store becomes visible
    • Makes modifications visible to other threads
  • x86-64 equivalent: Regular mov (x86-64 has strong ordering)

isb (Instruction Synchronization Barrier)

  • Full Name: Instruction Synchronization Barrier
  • From: std::hint::spin_loop()
  • Effect:
    • Flushes pipeline and ensures subsequent instructions see any context changes
    • Optimizes spin loops by reducing power consumption
    • Prevents speculative execution during spinning
  • x86-64 equivalent: pause instruction

dmb ishld (Data Memory Barrier)

  • Full Name: Data Memory Barrier, Inner Shareable, Load
  • Memory Ordering: Acquire fence
  • Effect:
    • Ensures all prior loads complete before subsequent operations
    • Used when downgrading reference counts (Arc drops)
  • x86-64 equivalent: lfence or implicit in loads

ldaddl (Load-Add Release)

  • Full Name: Load and Add with Release ordering
  • Used for: Arc reference counting (atomicrmw sub)
  • Effect:
    • Atomically adds value to memory location
    • Returns original value
    • Release ordering ensures prior operations complete first

Memory Ordering Translation

Rust Ordering LLVM Ordering ARM64 Instruction Memory Barrier
Acquire acquire casab (CAS with acquire) Load barrier after
Release release stlurb (Store with release) Store barrier before
Relaxed monotonic ldr / str (regular) None
Arc fence fence acquire dmb ishld Full load barrier

Cache Line Behavior

The atomic operations ensure cache coherency across cores:

  1. Lock Acquisition (casab):

    • Triggers cache line acquisition in Exclusive state
    • Other cores' cache lines invalidated
    • Acquire semantics prevent reordering of subsequent loads
  2. Lock Release (stlurb):

    • Flushes cache line to shared state
    • Makes modifications visible to other cores
    • Release semantics prevent reordering of prior stores
  3. Spin Loop (isb):

    • Reduces contention by yielding CPU resources
    • Prevents aggressive polling that would cause cache line ping-ponging

Performance Characteristics

  • Uncontended lock: ~2-3 cycles for casab + minimal overhead
  • Contended lock: Spin loop with isb reduces power consumption
  • Memory barriers:
    • Acquire: ~4-10 cycles depending on cache state
    • Release: ~4-10 cycles depending on cache state
    • Full barrier (dmb): ~10-20 cycles

Verification

Running the program with 2 threads × 10,000 iterations:

Final counter value: 20000
Expected value: 20000

The spinlock successfully synchronizes access to the shared counter, demonstrating correct implementation of acquire-release semantics.

use std::sync::atomic::{AtomicBool, AtomicUsize, Ordering};
use std::sync::Arc;
use std::thread;
use std::time::Duration;
/// Simple spinlock implementation using AtomicBool
struct SpinLock {
locked: AtomicBool,
}
impl SpinLock {
const fn new() -> Self {
SpinLock {
locked: AtomicBool::new(false),
}
}
fn lock(&self) {
// Try to acquire the lock using compare_exchange
// Note: compare_and_swap is deprecated, using compare_exchange instead
while self
.locked
.compare_exchange(false, true, Ordering::Acquire, Ordering::Relaxed)
.is_err()
{
// Spin loop hint to optimize spinning
std::hint::spin_loop();
}
}
fn unlock(&self) {
// Release the lock
self.locked.store(false, Ordering::Release);
}
}
fn main() {
// Shared spinlock and counter
let spinlock = Arc::new(SpinLock::new());
let counter = Arc::new(AtomicUsize::new(0));
let num_iterations = 10000;
let mut handles = vec![];
// Spawn thread 1
let spinlock1 = Arc::clone(&spinlock);
let counter1 = Arc::clone(&counter);
let handle1 = thread::spawn(move || {
for _ in 0..num_iterations {
spinlock1.lock();
// Critical section: increment counter
let current = counter1.load(Ordering::Relaxed);
counter1.store(current + 1, Ordering::Relaxed);
spinlock1.unlock();
}
});
handles.push(handle1);
// Spawn thread 2
let spinlock2 = Arc::clone(&spinlock);
let counter2 = Arc::clone(&counter);
let handle2 = thread::spawn(move || {
for _ in 0..num_iterations {
spinlock2.lock();
// Critical section: increment counter
let current = counter2.load(Ordering::Relaxed);
counter2.store(current + 1, Ordering::Relaxed);
spinlock2.unlock();
}
});
handles.push(handle2);
// Wait for both threads to complete
for handle in handles {
handle.join().unwrap();
}
println!("Final counter value: {}", counter.load(Ordering::Relaxed));
println!("Expected value: {}", num_iterations * 2);
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment