djanatyn/spinlock-analysis.md

## spinlock-analysis.md

      
    Raw
  

              spinlock-analysis.md
            
          
    Rust Atomic Spinlock Analysis

Overview

This analysis shows how Rust atomic operations with different memory orderings (Acquire, Release, Relaxed) translate through LLVM IR to actual ARM64 assembly instructions.
Key Findings

1. Lock Acquisition (Acquire Ordering)

Rust Source:
while self
    .locked
    .compare_exchange(false, true, Ordering::Acquire, Ordering::Relaxed)
    .is_err()
{
    std::hint::spin_loop();
}
LLVM IR:
bb8.i:
  %8 = cmpxchg ptr %_4.i, i8 0, i8 1 acquire monotonic, align 1
  %9 = extractvalue { i8, i1 } %8, 1
  br i1 %9, label %bb11.i, label %bb4.i3.i

bb4.i3.i:  ; spin loop
  tail call void @llvm.aarch64.isb(i32 noundef 15)
  %10 = cmpxchg ptr %_4.i, i8 0, i8 1 acquire monotonic, align 1
  %11 = extractvalue { i8, i1 } %10, 1
  br i1 %11, label %bb11.i, label %bb4.i3.i
ARM64 Assembly:
LBB5_2:                         ; spin loop
    isb                         ; Instruction Synchronization Barrier
LBB5_3:
    mov   w13, #0              ; expected value (false)
    add   x14, x10, #16        ; address of lock
    casab w13, w12, [x14]      ; Compare And Swap Acquire Byte
    cmp   w13, #0              ; check if we got the lock
    b.ne  LBB5_2               ; if not, spin again
2. Lock Release (Release Ordering)

Rust Source:
self.locked.store(false, Ordering::Release);
LLVM IR:
store atomic i8 0, ptr %_4.i release, align 1
ARM64 Assembly:
stlurb wzr, [x10, #16]  ; STore reLease Unscaled Register Byte
3. Counter Operations (Relaxed Ordering)

Rust Source:
let current = counter.load(Ordering::Relaxed);
counter.store(current + 1, Ordering::Relaxed);
LLVM IR:
%12 = load atomic i64, ptr %_18.i monotonic, align 8
%_6.i = add i64 %12, 1
store atomic i64 %_6.i, ptr %_18.i monotonic, align 8
ARM64 Assembly:
ldr x13, [x11, #16]    ; regular load (no barriers)
add x13, x13, #1       ; increment
str x13, [x11, #16]    ; regular store (no barriers)
Atomic Instruction Details

casab (Compare And Swap Acquire Byte)


Full Name: Compare And Swap Acquire Byte
Memory Ordering: Acquire semantics
Effect:

Atomically compares value at [x14] with w13 (0/false)
If equal, writes w12 (1/true) to [x14]
Returns original value in w13
Ensures all subsequent loads/stores see memory state after this operation


x86-64 equivalent: lock cmpxchg followed by load barrier

stlurb (STore reLease Unscaled Register Byte)


Full Name: Store Release Unscaled Register Byte
Memory Ordering: Release semantics
Effect:

Stores byte value to memory
Ensures all prior loads/stores complete before this store becomes visible
Makes modifications visible to other threads


x86-64 equivalent: Regular mov (x86-64 has strong ordering)

isb (Instruction Synchronization Barrier)


Full Name: Instruction Synchronization Barrier
From: std::hint::spin_loop()
Effect:

Flushes pipeline and ensures subsequent instructions see any context changes
Optimizes spin loops by reducing power consumption
Prevents speculative execution during spinning


x86-64 equivalent: pause instruction

dmb ishld (Data Memory Barrier)


Full Name: Data Memory Barrier, Inner Shareable, Load
Memory Ordering: Acquire fence
Effect:

Ensures all prior loads complete before subsequent operations
Used when downgrading reference counts (Arc drops)


x86-64 equivalent: lfence or implicit in loads

ldaddl (Load-Add Release)


Full Name: Load and Add with Release ordering
Used for: Arc reference counting (atomicrmw sub)
Effect:

Atomically adds value to memory location
Returns original value
Release ordering ensures prior operations complete first


Memory Ordering Translation


Rust Ordering
LLVM Ordering
ARM64 Instruction
Memory Barrier


Acquire
acquire
casab (CAS with acquire)
Load barrier after


Release
release
stlurb (Store with release)
Store barrier before


Relaxed
monotonic
ldr / str (regular)
None


Arc fence
fence acquire
dmb ishld
Full load barrier


Cache Line Behavior

The atomic operations ensure cache coherency across cores:


Lock Acquisition (casab):

Triggers cache line acquisition in Exclusive state
Other cores' cache lines invalidated
Acquire semantics prevent reordering of subsequent loads


Lock Release (stlurb):

Flushes cache line to shared state
Makes modifications visible to other cores
Release semantics prevent reordering of prior stores


Spin Loop (isb):

Reduces contention by yielding CPU resources
Prevents aggressive polling that would cause cache line ping-ponging


Performance Characteristics


Uncontended lock: ~2-3 cycles for casab + minimal overhead
Contended lock: Spin loop with isb reduces power consumption
Memory barriers:

Acquire: ~4-10 cycles depending on cache state
Release: ~4-10 cycles depending on cache state
Full barrier (dmb): ~10-20 cycles


Verification

Running the program with 2 threads × 10,000 iterations:
Final counter value: 20000
Expected value: 20000

The spinlock successfully synchronizes access to the shared counter, demonstrating correct implementation of acquire-release semantics.

  
## spinlock.rs
use std::sync::atomic::{AtomicBool, AtomicUsize, Ordering};
use std::sync::Arc;
use std::thread;
use std::time::Duration;

/// Simple spinlock implementation using AtomicBool
struct SpinLock {
    locked: AtomicBool,
}

impl SpinLock {
    const fn new() -> Self {
        SpinLock {
            locked: AtomicBool::new(false),
        }
    }

    fn lock(&self) {
        // Try to acquire the lock using compare_exchange
        // Note: compare_and_swap is deprecated, using compare_exchange instead
        while self
            .locked
            .compare_exchange(false, true, Ordering::Acquire, Ordering::Relaxed)
            .is_err()
        {
            // Spin loop hint to optimize spinning
            std::hint::spin_loop();
        }
    }

    fn unlock(&self) {
        // Release the lock
        self.locked.store(false, Ordering::Release);
    }
}

fn main() {
    // Shared spinlock and counter
    let spinlock = Arc::new(SpinLock::new());
    let counter = Arc::new(AtomicUsize::new(0));

    let num_iterations = 10000;
    let mut handles = vec![];

    // Spawn thread 1
    let spinlock1 = Arc::clone(&spinlock);
    let counter1 = Arc::clone(&counter);
    let handle1 = thread::spawn(move || {
        for _ in 0..num_iterations {
            spinlock1.lock();

            // Critical section: increment counter
            let current = counter1.load(Ordering::Relaxed);
            counter1.store(current + 1, Ordering::Relaxed);

            spinlock1.unlock();
        }
    });
    handles.push(handle1);

    // Spawn thread 2
    let spinlock2 = Arc::clone(&spinlock);
    let counter2 = Arc::clone(&counter);
    let handle2 = thread::spawn(move || {
        for _ in 0..num_iterations {
            spinlock2.lock();

            // Critical section: increment counter
            let current = counter2.load(Ordering::Relaxed);
            counter2.store(current + 1, Ordering::Relaxed);

            spinlock2.unlock();
        }
    });
    handles.push(handle2);

    // Wait for both threads to complete
    for handle in handles {
        handle.join().unwrap();
    }

    println!("Final counter value: {}", counter.load(Ordering::Relaxed));
    println!("Expected value: {}", num_iterations * 2);
}
Rust Ordering	LLVM Ordering	ARM64 Instruction	Memory Barrier
`Acquire`	`acquire`	`casab` (CAS with acquire)	Load barrier after
`Release`	`release`	`stlurb` (Store with release)	Store barrier before
`Relaxed`	`monotonic`	`ldr` / `str` (regular)	None
Arc fence	`fence acquire`	`dmb ishld`	Full load barrier
	use std::sync::atomic::{AtomicBool, AtomicUsize, Ordering};
	use std::sync::Arc;
	use std::thread;
	use std::time::Duration;

	/// Simple spinlock implementation using AtomicBool
	struct SpinLock {
	locked: AtomicBool,
	}

	impl SpinLock {
	const fn new() -> Self {
	SpinLock {
	locked: AtomicBool::new(false),
	}
	}

	fn lock(&self) {
	// Try to acquire the lock using compare_exchange
	// Note: compare_and_swap is deprecated, using compare_exchange instead
	while self
	.locked
	.compare_exchange(false, true, Ordering::Acquire, Ordering::Relaxed)
	.is_err()
	{
	// Spin loop hint to optimize spinning
	std::hint::spin_loop();
	}
	}

	fn unlock(&self) {
	// Release the lock
	self.locked.store(false, Ordering::Release);
	}
	}

	fn main() {
	// Shared spinlock and counter
	let spinlock = Arc::new(SpinLock::new());
	let counter = Arc::new(AtomicUsize::new(0));

	let num_iterations = 10000;
	let mut handles = vec![];

	// Spawn thread 1
	let spinlock1 = Arc::clone(&spinlock);
	let counter1 = Arc::clone(&counter);
	let handle1 = thread::spawn(move \|\| {
	for _ in 0..num_iterations {
	spinlock1.lock();

	// Critical section: increment counter
	let current = counter1.load(Ordering::Relaxed);
	counter1.store(current + 1, Ordering::Relaxed);

	spinlock1.unlock();
	}
	});
	handles.push(handle1);

	// Spawn thread 2
	let spinlock2 = Arc::clone(&spinlock);
	let counter2 = Arc::clone(&counter);
	let handle2 = thread::spawn(move \|\| {
	for _ in 0..num_iterations {
	spinlock2.lock();

	// Critical section: increment counter
	let current = counter2.load(Ordering::Relaxed);
	counter2.store(current + 1, Ordering::Relaxed);

	spinlock2.unlock();
	}
	});
	handles.push(handle2);

	// Wait for both threads to complete
	for handle in handles {
	handle.join().unwrap();
	}

	println!("Final counter value: {}", counter.load(Ordering::Relaxed));
	println!("Expected value: {}", num_iterations * 2);
	}