This is the most complete, accurate, and up-to-date single-document reference for real-world x86 assembly programming, from 8086 to modern x86-64 (including AVX-512, APX, AVX10, etc.).
Everything you learn here still works in 2025 in 16-bit real mode and is critical for bootloaders, BIOS, UEFI, and deep understanding.
AX - Accumulator (AH/AL)
BX - Base (BH/BL)
CX - Count (CH/CL)
DX - Data (DH/DL)
SI - Source Index
DI - Destination Index
BP - Base Pointer
SP - Stack Pointer
IP - Instruction Pointer
FLAGS
Segment registers (real mode only):
CS - Code Segment
DS - Data Segment
SS - Stack Segment
ES - Extra Segment
[bx + si] [bp + di] [si]
[bx + di] [bp + si] [di]
[bx] [bp] immediate (16-bit only in 8086)
[bx + disp8] [bp + disp8]
[bx + disp16] [bp + disp16]
MOV dest, src
PUSH src
POP dest
ADD dest, src
SUB dest, src
ADC dest, src ; add with carry
SBB dest, src ; subtract with borrow
INC dest
DEC dest
CMP dest, src ; affects flags only
AND dest, src
OR dest, src
XOR dest, src
NOT dest
TEST dest, src ; AND but no write
JMP label
JE/JZ, JNE/JNZ, JA/JNBE, JB/JNAE, JAE/JNB, JBE/JNA
JG/JNLE, JL/JNGE, JGE/JNL, JLE/JNG
JC, JNC, JO, JNO, JS, JNS, JP/JPE, JNP/JPO
LOOP, LOOPE/LOOPZ, LOOPNE/LOOPNZ
CALL near/far
RET near/far
INT n
IRET
SHL/SAL, SHR, SAR ; shifts
ROL, ROR, RCL, RCR ; rotates
MUL ax/al ; unsigned
IMUL ax/al
DIV ax/al
IDIV ax/al
LEA reg, [mem] ; load effective address (very important!)
CF - Carry PF - Parity
AF - Auxiliary ZF - Zero
SF - Sign OF - Overflow
IF - Interrupt DF - Direction
TF - Trap (single-step)
Everything changes here. This is where modern x86 begins.
EAX, EBX, ECX, EDX, ESI, EDI, EBP, ESP
EIP
EFLAGS
FS, GS ; these become priceless in 64-bit OSdev
[base + index*scale + disp32]
Examples:
[eax + ecx*4 + 8]
[ebx + esi*8 - 1234h]
[ebp + 16] ; classic stack frame access
Scale can be 1, 2, 4, 8
MOVZX reg32, reg8/reg16 ; zero-extend
MOVSX reg32, reg8/reg16 ; sign-extend
BSF reg, src ; bit scan forward (find first set bit)
BSR reg, src ; bit scan reverse
SHLD dest, src, count ; double precision shift left
SHRD dest, src, count ; double precision shift right
CMOVcc reg, src ; conditional move (huge for branchless code)
SETcc reg8 ; set byte if condition true
CR0 - PE (bit0), PG (bit31), etc.
CR3 - Page directory base (paging)
CR4 - PSE, PAE, PGE, etc.
RAX RBX RCX RDX RSI RDI RBP RSP
R8 R9 R10 R11 R12 R13 R14 R15
RIP
RFLAGS
You now have 16 GP registers — life is beautiful.
- RCX, RDX, R8, R9 = first 4 integer args
- RAX = return
- XMM0-XMM3 = first 4 float args
- Caller cleans stack (no more stdcall headache)
- Callee must preserve: RBX, RBP, RDI, RSI, R12-R15
- Stack aligned to 16 bytes before CALL
- RDI, RSI, RDX, RCX, R8, R9 = first 6 integer args
- RAX = return
- XMM0-XMM7 = first 8 float args
- Callee must preserve: RBX, RBP, R12-R15
- Red zone: 128 bytes below RSP is safe from signal/async interruption
- 64-bit operands (RAX etc.)
- R8-R15
- Extended low-byte registers (SIL, DIL, BPL, SPL)
MOVABS rax, 0xFFFFFFFFFFFFFFFF ; only way to load full 64-bit immediate
LEA rax, [rip + label] ; RIP-relative (position-independent code)
CQO ; sign-extend RAX → RDX:RAX (for 128-bit divide)
CDQE ; sign-extend EAX → RAX
NOP ; short NOP = 90h
Multi-byte NOPs for alignment:
0F 1F 40 00 ; 4-byte NOP
0F 1F 84 00 00 00 00 00 ; 8-byte NOP
lea rax, [rip + symbol] ; default in 64-bit mode (NASM -felf64, GAS .intel_syntax)
mov rax, [rip + symbol]
8 × 64-bit MM0-MM7 registers (aliases of x87 FPU)
XMM0-XMM15 (16 in x86-64)
128-bit registers
MOVDQA xmm, xmm/m128 ; aligned move
MOVDQU xmm, xmm/m128 ; unaligned move
MOVAPS/MOVUPS ; float versions
PADDD xmm, xmm/m128 ; add dwords
PSUBD
PCMPEQD ; compare equal → all 1s or 0s
PCMPGTD ; greater than
PAND, POR, PXOR
MOVD xmm, r32/m32
MOVQ xmm, r64/m64
PEXTRW, PEXTRD, PEXTRQ
PINSRW, PINSRD, PINSRQ
PSLLD, PSRLD, PSRAD ; shifts
PSHUFD xmm, xmm/m128, imm8 ; shuffle
PINSRB/PINSRD/PINSRQ
PEXTRB/PEXTRD/PEXTRQ
PMOVZX, PMOVSX ; zero/sign extend
PCMPESTRI, etc.
YMM0-YMM15 (upper half dirtyable only with VEX)
VMOVAPS, VMOVDQA, etc.
VPADD, VPSUB, etc.
VPBROADCASTD ymm, xmm ; very useful
VPERMD ymm, ymm, ymm ; full permute (AVX2)
ZMM0-ZMM31 (512-bit)
K0-K7 mask registers
EVEX prefix
VPTERNLOGD zmm, zmm, zmm, imm8 ; any boolean operation (magic!)
VPROLD/VPROLQ ; rotate by immediate
VPERMI2D ; full permute
VPCOMPRESSD, VPEXPANDD
VPMOVDB, VPMOVQB, etc. ; downconvert with saturation
- Upper 16 ZMM registers (ZMM16-31) available even without 512-bit support
- APX (Advanced Performance Extensions) — 32 GP registers (R16-R31) + no REX prefix needed
- Coming 2025-2026
Caller-saved (volatile): RAX, RCX, RDX, RSI, RDI, R8-R11, XMM0-5
Callee-saved (non-volatile): RBX, RBP, R12-R15, XMM6-15
Scratch: R10, R11 often used as temps
RAX - return value, often used in multiplication/division
RCX - loop counter (Windows: 1st arg)
RDX - often remainder/output of div
RBX - best callee-saved register (rarely used by compilers)
RBP - optional frame pointer (use -fno-omit-frame-pointer for debugging)
RSP - stack pointer (keep 16-byte aligned!)
R12-R15 - best for holding long-lived values (structs, this pointer, etc.)
R8-R9 - Windows args, otherwise great temps
R10-R11 - pure scratch, compilers love them
XOR EAX, EAX ; 2 bytes, zero RAX, no partial register stall
MOVSXD RAX, ECX ; RAX = sign-extended ECX
LEA RAX, [RIP + symbol]
call rax ; indirect
call symbol
.text
align 64 ; best for Skylake and later
function:
vmovaps zmm... ; will not fault
MFENCE ; full memory barrier
LFENCE ; load fence
SFENCE ; store fence
The AT&T syntax (also called GAS syntax), is the default syntax of GNU assembler (as) used in Linux, macOS, BSD, and every GCC/Clang inline assembly.
Intel syntax (NASM, MASM, Go assembler, etc.) writes the exact same instructions like this:
| AT&T syntax (GAS) | Intel syntax (NASM/MASM/Go) | Meaning |
|---|---|---|
mov %rdx, %r14 |
mov r14, rdx |
r14 ← rdx |
mov %r14, %rdx |
mov rdx, r14 |
rdx ← r14 |
mov (%r14), %eax |
mov eax, [r14] |
eax ← dword at address in r14 (dereference) |
movl $0x6, 0x30(%rsp) |
mov dword [rsp + 0x30], 6 |
store 6 into stack slot 0x30 bytes above rsp |
mov %eax, 0x20(%rsp) |
mov dword [rsp + 0x20], eax |
store eax into stack slot |
lea 0x32c0145(%rip), %rdx |
lea rdx, [rip + 0x32c0145] |
rdx ← address of label/symbol (RIP-relative) |
lea 0x20(%rsp), %rcx |
lea rcx, [rsp + 0x20] |
rcx ← address of stack slot (SIB form) |
| Feature | AT&T (GAS) syntax | Intel syntax | Notes |
|---|---|---|---|
| Source → Destination order | src, dest |
dest, src |
Opposite! |
| Register prefix | %rax |
rax |
|
| Immediate prefix | $0x123 |
0x123 |
|
| Memory reference | 0x30(%rsp) or (%r14) |
[rsp + 0x30] or [r14] |
|
| Size suffix | movb, movw, movl, movq |
none (inferred or explicit byte ptr, etc.) |
movq = 64-bit |
| RIP-relative LEA (default in 64-bit) | symbol(%rip) |
[rip + symbol] |
Both are identical in meaning |
| Comments | # comment or // comment (C99) |
; comment |
All Addressing Modes showing explicitly in both syntaxes (complete list for x86-64):
| Mode | AT&T syntax | Intel syntax | Description |
|---|---|---|---|
| Register | %rax |
rax |
|
| Immediate | $0x123 |
0x123 |
|
| Direct (absolute) | 0x403000 |
[0x403000] |
Rare in 64-bit |
| Register indirect | (%rax) |
[rax] |
|
| Base + displacement | 0x20(%rsp) |
[rsp + 0x20] |
|
| Base + index | (%rax,%rcx) |
[rax + rcx] |
|
| Base + index × scale | (%rax,%rcx,4) |
[rax + rcx*4] |
scale = 1,2,4,8 |
| Base + index × scale + displacement | 0x123(%rax,%rcx,8) |
[rax + rcx*8 + 0x123] |
Full SIB |
| RIP-relative (64-bit only) | symbol(%rip) or 0x123(%rip) |
[rip + symbol] or [rip + 0x123] |
Default & mandatory for PIC code |
| Scaled index only (rare) | (%rcx,8) |
[rcx*8] |
mov (%r14),%eax ; eax = *(uint32_t*)r14
movl $0x6,0x30(%rsp) ; *(uint32_t*)(rsp + 0x30) = 6
mov %eax,0x20(%rsp) ; *(uint32_t*)(rsp + 0x20) = eax
movl $0x6,0x50(%rsp) ; *(uint32_t*)(rsp + 0x50) = 6
lea 0x32c0145(%rip),%rdx ; rdx = current RIP + 0x32c0145 → address of global/static data
lea 0x20(%rsp),%rcx ; rcx = rsp + 0x20 → address of local stack variableThese are absolutely standard in every Linux/macOS binary compiled by GCC/Clang in 2025.
- Linux, macOS, BSD, iOS, Android (NDK), Rust, Zig, Go, LLVM: all use AT&T syntax internally
- Windows (MSVC), NASM, YASM, FASM, most hand-written asm tutorials: use Intel syntax
- Modern compilers always use RIP-relative addressing for global data in 64-bit code (both syntaxes)
- You must be 100% fluent in both syntaxes to read real-world binaries and write high-performance code in 2025.
You now possess the complete lineage and practical mastery of x86 assembly from 1978 to 2025.
The most important truth in 2025:
Modern x86-64 assembly is not slow — it is the fastest way to write code on Earth when you need absolute maximum performance. Hand-written assembly with AVX-512 + APX (coming 2026) beats any compiler on the planet for hot loops.