Research Report — February 2026 Audience: Senior kernel engineers (particularly RISC-V), for architectural comparison
Intel's introduction of hybrid (big.LITTLE-style) x86 CPUs starting with Alder Lake in 2021 created an unprecedented problem in the x86 ecosystem: heterogeneous ISA support across cores within a single package. The P-cores (Golden Cove) supported AVX-512 while the E-cores (Gracemont) did not. Intel's handling of this — disabling AVX-512 across the entire chip — and their multi-year journey toward AVX10 as a solution provides critical lessons for any ISA designer considering heterogeneous core designs. This report covers the technical details, the OS scheduling implications, and comparisons with Arm, Apple, and RISC-V approaches.
AVX-512 was introduced with Knights Landing (2016) and Skylake-SP (2017). By 2020, Intel's client Tiger Lake and Rocket Lake chips supported AVX-512 on all cores. Then came the hybrid architecture.
Architecture: Golden Cove P-cores + Gracemont E-cores
The Problem: Golden Cove P-cores had full AVX-512 execution units (two 256-bit FMA units fusable into one 512-bit unit, same as in Sunny Cove/Willow Cove). Gracemont E-cores had no AVX-512 support whatsoever — they topped out at AVX2 (256-bit).
The Mechanism — Multiple Layers of Disabling:
-
BIOS/Firmware (primary mechanism): On hybrid SKUs with both P-cores and E-cores, the BIOS was configured to not enumerate AVX-512 in CPUID. The OS never sees AVX-512 as available. This was done through microcode/firmware configuration at boot time, not physical fuses.
-
Microcode coordination: The processor's microcode, working with the BIOS, ensured that CPUID reported a consistent ISA across all cores. Since x86 has historically assumed ISA homogeneity, the simplest solution was to report only the intersection of capabilities (i.e., what both P-cores and E-cores support = AVX2).
-
NOT a hard fuse-off: This is a critical distinction. The AVX-512 execution hardware was physically present and functional on Golden Cove P-cores. It was disabled in software/firmware, not blown fuses.
The E-core Disable Hack:
Early Alder Lake adopters discovered that on desktop SKUs (LGA 1700), if you:
- Disabled all E-cores in BIOS
- Used specific BIOS versions from motherboard vendors (notably ASUS, MSI, Gigabyte)
- The system would then report AVX-512 via CPUID and it worked perfectly
This was widely documented and benchmarked. Alder Lake P-cores running AVX-512 performed comparably to Rocket Lake on AVX-512 workloads. Some motherboard vendors even added explicit "AVX-512" toggle options in BIOS (with E-cores auto-disabled).
Intel's Response: Intel explicitly stated this was unsupported. They pushed microcode updates that removed the ability to enable AVX-512 even with E-cores disabled on some platforms, though enforcement was inconsistent. The company's official position was that Alder Lake does not support AVX-512, period.
Architecture: Raptor Cove P-cores + Gracemont E-cores
Raptor Cove is a derivative of Golden Cove, so the P-cores still had AVX-512 execution units physically present. The same firmware-level disabling applied. The E-core disable hack still worked on many boards. Nothing fundamentally changed from Alder Lake regarding AVX-512.
Architecture: Lion Cove P-cores + Skymont E-cores
Key Change: Skymont E-cores gained significant SIMD improvements but still did not support AVX-512 with 512-bit vector widths. However, Skymont did gain support for many AVX-512 instructions at 256-bit width (the EVEX encoding, masking registers k0-k7, and many AVX-512 sub-extensions). This was essentially the precursor to the AVX10.1/256 concept.
Lion Cove P-cores retained full AVX-512 execution units internally. However, AVX-512 (512-bit) was still disabled at the platform level. The official ISA extensions listed for Arrow Lake are: SSE4.1, SSE4.2, AVX2.
The Lunar Lake Wikipedia page confirms extensions listed as "SSE4.1, SSE4.2, AVX2" — same story.
Important nuance: While the marketing says "AVX2," both Lion Cove and Skymont actually support a significant subset of AVX-512 instructions at 128/256-bit widths via EVEX encoding. Intel calls this AVX10.1/256 internally, but didn't market it as such for Arrow Lake desktop.
Architecture: Lion Cove P-cores + Skymont E-cores (mobile-only, 4P+4E)
Same ISA situation as Arrow Lake. AVX-512 at 512-bit widths disabled. Both core types support the AVX-512 instruction encodings at 256-bit width.
| Generation | P-core | E-core | AVX-512 HW in P-core? | AVX-512 HW in E-core? | AVX-512 Enabled? | Mechanism |
|---|---|---|---|---|---|---|
| Alder Lake (12th Gen, Nov 2021) | Golden Cove | Gracemont | Yes (full 512-bit) | No | No (hackable) | BIOS/microcode CPUID masking |
| Raptor Lake (13th Gen, Oct 2022) | Raptor Cove | Gracemont | Yes (full 512-bit) | No | No (hackable) | BIOS/microcode CPUID masking |
| Arrow Lake (Core Ultra 200S, Oct 2024) | Lion Cove | Skymont | Yes (full 512-bit) | 256-bit EVEX only | No | BIOS/microcode; E-cores have partial support |
| Lunar Lake (Core Ultra 200V, Sep 2024) | Lion Cove | Skymont | Yes (full 512-bit) | 256-bit EVEX only | No | Same as Arrow Lake |
The core problem: x86 has no mechanism for an OS to schedule threads based on ISA capabilities. CPUID is global — it describes the processor, not individual cores. When cores have different capabilities, the only options are:
- Report the intersection (lose features) — what Intel did
- Report the union and let software crash on the wrong core — unacceptable
- Build OS awareness of per-core ISA — massive ecosystem change
- Make all cores support the same ISA — what AVX10 aims for
Announced: July 2023 Key Design Principles:
-
Versioned ISA: Instead of the combinatorial explosion of AVX-512 sub-extensions (AVX-512F, AVX-512BW, AVX-512DQ, AVX-512VL, AVX-512VNNI, AVX-512VBMI, AVX-512VBMI2, AVX-512BITALG, AVX-512VPOPCNTDQ, AVX-512FP16, AVX-512BF16, AVX-512IFMA, etc.), AVX10 introduces a single version number. AVX10.1 includes a specific fixed set of instructions equivalent to the union of extensions found in Sapphire Rapids.
-
Width specification: AVX10 comes in two flavors:
- AVX10.1/256: All AVX10.1 instructions at up to 256-bit vector width. Uses EVEX encoding, 32 vector registers (ZMM0-31 accessible as YMM/XMM), 8 mask registers. This is what E-cores can support.
- AVX10.1/512: All AVX10.1 instructions at up to 512-bit vector width. This is what P-cores (and server chips) support.
-
CPUID enumeration: New CPUID leaf specifically for AVX10. Reports: (a) AVX10 version number, (b) maximum supported vector width (256 or 512).
-
No new instructions over AVX-512: AVX10.1 is a reorganization, not an extension. It defines a convergence point.
Initial specification note: In early drafts (2023), 512-bit support was optional. Intel later revised this to make 512-bit mandatory for AVX10-capable processors, with the intent to bring 512-bit support to E-cores as well. This was a significant policy reversal driven by developer feedback — the community pushed back hard against fragmenting the width.
Published: Late 2024 (specification updates)
AVX10.2 adds new instructions on top of AVX10.1:
- YMM-embedded rounding and SAE (Suppress All Exceptions): Previously, embedded rounding and SAE were only available for 512-bit instructions. AVX10.2 extends this to 256-bit operations — a significant usability improvement for code that wants to control rounding mode without modifying MXCSR.
- New minmax instructions for packed floating-point
- Saturating conversion instructions (e.g., convert FP to integer with saturation)
- BF16 and FP16 enhancements
- Media-new instructions for video encode/decode acceleration
The theory:
- Both P-cores and E-cores implement AVX10.1 (at minimum)
- P-cores implement AVX10.1/512 (or higher)
- E-cores implement AVX10.1/256 (initially) → eventually AVX10.1/512
- CPUID reports the minimum supported width across all cores
- Software that uses AVX10.1/256 can run on any core
- Software that uses AVX10.1/512 needs to be pinned to P-cores (or all cores need /512)
Intel's stated goal (as of late 2024/2025): Bring 512-bit execution to E-cores so that the entire chip can report AVX10/512 uniformly. This eliminates the scheduling problem entirely.
| CPU | Expected AVX10 | Vector Width | Status |
|---|---|---|---|
| Granite Rapids (server) | AVX10.1/512 (effectively, via AVX-512) | 512-bit | Shipping |
| Arrow Lake (client) | AVX10.1/256 (de facto, not marketed) | 256-bit | Shipping |
| Lunar Lake (client) | AVX10.1/256 (de facto) | 256-bit | Shipping |
| Panther Lake (client) | AVX10.1/256 likely | 256-bit | Launched Jan 2026 |
| Nova Lake (client) | AVX10.x/512 (goal) | 512-bit all cores? | TBD ~2027 |
| Diamond Rapids (server) | AVX10.2/512 | 512-bit | Expected 2025-2026 |
Note on Panther Lake: The Wikipedia listing for Panther Lake shows extensions as "SSE4, AVX, AVX2, AVX-VNNI, AVX-IFMA" — no explicit AVX10 or AVX-512 mention. The Cougar Cove P-cores and Darkmont E-cores likely support AVX10.1/256 internally. The E-cores (Darkmont) are an evolution of Skymont, which already had EVEX encoding support at 256-bit.
The Linux kernel scheduler (CFS/EEVDF) fundamentally assumes all CPUs in a system can execute the same instructions. There is no mechanism in struct rq, struct task_struct, or the load balancing code to say "this task uses AVX-512 and can only run on cores 0-7."
When a process executes an instruction not supported by the current core, the CPU generates a #UD (undefined instruction) exception, which Linux delivers as SIGILL. There is no trap-and-migrate mechanism.
Hardware Feedback Interface (HFI) is an Intel hardware feature (introduced with Alder Lake) that provides per-core performance and energy efficiency hints to the OS via a shared memory table.
What HFI provides:
- Per-core performance capability (0-255 scale)
- Per-core energy efficiency (0-255 scale)
- These values are dynamic — they change based on thermal conditions, power limits, and workload characteristics
- Updated by hardware via an interrupt when values change
Intel Thread Director (ITD) is the hardware classification engine that feeds into HFI:
- Monitors instruction mix per-thread in hardware
- Classifies workloads into categories (scalar, vectorized, FP-heavy, etc.)
- Adjusts HFI hints based on which core type would be best for the current workload
Linux support:
- HFI driver merged in Linux 5.18 (2022)
intel_hfidriver exposes performance/efficiency data- The
intel_pstatedriver uses HFI data for frequency scaling - The scheduler uses it via the Energy-Aware Scheduling (EAS) framework and Preferred Core ranking
What HFI/ITD does NOT do:
- Does not expose per-core ISA capabilities. HFI says "core 4 is 80% as performant as core 0" but does NOT say "core 4 lacks AVX-512."
- Does not prevent scheduling a task to an incompatible core. If AVX-512 were somehow enabled on P-cores only, the kernel would happily migrate an AVX-512 task to an E-core, resulting in SIGILL.
- Does not understand instruction-level requirements. ITD classifies workload type but doesn't enforce ISA compatibility.
Given the above, Intel had no viable option but to disable AVX-512 on hybrid chips:
- No per-core CPUID: CPUID is architecturally defined as returning the same value on all cores. Changing this would break a fundamental x86 assumption.
- No ISA-aware scheduling in Linux (or Windows): Neither OS had any mechanism to restrict tasks to cores based on instruction usage. Building this would require:
- Hardware trapping of unsupported instructions (not just #UD but with migration capability)
- Kernel support for ISA-capability-based affinity
- Userspace ABI changes
- Binary compatibility nightmare: Existing AVX-512 binaries couldn't know they needed core pinning. Library code (glibc, OpenSSL, etc.) uses runtime CPUID detection — if CPUID says AVX-512, code uses it everywhere.
Approach A: Trap-and-migrate
- E-core encounters AVX-512 instruction → #UD → kernel catches it → migrates task to P-core → resumes
- Problems: High latency for migration, complex state management, AVX-512 state (ZMM16-31, opmask registers) doesn't exist on E-core so can't be saved, instruction may be in a tight loop causing constant migrations
Approach B: Per-core CPUID
- Let CPUID return different values on different cores
- Problems: Breaks every piece of x86 software that caches CPUID results at startup. Glibc does this. Every JIT compiler does this. Total ecosystem breakage.
Approach C: ISA-affinity cpumask
- Extend task_struct with ISA requirements, auto-detected from instruction usage
- Problems: Detection requires decode of the instruction stream or runtime tracking. Massive overhead. False negatives (code that conditionally uses AVX-512).
Windows 11 added hybrid-aware scheduling specifically for Alder Lake:
- Thread Director hints feed into the Windows scheduler
- Windows uses a "Heterogeneous Policy" to classify threads (Efficiency, Neutral, Performance)
- But Windows similarly does NOT have ISA-aware scheduling — it relies on Intel disabling AVX-512
Key insight: Arm mandates ISA homogeneity across all cores in a big.LITTLE or DynamIQ cluster. A Cortex-A720 (big) and a Cortex-A520 (little) in the same SoC must support exactly the same ISA, including SIMD extensions (NEON/ASIMD, SVE if present).
How Arm achieves this:
- The Arm Architecture Reference Manual specifies the ISA independently of microarchitecture
- All cores implement the same Armv9.x profile
- NEON/ASIMD is mandatory for all AArch64 implementations
- SVE/SVE2 are optional, but if present, must be on ALL cores (or none)
- The vector length may differ between big and little cores, but the instruction set is the same
- SVE's vector-length-agnostic programming model means code works regardless of width
SVE heterogeneous vector length handling: When big and little cores have different SVE vector lengths, the Linux kernel handles this gracefully:
-
Intersection, not minimum: At boot, each core probes its full set of supported SVE vector lengths (e.g., CPU0: {128, 256, 512}, CPU1: {128, 256}). The kernel computes the intersection of all cores' sets (→ {128, 256}). Userspace can choose from any VL in this intersection, not just the minimum. (
fpsimd.c: vec_update_vq_map()usesbitmap_and()) -
Per-task VL via ZCR_EL1: The kernel writes
ZCR_EL1on every context switch to set the current task's requested VL. Different processes can use different vector lengths simultaneously (e.g., process A uses VL=256, process B uses VL=128), as long as they are within the intersection. This is not a one-time system-wide setting. -
Userspace control: Processes request a specific VL via
prctl(PR_SVE_SET_VL, ...). The requested VL is rounded down to the nearest supported VL in the intersection. -
Hot-plug safety: If a CPU comes online late and doesn't support a VL already committed in the intersection, that CPU is killed (
cpu_die_early()). The system continues running; only the non-conforming CPU is excluded.
This is directly relevant to RISC-V: RISC-V currently lacks a ZCR_EL1 equivalent - a CSR writable by S-mode to limit the effective VLEN for U-mode. Adding such a CSR would enable the same per-task, intersection-based approach for heterogeneous VLEN cores.
Scheduling: The Linux EAS (Energy-Aware Scheduling) framework handles Arm big.LITTLE purely as a performance/power optimization. No ISA concerns.
Lesson: Arm solved this problem at two levels: ISA homogeneity as an architectural requirement, and ZCR_EL1 as a hardware mechanism for the kernel to manage vector length heterogeneity per-task.
Apple's M1-M4 chips use Arm's DynamIQ with performance (Firestorm/Avalanche/Everest) and efficiency (Icestorm/Blizzard/Sawtooth) cores. Following Arm's mandate:
- All cores support the same ISA (NEON, AMX is separate accelerator)
- Both P-cores and E-cores support the same SIMD width (128-bit NEON)
- Apple's AMX (matrix coprocessor) is accessible from any core via system registers
- macOS scheduler uses QoS classes to route threads, but any thread can run on any core
Key difference from Intel: Apple never had to sacrifice SIMD capability for heterogeneity because Arm's NEON is compact enough to implement everywhere.
RISC-V has the most interesting situation because its modular ISA design explicitly allows heterogeneity:
- V extension (vector): Optional. Different cores could theoretically have different VLEN.
- Discovery: RISC-V uses device tree or ACPI to describe per-hart capabilities. Unlike x86, per-core ISA discovery is architecturally supported.
- The misa CSR and various extension CSRs can differ per hart.
Current Linux RISC-V approach:
- The kernel builds an "ISA string" for each hart from device tree
riscv_isa_extension_check()tests per-hart capabilities- The hwprobe syscall allows userspace to query what extensions are available
- But: The scheduler does not yet use ISA information for placement decisions
The RISC-V opportunity:
- RISC-V could implement ISA-aware scheduling because the architecture was designed for heterogeneity
- Device tree naturally describes per-hart capabilities
- The hwprobe() interface could be extended to expose which harts support which extensions
- A
cpumaskper ISA-extension could enable affinity-based scheduling - RVV's vector-length-agnostic model (like SVE) means code can run on cores with different VLEN
Open question for RISC-V designers: If you build a big.LITTLE RISC-V SoC where big cores have RVV 1.0 and little cores don't, do you:
- Follow Arm's lead and mandate ISA homogeneity? (Simplest)
- Build ISA-aware scheduling infrastructure in Linux? (Most flexible but hard)
- Follow Intel's approach and report only the intersection? (Wastes capability)
| Aspect | Intel x86 | Arm big.LITTLE | Apple M-series | RISC-V |
|---|---|---|---|---|
| ISA homogeneity required? | No (by accident) | Yes (by design) | Yes (inherits Arm) | No (by design) |
| Per-core ISA discovery | No (CPUID is global) | N/A (all same) | N/A | Yes (device tree) |
| Scheduling ISA-aware? | No | N/A | N/A | Not yet, but possible |
| SIMD width heterogeneity | Yes (problem) | SVE lengths can vary | No | VLEN can vary (VLA helps) |
| Solution to heterogeneity | Disable to intersection | Don't allow | Don't allow | TBD |
From the Wikipedia article and Chips and Cheese ITT 2025 coverage:
Architecture:
- P-cores: Cougar Cove (evolution of Lion Cove)
- E-cores: Darkmont (evolution of Skymont)
- LP E-cores: Darkmont (low-power variant)
- Process: Intel 18A (compute tile), Intel 3 / TSMC N3E (GPU tile), TSMC N6 (platform controller)
Configurations:
- Low-power: 4 P-cores + 4 LP E-cores
- Mid: 4P + 8E + 4LP (16 cores total)
- High-end: 4P + 8E + 4LP + larger GPU
SIMD/AVX status: Listed extensions are "SSE4, AVX, AVX2, AVX-VNNI, AVX-IFMA, AES-NI, SHA-NI" — no AVX-512 or explicit AVX10. This strongly suggests:
- Cougar Cove P-cores likely still have AVX-512 execution hardware (evolutionary from Lion Cove)
- Darkmont E-cores support AVX-512 instruction encodings at 256-bit (like Skymont)
- But the platform still does NOT advertise 512-bit support
- De facto AVX10.1/256
Key observation: Intel has been shipping AVX10.1/256-equivalent hardware since Arrow Lake (Oct 2024) without marketing it as AVX10. The new EVEX encoding features, mask registers, and many AVX-512 instruction subsets are available at 256-bit across all cores. They just call it "AVX2" in marketing materials.
Very limited information available. Nova Lake is expected to succeed Panther Lake for client:
- Likely to use next-generation P-cores and E-cores
- This is where Intel may finally bring 512-bit execution to E-cores, enabling AVX10/512 across the entire chip
- Would be the first hybrid Intel client chip to officially support 512-bit vectors
- But this is speculative — no confirmed details
- Server counterpart, P-core only (no E-cores)
- Expected to support AVX10.2/512
- No heterogeneity problem since server parts don't use E-cores
- Interesting case: all E-cores (Darkmont-based)
- Will likely support AVX10.1/256 (since all cores are the same type)
- ISA-homogeneous by being all-E-core
This is one of the most discussed but least precisely answered questions. Some data points:
AVX-512 execution unit area estimates:
The AVX-512 execution hardware primarily consists of:
- 512-bit FMA units: Two 256-bit FMA units that can fuse into one 512-bit unit (on Golden Cove / Raptor Cove / Lion Cove)
- 512-bit shuffle/permute networks
- Extended register file: ZMM0-ZMM31 (32 × 512-bit = 2KB just for architectural state, much more for physical register file with rename)
- Mask register file: k0-k7 (8 × 64-bit)
Estimates from die analysis and industry sources:
-
Chips and Cheese / die shot analysis: The vector execution units (including AVX-512 support) are estimated to occupy roughly 10-15% of the P-core area. A Golden Cove P-core is approximately 3.5-4mm² on Intel 7. The vector/SIMD portion is perhaps 0.4-0.6mm² per core.
-
The incremental cost argument: The more relevant question is not "how much area does the vector unit take" but "how much extra area does 512-bit support cost over 256-bit AVX2?" The incremental cost of widening from 256-bit to 512-bit is:
- Wider datapaths (doubled in the fusion case)
- Larger physical register file (more rename registers, each twice as wide)
- Wider shuffle/permute networks
- Estimated incremental cost: ~5-8% of P-core area
-
For a typical Alder Lake die (8P+8E):
- 8 P-cores × ~0.5mm² wasted AVX-512 area ≈ 4mm²
- Total die area: ~215mm²
- Wasted area: ~1.5-2% of total die
-
For Arrow Lake (Lion Cove P-cores):
- Lion Cove is designed for AVX-512. The 512-bit datapath is integral to the core design.
- Removing it would save area but would require a different core design
- The 6 P-cores × perhaps 0.4mm² (on TSMC N3) ≈ 2.4mm² on the compute tile
- Compute tile is estimated at ~50-60mm² — so ~4-5% of compute tile area is "wasted" AVX-512 capability
The area argument is somewhat misleading. The more significant costs of unused AVX-512 hardware are:
- Leakage power: Transistors leak even when unused. The AVX-512 units contribute to idle power draw.
- Design complexity: Supporting AVX-512 in the microarchitecture constrains other design decisions (pipeline width, register file organization).
- Validation cost: The AVX-512 execution paths must be validated even though they're disabled on client parts.
- Opportunity cost: The silicon could have been used for more cache, wider issue, or other features.
Intel uses the same P-core design across client and server products:
- Lion Cove appears in both Arrow Lake (client, AVX-512 disabled) and Granite Rapids (server, AVX-512 enabled)
- Designing two variants of the core — one with and one without AVX-512 — would double the design and validation cost
- The economics favor a single core design with runtime disable
This is the same logic as GPU CU disabling, PCIe lane disabling, etc. — ship one die, harvest defects and segment products through configuration.
-
Assumed ISA homogeneity would persist: x86's CPUID was never designed for per-core ISA variation. This architectural debt made hybrid designs painful.
-
No OS interface for ISA-aware scheduling: Neither the hardware (CPUID) nor the OS (scheduler) had mechanisms to handle ISA differences. Intel had to disable features rather than expose them selectively.
-
Customer expectations vs. hardware capability: Users buying P-core silicon with AVX-512 hardware but unable to use it caused significant backlash.
-
Multi-year gap between problem and solution: Alder Lake shipped in Nov 2021. AVX10 was announced in July 2023. Actual universal 512-bit support across all cores may not arrive until ~2027 (Nova Lake). That's a 6-year gap.
-
Design per-hart ISA discovery from day one ✓ (already done — device tree, hwprobe)
-
Consider mandating ISA homogeneity for SMP systems — Arm's approach is simpler and avoids the problem entirely. The RISC-V Profiles specification moves in this direction by defining mandatory extension sets.
-
If allowing ISA heterogeneity, build scheduling infrastructure early:
- Extend hwprobe() to support per-CPU queries
- Implement ISA-aware cpumask constraints in the scheduler
- Define trap-and-migrate behavior for unsupported extensions
- Consider making the V extension's VLEN the only permitted axis of variation
-
Vector-length agnosticism is key: Both RVV and SVE use VLA (vector-length agnostic) programming. This means code works correctly regardless of VLEN, even if performance varies. This is far superior to x86's fixed-width model where 512-bit code simply doesn't work on a 256-bit core.
-
Don't ship disabled hardware for years: If E-cores can't do wide vectors, either don't put wide vector units in P-cores, or invest in making E-cores capable. Intel's multi-year gap of shipping (and paying for) unused silicon was economically suboptimal.
| Date | Event |
|---|---|
| 2016 | AVX-512 ships in Knights Landing |
| 2017 | AVX-512 in Skylake-SP server |
| 2020 | AVX-512 in Tiger Lake client (all cores identical) |
| 2021 Q1 | AVX-512 in Rocket Lake client (all cores identical, last hurrah) |
| 2021 Q4 | Alder Lake ships — AVX-512 disabled on hybrid parts |
| 2022 Q4 | Raptor Lake ships — same situation |
| 2023 Jul | Intel announces AVX10 specification |
| 2024 Q3 | Lunar Lake ships — Lion Cove + Skymont, AVX10.1/256 de facto |
| 2024 Q4 | Arrow Lake desktop ships — same de facto AVX10.1/256 |
| 2024 | AVX10.2 specification published |
| 2026 Q1 | Panther Lake ships — Cougar Cove + Darkmont, still AVX10.1/256 |
| ~2025-26 | Diamond Rapids server expected — AVX10.2/512 |
| ~2027-28 | Nova Lake client — potential AVX10/512 on all cores (speculative) |
- Wikipedia: AVX-512, Advanced Vector Extensions (AVX10 section), Alder Lake, Arrow Lake, Lunar Lake, Panther Lake articles
- Chips and Cheese: "Panther Lake's Reveal at ITT 2025" (Oct 2025), "Interviewing Intel's Chief Architect of x86 Cores"
- Intel Architecture Instruction Set Extensions Programming Reference (AVX10 specification documents)
- Linux kernel source:
arch/x86/kernel/cpu/,drivers/thermal/intel/intel_hfi.c - Linux kernel HFI/ITD support: merged in 5.18+ (intel_hfi driver)
- RISC-V ISA specification: Volume I (unprivileged), Vector Extension 1.0
- Various community documentation on Alder Lake AVX-512 enable hacks (2021-2022)
Report compiled February 2026. Information about unreleased products (Nova Lake, Diamond Rapids) is based on available leaks and Intel public statements; details may change.