Nathan Moinvaziri nmoinvaz

## zlib-ng-inflate-safe-mode-benchmark.md

      
              1 file
            
          
              0 forks
            
          
                0 comments
              
            
              0 stars
            
          
                nmoinvaz
                / zlib-ng-inflate-safe-mode-benchmark.md
            
            
              Created
              March 10, 2026 19:08
            
              
                zlib-ng: inflate_fast safe mode benchmark — small output buffer performance
              
          
    zlib-ng: inflate_fast safe mode benchmark results

Summary

Adding a safe_mode parameter to inflate_fast() allows the fast path to run with
as few as 3 bytes of avail_out (down from 260). This eliminates the performance
cliff where PNG-style row-by-row decompression falls back to the slow inflate()
state-machine path for the last 260 bytes of each row.
Related: zlib-ng/zlib-ng#2062

  
## zlib-ng-vpclmulqdq-avx2-benchmarks.md

      
              1 file
            
          
              0 forks
            
          
                0 comments
              
            
              0 stars
            
          
                nmoinvaz
                / zlib-ng-vpclmulqdq-avx2-benchmarks.md
            
            
              Last active
              March 9, 2026 07:43
            
              
                zlib-ng: VPCLMULQDQ AVX2 vs PCLMULQDQ CRC32 benchmark results on Intel i7-1185G7
              
          
    zlib-ng: VPCLMULQDQ AVX2 vs PCLMULQDQ CRC32 Benchmark

Machine


CPU: 11th Gen Intel Core i7-1185G7 @ 3.00GHz (Tiger Lake)
Cores: 4 cores / 8 threads
L1d/L1i: 48 KiB / 32 KiB (x4)
L2: 1280 KiB (x4)
L3: 12288 KiB
OS: Windows 11 Pro


## zlib-ng-pr-2176-opt.md

      
              1 file
            
          
              0 forks
            
          
                0 comments
              
            
              0 stars
            
          
                nmoinvaz
                / zlib-ng-pr-2176-opt.md
            
            
              Last active
              March 7, 2026 00:02
            
              
                zlib-ng: CRC32 ARMv8 PMULL+EOR3 copy optimization
              
          
    zlib-ng: CRC32 ARMv8 PMULL+EOR3 copy optimization

Summary

Replace memcpy calls in the CRC32+copy interleaved path with direct NEON stores
(vst1q_u64) of already-loaded vectors, and direct scalar stores of already-loaded
uint64_t values. This eliminates redundant load/store sequences that the compiler
generated for memcpy when the source data was already in registers.
Additionally, reorder the vector loop so that stores happen before eor3 operations,

  
## zlibng-vs-zlibrs-benchmark.md

      
              1 file
            
          
              0 forks
            
          
                0 comments
              
            
              0 stars
            
          
                nmoinvaz
                / zlibng-vs-zlibrs-benchmark.md
            
            
              Last active
              February 26, 2026 20:28
            
              
                zlib-ng vs zlib-rs benchmark comparison on Apple M3 (ARM64)
              
          
    zlib-ng vs zlib-rs Benchmark Comparison (ARM64, Apple M3)

Machine Specs


CPU: Apple M3 (8 cores)
RAM: 24 GB
OS: Darwin 24.6.0 arm64 (macOS Sequoia)
Compiler: Apple clang 17.0.0 (clang-1700.6.3.2)
Rust: rustc 1.93.1 (01f6ddf75 2026-02-11)


## top_senders_100.py
import argparse
import imaplib
import email.utils
import sys
from collections import Counter

from rich.console import Console
from rich.table import Table
from rich.progress import Progress, BarColumn, TextColumn, TimeRemainingColumn, MofNCompleteColumn

## zlib-ng-crc32-arm-copy-benchmarks.md

      
              1 file
            
          
              0 forks
            
          
                0 comments
              
            
              0 stars
            
          
                nmoinvaz
                / zlib-ng-crc32-arm-copy-benchmarks.md
            
            
              Last active
              February 24, 2026 18:02
            
              
                zlib-ng: CRC32 ARM interleaved copy benchmark results (Apple M3)
              
          
    zlib-ng: CRC32 ARM Interleaved Copy Benchmark Results

Comparison


Baseline: develop @ 54352daf (Make extra length/distance bits computation branchless)
Contender: improvements/crc32-arm-copy @ b4043c6f (Implement crc32 interleaved copy for ARM PMULL+EOR3)
Repetitions: 5 per benchmark, aggregates only

Machine


## zlib-ng-CLAUDE.md

      
              1 file
            
          
              0 forks
            
          
                0 comments
              
            
              0 stars
            
          
                nmoinvaz
                / zlib-ng-CLAUDE.md
            
            
              Last active
              February 28, 2026 01:17
            
              
                zlib-ng CLAUDE.md
              
          
    Project Basics


Use CMake build system.
Always check the commits for HEAD and BASE or other branch names as they can change often.
To build for other architectures than the current architecture use llvm-clang unless gcc is specified.

Key Directories


arch/ - Architecture specific optimizations
test/ - Unit tests written using Google Test Framework (gtest_zlib project)


## crc32-arm-copy-benchmarks.md

      
              1 file
            
          
              0 forks
            
          
                1 comment
              
            
              0 stars
            
          
                nmoinvaz
                / crc32-arm-copy-benchmarks.md
            
            
              Last active
              February 24, 2026 04:49
            
              
                Zlib-ng benchmark: crc32_armv8_pmull_eor3 — improvements/crc32-arm-copy vs develop
              
          
    Benchmark: improvements/crc32-arm-copy vs develop

Date: 2026-02-23
Platform: Apple Silicon (ARM64), 8 cores, L1D 64 KiB, L2 4096 KiB
Build: CMake Release, static libs
Repetitions: 5 (median CPU time reported)
crc32/armv8_pmull_eor3 (CRC32 only)

| Size | develop (ns) | feature (ns) | Change |

  
## benchmark_compress_results.md

      
              1 file
            
          
              0 forks
            
          
                0 comments
              
            
              0 stars
            
          
                nmoinvaz
                / benchmark_compress_results.md
            
            
              Created
              February 21, 2026 00:19
            
              
                zlib-ng compress benchmark: improvements/tally-v2 vs develop
              
          
    Compress Benchmark: HEAD (improvements/tally-v2) vs develop

Environment


Platform: macOS Darwin 24.6.0, Apple Silicon (ARM64)
CPU: 8 cores, L1D 64 KiB, L1I 128 KiB, L2 4096 KiB
Build: CMake Release, static libs

Commits


HEAD (improvements/tally-v2): c51ce99e — Combine extra_lbits/base_length and extra_dbits/base_dist lookup tables
develop: 1b880ba9 — Make extra length/distance bits computation branchless using bit masking


## compress_block_bi_buf_register_optimization.md

      
              1 file
            
          
              0 forks
            
          
                0 comments
              
            
              0 stars
            
          
                nmoinvaz
                / compress_block_bi_buf_register_optimization.md
            
            
              Last active
              February 19, 2026 03:25
            
              
                Zlib-ng PR 2167 analysis
              
          
    Assembly Analysis: Keep bi_buf/bi_valid in Registers Across compress_block

Change

Hoist s->bi_buf and s->bi_valid into local variables in compress_block() and pass them by pointer to the emit functions. This eliminates redundant load/store pairs between zng_emit_lit and zng_emit_dist calls within the main compression loop.
Results

bi_buf/bi_valid Memory Operations (offsets 168/176 from deflate_state*)
	import argparse
	import imaplib
	import email.utils
	import sys
	from collections import Counter

	from rich.console import Console
	from rich.table import Table
	from rich.progress import Progress, BarColumn, TextColumn, TimeRemainingColumn, MofNCompleteColumn