Skip to content

Instantly share code, notes, and snippets.

@vassvik
Created February 18, 2026 21:01
Show Gist options
  • Select an option

  • Save vassvik/0205451ac3faec48f6f42115322798eb to your computer and use it in GitHub Desktop.

Select an option

Save vassvik/0205451ac3faec48f6f42115322798eb to your computer and use it in GitHub Desktop.
EmberGen 1.2 Feature & Algorithm Reference

EmberGen 1.2 — Feature & Algorithm Reference

This document describes what EmberGen 1.2 does at a feature/algorithm level, intended as a reproduction guide for reimplementing these features in the sparse rewrite. It is organized by system, not by UI node.


1. Simulation Overview

EmberGen simulates incompressible volumetric fluid on a uniform 3D grid (32–512 voxels per side). Two simulation modes exist: Combustion (smoke + temperature + fuel + flames, 4 channels) and Colored Smoke (smoke density + RGB color, 4 channels). The simulation runs entirely on the GPU via compute shaders.

Per-Frame Step Order

1. Advection              — transport all fields through velocity
2. Injection & Combustion — emitter SDF masking, source terms, reaction model
3. Diffusion              — viscous spreading of fields
4. Vorticity              — compute curl, apply confinement force
5. Pressure Projection    — enforces incompressibility (sub-steps below):
   5a. Collision Prep     — build solid boundary masks from SDFs
   5b. Hourglass Filter   — suppress checkerboard instability
   5c. Divergence         — compute ∇·v + extra combustion divergence
   5d. Multigrid V-Cycle  — Poisson pressure solve
   5e. Gradient Subtract  — v -= ∇p
   5f. (Optional second hourglass filter)
   (Steps 5c–5e repeat for multiple projection iterations)
6. GPU Particles          — spawn, compact, physics, sort

All time scaling is normalized to a 60 FPS reference (forces are multiplied by dt * 60).


2. Advection

Transports all fields (velocity, smoke, temperature, fuel, flames) through the velocity field using semi-Lagrangian backward tracing.

Backtracing Accuracy

Three modes, selected at compile time:

  • RK1 (1st order): Trace back by pos - v(pos) * dt
  • RK2 (2nd order): Midpoint/Heun's method — sample velocity at the RK1 destination, average with the original
  • RK4 (4th order): Full 4-stage Runge-Kutta

Interpolation

Velocity and voxel data use C0-continuous cubic interpolation on a 4×4×4 stencil. The weights are:

wm = -1/6 * (t-2)(t-1)t
w0 = +1/2 * (t-2)(t-1)(t+1)
w1 = -1/2 * (t-2)t(t+1)
w2 = +1/6 * (t-1)t(t+1)

This is explicitly not Catmull-Rom — it is a C0 (not C1) cubic that exactly resamples any cubic polynomial. The code notes it is more numerically stable than Catmull-Rom and does not require clamping to remain stable.

Per-channel interpolation in combustion mode: Only smoke (X) and flames (W) use cubic interpolation. Temperature (Y) and fuel (Z) fall back to hardware trilinear.

A fallback path uses hardware trilinear (texture()) when the shared memory optimization cannot fit the data.

Optional Clamping

After cubic interpolation, an optional min/max clamp against the local 2×2×2 trilinear neighborhood can be applied. This is the BFECC/MacCormack limiter concept applied as a simple spatial clamp — prevents overshoot artifacts from cubic interpolation.

Boundary Masking

A smoothstep(0.0, 0.05, dist) mask is applied when backtracing samples land outside the volume, preventing velocity source accumulation at boundaries.

Shared Memory Optimization

The shader cooperatively pre-loads the required 4×4×4 neighborhoods for the entire 8×8×8 workgroup into shared memory (fp16 packed as uvec2, two arrays of 4096 uints = 32,760 bytes). It first computes the bounding box of all backtrace destinations across the workgroup using atomicMin/Max. If this fits within 4092 voxels, all samples come from shared memory instead of global texture fetches. If not, it falls back to texelFetch.

Upscaling Support

The velocity field can run at coarser resolution than the voxel data (1/2×, 1/3×, 1/4×). The injection shader handles the resolution mismatch with an upscale factor.


3. Injection

After advection, each voxel evaluates every emitter's SDF mask and injects source terms. This is where new material enters the simulation.

Emitter SDF Masking

Each emitter references a shape (see Section 8). The GPU evaluates the emitter's SDF postfix stack per-voxel to determine an injection weight. The SDF distance is mapped to weight via the emission smoothness parameter — small smoothness = sharp boundary, large = diffuse falloff.

Injection Modes

Each emitter has a tightness parameter that blends between two injection modes:

  • Rate-based (tightness=0): adds rate × dt each frame
  • Replace (tightness=1): forces the value toward the target strength
  • Intermediate values interpolate: mask * min(max_voxel - current, lerp(rate*dt, strength-current, tightness))

Emitter Divergence

Emitters can inject divergence (expansion/contraction pressure) in addition to density/temperature/fuel. The divergence injection supports optional procedural noise randomization via 4D noise:

rnd = noise_4D(scaledPos, seed + time × speed) × 2.0
divergenceScale = lerp(1.0, rnd, noise_intensity)
extraDivergence += divergence_rate × mask × divergenceScale

4. Combustion Model

Based on the Intel GPU Gems combustion model. Runs in this order per voxel:

  1. Dissipation: All quantities decay exponentially. Temperature uses lerp(0, T, cooling_factor).

  2. Injection: Source terms from emitters are added.

  3. Oxygen model: oxygen = 1 - Fuel - Smoke - flames_weight*Flames + Fuel*OIF, clamped to [0,1]. The OIF (Oxygen In Fuel) term models pre-mixed fuel.

  4. Extinction (flames → smoke): When temperature drops below the extinction threshold, flames convert to smoke. Two blendable models:

    • Step: Hard threshold at extinction_smoke_temperature
    • Gaussian: exp(-(T/T_ext)²) — smooth gradual transition
    • ΔFlames = -flames_loss × temp_dep × Flames × dt × 60
    • ΔSmoke = +smoke_gain × temp_dep × Flames × dt × 60
  5. Combustion (fuel → flames): When temperature exceeds ignition threshold, fuel burns. Two blendable models:

    • Step: Hard threshold at ignition_temperature
    • Arrhenius: exp(-T_ignition / max(T, 0.01)) — physically-based exponential rate
    • combustionChange = temp_dep × min(Fuel, oxygen) × dt × 60
    • ΔFuel = -fuel_loss × combustionChange
    • ΔFlames = +flames_gain × combustionChange
    • ΔTemp = +temperature_gain × combustionChange
  6. Gas release divergence: additionalDivergence = gas_release × combustionChange — written to an Extra_Divergence texture that feeds into the pressure solve. This is what makes fire expand outward.

Colored Smoke Mode

Replaces the combustion model entirely with simple smoke + RGB color blending: Color = lerp(old_color, new_color, new_smoke / (old + new_smoke)). No temperature, fuel, or flames channels exist in this mode.

Buoyancy

Applied after injection as a velocity impulse in +Z:

plus  = (temperature / 1000) × buoyancy        // hot gas rises
minus = lerp(smoke_weight×smoke, fuel_weight×fuel, fuel_fraction)  // heavy material sinks
velocity.z += (plus - minus) × 60 × gravity × dt

Shredding

A velocity-magnitude modulation based on temperature that creates the torn/shredded look in fire:

intensity *= 0.05 × clamp(flames × 5.0, 0, 1)
stretch = smoothstep(threshold - width, threshold - width - 0.0005, temperature)
squash  = smoothstep(threshold + width, threshold + width + 0.0005, temperature)
velocity *= 1 + squash × intensity    // boost above threshold
velocity *= 1 - stretch × intensity   // reduce below threshold

The 0.05 and 5.0 are hardcoded scaling constants.

Wind

A time-varying directional force with 4D Perlin noise chaos added directly to velocity each step.

Simulation Mask

A global SDF (sphere, box, or cylinder) masks out voxel data near the domain edge with smoothstep falloff, preventing artifact accumulation at boundaries.


5. Diffusion

Applies viscous diffusion to all voxel channels and velocity using separable 1D Gaussian convolution in three passes (X, Y, Z).

Each pass loads a 1D strip into shared memory (fp16 packed), precomputes Gaussian weights exp(i² × (-1/2σ²)) × reciprocal_total_weight, and applies a dot product over the kernel. Kernel half-size is a compile-time constant. Valid half-sizes: 4, 8, 12, 16, 20, 24, 28, 32, 36, 40 (must be multiple of 4).

This approximates solving the diffusion equation forward-in-time by repeated Gaussian blurring.


6. Vorticity & Confinement

Vorticity Computation

Computes ω = curl(v) at each vertex location (corner between 8 voxels) using 4-point averaged finite differences:

ω.x = dw/dy - dv/dz
ω.y = du/dz - dw/dx
ω.z = dv/dx - du/dy

The magnitude |ω| is packed alongside the vorticity vector as fp16 pairs.

Vorticity Confinement

Re-injects small-scale rotational structure lost to numerical dissipation:

  1. Compute N = normalize(∇|ω|) using 8-point corner-averaged finite differences
  2. Compute ω as the average of 8 corner vorticity vectors
  3. Apply v += confinement × cross(N, ω) × dt × 60

Masking: The confinement strength is spatially modulated:

mask = vorticity_constant
     + vorticity_velocity × coarse_velocity
     + vorticity_temperature × temperature × 0.00025
     + vorticity_smoke × smoke

Gated to zero if temperature or smoke falls outside specified [low, high] range thresholds.

Cubic ramp: N *= pow(min(|ω| × ramp, 1), 3) suppresses confinement in regions of very weak vorticity to avoid noise amplification.

Large-scale vorticity: An optional second term operates on a downsampled vorticity field, injecting structure at a coarser scale independently.


7. Pressure Projection

Collision Preparation

Builds collision mask textures for all multigrid levels from SDF collider shapes and domain boundary walls. Uses a workgroup-level early-out: a coarse collision mask is precomputed so workgroups without any nearby colliders skip the expensive per-voxel SDF evaluation.

Animated mesh colliders inject their velocity directly into the velocity field via SDF velocity textures.

Boundary conditions (floor, ceiling, walls) are written explicitly into the outer face voxels of each collision texture level using a face bitmask.

Hourglass Filter

Required by the vertex-centered grid: velocity modes at Nyquist frequency (checkerboard patterns) are in the null space of the divergence operator — they produce zero divergence and thus cannot be removed by pressure projection. The hourglass filter explicitly damps these modes. For each voxel, samples 8 corner + 6 face neighbors:

w = (v_center × 16 - faces × 4 + corners) / 32
v_filtered = v_center - strength × w

The w vector is exactly the checkerboard-mode component (the 5-dimensional null space of the 27-point Laplacian: constant, 3D checker, and three 2D checkers). Applied both before and after pressure projection.

Divergence

Computes ∇·v at each vertex (corner between 8 voxels) by averaging the velocity components on each face:

divergence = -(vE-vW + vN-vS + vU-vD) + extra_divergence

Extra divergence from combustion gas release is added here. Pressure is optionally seeded as (8/9) × divergence / 8.

Multigrid V-Cycle Pressure Solve

Solves the Poisson equation Δp = -∇·v using a vertex-centered discretization where velocity lives at cell centers and pressure lives at vertices (the dual of the standard MAC grid).

Corner→Vertex Smoothing Scheme

The consistent discrete Laplacian for this grid is a full 27-point stencil with weights (-24 center, -4 face, +2 edge, +3 corner) / 16h². This is solved using two complementary smoothers in sequence:

Corner smoother (9-point): Uses only the 8 corner neighbors at (±1,±1,±1). This captures 75% of the full operator and is well-conditioned:

x_new = (4b + Σ(O_ijk × x_ijk)) / weight
x_new = lerp(x_old, x_new, ω)        // ω = 8/9 for Jacobi

where weight = Σ(O_ijk) counts non-solid neighbors.

Vertex smoother (27-point): Uses the full stencil including face and edge neighbors with gradient weighting: 3×corner + face_sum - negative_face_sum, divided by 16 × weight, with ω = 0.5. This handles the face/edge coupling the corner stencil cannot see.

Both are used per V-cycle level: corner smooth first, then vertex smooth. The corner stencil alone would leave systematic residual divergence in modes invisible to it (face/edge coupling), which would accumulate over frames.

SOR variant (corner stencil): Uses an 8-pass checkerboard pattern within a single dispatch exploiting the sublattice structure (corner stencil creates 4 independent bipartite pairs). The 8 offsets:

{0,0,0}, {1,1,1}, {1,0,0}, {0,1,1}, {0,1,0}, {1,0,1}, {1,1,0}, {0,0,1}

V-Cycle structure:

  1. Descent (Level 0→1→2→3→4): Corner smooth + vertex smooth, compute residual (with full stencil), restrict
    • Level 0→1: 1 pre-smooth iteration
    • Level 1→2, 2→3, 3→4: 2 pre-smooth iterations each
  2. Coarsest solve: Multiple SOR/Jacobi iterations with solve_omega weight, then vertex smooth
  3. Ascent (Level 4→0): Prolongate (additive trilinear injection), corner smooth + vertex smooth
    • Level 4→3, 3→2, 2→1: 2 post-smooth iterations each
    • Level 1→0: 1 post-smooth iteration

Restriction: [1,2,1]³ hierarchical Gaussian-like kernel applied in 3 separable stages, with 4× scaling correction for the coarsening ratio.

Prolongation: Trilinear injection averaging 8 coarse-grid corners with equal weight (1/8 each), applied additively as error correction.

Initial pressure at coarse level: x = (8/9) × (1/8) × restricted_residual

Coarsening policy: Up to 4 levels; coarser grid must be divisible by 8 in all dimensions. Additionally, the coarsest level must have at least 16,384 total cells (e.g. 16³ is too small at ~4K).

Multiple projection iterations: The outer loop can repeat the full divergence→multigrid→gradient cycle. On iteration 1, pressure is swapped into an accumulator. On iterations 2+, pressure is additively corrected into the accumulator. The final accumulated pressure is swapped back before use.

Pressure reuse: Previous frame's pressure serves as a warm start.

Gradient Projection

Subtracts pressure gradient from velocity: v -= ∇p. At each voxel, reads the 8 surrounding vertex pressures, computes face-averaged gradients:

v -= (pE-pW, pN-pS, pU-pD) × collision_mask

Collision mask = 0 inside solids (no-slip). Open boundaries enforce dp/dn = 0.


8. Shape System

Shapes define spatial regions for emission, collision, and force masking. They are analytical SDFs evaluated on the GPU from a postfix stack of operations.

Primitive SDFs (13 types)

Sphere, Box, Rounded Box, Capsule, Torus, Cylinder, Rounded Cylinder, Tube (hollow cylinder with optional end caps), Cone, Rounded Cone, Ellipsoid, Hemisphere, Spiral.

Each stores current + previous position/rotation (as quaternions) for motion velocity injection. SDFs are evaluated by inverse-transforming the query point: p = qrot(p - op.pos, qconj(op.rot)) / op.scale, then evaluating the local-space SDF.

The sphere has three motion trace modes:

  • Mode 0: Static sphere (no gap filling)
  • Mode 1: Capsule if radii match, cone if radii differ (previous/current positions)
  • Mode 2: Always capsule between previous/current positions

Shape Noise

Fractal Brownian motion (fBm) noise evaluated at world-space coordinates to define an implicit surface. Parameters: scale, seed, octaves (1–8), lacunarity, gain, amplitude, bias (shifts surface in/out), animation speed (4D noise).

Shape Blend (combine two shapes)

Operation Math
Union min(a, b)
Smooth Union smin(a, b, smoothness)
Subtraction max(-a + 3.0 + bias, b) — carves A out of B (note hardcoded +3.0 offset)
Smooth Subtraction smax(-a + bias, b, smoothness)
Intersection max(a, b)
Smooth Intersection smax(a, b, smoothness)
Add (Displacement) a + remap(b) × amplitude — sculpts A's surface using B
Morph lerp(a, b, weight)

A bias parameter offsets the SDF result, inflating/deflating the boundary.

Shape Modifier (single-shape post-processing)

Modifier Effect
Rounding sdf - thickness — grows surface outward
Golf Ball sdf + amplitude × sin(f×x) × sin(f×y) × sin(f×z) — sinusoidal bumps
Invert -(sdf) + offset — flips inside/outside
Shell abs(sdf) - thickness/2 — converts solid to thin shell (inward/centered/outward alignment)

Shape Transform

Pure transform node — applies position + rotation to all downstream shapes. Multiple shapes on the same input are implicitly unioned.

Shape Burst

Spawns N animated shapes that expand over time. Each burst element has:

  • Random position from distribution (box, ellipsoid, evenly distributed)
  • Random trigger timing (center-to-outside, outside-to-center, random)
  • Scale driven by a user curve over the burst's lifetime
  • Velocity injection from size rate-of-change: velocity = (size_next - size_current) × velocity_injection

Subtypes: sphere, cylinder, ring, hollow sphere, spike.

Shape Particles (CPU)

A CPU-simulated particle system acting as an emitter shape. Particles move, age, collide, and their positions are uploaded as Distance_Op entries (sphere or capsule per particle). Supports gravity, drag, advection into sim velocity, chaos perturbation, and per-property life curves. An "impact" output exposes collision zones as secondary shapes.

Shape Bitmap

Uses a 2D texture to define the shape with channel assignments for cutout, active region, fuel, smoke, temperature, and force masking.

SDF Assembly Pipeline

The CPU traverses the shape graph depth-first, building a postfix Distance_Op array:

  1. Leaf nodes push their Distance_Op
  2. Blend nodes pop two, push one result
  3. Modifier nodes pop one, push one
  4. Multiple shapes on the same pin auto-union with min()

The array is uploaded as an SSBO. The GPU evaluates this RPN stack per-voxel to compute: which voxels are inside the shape, emission weights, and velocity from emitter motion.


9. Forces

All forces inject velocity into the simulation each timestep. Each force is uploaded as a Force struct to the GPU.

Noise Force

Generates a 3D vector noise field using fractal Brownian motion. Two modes:

  • Plain fBm: Direct noise output
  • Curl noise: ∇ × F of an underlying fBm potential — divergence-free by construction, does not create artificial compression

Parameters: seed, octaves, lacunarity, gain, amplitude, 3D bias vector (directional tendency), animation speed (4D noise).

Point Force

Radial velocity from/toward a point.

  • Falloff: None, Linear, Quadratic, Cubic, or Custom exponent
  • attenuation = pow(saturate((dist - inner) / (outer - inner)), 1/exponent)
  • Can inject divergence (explosive/implosive pressure) at the point
  • Per-force chaos via fBm perturbation

Line Force

Force around an infinite line (or finite segment). Three independent components:

  • Push: along the line axis
  • Twist: rotational around the line (tornado effect)
  • Repel: radial from/toward the line

Falloff is by radial distance from the line.

Toroidal Force

Force around a torus ring — magnetic-field-like donut vortex. Same three components (push, twist, repel) referenced to the closest point on the ring circle.

Vector Field Force

Imports an FGA (Fluid Grid Asset) file — a dense 3D grid of velocity vectors. Trilinearly interpolated at each simulation voxel position.

Vorticles

GPU-simulated point vortices — persistent spinning primitives that inject rotational velocity into the field.

Each vorticle has: position, rotation axis, spin velocity, push velocity, size (radius in voxels), lifetime.

Update per step:

  1. Advected by local simulation velocity
  2. Rotation axis blends toward local vorticity curl: lerp(axis, curl, 1-damping)

Injection: For each nearby voxel:

weight = (1 - dist/size)^ramp
spin_dir = normalize(cross(rotation_axis, delta))
dv = (axis × push + spin_dir × rotation) × intensity × weight

Accumulated via 64-bit fixed-point atomics to avoid fp32 precision issues.

Spawn shapes: fill volume, box, sphere, line, spiral. Orientation: constant, shape-aligned, point-to-center, random.

Shared Force Weighting

All geometric forces share a 4-component per-voxel weighting:

weight = constant + smoke_w × smoke + temp_w × temperature + vel_w × |velocity|

Gated by range clamps on smoke and temperature.

Shape Masks on Forces

Forces accept mask shapes — the force is only active inside the mask SDF region.


10. Colliders

Define solid boundaries preventing flow passage. Uses SDF evaluation per voxel during pressure projection to mark solid cells. Velocity transfer: moving colliders inject their velocity into the fluid at the surface.

Supports: affect Volume only, Particles only, or both. An "ignore collisions" mode keeps the visual shape but bypasses the actual boundary.


11. GPU Particle System

A fully GPU-resident Lagrangian particle system (default pool: 128K particles, 96 bytes each, heavily bit-packed as fp16).

Spawning

Continuous (particles/second with fractional accumulator) or burst. Emission shapes map 1:1 from shape primitives. Can emit from mesh surfaces. Emission conditions: always, velocity above/below threshold.

Initial velocity modes: range (per-axis min/max), cone (direction + spread angle), from-position (radial from center).

Per-Frame Physics

  1. v *= (1 - damping × dt) — velocity damping
  2. v.z -= gravity × mass × physics × dt — gravity
  3. Sample force SSBOs, apply v += force × dt — force nodes
  4. SDF collision: decompose velocity into normal and tangential components:
    bounce = -min(0, dot(normal, normalized_vel)) × normal × |vel|
    slide  = velocity + bounce
    target = bounce × bounciness + slide × sliding + collider_vel × 0.1
    
  5. pos += v × dt — position integration
  6. Simulation coupling: v = lerp(v, sim_velocity, (1 - physics)³) — physics=0 means fully advected by sim, physics=1 means independent
  7. Boundary handling: kill, stop, or bounce at domain edges and floor

Compaction & Sorting

Dead particles are compacted via parallel prefix-sum (incremental, 65K-block, max 64 blocks). Alive particles are radix-sorted by camera depth for correct back-to-front alpha compositing.

Trail/Group System

Particles in groups with two modes:

  • Follow: Each trailing particle follows the one ahead with weighted inertia (ribbon/trail effects)
  • Sphere: Fibonacci sphere distribution around the leader, stretched by velocity direction

Freeze/Unfreeze

Particles can spawn frozen and only activate when: sim velocity exceeds threshold, applied force exceeds threshold, collision occurs, or they enter a mask shape region. Lifetime timer can pause while frozen.

Injection Into Volume

Each particle can write smoke/temperature/fuel/flames/velocity into an accumulation SSBO (atomic-add), which folds into the simulation fields. Injection amounts decay over life via per-property fading rate curves.


12. Lighting & Shadow System

Shadow Volume Computation

All lighting is precomputed into a 3D shadow texture before raymarching (8×8×8 compute tiles). This is the key architectural insight — the raymarch only samples this texture, making per-ray cost low.

For each voxel, the shader raymarches toward the light source through the density volume. The march uses cascaded MIP acceleration with exponentially growing step sizes:

  • First range distance: full-resolution voxel_sampler
  • range×2: mip0_sampler (2× downsampled)
  • range×4: mip1_sampler (4× downsampled)
  • range×8: mip2_sampler (8× downsampled)
  • Beyond: mip3_sampler (16× downsampled)

Step sizes grow by step *= 1.05 each iteration, so distant shadowing is very cheap. Integration uses the trapezoid rule (average of old and new sample) with a ramping blend weight:

integrated_density += multiplier × (old_sample + new_sample) × min(blendweight, 1.0) / 2.0
blendweight += 0.05 / shadowing_intensity

The ramp means the first few samples near the voxel contribute less, then blend weight reaches 1.0 and stays there.

After shadow integration, a separable 3D blur is applied in X, Y, Z passes (cubic falloff weights: pow(1 - abs(i)/radius, 3)). This is what gives the "shadow softness" effect — larger blur = softer, more diffuse shadows.

Extinction color (per-channel RGB): allows colored Beer-Lambert absorption. Max shadow density clamps integrated optical depth.

MOPF Scattering Model

The final per-voxel light color uses Multiple Octave Phase Function (MOPF), a multi-lobe Henyey-Greenstein extension similar to the "Deep Scattering" technique from Wrenninge et al.:

for n in 0..mopf_octaves:
    luminance += b × light_color × HenyeyGreenstein(g×c, cosθ)
                                  × exp(-density × a × extinction_color) × flt
    a *= mopf_attenuation       // extinction per octave
    b *= mopf_contribution      // energy contribution per octave
    c *= (1 - mopf_phase_attenuation)  // phase narrowing per octave

This simulates multiple forward-scattering lobes at different frequencies — approximating the appearance of multiple scattering without tracing multiple rays. The phase function uses a nonstandard pow(cosθ, 1.5) exponent for a sharper forward peak.

Powder effect: flt = saturate(lerp(1, 1-exp(-density×2), powder_intensity)) — darkens dense volumes at the illuminated surface to simulate sub-surface shadowing.

Ambient Light & Volumetric Normal

For ambient light, instead of raymarching toward a light, the shader computes a volumetric normal from density gradients (6 axis-aligned samples at 8-voxel distance), maps it to lat/lon to look up a sky palette color. AO is computed from the same 6 samples via exp(-density × ambient_shadowing).

AO Initialization (3-Scale)

On the initial pass, a 3×3×3 neighborhood is sampled at radii 2, 4, and 8 voxels. Three smoothed density values are combined with configurable low/med/hi frequency weights. Stored in shadow.a for use during raymarching.

Scattering Cache (Mipmap-Based)

Scattering/self-illumination is stored in the .yzw channels of mip0. These are built from density and flames data, then downsampled into successively coarser mips. During raymarching, the scattering contribution is sampled via tricubic interpolation. The ground plane and SDF shapes also sample this emissive cache for receiving scattered light from the volume.

Light Types

  • Directional: Shadow via axis-aligned blur; azimuth/elevation or source/target positioning. Supports light_inherit_sun — blends color toward the atmospheric sky palette color.
  • Point: Attenuation uses 1000 × 2 / (dist² + bulb² + dist × √(dist²+bulb²)) with a physical bulb size parameter. Shadow computed radially.
  • Spot: Same as point with cone angle masking and penumbra falloff ramp.
  • Ambient: Uniform ambient with per-voxel AO. Can inherit sky color from atmospheric model.

Ambient Occlusion (Volumetric, Ray-Traced)

During raymarching, 12 (or 36 in HQ) samples of density are taken in a hemisphere around the current position aligned to camera space:

AOdata += density_at(fixed_jittered_direction_i)
attenuation = SoftMax((1-ao_max), exp(-AOdata × ao_intensity))

Separate AO intensities for emissive light vs. directional light. The SoftMax (log-sum-exp with k=10) prevents hard clipping.

An optional occlusion cache stores the AO result in the 3D shadow texture for reuse.


13. Volume Raymarching

Architecture

All raymarching runs as compute shaders (16×16 tiles). Shader variants are compile-time selected: Combustion vs Colored Smoke, Full vs Lightweight, with/without SOIT.

Accelerator Pre-Pass (Empty Space Skipping)

Before the main raymarch, a conservative pre-pass finds front/back entry points of non-empty volume:

  1. Forward pass: step 1 voxel at a time, sample mip0_adv_sampler, record distance to first non-empty voxel (frontValue)
  2. Backward pass: reverse ray, record distance from back to last non-empty voxel (backValue)
  3. Store (frontValue, backValue) in a 2D R16F accelerator image

The main raymarch reads a 4×4 pixel neighborhood minimum of the accelerator (conservative), skips the empty front section, and clips the empty back. Safety margin of 3 voxels.

Ray-Volume Intersection

Standard slab method against the volume AABB, with dir == 0 edge case handling and camera near-plane smoothstep clipping over the first 8 steps.

Step Size & Dithering

User-controlled step size (minimum 0.1 voxels). Ray starting position is dithered using a 4×4 Bayer matrix pattern (4×2 interleaved for checkerboard mode):

offset = dir × (bayer_value - 0.5) × dithering_intensity

Voxel Sampling Options

  • Hardware trilinear (default)
  • Tricubic: Cubic B-spline via 8 hardware bilinear taps (weights: w0=1/6*(1-f)³, w1=2/3-1/2*f²*(2-f), w2=2/3-1/2*(1-f)²*(1+f), w3=1/6*f³). Note: this is a different cubic than the C0 cubic used in advection.
  • Sharpening: Base sample × 3.0 minus 6 axis-aligned samples × 1.75 each (with random rotation matrix for isotropy), normalized by 13.5. Negative sharpening intensity = blur.
  • Triquadratic: 8-corner quadratic B-spline with q*(q-1)+0.5 basis, averaged

Color Model: Combustion

Voxel data packing: x=smoke, y=temperature, z=fuel, w=flames.

  1. Fire color from temperature gradient lookup (1D texture array — blackbody or custom ramp)
  2. Smoke color from density/temperature ramp
  3. Blend with simplex-style interpolation:
    weight_smoke = smoke / (smoke + fuel)
    weight_fuel = fuel / (smoke + fuel)
    color = smoke×smokeColor×weight_smoke + fuel×fuelColor×weight_fuel + fireColor×fireLuminance
    
  4. Shadow contribution sampled from precomputed 3D shadow texture

Integration: 4th-Order Runge-Kutta

The volumetric rendering integral is solved using RK4, not Euler stepping. For each step, three sample points (entry s1, midpoint s2, exit s3) are evaluated:

f1 = s1 × (1 - accumulated_alpha)
f2 = s2 × (1 - (alpha + 0.5×dt×f1.w))
f3 = s2 × (1 - (alpha + 0.5×dt×f2.w))
f4 = s3 × (1 - (alpha + dt×f3.w))
current_color = (dt/6) × (f1 + 2f2 + 2f3 + f4)

The exit sample s3 is reused as the entry sample s1 for the next step, saving one texture fetch per step.

This same RK4 integration runs in parallel for: color, non-emissive (shadow-only) color, flames, and mask channels.

Empty Space Skipping

When enabled, the shader samples mip0_adv_sampler at each step midpoint. If zero (empty), the step is skipped — only the mask RK4 integration advances.

Checkerboard Rendering

Every other row of pixels is computed per frame (alternating by frame_oddness). Each thread computes pixel x = 2*thread_x + checker where checker = (y + frame_oddness) % 2. Missing pixels are reconstructed in post-processing by comparing a 4-neighbor average against the previous frame's value — whichever is closer to the current pixel is used.

Emissive Masking

Suppresses emissive light (fire glow) in regions of high density, making fire appear to "carve into" smoke rather than glow through it. A parallel RK4 integration of a non-emissive version of the color is maintained. The final color blends toward the non-emissive version based on accumulated density:

mask = 1 - linearstep(reference - width/2, reference + width/2, accumulated_shadow_density)
color = lerp(color, noEmissive, masking_intensity × smoothstep(0, 1, pow(mask, ramp)))

14. SOIT — Order-Independent Transparency (Particles + Volume)

SOIT correctly composites transparent rasterized particles with the raymarched volume. The challenge: particles are rendered by rasterization (back-to-front sorted), volume by raymarching, and they must occlude each other correctly.

Particle Write Phase

During particle rasterization, each fragment maps its depth into the volume's ray parameter space [0,1] and writes to 14 depth bins via additive blending of -log(1-alpha) into MRT buffers. Storing log-transmittance allows correct compositing via simple summation. A [0.25, 0.5, 0.25] smoothing kernel is applied across adjacent bins to reduce banding, then cumulative summation converts to transmittance: occlusion = 1 - exp(-cumulative_log_transmittance).

Volume Read Phase

During raymarching, each step looks up the particle occlusion at the current ray depth using linear interpolation across the 6 depth bins:

in_front = lerp(soit_cache[id], soit_cache[id+1], frac(depth × 13 - 1))
color.rgb *= (1 - in_front)  // attenuate by particle occlusion

Volume Write Phase

The volume writes back its own depth information (6 depth averages at 20% transmittance intervals) so particles know about volume occlusion.

Can run at 1/8th resolution for performance, with stochastic bilinear blending.


15. SDF Shape Rendering

Emitters, colliders, and force shapes are rendered via sphere-traced raymarching.

Pre-Pass

A 2D compute pass marches each screen ray, evaluating emitter/collider/force SDFs. When distance < 1.5, a hit is recorded. Step size uses sphere tracing: stepSize = clamp(minDist - 1, 0, 1) × 2.

Output: depth buffer used as the volume raymarch stopping distance.

Lighting Levels

  • L0 (full): Normal via tetrahedron method (4 SDF evals), shadow texture lookup, emissive + ambient scattering, optional hemispheric AO (24 samples)
  • L1: Albedo only, no shadow
  • L2: Solid black silhouette
  • L3: Shadow difference display

16. Particle Rendering

Rendered as camera-aligned billboards or velocity-stretched quads via instanced draw calls.

Blur Modes

  • None: Circular billboard
  • Symmetric/Asymmetric/Reverse: Velocity-stretched, long axis = velocity direction

Shape

Circular mask: dist = smoothstep(0, 1-sharpness, clamp((0.5-length(uv))×2, 0, 1)). HQ mode super-samples with 8 jittered sub-pixel samples using dFdx/dFdy.

Blend Modes

  • Translucent: (alpha×rgb×pow(dist,roundness), alpha×dist)
  • Additive: (rgb×dist, 0) — no alpha write
  • Translucent_Additive: Combination

Lighting

  • Lit: Samples the 3D shadow texture at particle position; particles can inject occlusion into the shadow map
  • Emissive: Self-illuminating, can inject into volumetric scattering

17. Ground Plane

Infinite horizontal plane with ray/plane intersection.

Patterns

  • Checkerboard: Analytically box-filtered to avoid aliasing at grazing angles
  • Grid: Analytic infinite grid with minor/major/axis lines and Gaussian distance fade
  • Noise: FBM pattern with mip-quality blending
  • Uniform: Solid color

Ground Shadow

Raymarches through the 3D volume toward the light using mip1 (downsampled density) for speed:

attenuation = exp(-integrated_density × ground_shadow)

Ground Emissive Scattering

Receives scattered light from the volume via mip0.yzw (emissive cache) sampled at the ground position projected into volume space. Fades with radial smoothstep from volume center.


18. Skybox / Atmosphere

Three modes:

  • Uniform color
  • Gradient: Linear gradient based on dot(ray_dir, up) between two colors
  • Atmospheric Scattering: Physically-based Rayleigh+Mie model:
    • Planet radius 600km, atmosphere top 670km
    • 5 primary integration steps (exponentially spaced), 3 secondary steps each
    • Rayleigh phase: 0.0597 × (1 + cos²θ)
    • Mie phase: Henyey-Greenstein 0.1194 × (1-g²)(1+cos²θ) / ((2+g²)(1+g²-2g×cosθ)^1.5)
    • Sun disc at ~1.8° angular extent

Sky color stored in a 2-texel palette:

  • sky_palette[0] = sun/directional color
  • sky_palette[1] = ambient sky color

Lights can inherit_sun / inherit_sky — blend toward the atmospheric palette color.


19. Volume Post-Processing

Applied as a 3D compute pass (8×8×8 tiles) before raymarching.

Operations (one active at a time)

  • Motion Blur: Samples density along velocity vector, weighted box filter
  • Sharpen: Unsharp mask per channel (negative = soften)
  • Dilate: Morphological dilation/erosion with softness control

Border Mask

Planar clip on each face (±X/Y/Z) with offset, width, and ramp exponent for cross-section rendering.

Modulation System (4 slots)

Each slot maps a source field → target field via an operator:

  • Sources: smoke, temperature, fuel, flames, velocity, vorticity, height, SDF, noise (4D simplex fBm)
  • Targets: smoke, temperature, fuel, flames, color
  • Operators: replace, add, multiply, min, max
  • Source/target ranges independently remappable via Bezier curves
  • Optional mask source gating

20. Post-Processing Pipeline

Pass 1 (Primary)

  1. Checkerboard reconstruction: Fetch 4 neighbors, compute average and delta vs current. Use whichever (average or previous frame) has smaller delta. Applied only to checker-pattern pixels based on (x + y + frame_oddness) % 2.

  2. Bloom composite: 6-level cascaded downsample, each level blurred with 7-tap Gaussian kernel [0.0205, 0.0855, 0.232, 0.324, 0.232, 0.0855, 0.0205] in separable H/V passes. Composited with per-level weights: [0.1, 0.2, 0.3, 0.5, 0.7, 1.0] × radius_blend. Final: color += bloom × 0.2 × bloom_intensity.

  3. Exposure: color.rgb *= exposure

  4. Tonemapping (pre-tonemap scale varies by operator):

    • Linear: clamp(color × 0.1, 0, 1)
    • ACES (cheap): (x×(2.51x+0.03)) / (x×(2.43x+0.59)+0.14) with 0.1× pre-scale
    • ACES (fitted): Full matrix transform + RRT/ODT with 0.2× pre-scale
    • Hable: Filmic curve with 0.4× pre-scale
    • Reinhard: x/(1+x) with 0.2× pre-scale
    • Lottes: a=1.6, d=0.977 with 0.1× pre-scale
    • Uchimura: 6-segment piecewise with 0.2× pre-scale
  5. Vibrance: rgb = lerp(luma, rgb, 1 + vibrance × (1 - sign(vibrance) × saturation_amount))

  6. Contrast + brightness: rgb = (rgb - 0.5) × (1 + contrast) + 0.5 + brightness

  7. Saturation: rgb = max(0, lerp(grey, rgb, saturation + 1.0)) where grey = 0.299r + 0.587g + 0.114b

  8. Gamma: color = pow(color, 1.0 / gamma)

Pass 2 (Secondary/Stylistic)

  • Pixelization with Bayer dithering
  • Quantization per RGB or per H/S/V independently
  • Vignette via rounded rectangle SDF
  • Palette remap (greyscale-to-gradient, nearest-palette-color brute force search)
  • Dithering patterns: Bayer 2×2/4×4, hatching, dots

Stylistic Filters

Filter Algorithm
Kuwahara Anisotropic Kuwahara (painterly blobs)
FDoG / FXDoG Flow-based Difference of Gaussians (line art)
AKF Anisotropic Kuwahara (Gaussian-based)
SST/TFM Structure tensor / tangent flow map
LIC Line Integral Convolution
Abstraction Image abstraction
Chromatic Aberration Radial RGB shift

FXAA

Standard Nvidia FXAA (Preset 5) with smart edge triggering — only runs at SDF/volume boundary transitions and large depth discontinuities, not uniformly.


21. Render Layers

27 output layer types:

Category Layers
Combined Full render, full render + shapes alpha
Light channels Key (directional), Ambient, Emissive, Scattering
Alpha Volume alpha, alpha-only
Data Depth, Motion vectors
Components Smoke only, Flames only, Temperature, Albedo
Six-point Top+Left+Right, Bottom+Back+Front (for game engine lightmaps)
Normals Derivative normal, six-point normal, assembled normal
Debug Shapes only

Six-point lighting renders the scene 6× with a virtual directional light from each axis direction. Normal maps are reconstructed from the six lighting passes:

normal.r = 0.5 + right×0.5 - left×0.5
normal.g = 0.5 + up×0.5 - down×0.5
normal.b = 0.5 + sqrt(1 - dot(normal.xy, normal.xy)) × 0.5

Derivative normals use multi-scale density gradients at 3 scales (1, 4, 24 voxels) along camera axes.


22. Export

VDB Export

Exports OpenVDB .vdb files per-frame. Channels: density, temperature, fuel, flames, velocity (as 3 scalar grids or 1 vector grid). Supports sparse representation — voxels below threshold are not written.

Value remapping option: linear remap from user-defined [min, max] range, with per-channel thresholds for sparsity. Coordinate system conversion via axis transposition and Y-flip:

System Transpose Flip Y
Y-up left-handed (0,2,1) no
Y-up right-handed (0,2,1) yes
Z-up left-handed (0,1,2) yes
Z-up right-handed (0,1,2) no

Import node inverse transform can be applied to VDB grid metadata for DCC alignment.

Readback: GPU textures → CPU via async PBO readback → float32 → OpenVDB.

Particle Export

Exports GPU particles to Alembic (.abc). Attributes: position (always), colors, densities, lifetimes/ages, display sizes (all optional with custom names). Single-file (all frames in one archive) or per-frame files.

Image Export

Triple-buffered PBOs for async GPU→CPU readback. Export passes: gamma compress, unmultiply alpha (straight alpha output), expand borders (fill zero-alpha pixels with neighbor average to prevent fringe artifacts), defringe.


23. Simulation Looping

Enables seamlessly looping simulation by blending two concurrent simulation states:

  1. At the capture point, the current state (S0) is copied to a secondary state (S1)
  2. Both S0 and S1 run in parallel (S1 uses separate Velocity_Sync/Advection_Sync textures)
  3. During the blend window: output = lerp(S0, S1, weight)
  4. At loop end, S1 becomes the new S0

Blend curves: Linear, Smoothstep, Quadratic Smoothstep, Quadratic, Cubic, Quintic.

Two blend modes: Dynamic (mixes live running fields) and Static (mixes a cached snapshot with live).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment