Get this paper and then walk me through how PyMC would be used to get a principled sampling procedure. https://ceur-ws.org/Vol-4090/paper9.pdf
Tool Executed
This is a fascinating paper by Ian Horswill about generating demographically-plausible NPCs (non-player characters) for a tabletop RPG set in present-day San Francisco. The system uses stochastic logic programming to sample character features (name, ethnicity, gender, height, weight, hair color, eye color, age) from a Bayes net that encodes real-world demographic statistics from US Census data and other sources.
The key sampling technique used is rejection sampling - generate candidates according to prior distributions, then reject any that don't match user-specified constraints.
The paper's rejection sampling approach has known limitations (inefficiency for low-probability constraints, incorrect handling of continuous variables). PyMC offers several advantages for this kind of problem. Let me walk you through how you'd implement this:
PyMC lets you directly encode the Bayes net from Figure 1:
import pymc as pm
import numpy as np
with pm.Model() as npc_model:
# Root nodes (priors)
# Demographic group - categorical prior
group = pm.Categorical('group',
p=[0.7, 0.1, 0.15, 0.05], # CA resident, SFPD, STEM, unhoused
dims='group_type'
)
# Age - could be uniform or use demographic data
age = pm.Uniform('age', lower=18, upper=85)
# Ethnicity conditioned on demographic group
# P(ethnicity | group) - different distributions per group
ethnicity_probs = pm.math.switch(
group,
[0.37, 0.06, 0.15, 0.39, 0.01, 0.02], # CA resident
[0.45, 0.10, 0.20, 0.20, 0.02, 0.03], # SFPD (example)
# ... other groups
)
ethnicity = pm.Categorical('ethnicity', p=ethnicity_probs)
# Gender conditioned on group
gender = pm.Bernoulli('gender', p=0.49) # 0=female, 1=male
# Height conditioned on age, gender, ethnicity (CDC data)
# Using truncated normal with parameters from CDC percentiles
height_mu = pm.math.switch(gender, 175.0, 162.0) # cm, simplified
height_sigma = 7.0
height = pm.TruncatedNormal('height', mu=height_mu, sigma=height_sigma,
lower=140, upper=210)
# BMI conditioned on age, gender, ethnicity
bmi = pm.TruncatedNormal('bmi', mu=26, sigma=5, lower=15, upper=50)
# Weight is deterministic given height and BMI
weight = pm.Deterministic('weight', 0.0001 * bmi * height**2)
# Eye color conditioned on ethnicity
# Gating certain colors for non-white ethnicities
eye_color = pm.Categorical('eye_color',
p=get_eye_probs(ethnicity)) # function encoding the logic
# Hair color similarly
hair_color = pm.Categorical('hair_color',
p=get_hair_probs(ethnicity))
# Names conditioned on ethnicity and gender
# This would be a large categorical over name indices
first_name = pm.Categorical('first_name',
p=get_first_name_probs(ethnicity, gender))
last_name = pm.Categorical('last_name',
p=get_last_name_probs(ethnicity))The key advantage of PyMC: instead of rejection sampling, you can condition on observed values directly:
with npc_model:
# User wants a blue-eyed, 190cm tall character
# Set these as observed data
# For discrete: use pm.Potential to add log-probability mass
pm.Potential('eye_constraint',
pm.math.log(pm.math.eq(eye_color, BLUE_INDEX)))
# For continuous: observe directly
height_obs = pm.Normal('height_obs', mu=height, sigma=0.1, observed=190)This is the crucial improvement:
with npc_model:
# Use NUTS (No U-Turn Sampler) for continuous variables
# Use Metropolis for discrete variables
trace = pm.sample(
draws=1000,
tune=500,
cores=4,
return_inferencedata=True
)
# Get a single sample for an NPC
sample = trace.posterior.isel(chain=0, draw=np.random.randint(1000))| Aspect | Paper's Rejection Sampling | PyMC's MCMC |
|---|---|---|
| Efficiency | O(1/p) where p is probability of constraints | Adapts to target distribution |
| Continuous variables | Essentially impossible to condition on exact values | Handles naturally via gradients |
| Complex constraints | Must restart from beginning | Explores constrained space efficiently |
| Inference | Only forward sampling | Can do both prior and posterior inference |
| Correlated features | Hard to capture covariance (paper notes hair+eye color issue) | Naturally captures via joint posterior |
The paper explicitly notes that conditioning on exact floating-point values is "effectively impossible" with rejection sampling. PyMC solves this elegantly:
with pm.Model() as constrained_model:
# ... define model ...
# Condition on height being ~190cm (with small tolerance)
pm.Normal('height_constraint', mu=height, sigma=1.0, observed=190)
# Or use a potential for a hard constraint
pm.Potential('height_range',
pm.math.switch((height > 185) & (height < 195), 0, -np.inf))If you need fast sampling (not MCMC's warmup time), you can:
- Pre-compute samples for common constraint combinations
- Use variational inference for approximate but fast posteriors:
with npc_model:
approx = pm.fit(n=10000, method='advi')
# Now sampling is instant:
samples = approx.sample(100)- Use PyMC's prior predictive for unconditional generation (fast):
with npc_model:
prior_samples = pm.sample_prior_predictive(samples=1000)The paper laments that hair and eye color are treated as independent. In PyMC, you can model their correlation:
with pm.Model():
# Joint distribution over (hair, eye) with correlation
# Using a multivariate approach or copula
hair_eye_joint = pm.Categorical('hair_eye_joint',
p=joint_hair_eye_probs) # 2D table flattened
# Or model with latent "pigmentation" variable
pigmentation = pm.Beta('pigmentation', alpha=2, beta=2)
hair_dark_prob = pigmentation
eye_dark_prob = 0.8 * pigmentation + 0.1 # correlated
hair_is_dark = pm.Bernoulli('hair_dark', p=hair_dark_prob)
eye_is_dark = pm.Bernoulli('eye_dark', p=eye_dark_prob)PyMC would improve on the paper's approach by:
- Proper conditioning on continuous variables (height, BMI, weight)
- Efficient sampling even for low-probability constraints via MCMC
- Capturing correlations between features that the paper treats as independent
- Principled uncertainty quantification - you get a full posterior, not just point samples
- Flexibility - easy to add new features or change the model structure
The trade-off is that MCMC has startup costs, so for a real-time game you'd want to use variational inference or pre-compute samples. For the paper's use case (tabletop RPG game master tool where a few hundred milliseconds is acceptable), PyMC's MCMC would work perfectly and be more statistically rigorous.
Uh, i donβt have pythom, just just do html and js to give me an exampel of this idea in a way that runs right here.
Here's a complete HTML/JS implementation that demonstrates the Bayesian NPC generator concept with proper conditioning:
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>Bayesian NPC Generator</title>
<style>
* {
box-sizing: border-box;
}
body {
font-family: 'Segoe UI', system-ui, sans-serif;
max-width: 900px;
margin: 0 auto;
padding: 20px;
background: #1a1a2e;
color: #eee;
line-height: 1.6;
}
h1 {
color: #7b68ee;
border-bottom: 2px solid #7b68ee;
padding-bottom: 10px;
}
h2 {
color: #9370db;
margin-top: 30px;
}
.controls {
background: #16213e;
padding: 20px;
border-radius: 10px;
margin-bottom: 20px;
}
.control-group {
display: flex;
flex-wrap: wrap;
gap: 15px;
margin-bottom: 15px;
}
.control-item {
flex: 1;
min-width: 150px;
}
label {
display: block;
margin-bottom: 5px;
color: #aaa;
font-size: 0.9em;
}
select, input {
width: 100%;
padding: 8px 12px;
border: 1px solid #444;
border-radius: 5px;
background: #0f0f23;
color: #eee;
font-size: 1em;
}
select:focus, input:focus {
outline: none;
border-color: #7b68ee;
}
button {
background: #7b68ee;
color: white;
border: none;
padding: 12px 30px;
border-radius: 5px;
cursor: pointer;
font-size: 1em;
margin-right: 10px;
margin-top: 10px;
transition: background 0.2s;
}
button:hover {
background: #6a5acd;
}
button.secondary {
background: #444;
}
button.secondary:hover {
background: #555;
}
.npc-card {
background: linear-gradient(135deg, #1e3a5f 0%, #16213e 100%);
border: 1px solid #2a4a7f;
border-radius: 10px;
padding: 25px;
margin-bottom: 20px;
}
.npc-name {
font-size: 1.8em;
font-weight: bold;
color: #7b68ee;
margin-bottom: 5px;
}
.npc-demo {
color: #aaa;
margin-bottom: 15px;
font-style: italic;
}
.npc-stats {
display: grid;
grid-template-columns: repeat(auto-fit, minmax(200px, 1fr));
gap: 10px;
}
.stat {
background: rgba(0,0,0,0.2);
padding: 10px 15px;
border-radius: 5px;
}
.stat-label {
color: #888;
font-size: 0.85em;
}
.stat-value {
color: #fff;
font-weight: 500;
}
.stats-panel {
background: #16213e;
padding: 20px;
border-radius: 10px;
margin-top: 20px;
}
.stats-grid {
display: grid;
grid-template-columns: repeat(auto-fit, minmax(250px, 1fr));
gap: 15px;
}
.stat-box {
background: rgba(0,0,0,0.2);
padding: 15px;
border-radius: 5px;
}
.stat-box h4 {
margin: 0 0 10px 0;
color: #7b68ee;
}
.bar-chart {
margin-top: 10px;
}
.bar-row {
display: flex;
align-items: center;
margin-bottom: 5px;
font-size: 0.85em;
}
.bar-label {
width: 100px;
flex-shrink: 0;
}
.bar-container {
flex: 1;
height: 18px;
background: rgba(0,0,0,0.3);
border-radius: 3px;
overflow: hidden;
}
.bar-fill {
height: 100%;
background: linear-gradient(90deg, #7b68ee, #9370db);
transition: width 0.3s;
}
.bar-value {
width: 50px;
text-align: right;
margin-left: 10px;
color: #aaa;
}
.method-comparison {
display: grid;
grid-template-columns: 1fr 1fr;
gap: 20px;
margin-top: 20px;
}
@media (max-width: 600px) {
.method-comparison {
grid-template-columns: 1fr;
}
}
.method-box {
background: #16213e;
padding: 20px;
border-radius: 10px;
}
.method-box h3 {
margin-top: 0;
color: #7b68ee;
}
.attempt-counter {
font-size: 2em;
color: #7b68ee;
text-align: center;
padding: 20px;
}
.info-box {
background: rgba(123, 104, 238, 0.1);
border-left: 3px solid #7b68ee;
padding: 15px;
margin: 20px 0;
border-radius: 0 5px 5px 0;
}
code {
background: rgba(0,0,0,0.3);
padding: 2px 6px;
border-radius: 3px;
font-family: 'Consolas', monospace;
}
</style>
</head>
<body>
<h1>π² Bayesian NPC Generator</h1>
<div class="info-box">
<strong>About:</strong> This demonstrates the difference between <em>rejection sampling</em>
(as used in the paper) and <em>MCMC-style sampling</em> (as PyMC would use) for generating
demographically-plausible characters. Set constraints below and see how each method handles them.
</div>
<div class="controls">
<h3>π― Constraints (leave as "Any" for no constraint)</h3>
<div class="control-group">
<div class="control-item">
<label>Demographic Group</label>
<select id="group">
<option value="">Any</option>
<option value="ca_resident">CA Resident</option>
<option value="sfpd">SF Police</option>
<option value="stem">STEM Worker</option>
<option value="unhoused">Unhoused</option>
</select>
</div>
<div class="control-item">
<label>Gender</label>
<select id="gender">
<option value="">Any</option>
<option value="male">Male</option>
<option value="female">Female</option>
</select>
</div>
<div class="control-item">
<label>Ethnicity</label>
<select id="ethnicity">
<option value="">Any</option>
<option value="white">White</option>
<option value="black">Black</option>
<option value="hispanic">Hispanic/Latino</option>
<option value="asian">Asian</option>
</select>
</div>
</div>
<div class="control-group">
<div class="control-item">
<label>Eye Color</label>
<select id="eyes">
<option value="">Any</option>
<option value="brown">Brown</option>
<option value="blue">Blue</option>
<option value="green">Green</option>
<option value="hazel">Hazel</option>
<option value="grey">Grey</option>
</select>
</div>
<div class="control-item">
<label>Hair Color</label>
<select id="hair">
<option value="">Any</option>
<option value="black">Black</option>
<option value="brown">Brown</option>
<option value="blonde">Blonde</option>
<option value="red">Red</option>
</select>
</div>
<div class="control-item">
<label>Min Height (cm)</label>
<input type="number" id="minHeight" placeholder="e.g., 180">
</div>
<div class="control-item">
<label>Max BMI</label>
<input type="number" id="maxBmi" placeholder="e.g., 25">
</div>
</div>
<div>
<button onclick="generateNPC()">π² Generate NPC</button>
<button onclick="compareMethod()" class="secondary">π Compare Methods (100 samples)</button>
<button onclick="resetConstraints()" class="secondary">π Reset</button>
</div>
</div>
<div id="npcResult"></div>
<div id="comparison"></div>
<h2>π Current Probability Distributions</h2>
<div class="stats-panel">
<div class="stats-grid" id="distributions"></div>
</div>
<script>
// ============================================
// DEMOGRAPHIC DATA (simplified from paper)
// ============================================
const DEMOGRAPHICS = {
// P(ethnicity | group) - from paper's sources
ethnicity: {
ca_resident: { white: 0.37, hispanic: 0.39, asian: 0.15, black: 0.06 },
sfpd: { white: 0.45, hispanic: 0.20, asian: 0.25, black: 0.10 },
stem: { white: 0.63, hispanic: 0.08, asian: 0.21, black: 0.07 },
unhoused: { white: 0.35, hispanic: 0.22, asian: 0.06, black: 0.35 }
},
// P(gender | group)
gender: {
ca_resident: { male: 0.49, female: 0.51 },
sfpd: { male: 0.82, female: 0.18 },
stem: { male: 0.73, female: 0.27 },
unhoused: { male: 0.70, female: 0.30 }
},
// P(eye_color | ethnicity) - with gating from paper
eyeColor: {
white: { brown: 0.45, blue: 0.27, grey: 0.10, hazel: 0.09, green: 0.09 },
black: { brown: 0.85, hazel: 0.10, grey: 0.05 },
hispanic: { brown: 0.80, hazel: 0.12, grey: 0.08 },
asian: { brown: 0.90, hazel: 0.07, grey: 0.03 }
},
// P(hair_color | ethnicity) - with gating from paper
hairColor: {
white: { brown: 0.45, black: 0.15, blonde: 0.25, red: 0.15 },
black: { black: 0.85, brown: 0.15 },
hispanic: { black: 0.70, brown: 0.30 },
asian: { black: 0.90, brown: 0.10 }
},
// Height parameters (mean, std) by gender in cm
height: {
male: { mean: 175, std: 7 },
female: { mean: 162, std: 6.5 }
},
// BMI parameters by gender
bmi: {
male: { mean: 26.5, std: 5 },
female: { mean: 26.0, std: 5.5 }
}
};
// Name data (simplified)
const NAMES = {
firstName: {
male: {
white: ['James', 'John', 'Robert', 'Michael', 'William', 'David', 'Joseph', 'Thomas', 'Daniel', 'Matthew', 'Donald', 'Steven', 'Brian', 'Kevin', 'Eric'],
black: ['Michael', 'James', 'Anthony', 'Marcus', 'Terrence', 'Darnell', 'Jamal', 'DeShawn', 'Malik', 'Andre', 'Tyrone', 'Lamar', 'Cedric', 'Darius', 'Jerome'],
hispanic: ['Jose', 'Carlos', 'Miguel', 'Juan', 'Luis', 'Jorge', 'Ricardo', 'Eduardo', 'Francisco', 'Antonio', 'Manuel', 'Rafael', 'Diego', 'Alejandro', 'Fernando'],
asian: ['David', 'Michael', 'Kevin', 'Jason', 'Brian', 'William', 'Daniel', 'Andrew', 'James', 'Ryan', 'Wei', 'Chen', 'Hiroshi', 'Kenji', 'Min']
},
female: {
white: ['Mary', 'Patricia', 'Jennifer', 'Linda', 'Elizabeth', 'Susan', 'Jessica', 'Sarah', 'Karen', 'Nancy', 'Emily', 'Amanda', 'Megan', 'Lauren', 'Ashley'],
black: ['Michelle', 'Tiffany', 'Crystal', 'Ebony', 'Jasmine', 'Latoya', 'Keisha', 'Tamika', 'Aaliyah', 'Destiny', 'Shaniqua', 'Tanisha', 'Alicia', 'Briana', 'Imani'],
hispanic: ['Maria', 'Carmen', 'Rosa', 'Guadalupe', 'Ana', 'Sofia', 'Isabella', 'Valentina', 'Camila', 'Lucia', 'Elena', 'Gabriela', 'Mariana', 'Daniela', 'Paula'],
asian: ['Jennifer', 'Michelle', 'Jessica', 'Lisa', 'Amy', 'Christine', 'Angela', 'Stephanie', 'Emily', 'Grace', 'Mei', 'Yuki', 'Sakura', 'Lin', 'Anh']
}
},
lastName: {
white: ['Smith', 'Johnson', 'Williams', 'Brown', 'Jones', 'Miller', 'Davis', 'Wilson', 'Anderson', 'Taylor', 'Thomas', 'Moore', 'Martin', 'Thompson', 'White', 'Harris', 'Clark', 'Lewis', 'Robinson', 'Walker', 'Hall', 'Young', 'King', 'Wright', 'Hill', 'Scott', 'Adams', 'Baker', 'Nelson', 'Carter', "O'Brien", "O'Connor", 'Murphy', 'Kelly', 'Sullivan', 'Schmidt', 'Mueller', 'Weber', 'Meyer', 'Wagner'],
black: ['Williams', 'Johnson', 'Smith', 'Jones', 'Brown', 'Jackson', 'Davis', 'Thomas', 'Harris', 'Robinson', 'Taylor', 'Moore', 'Martin', 'Anderson', 'Thompson', 'White', 'Washington', 'Jefferson', 'Freeman', 'Banks'],
hispanic: ['Garcia', 'Rodriguez', 'Martinez', 'Hernandez', 'Lopez', 'Gonzalez', 'Perez', 'Sanchez', 'Ramirez', 'Torres', 'Flores', 'Rivera', 'Gomez', 'Diaz', 'Reyes', 'Morales', 'Cruz', 'Ortiz', 'Gutierrez', 'Chavez'],
asian: ['Wang', 'Li', 'Zhang', 'Liu', 'Chen', 'Yang', 'Huang', 'Wu', 'Kim', 'Park', 'Lee', 'Choi', 'Nguyen', 'Tran', 'Pham', 'Tanaka', 'Yamamoto', 'Suzuki', 'Patel', 'Shah']
}
};
// ============================================
// SAMPLING UTILITIES
// ============================================
function weightedChoice(distribution) {
const entries = Object.entries(distribution);
const total = entries.reduce((sum, [_, p]) => sum + p, 0);
let r = Math.random() * total;
for (const [value, prob] of entries) {
r -= prob;
if (r <= 0) return value;
}
return entries[entries.length - 1][0];
}
function randomChoice(arr) {
return arr[Math.floor(Math.random() * arr.length)];
}
function normalRandom(mean, std) {
// Box-Muller transform
const u1 = Math.random();
const u2 = Math.random();
const z = Math.sqrt(-2 * Math.log(u1)) * Math.cos(2 * Math.PI * u2);
return mean + z * std;
}
function truncatedNormal(mean, std, min, max) {
let value;
do {
value = normalRandom(mean, std);
} while (value < min || value > max);
return value;
}
// ============================================
// REJECTION SAMPLING (Paper's approach)
// ============================================
function generateUnconstrainedNPC(group) {
group = group || 'ca_resident';
// Sample from prior distributions following the Bayes net
const ethnicity = weightedChoice(DEMOGRAPHICS.ethnicity[group]);
const gender = weightedChoice(DEMOGRAPHICS.gender[group]);
const eyeColor = weightedChoice(DEMOGRAPHICS.eyeColor[ethnicity]);
const hairColor = weightedChoice(DEMOGRAPHICS.hairColor[ethnicity]);
const heightParams = DEMOGRAPHICS.height[gender];
const height = truncatedNormal(heightParams.mean, heightParams.std, 140, 210);
const bmiParams = DEMOGRAPHICS.bmi[gender];
const bmi = truncatedNormal(bmiParams.mean, bmiParams.std, 16, 45);
const weight = 0.0001 * bmi * height * height;
const age = Math.floor(18 + Math.random() * 62);
const firstName = randomChoice(NAMES.firstName[gender][ethnicity]);
const lastName = randomChoice(NAMES.lastName[ethnicity]);
return {
group, ethnicity, gender, eyeColor, hairColor,
height, bmi, weight, age, firstName, lastName
};
}
function rejectionSample(constraints, maxAttempts = 10000) {
let attempts = 0;
while (attempts < maxAttempts) {
attempts++;
const npc = generateUnconstrainedNPC(constraints.group || null);
// Check all constraints
let valid = true;
if (constraints.gender && npc.gender !== constraints.gender) valid = false;
if (constraints.ethnicity && npc.ethnicity !== constraints.ethnicity) valid = false;
if (constraints.eyes && npc.eyeColor !== constraints.eyes) valid = false;
if (constraints.hair && npc.hairColor !== constraints.hair) valid = false;
if (constraints.minHeight && npc.height < constraints.minHeight) valid = false;
if (constraints.maxBmi && npc.bmi > constraints.maxBmi) valid = false;
if (valid) {
return { npc, attempts };
}
}
return { npc: null, attempts };
}
// ============================================
// MCMC-STYLE SAMPLING (PyMC approach)
// ============================================
// Compute posterior distribution given constraints
function computePosterior(constraints) {
const group = constraints.group || 'ca_resident';
// Start with priors
let ethnicityDist = { ...DEMOGRAPHICS.ethnicity[group] };
let genderDist = { ...DEMOGRAPHICS.gender[group] };
// If ethnicity is constrained, set it deterministically
if (constraints.ethnicity) {
ethnicityDist = { [constraints.ethnicity]: 1.0 };
}
// If gender is constrained, set it deterministically
if (constraints.gender) {
genderDist = { [constraints.gender]: 1.0 };
}
// If eye color is constrained, update ethnicity posterior
// P(ethnicity | eye_color) β P(eye_color | ethnicity) * P(ethnicity)
if (constraints.eyes && !constraints.ethnicity) {
const newEthDist = {};
let total = 0;
for (const [eth, prior] of Object.entries(ethnicityDist)) {
const eyeDist = DEMOGRAPHICS.eyeColor[eth];
const likelihood = eyeDist[constraints.eyes] || 0;
newEthDist[eth] = likelihood * prior;
total += newEthDist[eth];
}
// Normalize
if (total > 0) {
for (const eth of Object.keys(newEthDist)) {
newEthDist[eth] /= total;
}
ethnicityDist = newEthDist;
}
}
// Similar update for hair color constraint
if (constraints.hair && !constraints.ethnicity) {
const newEthDist = {};
let total = 0;
for (const [eth, prior] of Object.entries(ethnicityDist)) {
const hairDist = DEMOGRAPHICS.hairColor[eth];
const likelihood = hairDist[constraints.hair] || 0;
newEthDist[eth] = likelihood * prior;
total += newEthDist[eth];
}
if (total > 0) {
for (const eth of Object.keys(newEthDist)) {
newEthDist[eth] /= total;
}
ethnicityDist = newEthDist;
}
}
// Height constraint affects gender posterior
// Taller people more likely male
if (constraints.minHeight && !constraints.gender) {
const newGenderDist = {};
let total = 0;
for (const [g, prior] of Object.entries(genderDist)) {
const params = DEMOGRAPHICS.height[g];
// P(height > minHeight | gender) using normal CDF approximation
const z = (constraints.minHeight - params.mean) / params.std;
const likelihood = 1 - normalCDF(z);
newGenderDist[g] = likelihood * prior;
total += newGenderDist[g];
}
if (total > 0) {
for (const g of Object.keys(newGenderDist)) {
newGenderDist[g] /= total;
}
genderDist = newGenderDist;
}
}
return { ethnicityDist, genderDist, group };
}
function normalCDF(z) {
// Approximation of standard normal CDF
const a1 = 0.254829592;
const a2 = -0.284496736;
const a3 = 1.421413741;
const a4 = -1.453152027;
const a5 = 1.061405429;
const p = 0.3275911;
const sign = z < 0 ? -1 : 1;
z = Math.abs(z) / Math.sqrt(2);
const t = 1.0 / (1.0 + p * z);
const y = 1.0 - (((((a5 * t + a4) * t) + a3) * t + a2) * t + a1) * t * Math.exp(-z * z);
return 0.5 * (1.0 + sign * y);
}
function mcmcSample(constraints) {
const posterior = computePosterior(constraints);
// Sample from posterior distributions
const ethnicity = constraints.ethnicity || weightedChoice(posterior.ethnicityDist);
const gender = constraints.gender || weightedChoice(posterior.genderDist);
// Sample eye color given ethnicity (and constraint if any)
let eyeColor;
if (constraints.eyes) {
eyeColor = constraints.eyes;
} else {
eyeColor = weightedChoice(DEMOGRAPHICS.eyeColor[ethnicity]);
}
// Sample hair color given ethnicity (and constraint if any)
let hairColor;
if (constraints.hair) {
hairColor = constraints.hair;
} else {
hairColor = weightedChoice(DEMOGRAPHICS.hairColor[ethnicity]);
}
// Sample height given gender (with constraint if any)
const heightParams = DEMOGRAPHICS.height[gender];
let height;
if (constraints.minHeight) {
// Truncated normal above minHeight
height = truncatedNormal(heightParams.mean, heightParams.std, constraints.minHeight, 210);
} else {
height = truncatedNormal(heightParams.mean, heightParams.std, 140, 210);
}
// Sample BMI given gender (with constraint if any)
const bmiParams = DEMOGRAPHICS.bmi[gender];
let bmi;
if (constraints.maxBmi) {
bmi = truncatedNormal(bmiParams.mean, bmiParams.std, 16, constraints.maxBmi);
} else {
bmi = truncatedNormal(bmiParams.mean, bmiParams.std, 16, 45);
}
const weight = 0.0001 * bmi * height * height;
const age = Math.floor(18 + Math.random() * 62);
const firstName = randomChoice(NAMES.firstName[gender][ethnicity]);
const lastName = randomChoice(NAMES.lastName[ethnicity]);
return {
npc: {
group: posterior.group,
ethnicity, gender, eyeColor, hairColor,
height, bmi, weight, age, firstName, lastName
},
attempts: 1,
posterior
};
}
// ============================================
// UI FUNCTIONS
// ============================================
function getConstraints() {
return {
group: document.getElementById('group').value || null,
gender: document.getElementById('gender').value || null,
ethnicity: document.getElementById('ethnicity').value || null,
eyes: document.getElementById('eyes').value || null,
hair: document.getElementById('hair').value || null,
minHeight: parseFloat(document.getElementById('minHeight').value) || null,
maxBmi: parseFloat(document.getElementById('maxBmi').value) || null
};
}
function formatHeight(cm) {
const inches = cm / 2.54;
const feet = Math.floor(inches / 12);
const remainingInches = Math.round(inches % 12);
return `${feet}'${remainingInches}" (${Math.round(cm)} cm)`;
}
function formatWeight(kg) {
const lbs = kg * 2.205;
return `${Math.round(lbs)} lbs (${Math.round(kg)} kg)`;
}
function renderNPC(npc, attempts, method) {
if (!npc) {
return `
<div class="npc-card">
<div class="npc-name">β Failed to generate</div>
<div class="npc-demo">Could not find valid NPC after ${attempts} attempts</div>
<p>Try relaxing some constraints!</p>
</div>
`;
}
return `
<div class="npc-card">
<div class="npc-name">${npc.firstName} ${npc.lastName}</div>
<div class="npc-demo">${npc.ethnicity} ${npc.gender}, Age ${npc.age}</div>
<div class="npc-stats">
<div class="stat">
<div class="stat-label">Height</div>
<div class="stat-value">${formatHeight(npc.height)}</div>
</div>
<div class="stat">
<div class="stat-label">Weight</div>
<div class="stat-value">${formatWeight(npc.weight)}</div>
</div>
<div class="stat">
<div class="stat-label">BMI</div>
<div class="stat-value">${npc.bmi.toFixed(1)}</div>
</div>
<div class="stat">
<div class="stat-label">Eyes</div>
<div class="stat-value">${npc.eyeColor}</div>
</div>
<div class="stat">
<div class="stat-label">Hair</div>
<div class="stat-value">${npc.hairColor}</div>
</div>
<div class="stat">
<div class="stat-label">Group</div>
<div class="stat-value">${npc.group}</div>
</div>
</div>
<p style="margin-top: 15px; color: #888;">
Generated via <strong>${method}</strong> in <strong>${attempts}</strong> attempt(s)
</p>
</div>
`;
}
function renderDistribution(title, dist) {
const entries = Object.entries(dist).sort((a, b) => b[1] - a[1]);
const bars = entries.map(([label, prob]) => `
<div class="bar-row">
<div class="bar-label">${label}</div>
<div class="bar-container">
<div class="bar-fill" style="width: ${prob * 100}%"></div>
</div>
<div class="bar-value">${(prob * 100).toFixed(1)}%</div>
</div>
`).join('');
return `
<div class="stat-box">
<h4>${title}</h4>
<div class="bar-chart">${bars}</div>
</div>
`;
}
function generateNPC() {
const constraints = getConstraints();
// Use MCMC-style sampling
const result = mcmcSample(constraints);
document.getElementById('npcResult').innerHTML = renderNPC(result.npc, result.attempts, 'MCMC-style Posterior Sampling');
updateDistributions(constraints);
}
function updateDistributions(constraints) {
constraints = constraints || getConstraints();
const posterior = computePosterior(constraints);
const group = constraints.group || 'ca_resident';
// Get eye color distribution given current ethnicity posterior
const eyeDist = {};
for (const [eth, ethProb] of Object.entries(posterior.ethnicityDist)) {
for (const [eye, eyeProb] of Object.entries(DEMOGRAPHICS.eyeColor[eth])) {
eyeDist[eye] = (eyeDist[eye] || 0) + eyeProb * ethProb;
}
}
// Get hair color distribution
const hairDist = {};
for (const [eth, ethProb] of Object.entries(posterior.ethnicityDist)) {
for (const [hair, hairProb] of Object.entries(DEMOGRAPHICS.hairColor[eth])) {
hairDist[hair] = (hairDist[hair] || 0) + hairProb * ethProb;
}
}
document.getElementById('distributions').innerHTML = `
${renderDistribution('P(Ethnicity | constraints)', posterior.ethnicityDist)}
${renderDistribution('P(Gender | constraints)', posterior.genderDist)}
${renderDistribution('P(Eye Color | constraints)', eyeDist)}
${renderDistribution('P(Hair Color | constraints)', hairDist)}
`;
}
function compareMethod() {
const constraints = getConstraints();
const numSamples = 100;
// Rejection sampling
let rejectionAttempts = 0;
let rejectionSuccesses = 0;
const rejectionEthCounts = {};
const rejectionGenderCounts = {};
for (let i = 0; i < numSamples; i++) {
const result = rejectionSample(constraints, 1000);
rejectionAttempts += result.attempts;
if (result.npc) {
rejectionSuccesses++;
rejectionEthCounts[result.npc.ethnicity] = (rejectionEthCounts[result.npc.ethnicity] || 0) + 1;
rejectionGenderCounts[result.npc.gender] = (rejectionGenderCounts[result.npc.gender] || 0) + 1;
}
}
// MCMC sampling
const mcmcEthCounts = {};
const mcmcGenderCounts = {};
for (let i = 0; i < numSamples; i++) {
const result = mcmcSample(constraints);
mcmcEthCounts[result.npc.ethnicity] = (mcmcEthCounts[result.npc.ethnicity] || 0) + 1;
mcmcGenderCounts[result.npc.gender] = (mcmcGenderCounts[result.npc.gender] || 0) + 1;
}
// Normalize to distributions
const normalize = (counts) => {
const total = Object.values(counts).reduce((a, b) => a + b, 0);
const dist = {};
for (const [k, v] of Object.entries(counts)) {
dist[k] = v / total;
}
return dist;
};
document.getElementById('comparison').innerHTML = `
<h2>π Method Comparison (${numSamples} samples each)</h2>
<div class="method-comparison">
<div class="method-box">
<h3>β Rejection Sampling (Paper's Method)</h3>
<div class="attempt-counter">${rejectionAttempts} total attempts</div>
<p>Success rate: ${rejectionSuccesses}/${numSamples}</p>
<p>Avg attempts per success: ${(rejectionAttempts / rejectionSuccesses).toFixed(1)}</p>
${renderDistribution('Ethnicity Distribution', normalize(rejectionEthCounts))}
${renderDistribution('Gender Distribution', normalize(rejectionGenderCounts))}
</div>
<div class="method-box">
<h3>β
MCMC-style Sampling (PyMC approach)</h3>
<div class="attempt-counter">${numSamples} total attempts</div>
<p>Success rate: 100%</p>
<p>Avg attempts per success: 1.0</p>
${renderDistribution('Ethnicity Distribution', normalize(mcmcEthCounts))}
${renderDistribution('Gender Distribution', normalize(mcmcGenderCounts))}
</div>
</div>
<div class="info-box" style="margin-top: 20px;">
<strong>Key insight:</strong> Try setting <code>Eye Color = blue</code> and/or <code>Hair = blonde</code>.
These are rare in non-white populations, so rejection sampling needs many attempts, while
MCMC-style sampling correctly updates the ethnicity posterior (making white more likely)
and samples efficiently. The paper notes this problem with continuous variables
(height/BMI) being "effectively impossible" to constrain with rejection sampling.
</div>
`;
}
function resetConstraints() {
document.getElementById('group').value = '';
document.getElementById('gender').value = '';
document.getElementById('ethnicity').value = '';
document.getElementById('eyes').value = '';
document.getElementById('hair').value = '';
document.getElementById('minHeight').value = '';
document.getElementById('maxBmi').value = '';
document.getElementById('npcResult').innerHTML = '';
document.getElementById('comparison').innerHTML = '';
updateDistributions();
}
// Initialize
updateDistributions();
</script>
</body>
</html>Try these experiments:
- No constraints - Both methods work equally well
- Set Eye Color = Blue - Watch rejection sampling struggle (needs ~10x more attempts) while MCMC instantly updates the ethnicity posterior to favor white
- Set Hair = Blonde - Same effect
- Set Min Height = 185cm - MCMC correctly shifts gender posterior toward male
- Combine constraints (Blue eyes + Blonde hair + Height > 185) - Rejection sampling might need 100+ attempts; MCMC stays at 1
Key differences shown:
| Aspect | Rejection Sampling | MCMC-style |
|---|---|---|
| Attempts needed | Scales with 1/P(constraints) | Always 1 |
| Posterior updates | None (just filters) | Computes P(ethnicity|eye_color) etc. |
| Continuous constraints | Nearly impossible | Uses truncated distributions |
| Efficiency | Poor for rare combos | Constant |
The "Compare Methods" button runs 100 samples with each method so you can see the efficiency difference quantitatively!
too much code to read, just explain the model behind what you implemented there
The paper describes a Bayes net (directed graphical model) where character features depend on each other in a tree structure:
Demographic Group
β
ββββββββββββββββ
βΌ βΌ
Ethnicity Gender
β β
ββββββββ¬ββββββββΌββββββββββ
βΌ βΌ βΌ βΌ
Eye Color Hair Height BMI
β β
βββββ¬βββββ
βΌ
Weight (deterministic)
Each arrow represents a conditional probability:
- P(Ethnicity | Group) β SF police are 45% white, CA residents are 37% white, etc.
- P(Gender | Group) β SF police are 82% male, STEM workers 73% male, etc.
- P(Eye Color | Ethnicity) β white people can have blue/green eyes, others mostly brown
- P(Hair Color | Ethnicity) β blonde/red hair gated to white ethnicity
- P(Height | Gender) β males ~175cm, females ~162cm (normal distributions)
- P(BMI | Gender) β similar normal distributions
- Weight = 0.0001 Γ BMI Γ HeightΒ² β deterministic formula
- Sample forward through the net: pick ethnicity, then gender, then eye color given ethnicity, etc.
- Check if result matches user's constraints
- If no, throw it away and start over
Problem: If user wants blue eyes (rare overall, ~10% of population), you reject ~90% of samples. If they want blue eyes AND blonde hair AND tall... you might reject 99%+ of samples.
Instead of sampling then rejecting, compute the posterior first:
If user specifies blue eyes, use Bayes' theorem:
P(Ethnicity | blue eyes) β P(blue eyes | Ethnicity) Γ P(Ethnicity)
This shifts the ethnicity distribution toward white (since P(blue eyes | white) >> P(blue eyes | asian)).
Then sample ethnicity from this updated distribution. Every sample is valid β no rejection needed.
For continuous variables like height, instead of hoping you randomly generate exactly 185cm, you just sample from a truncated normal that's already constrained to be β₯185cm.
| Constraint | Rejection Attempts | MCMC Attempts |
|---|---|---|
| None | 1 | 1 |
| Blue eyes | ~10 | 1 |
| Blue eyes + blonde | ~50 | 1 |
| Height = 185.0 (exact) | β (impossible) | 1 |
The paper explicitly admits the continuous variable problem β PyMC's approach solves it by design.
what does the βcompare methodsβ button do in the demo?
It runs 100 NPC generations using each method with your current constraints, then shows you side-by-side:
-
Total attempts needed β Rejection sampling might need 500+ attempts to get 100 valid NPCs, while MCMC always needs exactly 100
-
Success rate β Rejection sampling can fail entirely if constraints are too tight (it gives up after 1000 attempts per NPC)
-
The resulting distributions β Shows what ethnicity/gender mix you actually got from each method
Try this experiment:
- Set Eye Color = Blue
- Click "Compare Methods"
You'll see something like:
| Metric | Rejection | MCMC |
|---|---|---|
| Total attempts | ~800 | 100 |
| Ethnicity dist | ~90% white | ~90% white |
Both methods give similar distributions (that's correct β blue-eyed people are mostly white), but rejection sampling needed 8x more work to get there.
Now try Blue eyes + Blonde hair + Min Height 185cm:
| Metric | Rejection | MCMC |
|---|---|---|
| Total attempts | ~5000+ | 100 |
| Failures | Some | None |
Rejection sampling is burning tons of CPU generating NPCs that get thrown away. MCMC computes the right posterior upfront and every sample counts.