Skip to content

Instantly share code, notes, and snippets.

@nilp0inter
Created March 9, 2026 15:42
Show Gist options
  • Select an option

  • Save nilp0inter/1418068c6a4906d8cc15d9804096ddc5 to your computer and use it in GitHub Desktop.

Select an option

Save nilp0inter/1418068c6a4906d8cc15d9804096ddc5 to your computer and use it in GitHub Desktop.
Stochastic Gherkin Extension

Stochastic Testing Framework: Roulette Implementation

This repository contains a reference implementation of a Domain-Specific extension to Gherkin (BDD), designed to automate the testing of non-deterministic and stochastic systems.

Problem Statement

Standard BDD frameworks (such as Behave, Cucumber, or Pytest-BDD) are designed for deterministic testing. They evaluate a single execution path and return a tri-state result (Pass/Fail/Error).

However, when testing stochastic systems (e.g., Random Number Generators, Machine Learning models, or complex behavioral economies), a single execution is insufficient to determine system correctness. System validation requires evaluating the statistical distribution of outcomes over $N$ iterations.

Attempting to force iterative statistical evaluation into standard BDD typically results in semantic overloading—such as hiding while loops, data aggregation, and statistical math within a single Then step. This destroys test readability, traceability, and state isolation.

Architectural Solution

This framework resolves the limitation by introducing a Meta-Testing Architecture via a custom Gherkin superset. It explicitly separates the orchestrator (Macro) from the payload (Micro).

  1. The Micro-Domain (Atomic Behavior): A standard, deterministic BDD scenario representing a single system interaction.
  2. The Macro-Domain (Stochastic Scenario): A wrapper that declares configuration limits, defines a strict data schema, iteratively executes the Micro-Domain, and queries the aggregated results.

Framework Semantics and Syntax

The extension introduces specialized keywords and blocks to handle the macro-execution lifecycle logically from top to bottom:

1. Top-Level Domain Boundaries

To prevent the standard BDD runner from attempting to execute a simulation sequentially, the framework introduces domain-specific root keywords: Stochastic Feature and Stochastic Scenario.

These act as routing directives. When the custom parser reads these keywords, it delegates the entire block to the Stochastic Orchestration Engine instead of the standard task runner.

Stochastic Feature: Roulette Fair Play Validation  
  
  Stochastic Scenario: Verify the average house edge on a single number bet  

2. Engine Configuration and Contract Definition

Inside the Stochastic Scenario, the environment is initialized by defining the execution boundaries and the strict shape of the data that will be collected during the iterations.

    Given the following Execution Strategy:  
      | Setting         | Value |  
      | Maximum Samples | 50000 |  

    And the following Sample Schema:  
      | Observation     | Type    | Description                               |  
      | color_result    | String  | The color evaluated by the engine         |  

Note: Declaring the schema in the feature file allows the framework to enforce strict type validation during runtime and establishes a single source of truth for downstream data-analysis tools.

3. Execution Payload (Atomic Behavior)

This block represents the Micro-Domain. It is written in standard Gherkin. The Stochastic runner parses this block, treats it as an independent test suite, and executes it iteratively based on the Execution Strategy.

    When the following Atomic Behavior is executed iteratively:  
      Scenario Outline: Processing 1:1 payouts for color bets
        Given a new roulette game with a starting balance of 100 chips
        When the player bets 10 chips on "<color_choice>"
        Then the system identifies the winning color

Implementation Note: Inside the Python step definition for the core Then step, the developer calls context.sample.observe(color_result="Red"). This yields the iteration's data point back to the orchestration engine without breaking the deterministic test flow.

4. Statistical Aggregation and Assertion

Once the engine completes the specified iterations, it compiles the yielded observations into a localized, queryable dataset (e.g., a Pandas DataFrame). The final assertions act as declarative data queries against this exact dataset.

    Then the statistical assertion "Red_Occurrence" is met:  
      | Observation  | Filter | Operator | Value |  
      | color_result | Red    | >=       | 0.470 |  
      | color_result | Red    | <=       | 0.495 |  
Stochastic Feature: Roulette Fair Play Validation
Stochastic Scenario: Verify the RNG color distribution conforms to European Roulette probabilities
Given the following Execution Strategy:
| Setting | Value |
| Maximum Samples | 50000 |
| Warmup Samples | 500 |
| Fail Fast | false |
And the following Sample Schema:
| Observation | Type | Description |
| winning_number | Integer | The exact pocket the ball landed in |
| color_result | String | The color evaluated by the engine |
| player_payout | Float | The net chip fluctuation |
| rng_seed | String | The randomness seed for reproducibility |
When the following Atomic Behavior is executed iteratively:
Scenario Outline: Processing 1:1 payouts for color bets
Given a new roulette game with a starting balance of 100 chips
When the player bets 10 chips on "<color_choice>"
And the wheel is spun
Then the system identifies the winning color
And pays the player if the winning color is Red and the bet was "Red"
And pays the player if the winning color is Black and the bet was "Black"
And awards the bet to the house in all other cases
Examples:
| color_choice |
| Red |
| Black |
Then the statistical assertion "Green_Occurrence" is met:
| Observation | Filter | Operator | Value |
| color_result | Green | >= | 0.025 |
| color_result | Green | <= | 0.029 |
And the statistical assertion "Red_Occurrence" is met:
| Observation | Filter | Operator | Value |
| color_result | Red | >= | 0.470 |
| color_result | Red | <= | 0.495 |
And the statistical assertion "Black_Occurrence" is met:
| Observation | Filter | Operator | Value |
| color_result | Black | >= | 0.470 |
| color_result | Black | <= | 0.495 |
Stochastic Feature: Roulette Fair Play Validation
Stochastic Scenario: Verify the average house edge on a single number bet
Given the following Execution Strategy:
| Setting | Value |
| Maximum Samples | 50000 |
| Warmup Samples | 500 |
| Fail Fast | false |
And the following Sample Schema:
| Observation | Type | Description |
| winning_number | Integer | The exact pocket the ball landed in |
| color_result | String | The color evaluated by the engine |
| player_payout | Float | The net chip fluctuation |
| rng_seed | String | The randomness seed for reproducibility |
When the following Atomic Behavior is executed iteratively:
Scenario: Processing a 35:1 payout for a single number bet
Given a new roulette game with a starting balance of 100 chips
When the player bets 10 chips on "17"
And the wheel is spun
Then the system identifies the winning number
And pays the player if the winning number is 17 and the bet was "17"
And awards the bet to the house in all other cases
Then the statistical assertion "Expected_House_Edge" is met:
| Observation | Aggregation | Operator | Value |
| player_payout | Average | >= | -0.40 |
| player_payout | Average | <= | -0.15 |
from behave import given, when, then
import pandas as pd # Usado por el framework para procesar metadatos
import random
# ========================================================================
# 1. PRETEND API (The System Under Test)
# ========================================================================
class RouletteEngine:
"""Mock of the casino's production roulette system."""
COLORS = {0: "Green"} | {i: "Red" for i in [1,3,5,7,9,12,14,16,18,19,21,23,25,27,30,32,34,36]} \
| {i: "Black" for i in [2,4,6,8,10,11,13,15,17,20,22,24,26,28,29,31,33,35]}
def __init__(self, start_balance):
self.balance = start_balance
self.bet_amount = 0
self.bet_choice = None
self.winning_number = None
self.payout = 0
def place_bet(self, amount, choice):
self.balance -= amount
self.bet_amount = amount
self.bet_choice = choice
def spin(self):
# The RNG generates a pseudo-random number (0-36)
self.winning_number = random.randint(0, 36)
self.seed_used = hex(random.getrandbits(16))
def resolve_bet(self):
winning_color = self.COLORS[self.winning_number]
# 35:1 Payout for single number
if str(self.winning_number) == self.bet_choice:
self.payout = self.bet_amount * 36
# 1:1 Payout for colors
elif winning_color == self.bet_choice:
self.payout = self.bet_amount * 2
else:
self.payout = 0
self.balance += self.payout
# Returns the financial delta (Net win/loss)
return self.payout - self.bet_amount
# ========================================================================
# 2. ATOMIC BEHAVIOR STEPS (Developer's Deterministic Tests)
# ========================================================================
@given('a new roulette game with a starting balance of {start_balance:d} chips')
def step_start_game(context, start_balance):
context.game = RouletteEngine(start_balance)
context.initial_balance = start_balance
@when('the player bets {amount:d} chips on "{bet_choice}"')
def step_place_bet(context, amount, bet_choice):
context.game.place_bet(amount, bet_choice)
@when('the wheel is spun')
def step_spin_wheel(context):
context.game.spin()
@then('the system identifies the winning {entity}')
def step_identify_winner(context, entity):
# Deterministic assertion ensures the API generated a valid state before continuing
assert context.game.winning_number is not None
@then('pays the player if the winning {entity} is {target} and the bet was "{bet_choice}"')
def step_pay_winner(context, entity, target, bet_choice):
net_payout = context.game.resolve_bet()
# Deterministic assertion: Did the API pay correctly for a win?
if (entity == "color" and context.game.COLORS[context.game.winning_number] == target) or \
(entity == "number" and str(context.game.winning_number) == target):
assert context.game.balance > context.initial_balance
# --- FRAMEWORK MAGIC: YIELDING THE OBSERVATION ---
# The developer feeds the stochastic engine from inside the atomic test!
context.sample.observe(
winning_number=context.game.winning_number,
color_result=context.game.COLORS[context.game.winning_number],
player_payout=float(net_payout),
rng_seed=context.game.seed_used
)
@then('awards the bet to the house in all other cases')
def step_house_wins(context):
# A simple deterministic UI/State check could go here
pass
# ========================================================================
# 3. STOCHASTIC FRAMEWORK STEPS (The Engine's Meta-Steps)
# ========================================================================
@given('the following Execution Strategy:')
def step_execution_strategy(context):
# The framework parses the table into a configuration object
context.stochastic_engine.config = {row['Setting']: row['Value'] for row in context.table}
@given('the following Sample Schema:')
def step_sample_schema(context):
# The framework prepares the strict schema validation for the observations
context.stochastic_engine.build_schema(context.table)
@when('the following Atomic Behavior is executed iteratively:')
def step_execute_iterations(context):
"""
Note: Because we are using the Custom Parser approach, this step acts as
the trigger. Inside, the framework takes the child AST (the inner Scenarios),
loops them up to 'Maximum Samples', collects the 'context.sample.observe()'
calls, and compiles them into a Pandas DataFrame.
"""
# Pretend API triggering the engine loop
context.stochastic_engine.run_atomic_loop(max_iterations=int(context.stochastic_engine.config['Maximum Samples']))
# Once finished, the data is available as a DataFrame for the Then steps
# context.samples_df = pd.DataFrame([...50,000 rows of observations...])
@then('the statistical assertion "{assertion_name}" is met:')
def step_statistical_assertion(context, assertion_name):
# We retrieve the dataset of the 50,000 runs
df = context.stochastic_engine.samples_df
total_samples = len(df)
for row in context.table:
observation = row['Observation']
operator = row['Operator']
target_value = float(row['Value'])
# ----------------------------------------------------
# Logic A: Data Filtering (e.g., Occurrences of "Red")
# ----------------------------------------------------
if 'Filter' in row.headings:
filter_val = row['Filter']
# Pandas calculates how many rows match the string (e.g., 'Red')
occurrences = len(df[df[observation] == filter_val])
actual_value = occurrences / total_samples
# ----------------------------------------------------
# Logic B: Data Aggregation (e.g., Average Payout)
# ----------------------------------------------------
elif 'Aggregation' in row.headings:
aggregation = row['Aggregation']
if aggregation == "Average":
# Pandas calculates the mean of the float column
actual_value = df[observation].mean()
# ----------------------------------------------------
# Dynamic Evaluation
# ----------------------------------------------------
if operator == ">=":
assert actual_value >= target_value, \
f"[{assertion_name}] FAIL: {observation} was {actual_value:.4f}, expected >= {target_value}"
elif operator == "<=":
assert actual_value <= target_value, \
f"[{assertion_name}] FAIL: {observation} was {actual_value:.4f}, expected <= {target_value}"
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment