Skip to content

Instantly share code, notes, and snippets.

@rjzamora
Last active November 18, 2025 19:20
Show Gist options
  • Select an option

  • Save rjzamora/ce1960bbfcf279f009364443f3334175 to your computer and use it in GitHub Desktop.

Select an option

Save rjzamora/ce1960bbfcf279f009364443f3334175 to your computer and use it in GitHub Desktop.
Example script to run tpc-h queries with query-specific parameters.
#!/bin/bash
# Run TPC-H benchmarks for queries 1-22 sequentially
# Usage: bash ./run_tpch_benchmarks.sh > summary.out 2>&1 &
export CUDA_VISIBLE_DEVICES=3
ITERATIONS=3
# Configuration (Viking)
PDSH_FILE=/raid/rzamora/benchmarking_rapidsmpf/cudf/python/cudf_polars/cudf_polars/experimental/benchmarks/pdsh.py
DATASET_PATH=/raid/rapidsmpf/data/tpch/scale-1000
# Configuration (PDX)
# PDSH_FILE=/home/rapids/src/cudf/python/cudf_polars/cudf_polars/experimental/benchmarks/pdsh.py
# DATASET_PATH=/data/tpch-rs/scale-1000
# Default values
DEFAULT_BLOCKSIZE=5_000_000_000
DEFAULT_SPILL_DEVICE=0.9
DEFAULT_MAX_IO_THREADS=8
# Per-query settings (only specify non-default values)
declare -A QUERY_BLOCKSIZE
QUERY_BLOCKSIZE[1]=5_000_000_000
QUERY_BLOCKSIZE[4]=1_000_000_000
QUERY_BLOCKSIZE[13]=1_000_000_000
QUERY_BLOCKSIZE[18]=1_000_000_000
QUERY_BLOCKSIZE[21]=1_000_000_000
declare -A QUERY_SPILL_DEVICE
QUERY_SPILL_DEVICE[1]=0.9
QUERY_SPILL_DEVICE[4]=0.8
QUERY_SPILL_DEVICE[13]=0.8
QUERY_SPILL_DEVICE[18]=0.85
QUERY_SPILL_DEVICE[21]=0.4
declare -A QUERY_MAX_IO_THREADS
QUERY_MAX_IO_THREADS[1]=4
QUERY_MAX_IO_THREADS[4]=4
QUERY_MAX_IO_THREADS[13]=4
QUERY_MAX_IO_THREADS[18]=4
QUERY_MAX_IO_THREADS[21]=4
for QUERY_NUM in {1..22}; do
# Get query-specific settings or use defaults
BLOCKSIZE=${QUERY_BLOCKSIZE[$QUERY_NUM]:-$DEFAULT_BLOCKSIZE}
SPILL_DEVICE=${QUERY_SPILL_DEVICE[$QUERY_NUM]:-$DEFAULT_SPILL_DEVICE}
MAX_IO_THREADS=${QUERY_MAX_IO_THREADS[$QUERY_NUM]:-$DEFAULT_MAX_IO_THREADS}
echo "Starting Query ${QUERY_NUM} (blocksize=${BLOCKSIZE}, spill-device=${SPILL_DEVICE})..."
python $PDSH_FILE $QUERY_NUM \
--iterations ${ITERATIONS} \
--path ${DATASET_PATH} \
--suffix '' \
--spill-device ${SPILL_DEVICE} \
--blocksize ${BLOCKSIZE} \
--max-io-threads ${MAX_IO_THREADS} \
--rmm-async \
--explain \
--runtime rapidsmpf > "output_q${QUERY_NUM}.out" 2>&1
exit_code=$?
if [ $exit_code -eq 0 ]; then
echo "Query ${QUERY_NUM} completed successfully"
else
echo "Query ${QUERY_NUM} failed with exit code ${exit_code}"
fi
echo "---"
done
echo "All benchmarks completed!"
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment