Clone the gorilla repo and install BFCL dependencies:
git clone https://github.com/ShishirPatil/gorilla.git
cd gorilla/berkeley-function-call-leaderboard
python -m venv
source venv/bin/activate
pip install -e .
Run Qwen3-0.6B in vLLM:
Adjust args as necessary - this is how I do it locally on my Mac with a CPU-only build of vLLM from latest main as of Sept 3rd, 2025.
vllm serve Qwen/Qwen3-0.6B \
--max-model-len 8192 \
--max-num-batched-tokens 8192 \
--reasoning-parser deepseek_r1 \
--enable-auto-tool-choice \
--tool-call-parser hermes
Edit the bfcl model_config.py as needed - see bfcl.diff below for an example.
You need to create a MODEL_CONFIG_MAPPING entry for each model you want to test,
setting the model_handler to OpenAICompletionsHandler and ensure is_fc_model=True.
<placeholder to ensure you edit the model_config.py when following steps>
Run the live_simple subset of BFCL tests just to ensure it's working against your vLLM and deployed model. Make sure OPENAI_BASE_URL is set to the proper path for your vLLM server:
OPENAI_BASE_URL="http://localhost:8000/v1" \
OPENAI_API_KEY="fake" \
bfcl generate --model Qwen/Qwen3-0.6B --test-category live_simple