Skip to content

Instantly share code, notes, and snippets.

@evelynmitchell
Created December 1, 2025 23:40
Show Gist options
  • Select an option

  • Save evelynmitchell/cd8505021df41a145f1463f4956ad437 to your computer and use it in GitHub Desktop.

Select an option

Save evelynmitchell/cd8505021df41a145f1463f4956ad437 to your computer and use it in GitHub Desktop.
AgentLightning.ipynb
Display the source blob
Display the rendered blob
Raw
{
"nbformat": 4,
"nbformat_minor": 0,
"metadata": {
"colab": {
"provenance": [],
"authorship_tag": "ABX9TyPmwIAZRhXbO1Mo5X6Fvoit",
"include_colab_link": true
},
"kernelspec": {
"name": "python3",
"display_name": "Python 3"
},
"language_info": {
"name": "python"
}
},
"cells": [
{
"cell_type": "markdown",
"metadata": {
"id": "view-in-github",
"colab_type": "text"
},
"source": [
"<a href=\"https://colab.research.google.com/gist/evelynmitchell/cd8505021df41a145f1463f4956ad437/agentlightning.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "lq7qKBIP21mc"
},
"outputs": [],
"source": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "224e36c7"
},
"source": [
"## Agent-Lightning: Tools and Potential Use Cases Summary\n",
"\n",
"**Key Tools and Features Demonstrated/Mentioned:**\n",
"\n",
"* **OpenTelemetry Tracing (`minimal` example)**: Integrates with OpenTelemetry to provide detailed tracing of agent executions, offering visibility into flow and behavior, including span generation, reward emission, and context propagation.\n",
"* **Automatic Prompt Optimization (APO) (`apo` example)**: Offers tools for optimizing prompts, enabling agents to learn and adapt their prompt templates for improved performance. Includes debugging workflows.\n",
"* **Supervised Fine-Tuning (SFT) with Azure OpenAI (`azure` example)**: Provides a framework for fine-tuning large language models (LLMs) on Azure OpenAI, allowing for custom model training based on collected agent traces.\n",
"* **VERL Algorithm with AutoGen and MCP Calculator (`calc_x` example)**: Enables training mathematical reasoning agents that can interact with external tools (e.g., an MCP calculator) and leverage frameworks like AutoGen for problem-solving.\n",
"* **Text-to-SQL Reinforcement Learning (`spider` example)**: Used to develop agents capable of translating natural language queries into SQL, often utilizing frameworks like LangGraph and LangChain.\n",
"* **Tinker Integration (`tinker` example)**: Supports integration with external reinforcement learning platforms like Tinker for advanced agent training and optimization within specialized RL infrastructure.\n",
"* **Efficient SFT with Unsloth (`unsloth` example)**: For resource-constrained environments, supports supervised fine-tuning using Unsloth, which leverages techniques like 4-bit quantization and LoRA to efficiently train LLMs.\n",
"\n",
"**Potential Use Cases:**\n",
"\n",
"* **Agent Development & Debugging**: Building, observing, and debugging complex AI agents, especially those interacting with LLMs and external tools, to ensure reliable and predictable operation.\n",
"* **Prompt Engineering & Optimization**: Automatically refining prompts to improve agent performance on specific tasks, reducing manual trial-and-error.\n",
"* **Custom LLM Fine-tuning**: Tailoring general-purpose LLMs to specific domains or tasks by fine-tuning them with proprietary datasets, whether on cloud platforms (Azure OpenAI) or using efficient local methods (Unsloth).\n",
"* **Reasoning & Tool Use**: Creating agents that can perform multi-step reasoning and effectively use external tools (e.g., calculators, databases, APIs) to solve problems.\n",
"* **Reinforcement Learning for Agents**: Applying advanced RL techniques to train agents, enabling them to learn optimal strategies through interaction with environments and rewards."
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "kHhp-Px_j4AQ"
},
"source": [
"## Agent-Lightning: Tools and Potential Use Cases Summary\n",
"\n",
"**Key Tools and Features Demonstrated/Mentioned:**\n",
"\n",
"* **OpenTelemetry Tracing (`minimal` example)**: Integrates with OpenTelemetry to provide detailed tracing of agent executions, offering visibility into flow and behavior, including span generation, reward emission, and context propagation.\n",
"* **Automatic Prompt Optimization (APO) (`apo` example)**: Offers tools for optimizing prompts, enabling agents to learn and adapt their prompt templates for improved performance. Includes debugging workflows.\n",
"* **Supervised Fine-Tuning (SFT) with Azure OpenAI (`azure` example)**: Provides a framework for fine-tuning large language models (LLMs) on Azure OpenAI, allowing for custom model training based on collected agent traces.\n",
"* **VERL Algorithm with AutoGen and MCP Calculator (`calc_x` example)**: Enables training mathematical reasoning agents that can interact with external tools (e.g., an MCP calculator) and leverage frameworks like AutoGen for problem-solving.\n",
"* **Text-to-SQL Reinforcement Learning (`spider` example)**: Used to develop agents capable of translating natural language queries into SQL, often utilizing frameworks like LangGraph and LangChain.\n",
"* **Tinker Integration (`tinker` example)**: Supports integration with external reinforcement learning platforms like Tinker for advanced agent training and optimization within specialized RL infrastructure.\n",
"* **Efficient SFT with Unsloth (`unsloth` example)**: For resource-constrained environments, supports supervised fine-tuning using Unsloth, which leverages techniques like 4-bit quantization and LoRA to efficiently train LLMs.\n",
"\n",
"**Potential Use Cases:**\n",
"\n",
"* **Agent Development & Debugging**: Building, observing, and debugging complex AI agents, especially those interacting with LLMs and external tools, to ensure reliable and predictable operation.\n",
"* **Prompt Engineering & Optimization**: Automatically refining prompts to improve agent performance on specific tasks, reducing manual trial-and-error.\n",
"* **Custom LLM Fine-tuning**: Tailoring general-purpose LLMs to specific domains or tasks by fine-tuning them with proprietary datasets, whether on cloud platforms (Azure OpenAI) or using efficient local methods (Unsloth).\n",
"* **Reasoning & Tool Use**: Creating agents that can perform multi-step reasoning and effectively use external tools (e.g., calculators, databases, APIs) to solve problems.\n",
"* **Reinforcement Learning for Agents**: Applying advanced RL techniques to train agents, enabling them to learn optimal strategies through interaction with environments and rewards."
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "wcT1BTpDj5J0"
},
"source": [
"## Agent-Lightning: Tools and Potential Use Cases Summary\n",
"\n",
"**Key Tools and Features Demonstrated/Mentioned:**\n",
"\n",
"* **OpenTelemetry Tracing (`minimal` example)**: Integrates with OpenTelemetry to provide detailed tracing of agent executions, offering visibility into flow and behavior, including span generation, reward emission, and context propagation.\n",
"* **Automatic Prompt Optimization (APO) (`apo` example)**: Offers tools for optimizing prompts, enabling agents to learn and adapt their prompt templates for improved performance. Includes debugging workflows.\n",
"* **Supervised Fine-Tuning (SFT) with Azure OpenAI (`azure` example)**: Provides a framework for fine-tuning large language models (LLMs) on Azure OpenAI, allowing for custom model training based on collected agent traces.\n",
"* **VERL Algorithm with AutoGen and MCP Calculator (`calc_x` example)**: Enables training mathematical reasoning agents that can interact with external tools (e.g., an MCP calculator) and leverage frameworks like AutoGen for problem-solving.\n",
"* **Text-to-SQL Reinforcement Learning (`spider` example)**: Used to develop agents capable of translating natural language queries into SQL, often utilizing frameworks like LangGraph and LangChain.\n",
"* **Tinker Integration (`tinker` example)**: Supports integration with external reinforcement learning platforms like Tinker for advanced agent training and optimization within specialized RL infrastructure.\n",
"* **Efficient SFT with Unsloth (`unsloth` example)**: For resource-constrained environments, supports supervised fine-tuning using Unsloth, which leverages techniques like 4-bit quantization and LoRA to efficiently train LLMs.\n",
"\n",
"**Potential Use Cases:**\n",
"\n",
"* **Agent Development & Debugging**: Building, observing, and debugging complex AI agents, especially those interacting with LLMs and external tools, to ensure reliable and predictable operation.\n",
"* **Prompt Engineering & Optimization**: Automatically refining prompts to improve agent performance on specific tasks, reducing manual trial-and-error.\n",
"* **Custom LLM Fine-tuning**: Tailoring general-purpose LLMs to specific domains or tasks by fine-tuning them with proprietary datasets, whether on cloud platforms (Azure OpenAI) or using efficient local methods (Unsloth).\n",
"* **Reasoning & Tool Use**: Creating agents that can perform multi-step reasoning and effectively use external tools (e.g., calculators, databases, APIs) to solve problems.\n",
"* **Reinforcement Learning for Agents**: Applying advanced RL techniques to train agents, enabling them to learn optimal strategies through interaction with environments and rewards."
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "beda5f4c"
},
"source": [
"# Task\n",
"Demonstrate the Agent-Lightning library by cloning its repository, installing dependencies, and running an example within Google Colab."
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "a13f7aca"
},
"source": [
"## Clone Agent-Lightning Repository\n",
"\n",
"### Subtask:\n",
"Clone the 'https://github.com/microsoft/agent-lightning' repository to access its code and examples.\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "56405c90"
},
"source": [
"**Reasoning**:\n",
"To clone the specified GitHub repository, I will use the `git clone` command in a code cell.\n",
"\n"
]
},
{
"cell_type": "code",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "6469dfd9",
"outputId": "4befeb78-cbcb-41f3-a6e4-6d102336b3c4"
},
"source": [
"!git clone https://github.com/microsoft/agent-lightning\n",
"print(\"Repository cloned successfully.\")"
],
"execution_count": null,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"Cloning into 'agent-lightning'...\n",
"remote: Enumerating objects: 12912, done.\u001b[K\n",
"remote: Counting objects: 100% (2010/2010), done.\u001b[K\n",
"remote: Compressing objects: 100% (431/431), done.\u001b[K\n",
"remote: Total 12912 (delta 1841), reused 1591 (delta 1579), pack-reused 10902 (from 2)\u001b[K\n",
"Receiving objects: 100% (12912/12912), 23.03 MiB | 12.07 MiB/s, done.\n",
"Resolving deltas: 100% (7699/7699), done.\n",
"Repository cloned successfully.\n"
]
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "8b14ed5f"
},
"source": [
"## Install Dependencies with uv\n",
"\n",
"### Subtask:\n",
"Install the necessary Python packages and any other dependencies required by the Agent-Lightning project using 'uv' to ensure its examples can run correctly. This will likely involve installing an editable version of the package and its dependencies with 'uv'.\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "ed2817f7"
},
"source": [
"**Reasoning**:\n",
"The first step is to install the `uv` package using pip, as it's required for managing dependencies in the Agent-Lightning project.\n",
"\n"
]
},
{
"cell_type": "code",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "91822c98",
"outputId": "2ddb1c72-c018-42b0-9db9-f5113751f233"
},
"source": [
"pip install uv"
],
"execution_count": null,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"Collecting uv\n",
" Downloading uv-0.9.13-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (11 kB)\n",
"Downloading uv-0.9.13-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (21.7 MB)\n",
"\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m21.7/21.7 MB\u001b[0m \u001b[31m81.9 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
"\u001b[?25hInstalling collected packages: uv\n",
"Successfully installed uv-0.9.13\n"
]
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "9289d75b"
},
"source": [
"**Reasoning**:\n",
"Now that `uv` is installed, I need to change the current working directory to the `agent-lightning` directory so that `uv` can correctly identify the project's dependencies and install the package in editable mode.\n",
"\n"
]
},
{
"cell_type": "code",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "3d7b02fd",
"outputId": "759793bc-f0fd-4a2e-f0a4-36fec2eea5fa"
},
"source": [
"import os\n",
"os.chdir('agent-lightning')\n",
"print(f\"Current working directory changed to: {os.getcwd()}\")"
],
"execution_count": null,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"Current working directory changed to: /content/agent-lightning\n"
]
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "3ae29996"
},
"source": [
"**Reasoning**:\n",
"With the working directory correctly set to the `agent-lightning` repository, I will now use `uv` to install all the project dependencies and the package itself in editable mode, as specified in the subtask instructions.\n",
"\n"
]
},
{
"cell_type": "code",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "9e9a969e",
"outputId": "dbf46b50-7613-4614-983f-023275d3268d"
},
"source": [
"import subprocess\n",
"\n",
"try:\n",
" # Use uv to install dependencies and the package in editable mode\n",
" result = subprocess.run(['uv', 'pip', 'install', '-e', '.'], capture_output=True, text=True, check=True)\n",
" print(\"Dependencies and agent-lightning package installed successfully with uv.\")\n",
" print(\"Standard Output:\", result.stdout)\n",
"except subprocess.CalledProcessError as e:\n",
" print(f\"Error installing dependencies: {e}\")\n",
" print(\"Standard Output:\", e.stdout)\n",
" print(\"Standard Error:\", e.stderr)\n",
"except FileNotFoundError:\n",
" print(\"Error: 'uv' command not found. Please ensure uv is installed and in your PATH.\")"
],
"execution_count": null,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"Dependencies and agent-lightning package installed successfully with uv.\n",
"Standard Output: \n"
]
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "3da533fb"
},
"source": [
"## Explore and Run an Example\n",
"\n",
"### Subtask:\n",
"Identify a suitable example from the cloned repository (e.g., from an 'examples' directory) that showcases the core functionalities of Agent-Lightning. Then, execute this example to demonstrate its capabilities within the notebook.\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "53b40d3a"
},
"source": [
"**Reasoning**:\n",
"To identify a suitable example and its execution method, I need to read the `README.md` file located in the `examples` directory, as instructed.\n",
"\n"
]
},
{
"cell_type": "code",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "90e69de5",
"outputId": "60431439-23b7-44bc-9b13-c33728d1b0cc"
},
"source": [
"import subprocess\n",
"\n",
"try:\n",
" result = subprocess.run(['cat', 'examples/README.md'], capture_output=True, text=True, check=True)\n",
" print(result.stdout)\n",
"except subprocess.CalledProcessError as e:\n",
" print(f\"Error reading README.md: {e}\")\n",
" print(\"Standard Error:\", e.stderr)\n",
"except FileNotFoundError:\n",
" print(\"Error: 'cat' command not found or file not found. Ensure you are in the correct directory.\")"
],
"execution_count": null,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"# ⚡ Examples Catalog\n",
"\n",
"This catalog highlights the examples shipped with Agent-lightning.\n",
"\n",
"| Example | Description | CI Maintenance |\n",
"|---------|-------------|----------------|\n",
"| [apo](./apo) | Automatic Prompt Optimization tutorials covering built-in, custom, and debugging workflows. | [![apo workflow status](https://github.com/microsoft/agent-lightning/actions/workflows/badge-apo.yml/badge.svg)](https://github.com/microsoft/agent-lightning/actions/workflows/examples-apo.yml) |\n",
"| [azure](./azure) | Supervised fine-tuning with Azure OpenAI. | [![azure workflow status](https://github.com/microsoft/agent-lightning/actions/workflows/badge-azure.yml/badge.svg)](https://github.com/microsoft/agent-lightning/actions/workflows/examples-azure.yml) |\n",
"| [calc_x](./calc_x) | VERL-powered math reasoning agent training that uses AutoGen with an MCP calculator tool. | [![calc_x workflow status](https://github.com/microsoft/agent-lightning/actions/workflows/badge-calc-x.yml/badge.svg)](https://github.com/microsoft/agent-lightning/actions/workflows/examples-calc-x.yml) |\n",
"| [minimal](./minimal) | Bite-sized programs that demonstrate how individual Agent-lightning building blocks behave in isolation. | [![minimal workflow status](https://github.com/microsoft/agent-lightning/actions/workflows/badge-unit.yml/badge.svg)](https://github.com/microsoft/agent-lightning/actions/workflows/badge-unit.yml) |\n",
"| [rag](./rag) | Retrieval-Augmented Generation pipeline targeting the MuSiQue dataset with Wikipedia retrieval. | **Unmaintained** — last verified with Agent-lightning v0.1.1 |\n",
"| [search_r1](./search_r1) | Framework-free Search-R1 reinforcement learning training workflow with a retrieval backend. | **Unmaintained** — last verified with Agent-lightning v0.1.2 |\n",
"| [spider](./spider) | Text-to-SQL reinforcement learning training on the Spider dataset using LangGraph. | [![spider workflow status](https://github.com/microsoft/agent-lightning/actions/workflows/badge-spider.yml/badge.svg)](https://github.com/microsoft/agent-lightning/actions/workflows/examples-spider.yml) |\n",
"| [tinker](./tinker) | Reinforcement learning with Tinker as the backend training service. | [![tinker workflow status](https://github.com/microsoft/agent-lightning/actions/workflows/badge-tinker.yml/badge.svg)](https://github.com/microsoft/agent-lightning/actions/workflows/examples-tinker.yml) |\n",
"| [unsloth](./unsloth) | Supervised fine-tuning example powered by Unsloth with 4-bit quantization and LoRA. | [![unsloth workflow status](https://github.com/microsoft/agent-lightning/actions/workflows/badge-unsloth.yml/badge.svg)](https://github.com/microsoft/agent-lightning/actions/workflows/examples-unsloth.yml) |\n",
"\n",
"*NOTE: CI status avoids taking any workflow running with latest dependencies into account. That's why we reference the corresponding `badge-*` workflows instead. Each example's own README also displays its `examples-*` workflow status whenever the project is maintained by CI.*\n",
"\n"
]
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "491a741c"
},
"source": [
"**Reasoning**:\n",
"Now that I have read the `examples/README.md` and identified the 'minimal' example as a suitable simple demonstration, I need to list the files within the `examples/minimal` directory to find the main Python script that can be executed.\n",
"\n"
]
},
{
"cell_type": "code",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "cbc9978a",
"outputId": "a92e09c1-f69a-404f-eea8-f0fc6551f55a"
},
"source": [
"import os\n",
"print(os.listdir('examples/minimal'))"
],
"execution_count": null,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"['write_traces.py', 'README.md', 'vllm_server.py', 'llm_proxy.py']\n"
]
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "0aac4b16"
},
"source": [
"**Reasoning**:\n",
"Based on the files listed in `examples/minimal`, `write_traces.py` appears to be a suitable Python script to demonstrate the core functionalities of Agent-Lightning. I will execute this script.\n",
"\n"
]
},
{
"cell_type": "code",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "5996e0e5",
"outputId": "9acdb9e5-8af7-4bd8-aeff-a3d860ecdfbb"
},
"source": [
"import subprocess\n",
"\n",
"try:\n",
" # Execute the write_traces.py example script\n",
" result = subprocess.run(['python', 'examples/minimal/write_traces.py'], capture_output=True, text=True, check=True)\n",
" print(\"write_traces.py executed successfully.\")\n",
" print(\"Standard Output:\", result.stdout)\n",
" if result.stderr:\n",
" print(\"Standard Error:\", result.stderr)\n",
"except subprocess.CalledProcessError as e:\n",
" print(f\"Error executing write_traces.py: {e}\")\n",
" print(\"Standard Output:\", e.stdout)\n",
" print(\"Standard Error:\", e.stderr)\n",
"except FileNotFoundError:\n",
" print(\"Error: 'python' command not found or script not found.\")"
],
"execution_count": null,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"Error executing write_traces.py: Command '['python', 'examples/minimal/write_traces.py']' returned non-zero exit status 2.\n",
"Standard Output: \n",
"Standard Error: usage: write_traces.py [-h] [--use-client] {otel,agentops}\n",
"write_traces.py: error: the following arguments are required: mode\n",
"\n"
]
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "93c6a60e"
},
"source": [
"**Reasoning**:\n",
"The previous execution of `write_traces.py` failed because it requires a 'mode' argument. Based on the error message `usage: write_traces.py [-h] [--use-client] {otel,agentops}`, I need to provide either 'otel' or 'agentops' as an argument. I will re-execute the script with the 'otel' mode to successfully run the example.\n",
"\n"
]
},
{
"cell_type": "code",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "da09807e",
"outputId": "04de2fec-3af1-48de-8ce1-bd9880c7400e"
},
"source": [
"import subprocess\n",
"\n",
"try:\n",
" # Execute the write_traces.py example script with 'otel' mode\n",
" result = subprocess.run(['python', 'examples/minimal/write_traces.py', 'otel'], capture_output=True, text=True, check=True)\n",
" print(\"write_traces.py executed successfully with 'otel' mode.\")\n",
" print(\"Standard Output:\", result.stdout)\n",
" if result.stderr:\n",
" print(\"Standard Error:\", result.stderr)\n",
"except subprocess.CalledProcessError as e:\n",
" print(f\"Error executing write_traces.py: {e}\")\n",
" print(\"Standard Output:\", e.stdout)\n",
" print(\"Standard Error:\", e.stderr)\n",
"except FileNotFoundError:\n",
" print(\"Error: 'python' command not found or script not found.\")"
],
"execution_count": null,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"write_traces.py executed successfully with 'otel' mode.\n",
"Standard Output: [11/30/25 22:24:57] INFO [Worker 0] Setting up OpenTelemetry tracer... otel.py:51\n",
" INFO [Worker 0] OpenTelemetry tracer provider initialized. otel.py:65\n",
" DEBUG Enabling LightningSpanProcessor in TracerProvider. otel.py:193\n",
" DEBUG Rollout ID and Attempt ID not set; bypassing export. otlp.py:313\n",
" DEBUG Rollout ID and Attempt ID not set; bypassing export. otlp.py:313\n",
" DEBUG Rollout ID and Attempt ID not set; bypassing export. otlp.py:313\n",
" DEBUG Emitting reward: 1.0 reward.py:181\n",
" DEBUG Tracer provider: <opentelemetry.sdk.trace.TracerProvider object at 0x7e7b25a8a780>. Active span processors: otel.py:107\n",
" DEBUG * SynchronousMultiSpanProcessor - LightningSpanProcessor(disable_store_submission=False, store=<agentlightning.store.memory.InMemoryLightningStore object at otel.py:109\n",
" 0x7e7b258eedb0>, rollout_id='ro-471f9d5b3e6d', attempt_id='at-1ea49fd4') \n",
" DEBUG * SynchronousMultiSpanProcessor - SimpleSpanProcessor - LightningStoreOTLPExporter(endpoint='http://localhost:4318/v1/traces', rollout_id=None, otel.py:109\n",
" attempt_id=None, should_bypass=True) \n",
" DEBUG Emitting annotation span with keys {'agentlightning.reward.0.name': 'primary', 'agentlightning.reward.0.value': 1.0} annotation.py:42\n",
" DEBUG Rollout ID and Attempt ID not set; bypassing export. otlp.py:313\n",
" INFO [Worker 0] Tearing down OpenTelemetry tracer... otel.py:69\n",
"PaginatedResult(\n",
" items=[\n",
" Span(\n",
" rollout_id='ro-471f9d5b3e6d',\n",
" attempt_id='at-1ea49fd4',\n",
" sequence_id=1,\n",
" trace_id='83777f1c40bba5dea27f89c96e2d2081',\n",
" span_id='2009d56b20b54701',\n",
" parent_id='3f525257c61cf9ae',\n",
" name='grpc-span-2',\n",
" status=TraceStatus(status_code='UNSET', description=None),\n",
" attributes={},\n",
" events=[],\n",
" links=[],\n",
" start_time=1764541497.8636384,\n",
" end_time=1764541497.8737893,\n",
" context=SpanContext(\n",
" trace_id='83777f1c40bba5dea27f89c96e2d2081',\n",
" span_id='2009d56b20b54701',\n",
" is_remote=False,\n",
" trace_state={}\n",
" ),\n",
" parent=SpanContext(\n",
" trace_id='83777f1c40bba5dea27f89c96e2d2081',\n",
" span_id='3f525257c61cf9ae',\n",
" is_remote=False,\n",
" trace_state={}\n",
" ),\n",
" resource=OtelResource(\n",
" attributes={\n",
" 'telemetry.sdk.language': 'python',\n",
" 'telemetry.sdk.name': 'opentelemetry',\n",
" 'telemetry.sdk.version': '1.37.0',\n",
" 'service.name': 'unknown_service',\n",
" 'agentlightning.rollout_id': '',\n",
" 'agentlightning.attempt_id': ''\n",
" },\n",
" schema_url=''\n",
" )\n",
" ),\n",
" Span(\n",
" rollout_id='ro-471f9d5b3e6d',\n",
" attempt_id='at-1ea49fd4',\n",
" sequence_id=2,\n",
" trace_id='83777f1c40bba5dea27f89c96e2d2081',\n",
" span_id='3f525257c61cf9ae',\n",
" parent_id=None,\n",
" name='grpc-span-1',\n",
" status=TraceStatus(status_code='UNSET', description=None),\n",
" attributes={},\n",
" events=[],\n",
" links=[],\n",
" start_time=1764541497.8533633,\n",
" end_time=1764541497.8794494,\n",
" context=SpanContext(\n",
" trace_id='83777f1c40bba5dea27f89c96e2d2081',\n",
" span_id='3f525257c61cf9ae',\n",
" is_remote=False,\n",
" trace_state={}\n",
" ),\n",
" parent=None,\n",
" resource=OtelResource(\n",
" attributes={\n",
" 'telemetry.sdk.language': 'python',\n",
" 'telemetry.sdk.name': 'opentelemetry',\n",
" 'telemetry.sdk.version': '1.37.0',\n",
" 'service.name': 'unknown_service',\n",
" 'agentlightning.rollout_id': '',\n",
" 'agentlightning.attempt_id': ''\n",
" },\n",
" schema_url=''\n",
" )\n",
" ),\n",
" Span(\n",
" rollout_id='ro-471f9d5b3e6d',\n",
" attempt_id='at-1ea49fd4',\n",
" sequence_id=3,\n",
" trace_id='7edd77da78ed4c275acbb894af54d421',\n",
" span_id='4c8b637f5a1a5c28',\n",
" parent_id=None,\n",
" name='grpc-span-3',\n",
" status=TraceStatus(status_code='UNSET', description=None),\n",
" attributes={},\n",
" events=[],\n",
" links=[],\n",
" start_time=1764541497.8845048,\n",
" end_time=1764541497.8946412,\n",
" context=SpanContext(\n",
" trace_id='7edd77da78ed4c275acbb894af54d421',\n",
" span_id='4c8b637f5a1a5c28',\n",
" is_remote=False,\n",
" trace_state={}\n",
" ),\n",
" parent=None,\n",
" resource=OtelResource(\n",
" attributes={\n",
" 'telemetry.sdk.language': 'python',\n",
" 'telemetry.sdk.name': 'opentelemetry',\n",
" 'telemetry.sdk.version': '1.37.0',\n",
" 'service.name': 'unknown_service',\n",
" 'agentlightning.rollout_id': '',\n",
" 'agentlightning.attempt_id': ''\n",
" },\n",
" schema_url=''\n",
" )\n",
" ),\n",
" Span(\n",
" rollout_id='ro-471f9d5b3e6d',\n",
" attempt_id='at-1ea49fd4',\n",
" sequence_id=4,\n",
" trace_id='bc062e415394e44692c47ec3438d8d71',\n",
" span_id='ab9cfe7046fe9aa2',\n",
" parent_id=None,\n",
" name='agentlightning.annotation',\n",
" status=TraceStatus(status_code='UNSET', description=None),\n",
" attributes={\n",
" 'agentlightning.reward.0.name': 'primary',\n",
" 'agentlightning.reward.0.value': 1.0\n",
" },\n",
" events=[],\n",
" links=[],\n",
" start_time=1764541497.9080155,\n",
" end_time=1764541497.91004,\n",
" context=SpanContext(\n",
" trace_id='bc062e415394e44692c47ec3438d8d71',\n",
" span_id='ab9cfe7046fe9aa2',\n",
" is_remote=False,\n",
" trace_state={}\n",
" ),\n",
" parent=None,\n",
" resource=OtelResource(\n",
" attributes={\n",
" 'telemetry.sdk.language': 'python',\n",
" 'telemetry.sdk.name': 'opentelemetry',\n",
" 'telemetry.sdk.version': '1.37.0',\n",
" 'service.name': 'unknown_service',\n",
" 'agentlightning.rollout_id': '',\n",
" 'agentlightning.attempt_id': ''\n",
" },\n",
" schema_url=''\n",
" )\n",
" )\n",
" ],\n",
" limit=-1,\n",
" offset=0,\n",
" total=4\n",
")\n",
"\n"
]
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "ccbc61d9"
},
"source": [
"## Final Task\n",
"\n",
"### Subtask:\n",
"Summarize the demo, including what was successfully demonstrated, any key takeaways, and potential next steps for further exploration of the Agent-Lightning library.\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "5fdc7bd5"
},
"source": [
"## Summary:\n",
"\n",
"### Data Analysis Key Findings\n",
"\n",
"* The Agent-Lightning repository was successfully cloned from `https://github.com/microsoft/agent-lightning`, making its contents available for further steps.\n",
"* All required Python packages and dependencies for Agent-Lightning were successfully installed using the `uv` tool, including an editable installation of the library itself.\n",
"* The `examples/minimal/write_traces.py` script was identified and successfully executed. This script demonstrated Agent-Lightning's core capability of generating and managing execution traces, specifically using the \"otel\" (OpenTelemetry) mode. The output included detailed OpenTelemetry tracing logs and `PaginatedResult` objects containing `Span` data with attributes like `rollout_id`, `attempt_id`, `trace_id`, and `span_id`.\n",
"\n",
"### Insights or Next Steps\n",
"\n",
"* **Insight:** The Agent-Lightning library provides a straightforward mechanism for instrumenting agent executions with tracing capabilities, as demonstrated by the `write_traces.py` example using OpenTelemetry. This is crucial for observability and debugging complex agent workflows.\n",
"* **Next Steps:** Further exploration could involve running other examples within the `agent-lightning` repository, especially those demonstrating more complex agent architectures or integrations (e.g., with specific LLMs or tools). Additionally, experimenting with the \"agentops\" mode in `write_traces.py` or similar tracing scripts would provide insight into alternative tracing backends.\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "087a7958"
},
"source": [
"# Task\n",
"Explore and execute the 'apo' example from the Agent-Lightning repository, which covers Automatic Prompt Optimization tutorials. This will involve reviewing its README and running the main script."
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "af64b1c8"
},
"source": [
"## Demonstrate APO Example\n",
"\n",
"### Subtask:\n",
"Explore and execute the 'apo' example from the Agent-Lightning repository, which covers Automatic Prompt Optimization tutorials. This will involve reviewing its README and running the main script.\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "1b45c926"
},
"source": [
"**Reasoning**:\n",
"To identify a suitable example and its execution method, I need to read the `README.md` file located in the `examples/apo` directory, as instructed."
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "74776530"
},
"source": [
"**Reasoning**:\n",
"To understand the 'apo' example and identify the main script, I need to read the `README.md` file within the `examples/apo` directory as per the instructions.\n",
"\n"
]
},
{
"cell_type": "code",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 106
},
"id": "6e60b7c7",
"outputId": "66f3a596-7118-485c-e081-b56d1cc0c975"
},
"source": [
"**Reasoning**:\n",
"To identify a suitable example and its execution method, I need to read the `README.md` file located in the `examples/apo` directory, as instructed."
],
"execution_count": null,
"outputs": [
{
"output_type": "error",
"ename": "SyntaxError",
"evalue": "invalid syntax (ipython-input-4032107909.py, line 1)",
"traceback": [
"\u001b[0;36m File \u001b[0;32m\"/tmp/ipython-input-4032107909.py\"\u001b[0;36m, line \u001b[0;32m1\u001b[0m\n\u001b[0;31m **Reasoning**:\u001b[0m\n\u001b[0m ^\u001b[0m\n\u001b[0;31mSyntaxError\u001b[0m\u001b[0;31m:\u001b[0m invalid syntax\n"
]
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "fab9fb07"
},
"source": [
"**Reasoning**:\n",
"The previous code cell failed because markdown text was placed inside a Python code block. I need to correct this by using the `subprocess.run` command to read the `README.md` file, as specified in the instructions and previous reasoning.\n",
"\n"
]
},
{
"cell_type": "code",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "4598175e",
"outputId": "d21b4277-1623-4fa1-b94a-dbbc9cae4e0f"
},
"source": [
"import subprocess\n",
"\n",
"try:\n",
" result = subprocess.run(['cat', 'examples/apo/README.md'], capture_output=True, text=True, check=True)\n",
" print(result.stdout)\n",
"except subprocess.CalledProcessError as e:\n",
" print(f\"Error reading README.md: {e}\")\n",
" print(\"Standard Error:\", e.stderr)\n",
"except FileNotFoundError:\n",
" print(\"Error: 'cat' command not found or file not found. Ensure you are in the correct directory.\")"
],
"execution_count": null,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"# APO Example\n",
"\n",
"[![apo CI status](https://github.com/microsoft/agent-lightning/actions/workflows/examples-apo.yml/badge.svg)](https://github.com/microsoft/agent-lightning/actions/workflows/examples-apo.yml)\n",
"\n",
"This example folder contains three complementary tutorials that demonstrate different aspects of Agent-Lightning. It's compatible with Agent-lightning v0.2 or later.\n",
"\n",
"## Overview\n",
"\n",
"The folder showcases three distinct use cases: using the built-in APO algorithm to train a room selection agent, creating custom training algorithms from scratch, and debugging agents effectively. Each tutorial is self-contained and demonstrates a specific workflow.\n",
"\n",
"## Requirements\n",
"\n",
"Follow the [installation guide](../../docs/tutorials/installation.md) to install Agent-Lightning and APO-extra dependencies. All examples also require an OpenAI-compatible API service.\n",
"\n",
"## Included Files\n",
"\n",
"| File/Directory | Description |\n",
"|----------------|-------------|\n",
"| `room_selector.py` | Room booking agent implementation using function calling |\n",
"| `room_selector_apo.py` | Training script using the built-in APO algorithm to optimize prompts |\n",
"| `room_tasks.jsonl` | Dataset with room booking scenarios and expected selections |\n",
"| `apo_custom_algorithm.py` | Tutorial on creating custom algorithms (runnable as algo or runner) |\n",
"| `apo_custom_algorithm_trainer.py` | Shows how to integrate custom algorithms into the Trainer |\n",
"| `apo_debug.py` | Tutorial demonstrating various agent debugging techniques |\n",
"| `legacy_apo_client.py` | Deprecated APO client implementation compatible with Agent-lightning v0.1.x |\n",
"| `legacy_apo_server.py` | Deprecated APO server implementation compatible with Agent-lightning v0.1.x |\n",
"\n",
"## Sample 1: Using Built-in APO Algorithm\n",
"\n",
"The `room_selector_apo.py` script demonstrates how to use Agent-Lightning's built-in APO (Asynchronous Prompt Optimization) algorithm to train a room booking agent. The agent learns to select meeting rooms based on duration, attendee count, equipment needs, accessibility requirements, and availability.\n",
"\n",
"Run the training with:\n",
"\n",
"```bash\n",
"python room_selector_apo.py\n",
"```\n",
"\n",
"This script initializes the APO algorithm with beam search parameters, loads the room booking dataset, and optimizes the agent's prompt template through iterative training. The algorithm automatically manages the training loop, gradient computation, and prompt updates. Read more about this example in [Train the First Agent with APO](../../docs/how-to/train-first-agent.md).\n",
"\n",
"## Sample 2: Creating Custom Algorithms\n",
"\n",
"The `apo_custom_algorithm.py` and `apo_custom_algorithm_trainer.py` files teach you how to implement custom training algorithms from scratch. This is useful when the built-in algorithms don't fit your specific needs. See [Custom Algorithm tutorial](../../docs/how-to/write-first-algorithm.md) for more details.\n",
"\n",
"### Option A: Run algorithm and runner separately\n",
"\n",
"Start the store, algorithm, and runner in three separate terminals:\n",
"\n",
"```bash\n",
"# Terminal 1: Start the store\n",
"agl store\n",
"\n",
"# Terminal 2: Run the algorithm\n",
"python apo_custom_algorithm.py algo\n",
"\n",
"# Terminal 3: Run the rollout runner\n",
"python apo_custom_algorithm.py runner\n",
"```\n",
"\n",
"### Option B: Run integrated version\n",
"\n",
"Use the integrated trainer that handles all components:\n",
"\n",
"```bash\n",
"python apo_custom_algorithm_trainer.py\n",
"```\n",
"\n",
"## Sample 3: Debugging Agents\n",
"\n",
"The `apo_debug.py` script demonstrates multiple approaches to debugging agents in Agent-Lightning:\n",
"\n",
"```bash\n",
"python apo_debug.py\n",
"```\n",
"\n",
"Read more about this example in [Debugging Agents](../../docs/tutorials/debug.md).\n",
"\n",
"## Appendix: Dataset Format\n",
"\n",
"The `room_tasks.jsonl` file contains meeting scenarios with the following structure:\n",
"\n",
"```json\n",
"{\n",
" \"id\": \"s01\",\n",
" \"task_input\": {\n",
" \"date\": \"2025-10-13\",\n",
" \"time\": \"16:30\",\n",
" \"duration_min\": 30,\n",
" \"attendees\": 12,\n",
" \"needs\": [\"projector\", \"confphone\"],\n",
" \"accessible_required\": true\n",
" },\n",
" \"expected_choice\": \"Nova\"\n",
"}\n",
"```\n",
"\n"
]
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "225f1370"
},
"source": [
"**Reasoning**:\n",
"Now that I have read the `examples/apo/README.md` and identified `room_selector_apo.py` as a primary example to demonstrate the built-in APO algorithm, I will list the files in the `examples/apo` directory to confirm its presence and other available scripts before executing.\n",
"\n"
]
},
{
"cell_type": "code",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "46ead6f3",
"outputId": "796d42c0-2682-45b5-b1fb-9ff0f24f26a6"
},
"source": [
"import os\n",
"print(os.listdir('examples/apo'))"
],
"execution_count": null,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"['room_selector.py', 'room_tasks.jsonl', 'apo_debug.py', 'README.md', 'apo_custom_algorithm_trainer.py', 'legacy_apo_server.py', 'room_selector_apo.py', 'apo_custom_algorithm.py', 'legacy_apo_client.py']\n"
]
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "40f8fee6"
},
"source": [
"**Reasoning**:\n",
"With the `README.md` reviewed and the files in the `examples/apo` directory confirmed, I will now execute the `room_selector_apo.py` script as it demonstrates the built-in APO algorithm, aligning with the subtask's goal.\n",
"\n"
]
},
{
"cell_type": "code",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "a76d435e",
"outputId": "cde8b22f-1ee8-4a34-ecc4-51f083288b00"
},
"source": [
"import subprocess\n",
"\n",
"try:\n",
" # Execute the room_selector_apo.py example script\n",
" result = subprocess.run(['python', 'examples/apo/room_selector_apo.py'], capture_output=True, text=True, check=True)\n",
" print(\"room_selector_apo.py executed successfully.\")\n",
" print(\"Standard Output:\", result.stdout)\n",
" if result.stderr:\n",
" print(\"Standard Error:\", result.stderr)\n",
"except subprocess.CalledProcessError as e:\n",
" print(f\"Error executing room_selector_apo.py: {e}\")\n",
" print(\"Standard Output:\", e.stdout)\n",
" print(\"Standard Error:\", e.stderr)\n",
"except FileNotFoundError:\n",
" print(\"Error: 'python' command not found or script not found.\")"
],
"execution_count": null,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"Error executing room_selector_apo.py: Command '['python', 'examples/apo/room_selector_apo.py']' returned non-zero exit status 1.\n",
"Standard Output: \n",
"Standard Error: Traceback (most recent call last):\n",
" File \"/content/agent-lightning/examples/apo/room_selector_apo.py\", line 13, in <module>\n",
" from agentlightning.algorithm.apo import APO\n",
" File \"/content/agent-lightning/agentlightning/algorithm/apo/__init__.py\", line 3, in <module>\n",
" from .apo import APO\n",
" File \"/content/agent-lightning/agentlightning/algorithm/apo/apo.py\", line 18, in <module>\n",
" import poml\n",
"ModuleNotFoundError: No module named 'poml'\n",
"\n"
]
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "6ada88c6"
},
"source": [
"**Reasoning**:\n",
"The previous attempt to run `room_selector_apo.py` failed due to a `ModuleNotFoundError: No module named 'poml'`. This indicates that an extra dependency for the APO example was not installed. The `examples/apo/README.md` mentioned installing 'APO-extra dependencies'. I need to install these extra dependencies using `uv` to resolve the missing module.\n",
"\n"
]
},
{
"cell_type": "code",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "65dab8a6",
"outputId": "9e2471c4-f2eb-439e-f8ae-e6a58a7913ff"
},
"source": [
"import subprocess\n",
"\n",
"try:\n",
" # Use uv to install the 'apo' extra dependencies\n",
" # The extra dependencies are specified in pyproject.toml as [tool.uv.extras]\n",
" result = subprocess.run(['uv', 'pip', 'install', '-e', '.[apo]'], capture_output=True, text=True, check=True)\n",
" print(\"APO extra dependencies installed successfully with uv.\")\n",
" print(\"Standard Output:\", result.stdout)\n",
"except subprocess.CalledProcessError as e:\n",
" print(f\"Error installing APO extra dependencies: {e}\")\n",
" print(\"Standard Output:\", e.stdout)\n",
" print(\"Standard Error:\", e.stderr)\n",
"except FileNotFoundError:\n",
" print(\"Error: 'uv' command not found. Please ensure uv is installed and in your PATH.\")"
],
"execution_count": null,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"APO extra dependencies installed successfully with uv.\n",
"Standard Output: \n"
]
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "3efb2583"
},
"source": [
"**Reasoning**:\n",
"Now that the `apo` extra dependencies, including `poml`, have been successfully installed, I will re-attempt to execute the `room_selector_apo.py` script. This should resolve the `ModuleNotFoundError` and allow the example to run as intended.\n",
"\n"
]
},
{
"cell_type": "code",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "4fdaec01",
"outputId": "f8b986b6-af91-407e-dc1f-f13dd4b2e5b1"
},
"source": [
"import subprocess\n",
"\n",
"try:\n",
" # Execute the room_selector_apo.py example script\n",
" result = subprocess.run(['python', 'examples/apo/room_selector_apo.py'], capture_output=True, text=True, check=True)\n",
" print(\"room_selector_apo.py executed successfully.\")\n",
" print(\"Standard Output:\", result.stdout)\n",
" if result.stderr:\n",
" print(\"Standard Error:\", result.stderr)\n",
"except subprocess.CalledProcessError as e:\n",
" print(f\"Error executing room_selector_apo.py: {e}\")\n",
" print(\"Standard Output:\", e.stdout)\n",
" print(\"Standard Error:\", e.stderr)\n",
"except FileNotFoundError:\n",
" print(\"Error: 'python' command not found or script not found.\")"
],
"execution_count": null,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"Error executing room_selector_apo.py: Command '['python', 'examples/apo/room_selector_apo.py']' returned non-zero exit status 1.\n",
"Standard Output: \n",
"Standard Error: Traceback (most recent call last):\n",
" File \"/content/agent-lightning/examples/apo/room_selector_apo.py\", line 69, in <module>\n",
" main()\n",
" File \"/content/agent-lightning/examples/apo/room_selector_apo.py\", line 39, in main\n",
" openai_client = AsyncOpenAI()\n",
" ^^^^^^^^^^^^^\n",
" File \"/usr/local/lib/python3.12/dist-packages/openai/_client.py\", line 488, in __init__\n",
" raise OpenAIError(\n",
"openai.OpenAIError: The api_key client option must be set either by passing api_key to the client or by setting the OPENAI_API_KEY environment variable\n",
"\n"
]
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "fb629381"
},
"source": [
"**Reasoning**:\n",
"The previous execution of `room_selector_apo.py` failed because the `OPENAI_API_KEY` environment variable was not set, which is required by the `openai` client. I need to set this environment variable with a placeholder value before attempting to run the script again. The user can then replace the placeholder with their actual key.\n",
"\n"
]
},
{
"cell_type": "code",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "a4882ac6",
"outputId": "f17584fd-b432-49a8-9d15-77e81540027a"
},
"source": [
"import os\n",
"\n",
"# Set a placeholder for the OpenAI API key.\n",
"# IMPORTANT: Replace 'YOUR_OPENAI_API_KEY_HERE' with your actual OpenAI API key.\n",
"# If you don't have one, you can get it from https://platform.openai.com/account/api-keys\n",
"os.environ['OPENAI_API_KEY'] = 'YOUR_OPENAI_API_KEY_HERE'\n",
"\n",
"print(\"OPENAI_API_KEY environment variable set (placeholder value).\")"
],
"execution_count": null,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"OPENAI_API_KEY environment variable set (placeholder value).\n"
]
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "574399a6"
},
"source": [
"**Reasoning**:\n",
"Now that the `OPENAI_API_KEY` environment variable has been set (even with a placeholder), I will re-attempt to execute the `room_selector_apo.py` script. This should resolve the API key error and allow the example to proceed further.\n",
"\n"
]
},
{
"cell_type": "code",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "0d5cd8e9",
"outputId": "64208474-942c-435a-a6ea-c678b438c195"
},
"source": [
"import subprocess\n",
"\n",
"try:\n",
" # Execute the room_selector_apo.py example script\n",
" result = subprocess.run(['python', 'examples/apo/room_selector_apo.py'], capture_output=True, text=True, check=True)\n",
" print(\"room_selector_apo.py executed successfully.\")\n",
" print(\"Standard Output:\", result.stdout)\n",
" if result.stderr:\n",
" print(\"Standard Error:\", result.stderr)\n",
"except subprocess.CalledProcessError as e:\n",
" print(f\"Error executing room_selector_apo.py: {e}\")\n",
" print(\"Standard Output:\", e.stdout)\n",
" print(\"Standard Error:\", e.stderr)\n",
"except FileNotFoundError:\n",
" print(\"Error: 'python' command not found or script not found.\")"
],
"execution_count": null,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"Error executing room_selector_apo.py: Command '['python', 'examples/apo/room_selector_apo.py']' returned non-zero exit status 1.\n",
"Standard Output: \n",
"Standard Error: Traceback (most recent call last):\n",
" File \"/content/agent-lightning/examples/apo/room_selector_apo.py\", line 69, in <module>\n",
" main()\n",
" File \"/content/agent-lightning/examples/apo/room_selector_apo.py\", line 64, in main\n",
" dataset_train, dataset_val = load_train_val_dataset()\n",
" ^^^^^^^^^^^^^^^^^^^^^^^^\n",
" File \"/content/agent-lightning/examples/apo/room_selector_apo.py\", line 18, in load_train_val_dataset\n",
" dataset_full = load_room_tasks()\n",
" ^^^^^^^^^^^^^^^^^\n",
" File \"/content/agent-lightning/examples/apo/room_selector.py\", line 332, in load_room_tasks\n",
" for line in open(\"room_tasks.jsonl\"):\n",
" ^^^^^^^^^^^^^^^^^^^^^^^^\n",
"FileNotFoundError: [Errno 2] No such file or directory: 'room_tasks.jsonl'\n",
"\n"
]
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "9a3beefe"
},
"source": [
"# Task\n",
"The previous execution of `room_selector_apo.py` failed with a `FileNotFoundError: [Errno 2] No such file or directory: 'room_tasks.jsonl'`. This happened because the script expects to find `room_tasks.jsonl` in the current working directory, but it's located in `examples/apo/`.\n",
"\n",
"More importantly, the `room_selector_apo.py` script also requires a valid OpenAI API key, which was highlighted as unavailable in the initial plan. Therefore, as per the plan, we will *not* proceed with `room_selector_apo.py`. Instead, we will focus on `apo_debug.py`, which is designed to demonstrate debugging techniques and might not require an OpenAI API key for initial execution.\n",
"\n",
"## Demonstrate APO Example (Debugging)\n",
"\n",
"### Subtask:\n",
"Execute the `apo_debug.py` script from the 'apo' example to demonstrate debugging techniques, as specified in the plan. This example is chosen due to the unavailability of an OpenAI API key for the `room_selector_apo.py` script.\n",
"\n",
"### Reasoning:\n",
"As per the plan, the `apo_debug.py` script is the chosen demonstration for the 'apo' example. I will now execute it directly from the `examples/apo` directory. This script demonstrates various agent debugging techniques and is less likely to immediately require an OpenAI API key for its initial execution.\n",
"\n",
"### Code:\n",
"```python\n",
"import subprocess\n",
"\n",
"try:\n",
" # Execute the apo_debug.py example script\n",
" result = subprocess.run(['python', 'examples/apo/apo_debug.py'], capture_output=True, text=True, check=True)\n",
" print(\"apo_debug.py executed successfully.\")\n",
" print(\"Standard Output:\", result.stdout)\n",
" if result.stderr:\n",
" print(\"Standard Error:\", result.stderr)\n",
"except subprocess.CalledProcessError as e:\n",
" print(f\"Error executing apo_debug.py: {e}\")\n",
" print(\"Standard Output:\", e.stdout)\n",
" print(\"Standard Error:\", e.stderr)\n",
"except FileNotFoundError:\n",
" print(\"Error: 'python' command not found or script not found.\")\n",
"```"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "7c9df965"
},
"source": [
"## Demonstrate APO Example\n",
"\n",
"### Subtask:\n",
"Execute the `apo_debug.py` script from the 'apo' example to demonstrate debugging techniques, as specified in the plan. This example is chosen due to the unavailability of an OpenAI API key for the `room_selector_apo.py` script.\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "c67b534d"
},
"source": [
"**Reasoning**:\n",
"The previous attempts to run `room_selector_apo.py` failed due to a missing OpenAI API key. As per the subtask instructions, I will now execute `apo_debug.py` as it demonstrates debugging techniques and does not require an OpenAI API key.\n",
"\n"
]
},
{
"cell_type": "code",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "8594cd8a",
"outputId": "f60e7226-027f-4299-9a22-297b12b873cb"
},
"source": [
"import subprocess\n",
"\n",
"try:\n",
" # Execute the apo_debug.py example script\n",
" result = subprocess.run(['python', 'examples/apo/apo_debug.py'], capture_output=True, text=True, check=True)\n",
" print(\"apo_debug.py executed successfully.\")\n",
" print(\"Standard Output:\", result.stdout)\n",
" if result.stderr:\n",
" print(\"Standard Error:\", result.stderr)\n",
"except subprocess.CalledProcessError as e:\n",
" print(f\"Error executing apo_debug.py: {e}\")\n",
" print(\"Standard Output:\", e.stdout)\n",
" print(\"Standard Error:\", e.stderr)\n",
"except FileNotFoundError:\n",
" print(\"Error: 'python' command not found or script not found.\")"
],
"execution_count": null,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"Error executing apo_debug.py: Command '['python', 'examples/apo/apo_debug.py']' returned non-zero exit status 1.\n",
"Standard Output: [11/30/25 22:31:00] INFO [Worker 0] Setting up OpenTelemetry tracer... otel.py:51\n",
" INFO [Worker 0] OpenTelemetry tracer provider initialized. otel.py:65\n",
" ERROR [Worker 0 | Rollout ro-10d064f08269] Exception during rollout. agent.py:512\n",
" Traceback (most recent call last): \n",
" File \"/content/agent-lightning/agentlightning/runner/agent.py\", line 485, in _step_impl \n",
" result = await rollout_method( \n",
" ^^^^^^^^^^^^^^^^^^^^^ \n",
" File \"/content/agent-lightning/agentlightning/litagent/litagent.py\", line 251, in validation_rollout_async \n",
" return await self.rollout_async(task, resources, rollout) \n",
" ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ \n",
" File \"/content/agent-lightning/agentlightning/litagent/decorator.py\", line 178, in rollout_async \n",
" return await self._rollout_func(task, **kwargs) # type: ignore \n",
" ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ \n",
" File \"/content/agent-lightning/examples/apo/apo_custom_algorithm.py\", line 107, in apo_rollout \n",
" result = await client.chat.completions.create( \n",
" ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ \n",
" File \"/usr/local/lib/python3.12/dist-packages/openai/resources/chat/completions/completions.py\", line 2672, in create \n",
" return await self._post( \n",
" ^^^^^^^^^^^^^^^^^ \n",
" File \"/usr/local/lib/python3.12/dist-packages/openai/_base_client.py\", line 1794, in post \n",
" return await self.request(cast_to, opts, stream=stream, stream_cls=stream_cls) \n",
" ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ \n",
" File \"/usr/local/lib/python3.12/dist-packages/openai/_base_client.py\", line 1594, in request \n",
" raise self._make_status_error_from_response(err.response) from None \n",
" openai.AuthenticationError: Error code: 401 - {'error': {'message': 'Incorrect API key provided: YOUR_OPE************HERE. You can find your API key at \n",
" https://platform.openai.com/account/api-keys.', 'type': 'invalid_request_error', 'param': None, 'code': 'invalid_api_key'}} \n",
" INFO [Worker 0] Tearing down OpenTelemetry tracer... otel.py:69\n",
"\n",
"Standard Error: Traceback (most recent call last):\n",
" File \"/content/agent-lightning/examples/apo/apo_debug.py\", line 121, in <module>\n",
" asyncio.run(debug_with_runner())\n",
" File \"/usr/lib/python3.12/asyncio/runners.py\", line 195, in run\n",
" return runner.run(main)\n",
" ^^^^^^^^^^^^^^^^\n",
" File \"/usr/lib/python3.12/asyncio/runners.py\", line 118, in run\n",
" return self._loop.run_until_complete(task)\n",
" ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n",
" File \"/usr/lib/python3.12/asyncio/base_events.py\", line 691, in run_until_complete\n",
" return future.result()\n",
" ^^^^^^^^^^^^^^^\n",
" File \"/content/agent-lightning/examples/apo/apo_debug.py\", line 40, in debug_with_runner\n",
" await runner.step(\n",
" File \"/content/agent-lightning/agentlightning/runner/agent.py\", line 650, in step\n",
" rollout_id = await self._step_impl(attempted_rollout, raise_on_exception=True)\n",
" ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n",
" File \"/content/agent-lightning/agentlightning/runner/agent.py\", line 485, in _step_impl\n",
" result = await rollout_method(\n",
" ^^^^^^^^^^^^^^^^^^^^^\n",
" File \"/content/agent-lightning/agentlightning/litagent/litagent.py\", line 251, in validation_rollout_async\n",
" return await self.rollout_async(task, resources, rollout)\n",
" ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n",
" File \"/content/agent-lightning/agentlightning/litagent/decorator.py\", line 178, in rollout_async\n",
" return await self._rollout_func(task, **kwargs) # type: ignore\n",
" ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n",
" File \"/content/agent-lightning/examples/apo/apo_custom_algorithm.py\", line 107, in apo_rollout\n",
" result = await client.chat.completions.create(\n",
" ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n",
" File \"/usr/local/lib/python3.12/dist-packages/openai/resources/chat/completions/completions.py\", line 2672, in create\n",
" return await self._post(\n",
" ^^^^^^^^^^^^^^^^^\n",
" File \"/usr/local/lib/python3.12/dist-packages/openai/_base_client.py\", line 1794, in post\n",
" return await self.request(cast_to, opts, stream=stream, stream_cls=stream_cls)\n",
" ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n",
" File \"/usr/local/lib/python3.12/dist-packages/openai/_base_client.py\", line 1594, in request\n",
" raise self._make_status_error_from_response(err.response) from None\n",
"openai.AuthenticationError: Error code: 401 - {'error': {'message': 'Incorrect API key provided: YOUR_OPE************HERE. You can find your API key at https://platform.openai.com/account/api-keys.', 'type': 'invalid_request_error', 'param': None, 'code': 'invalid_api_key'}}\n",
"\n"
]
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "5e10fbcf"
},
"source": [
"## OpenAI API Key Required for Debugging Example\n",
"\n",
"Despite previous attempts to find an example that does not require an OpenAI API key, the `apo_debug.py` script also failed with an `openai.AuthenticationError`. This indicates that the debugging functionalities demonstrated in this script rely on making calls to an OpenAI-compatible API service, and therefore require a valid `OPENAI_API_KEY` to be set in the environment.\n",
"\n",
"To successfully execute `apo_debug.py` and demonstrate the debugging techniques of Agent-Lightning, please ensure you replace `'YOUR_OPENAI_API_KEY_HERE'` with an actual, valid OpenAI API key in the `os.environ['OPENAI_API_KEY']` line. Without a valid key, the script cannot proceed with its intended functionality.\n",
"\n",
"Since I cannot provide a valid API key, the subtask to execute and demonstrate `apo_debug.py` cannot be fully completed at this moment."
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "e97d0054"
},
"source": [
"## Demonstrate Azure Example\n",
"\n",
"### Subtask:\n",
"Explore and execute the 'azure' example, which showcases supervised fine-tuning with Azure OpenAI. This example likely requires an Azure OpenAI API key. If you do not have one, you may need to skip this step. This will involve reviewing its README and running the main script.\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "ff83c2d0"
},
"source": [
"**Reasoning**:\n",
"To understand the 'azure' example and identify the main script for execution, I need to read the `README.md` file within the `examples/azure` directory as per the instructions.\n",
"\n"
]
},
{
"cell_type": "code",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "48b156c9",
"outputId": "7d79d2cd-a089-46cd-effa-a37e46ea9c7f"
},
"source": [
"import subprocess\n",
"\n",
"try:\n",
" result = subprocess.run(['cat', 'examples/azure/README.md'], capture_output=True, text=True, check=True)\n",
" print(result.stdout)\n",
"except subprocess.CalledProcessError as e:\n",
" print(f\"Error reading README.md: {e}\")\n",
" print(\"Standard Error:\", e.stderr)\n",
"except FileNotFoundError:\n",
" print(\"Error: 'cat' command not found or file not found. Ensure you are in the correct directory.\")"
],
"execution_count": null,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"# Supervised Fine-tuning with Azure OpenAI\n",
"\n",
"This example walks through an end-to-end supervised fine-tuning loop on Azure OpenAI. The trainer runs a toy capital-lookup agent, collects traces with rewards, submits fine-tuning jobs using those traces, and deploys every successful checkpoint as a new Azure OpenAI deployment.\n",
"\n",
"**NOTE: The example is tested and compatible with Agent-lightning v0.2.x, but it's not yet maintained on CI due to the difficulty of maintaining a logged-in status in the testing environment.**\n",
"\n",
"## Prerequisites\n",
"\n",
"You need an Azure subscription with an Azure OpenAI resource that supports fine-tuning in your region and a base deployment you can reuse (the defaults assume `gpt-4.1-mini` backed by `gpt-4.1-mini-2025-04-14`). Sign in with the Azure CLI (`az login`) and install the project dependencies, for example via `uv sync` from the repository root.\n",
"\n",
"## Setup\n",
"\n",
"Copy the sample environment file `.env.example`, fill in your Azure values, and source it before running any scripts:\n",
"\n",
"```bash\n",
"cp examples/azure_finetune/.env.example examples/azure_finetune/.env\n",
"# edit examples/azure_finetune/.env with your keys and identifiers\n",
"source examples/azure_finetune/.env\n",
"```\n",
"\n",
"Confirm that you have successfully logged into Azure with:\n",
"\n",
"```bash\n",
"az account show\n",
"```\n",
"\n",
"## Included Files\n",
"\n",
"| File | Description |\n",
"| --- | --- |\n",
"| `aoai_finetune.py` | Fine-tuning algorithm that batches rollouts, filters traces, launches jobs, deploys checkpoints, and evaluates them. |\n",
"| `train_capital_agent.py` | Trainer entry point that loads `capital_samples.csv` and orchestrates three fine-tuning iterations. |\n",
"| `capital_agent.py` | Tool-enabled agent that calls `country_capital_lookup`, producing reward `1.0` when the response contains the expected capital. |\n",
"| `capital_samples.csv` | Prompt/answer pairs that the trainer splits 80/20 into training and validation sets. |\n",
"| `tests/test_deployment.py` | Smoke tests for deployment helper methods when live Azure credentials are configured. |\n",
"\n",
"## Workflow Overview\n",
"\n",
"- **Stage 1 – Collect traces.** `Trainer` points runners at your base deployment and gathers rollouts in batches of `finetune_every_n_rollouts`.\n",
"- **Stage 2 – Filter and package data.** Rewards and telemetries from `capital_agent` are collected by Agent-lightning, which drives filtering via `data_filter_ratio`, and the remaining traces are serialized into Azure OpenAI JSONL format.\n",
"- **Stage 3 – Fine-tune.** `AzureOpenAIFinetune.finetune` uploads the dataset, waits for the fine-tuning job to finish, and returns the new base model identifier.\n",
"- **Stage 4 – Deploy and evaluate.** A versioned deployment such as `gpt-4.1-mini-ft_v01` is created, old deployments are pruned when `max_deployments` is exceeded, and validation rollouts confirm the reward.\n",
"\n",
"The process is shown in the following diagram:\n",
"\n",
"<p align=\"center\">\n",
" <img src=\"./assets/aoai_finetune.svg\" alt=\"Azure OpenAI Finetune\" style=\"width:100%\"/>\n",
"</p>\n",
"\n",
"## Capital Agent\n",
"\n",
"`capital_agent.py` defines a tool-enabled agent that must call `country_capital_lookup` whenever a user asks for a capital city. The deterministic lookup table keeps the task simple, and the reward function checks that the final response contains the expected capital name. Run the script directly to validate credentials or debug tool call behavior:\n",
"\n",
"```bash\n",
"python capital_agent.py\n",
"```\n",
"\n",
"The agent executes five sample tasks, prints each tool interaction, and records traces via the Agent Lightning tracer.\n",
"\n",
"## Running the Example\n",
"\n",
"Start the full fine-tuning loop from the repository root:\n",
"\n",
"```bash\n",
"python train_capital_agent.py\n",
"```\n",
"\n",
"`train_capital_agent.py` divides the dataset into training and validation subsets, then completes three fine-tune → deploy → evaluate iterations. Expect short rollout times paired with longer waits (up to 4 hours in our experiments) for Azure’s fine-tuning queue; deployments usually reach `Succeeded` within 2-3 minutes. The console output looks like this:\n",
"\n",
"```log\n",
"10:13:02,624 Starting client-server execution with 2 runner(s) [role=both, main_process=algorithm]\n",
"10:13:02,639 Starting LightningStore server on localhost:4747\n",
"10:13:02,749 [AOAI FT 1/3] [Stage 1] Starting fine-tuning iteration with 24 tasks...\n",
"10:13:02,750 [AOAI FT 1/3] [Stage 2] Using model deployment: gpt-4.1-mini\n",
"10:13:03,428 [Worker 1] Started async rollouts (max: unlimited).\n",
"10:13:03,429 [Worker 0] Started async rollouts (max: unlimited).\n",
"10:13:05,279 [Worker 0 | Rollout ro-efab388d2f0e] Completed in 1.83s. Collected 4 span(s). Final reward: 1.0\n",
"10:13:05,454 [Worker 1 | Rollout ro-8ba08859ae85] Completed in 2.01s. Collected 4 span(s). Final reward: 1.0\n",
"[... 22 more rollouts omitted ...]\n",
"10:13:28,430 [AOAI FT 1/3] [Stage 3] Completed rollouts for 24 tasks.\n",
"10:13:28,431 [AOAI FT 1/3] Keeping 28 example(s) for fine-tuning after reward-based filtering.\n",
"10:13:28,431 [AOAI FT 1/3] [Stage 4] Prepared 28 training examples after filtering.\n",
"10:13:28,431 [AOAI FT 1/3] [Stage 5] Starting fine-tuning for model gpt-4.1-mini-2025-04-14...\n",
"10:13:29,854 [AOAI FT 1/3] Uploaded training file to Azure OpenAI (file_id=file-0fd6e72151094a0eb0306de7aae4883b).\n",
"10:13:41,216 [AOAI FT 1/3] Fine-tuning job ftjob-0ee45c42591b4f4a8bd4f49ef2301dcd created for base model gpt-4.1-mini-2025-04-14.\n",
"10:13:41,217 [AOAI FT 1/3] Waiting for fine-tuning job ftjob-0ee45c42591b4f4a8bd4f49ef2301dcd to complete.\n",
"12:29:11,444 [AOAI FT 1/3] Fine-tuning job ftjob-0ee45c42591b4f4a8bd4f49ef2301dcd succeeded with new model id gpt-4.1-mini-2025-04-14.ft-0ee45c42591b4f4a8bd4f49ef2301dcd-v01.\n",
"12:29:11,444 [AOAI FT 1/3] [Stage 6] Deploying fine-tuned model...\n",
"12:29:14,217 [AOAI FT 1/3] Waiting for deployment gpt-4.1-mini-ft_v01 to become ready.\n",
"12:29:15,458 [AOAI FT 1/3] Waiting for deployment to be ready. Current provisioning state of gpt-4.1-mini-ft_v01: Creating\n",
"[... 7 repetitive deployment status checks omitted ...]\n",
"12:32:53,773 [AOAI FT 1/3] Waiting for deployment to be ready. Current provisioning state of gpt-4.1-mini-ft_v01: Succeeded\n",
"12:32:53,773 [AOAI FT 1/3] Deployment gpt-4.1-mini-ft_v01 is ready with version 1.\n",
"12:32:53,774 [AOAI FT 1/3] [Stage 7] Evaluating on validation dataset...\n",
"[... 8 validation rollouts omitted ...]\n",
"12:33:03,979 [AOAI FT 1/3] [Stage 7] Evaluation completed. Average reward: 1.0000\n",
"12:33:03,979 [AOAI FT 2/3] [Stage 1] Starting fine-tuning iteration with 24 tasks...\n",
"12:33:03,979 [AOAI FT 2/3] [Stage 2] Using model deployment: gpt-4.1-mini-ft_v01\n",
"[... 24 rollouts omitted ...]\n",
"12:33:34,619 [AOAI FT 2/3] [Stage 3] Completed rollouts for 24 tasks.\n",
"12:33:34,620 [AOAI FT 2/3] [Stage 4] Prepared 27 training examples after filtering.\n",
"12:33:34,620 [AOAI FT 2/3] [Stage 5] Starting fine-tuning for model gpt-4.1-mini-2025-04-14.ft-0ee45c42591b4f4a8bd4f49ef2301dcd-v01...\n",
"12:35:12,694 [AOAI FT 2/3] Waiting for fine-tuning job ftjob-06366e441ee24a0ea242014fea8fbc3a to complete.\n",
"13:16:43,810 [AOAI FT 2/3] Fine-tuning job ftjob-06366e441ee24a0ea242014fea8fbc3a succeeded with new model id gpt-4.1-mini-2025-04-14.ft-06366e441ee24a0ea242014fea8fbc3a-v02.\n",
"13:16:43,810 [AOAI FT 2/3] [Stage 6] Deploying fine-tuned model...\n",
"13:16:46,263 [AOAI FT 2/3] Waiting for deployment gpt-4.1-mini-ft_v02 to become ready.\n",
"[... 5 repetitive deployment status checks omitted ...]\n",
"13:19:23,856 [AOAI FT 2/3] Waiting for deployment to be ready. Current provisioning state of gpt-4.1-mini-ft_v02: Succeeded\n",
"13:19:23,857 [AOAI FT 2/3] [Stage 7] Evaluating on validation dataset...\n",
"[... 8 validation rollouts omitted ...]\n",
"13:19:39,072 [AOAI FT 2/3] [Stage 7] Evaluation completed. Average reward: 1.0000\n",
"13:19:39,072 [AOAI FT 3/3] [Stage 1] Starting fine-tuning iteration with 24 tasks...\n",
"13:19:39,073 [AOAI FT 3/3] [Stage 2] Using model deployment: gpt-4.1-mini-ft_v02\n",
"[... 24 rollouts omitted ...]\n",
"13:20:04,721 [AOAI FT 3/3] [Stage 3] Completed rollouts for 24 tasks.\n",
"13:20:04,722 [AOAI FT 3/3] [Stage 4] Prepared 27 training examples after filtering.\n",
"13:20:04,722 [AOAI FT 3/3] [Stage 5] Starting fine-tuning for model gpt-4.1-mini-2025-04-14.ft-06366e441ee24a0ea242014fea8fbc3a-v02...\n",
"13:20:17,013 [AOAI FT 3/3] Waiting for fine-tuning job ftjob-2651d3183a4b40679d4c3fc886940c0c to complete.\n",
"14:02:47,241 [AOAI FT 3/3] Fine-tuning job ftjob-2651d3183a4b40679d4c3fc886940c0c succeeded with new model id gpt-4.1-mini-2025-04-14.ft-2651d3183a4b40679d4c3fc886940c0c-v03.\n",
"14:02:47,242 [AOAI FT 3/3] [Stage 6] Deploying fine-tuned model...\n",
"14:02:47,242 [AOAI FT 3/3] Maximum number of deployments reached (2). Cleaning up old deployments.\n",
"14:02:47,242 [AOAI FT 3/3] Deleting old deployment gpt-4.1-mini-ft_v01.\n",
"14:02:48,925 [AOAI FT 3/3] Deployment gpt-4.1-mini-ft_v01 deleted successfully.\n",
"14:02:51,168 [AOAI FT 3/3] Waiting for deployment gpt-4.1-mini-ft_v03 to become ready.\n",
"[... 7 repetitive deployment status checks omitted ...]\n",
"14:06:30,300 [AOAI FT 3/3] Waiting for deployment to be ready. Current provisioning state of gpt-4.1-mini-ft_v03: Succeeded\n",
"14:06:30,301 [AOAI FT 3/3] [Stage 7] Evaluating on validation dataset...\n",
"[... 8 validation rollouts omitted ...]\n",
"14:06:45,506 [AOAI FT 3/3] [Stage 7] Evaluation completed. Average reward: 1.0000\n",
"14:06:45,506 Stopping server...\n",
"14:06:45,657 Server stopped.\n",
"```\n",
"\n",
"## Tips and Cleanup\n",
"\n",
"Tweak `finetune_every_n_rollouts`, `max_deployments`, and `data_filter_ratio` in `train_capital_agent.py` to align with your quotas. While jobs run, visit the Azure OpenAI portal to confirm status. When you are done, delete unused deployments there.\n",
"\n"
]
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "c0fa7346"
},
"source": [
"## Azure Example Requires Azure OpenAI Credentials\n",
"\n",
"To execute the `azure` example, specifically `train_capital_agent.py`, an active Azure subscription with an Azure OpenAI resource is required. The `README.md` explicitly mentions the need to sign in with the Azure CLI (`az login`) and configure environment variables from a `.env` file containing Azure-specific keys and identifiers.\n",
"\n",
"Without these Azure credentials, the example cannot be run, as it depends on interacting with Azure OpenAI services for supervised fine-tuning. Therefore, I am unable to proceed with the execution of this example at this time due to the lack of necessary authentication information."
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "cb3174b6"
},
"source": [
"## Explore 'calc_x' Example README\n",
"\n",
"### Subtask:\n",
"Read the `README.md` file within the `examples/calc_x` directory to understand the example's purpose, requirements, and how to execute it.\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "842dd25e"
},
"source": [
"**Reasoning**:\n",
"To understand the 'calc_x' example and identify the main script, I need to read the `README.md` file within the `examples/calc_x` directory as per the instructions.\n",
"\n"
]
},
{
"cell_type": "code",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "4eeaa95e",
"outputId": "9b25261b-7ce8-4094-cbf1-eb41cc4c638d"
},
"source": [
"import subprocess\n",
"\n",
"try:\n",
" result = subprocess.run(['cat', 'examples/calc_x/README.md'], capture_output=True, text=True, check=True)\n",
" print(result.stdout)\n",
"except subprocess.CalledProcessError as e:\n",
" print(f\"Error reading README.md: {e}\")\n",
" print(\"Standard Error:\", e.stderr)\n",
"except FileNotFoundError:\n",
" print(\"Error: 'cat' command not found or file not found. Ensure you are in the correct directory.\")"
],
"execution_count": null,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"# Calc-X Example\n",
"\n",
"[![calc_x CI status](https://github.com/microsoft/agent-lightning/actions/workflows/examples-calc-x.yml/badge.svg)](https://github.com/microsoft/agent-lightning/actions/workflows/examples-calc-x.yml)\n",
"\n",
"This example demonstrates training a mathematical reasoning agent using Agent-Lightning with the VERL algorithm and AutoGen framework. The agent solves math problems using a calculator tool through the Model Context Protocol (MCP). It's compatible with Agent-lightning v0.2 or later.\n",
"\n",
"## Requirements\n",
"\n",
"This example requires a single node with at least one 40GB GPU. Follow the [installation guide](../../docs/tutorials/installation.md) to install Agent-Lightning and VERL-related dependencies.\n",
"\n",
"Additionally, ensure `uv` and the MCP calculator server are properly installed. The agent relies on the MCP protocol to access calculator functionality during problem-solving.\n",
"\n",
"```bash\n",
"pip install \"autogen-agentchat\" \"autogen-ext[openai]\" \"mcp>=1.10.0\"\n",
"```\n",
"\n",
"## Dataset\n",
"\n",
"Download the Calc-X dataset in parquet format from [here](https://drive.google.com/file/d/1FQMyKLLd6hP9dw9rfZn1EZOWNvKaDsqw/view?usp=sharing) and extract it to the `data` folder:\n",
"\n",
"```bash\n",
"unzip calc-x-data.zip -d data\n",
"```\n",
"\n",
"The dataset contains mathematical problems with ground truth solutions for training and evaluation.\n",
"\n",
"## Included Files\n",
"\n",
"| File/Directory | Description |\n",
"|----------------|-------------|\n",
"| `calc_agent.py` | Math problem-solving agent using AutoGen and MCP calculator tool |\n",
"| `train_calc_agent.py` | Training script using VERL algorithm with configurable hyperparameters |\n",
"| `eval_utils.py` | Evaluation utilities for assessing agent accuracy on math problems |\n",
"| `data/` | Directory containing training and test datasets in parquet format |\n",
"| `tests/` | Test files including MCP calculator verification script |\n",
"| `legacy_calc_agent.py` | Legacy agent implementation compatible with Agent-lightning v0.1.x (deprecated) |\n",
"| `legacy_calc_agent_debug.py` | Legacy debugging script compatible with Agent-lightning v0.1.x (deprecated) |\n",
"| `legacy_train.sh` | Legacy training script compatible with Agent-lightning v0.1.x (deprecated) |\n",
"\n",
"## Running Examples\n",
"\n",
"### Training\n",
"\n",
"The training process uses distributed Ray workers to run agent rollouts in parallel while the training server optimizes the model. Start Ray before launching the training:\n",
"\n",
"```bash\n",
"bash ../../scripts/restart_ray.sh\n",
"```\n",
"\n",
"If you want to track experiments with Weights & Biases, set the `WANDB_API_KEY` environment variable **before starting Ray**.\n",
"\n",
"Then run the training script:\n",
"\n",
"```bash\n",
"python train_calc_agent.py --train-file data/train.parquet --val-file data/test.parquet\n",
"```\n",
"\n",
"The script automatically launches agent workers and the training server. The agent workers execute math problem rollouts using the MCP calculator, while the training server applies the VERL algorithm to improve the model based on rewards.\n",
"\n",
"### Debugging\n",
"\n",
"To test the agent interactively without training:\n",
"\n",
"```bash\n",
"python calc_agent.py\n",
"```\n",
"\n",
"This runs the agent on sample problems to verify that the MCP calculator integration and AutoGen setup work correctly. This test relies on an OpenAI service available. Set `OPENAI_API_KEY` environment variable to the API key of the OpenAI service; and `OPENAI_API_BASE` environment variable to the base URL of the OpenAI service.\n",
"\n",
"A very common issue is that the agent may hang indefinitely if the environment is not properly configured. Verify that `uv` and the MCP calculator server are correctly installed by running:\n",
"\n",
"```bash\n",
"python tests/test_mcp_calculator.py\n",
"```\n",
"\n"
]
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "68525994"
},
"source": [
"## Calc-X Example Requires Extensive Setup\n",
"\n",
"The `examples/calc_x/README.md` has been reviewed, and it outlines the following key requirements for running this example:\n",
"\n",
"* **Hardware**: A single node with at least one 40GB GPU is required.\n",
"* **Dependencies**: Installation of `autogen-agentchat`, `autogen-ext[openai]`, and `mcp>=1.10.0` is necessary.\n",
"* **Dataset**: A specific `calc-x-data.zip` dataset needs to be downloaded and extracted into a `data` directory.\n",
"* **OpenAI API Key**: Both the training and debugging scripts (`train_calc_agent.py` and `calc_agent.py`) rely on an OpenAI service, necessitating the `OPENAI_API_KEY` and `OPENAI_API_BASE` environment variables to be set.\n",
"\n",
"Given the stringent hardware requirement (40GB GPU) and the need for external API keys and dataset downloads, this example cannot be fully demonstrated in the current Google Colab environment without explicit user intervention to meet these prerequisites. Attempting to run it would lead to resource allocation errors or authentication failures, similar to previous examples."
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "51da3ea0"
},
"source": [
"## Explore 'spider' Example README\n",
"\n",
"### Subtask:\n",
"Read the `README.md` file within the `examples/spider` directory to understand the example's purpose, requirements, and how to execute it.\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "8caaaebd"
},
"source": [
"**Reasoning**:\n",
"To understand the 'spider' example and its execution method, I need to read the `README.md` file located in the `examples/spider` directory.\n",
"\n"
]
},
{
"cell_type": "code",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "cfc0877e",
"outputId": "3cc6515c-94e5-4dfc-8787-9ab5bf251bd5"
},
"source": [
"import subprocess\n",
"\n",
"try:\n",
" result = subprocess.run(['cat', 'examples/spider/README.md'], capture_output=True, text=True, check=True)\n",
" print(result.stdout)\n",
"except subprocess.CalledProcessError as e:\n",
" print(f\"Error reading README.md: {e}\")\n",
" print(\"Standard Error:\", e.stderr)\n",
"except FileNotFoundError:\n",
" print(\"Error: 'cat' command not found or file not found. Ensure you are in the correct directory.\")"
],
"execution_count": null,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"# Spider Example\n",
"\n",
"[![spider CI status](https://github.com/microsoft/agent-lightning/actions/workflows/examples-spider.yml/badge.svg)](https://github.com/microsoft/agent-lightning/actions/workflows/examples-spider.yml)\n",
"\n",
"This example demonstrates how to train a text-to-SQL agent on the Spider dataset using Agent-Lightning with reinforcement learning. It's compatible with Agent-lightning v0.2 or later.\n",
"\n",
"## Requirements\n",
"\n",
"This example depends on LangChain v0.x and several SQL-related libraries. Install the required dependencies with:\n",
"\n",
"```bash\n",
"pip install \"langgraph<1.0\" \"langchain[openai]<1.0\" \"langchain-community\" \"langchain-text-splitters<1.0\" \"sqlparse\" \"nltk\"\n",
"```\n",
"\n",
"Additionally, follow the [installation guide](../../docs/tutorials/installation.md) to install Agent-Lightning and VERL-related dependencies.\n",
"\n",
"## Dataset\n",
"\n",
"Detailed dataset preparation instructions are available in the [How to Train a SQL Agent](../../docs/how-to/train-sql-agent.md) guide.\n",
"\n",
"## Included Files\n",
"\n",
"| File/Directory | Description |\n",
"|----------------|-------------|\n",
"| `train_sql_agent.py` | Training script for SQL agents with support for multiple model configurations (Qwen, LLaMA, fast mode for CI) |\n",
"| `sql_agent.py` | SQL agent implementation using LangGraph and LangChain, with debugging capabilities |\n",
"| `data/` | Directory containing the Spider dataset files |\n",
"| `spider_eval/` | Evaluation utilities for assessing SQL agent performance |\n",
"\n",
"## Running Examples\n",
"\n",
"### Training\n",
"\n",
"Train a SQL agent using the Qwen2.5-Coder-1.5B-Instruct model with the following command. This requires a single node with at least one 40GB GPU:\n",
"\n",
"```bash\n",
"python train_sql_agent.py qwen\n",
"```\n",
"\n",
"If you want to use an NPU for training, please refer to the **Launch Training with NPUS** section in [How to Train a SQL Agent](../../docs/how-to/train-sql-agent.md).\n",
"\n",
"### Debugging\n",
"\n",
"To test and debug the SQL agent interactively:\n",
"\n",
"```bash\n",
"python sql_agent.py\n",
"```\n",
"\n",
"This command requires an OpenAI-compatible API service. Configure your service endpoint and credentials using the `OPENAI_API_BASE` and `OPENAI_API_KEY` environment variables.\n",
"\n"
]
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "dfbbb4c8"
},
"source": [
"## Spider Example Requires Extensive Setup\n",
"\n",
"The `examples/spider/README.md` has been reviewed, and it outlines the following key requirements for running this example:\n",
"\n",
"* **Hardware**: Training a SQL agent using this example \"requires a single node with at least one 40GB GPU\". This is a significant constraint for the typical Google Colab environment.\n",
"* **Dependencies**: Several specific Python packages (`langgraph`, `langchain[openai]`, `langchain-community`, `langchain-text-splitters`, `sqlparse`, `nltk`) need to be installed.\n",
"* **Dataset**: The example relies on the Spider dataset, which requires detailed preparation as outlined in the `How to Train a SQL Agent` guide.\n",
"* **OpenAI API Key**: Both the training and debugging scripts (`train_sql_agent.py` and `sql_agent.py`) require an OpenAI-compatible API service, necessitating the `OPENAI_API_BASE` and `OPENAI_API_KEY` environment variables to be set.\n",
"\n",
"Given these stringent hardware requirements, external API key dependencies, and the need for dataset preparation, this example cannot be fully demonstrated in the current Google Colab environment without explicit user intervention to meet these prerequisites. Attempting to run it would lead to resource allocation errors or authentication failures, similar to previous examples."
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "638f8797"
},
"source": [
"## Spider Example Requires Extensive Setup\n",
"\n",
"The `examples/spider/README.md` has been reviewed, and it outlines the following key requirements for running this example:\n",
"\n",
"* **Hardware**: Training a SQL agent using this example \"requires a single node with at least one 40GB GPU\". This is a significant constraint for the typical Google Colab environment.\n",
"* **Dependencies**: Several specific Python packages (`langgraph`, `langchain[openai]`, `langchain-community`, `langchain-text-splitters`, `sqlparse`, `nltk`) need to be installed.\n",
"* **Dataset**: The example relies on the Spider dataset, which requires detailed preparation as outlined in the `How to Train a SQL Agent` guide.\n",
"* **OpenAI API Key**: Both the training and debugging scripts (`train_sql_agent.py` and `sql_agent.py`) require an OpenAI-compatible API service, necessitating the `OPENAI_API_BASE` and `OPENAI_API_KEY` environment variables to be set.\n",
"\n",
"Given these stringent hardware requirements, external API key dependencies, and the need for dataset preparation, this example cannot be fully demonstrated in the current Google Colab environment without explicit user intervention to meet these prerequisites. Attempting to run it would lead to resource allocation errors or authentication failures, similar to previous examples."
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "9b3eed6f"
},
"source": [
"## Spider Example Requires Extensive Setup\n",
"\n",
"The `examples/spider/README.md` has been reviewed, and it outlines the following key requirements for running this example:\n",
"\n",
"* **Hardware**: Training a SQL agent using this example \"requires a single node with at least one 40GB GPU\". This is a significant constraint for the typical Google Colab environment.\n",
"* **Dependencies**: Several specific Python packages (`langgraph`, `langchain[openai]`, `langchain-community`, `langchain-text-splitters`, `sqlparse`, `nltk`) need to be installed.\n",
"* **Dataset**: The example relies on the Spider dataset, which requires detailed preparation as outlined in the `How to Train a SQL Agent` guide.\n",
"* **OpenAI API Key**: Both the training and debugging scripts (`train_sql_agent.py` and `sql_agent.py`) require an OpenAI-compatible API service, necessitating the `OPENAI_API_BASE` and `OPENAI_API_KEY` environment variables to be set.\n",
"\n",
"Given these stringent hardware requirements, external API key dependencies, and the need for dataset preparation, this example cannot be fully demonstrated in the current Google Colab environment without explicit user intervention to meet these prerequisites. Attempting to run it would lead to resource allocation errors or authentication failures, similar to previous examples."
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "8559e02b"
},
"source": [
"## Explore 'tinker' Example README\n",
"\n",
"### Subtask:\n",
"Read the `README.md` file within the `examples/tinker` directory to understand the example's purpose, requirements, and how to execute it.\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "b04b908a"
},
"source": [
"**Reasoning**:\n",
"To understand the 'tinker' example and its execution method, I need to read the `README.md` file located in the `examples/tinker` directory.\n",
"\n"
]
},
{
"cell_type": "code",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "ab0d261d",
"outputId": "65bb3e9f-89e7-4496-d8b5-52473ad5df22"
},
"source": [
"import subprocess\n",
"\n",
"try:\n",
" result = subprocess.run(['cat', 'examples/tinker/README.md'], capture_output=True, text=True, check=True)\n",
" print(result.stdout)\n",
"except subprocess.CalledProcessError as e:\n",
" print(f\"Error reading README.md: {e}\")\n",
" print(\"Standard Error:\", e.stderr)\n",
"except FileNotFoundError:\n",
" print(\"Error: 'cat' command not found or file not found. Ensure you are in the correct directory.\")"
],
"execution_count": null,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"# Tinker + Agent-lightning Integration\n",
"\n",
"This example shows how to use [Tinker's reinforcement-learning infrastructure](https://tinker-docs.thinkingmachines.ai/) as a fine-tuning backend for agents written against Agent-lightning. You author the agent exactly the way you would for deployment, while the bridge code reconstructs Tinker-compatible trajectories from Agent-lightning traces.\n",
"\n",
"**NOTE: The example is tested and compatible with Agent-lightning v0.2.x, but it's not yet maintained on CI due to the cost of running the Tinker training service.**\n",
"\n",
"## How this differs from the original Tinker Cookbook RL recipe\n",
"\n",
"Real-world agent apps orchestrate logic in familiar frameworks (CrewAI, LangChain, AutoGen, OpenAI Agents, etc.) or by calling OpenAI-compatible REST APIs. A simple number-guessing agent might look like this:\n",
"\n",
"```python\n",
"def guess_number_agent():\n",
" client = openai.OpenAI()\n",
" messages = [{\"role\": \"user\", \"content\": \"Guess a number between 1 and 100.\"}]\n",
" for _ in range(MAX_TURNS):\n",
" response = client.chat.completions.create(model=\"gpt-4.1\", messages=messages)\n",
" response_content = response.choices[0].message.content\n",
" messages.append({\"role\": \"assistant\", \"content\": response_content})\n",
" guessed_number = extract_number(response_content)\n",
" if guessed_number == gold_answer:\n",
" return 1.0\n",
" elif guessed_number < gold_answer:\n",
" messages.append({\"role\": \"user\", \"content\": \"Too low\"})\n",
" else:\n",
" messages.append({\"role\": \"user\", \"content\": \"Too high\"})\n",
" return 0.0\n",
"```\n",
"\n",
"The reference [Tinker Cookbook example](https://github.com/thinking-machines-lab/tinker-cookbook/tree/51d9e8226f2dcf82ceac272c734a5f6e3b4f0203/tinker_cookbook/recipes/multiplayer_rl/guess_number), however, expects you to rewrite the same logic into a callback-style `Env`, and it creates a simple loop to iterate between a language model (`TokenCompleter`) and the `Env`.\n",
"\n",
"```python\n",
"class GuessNumberEnv:\n",
" def __init__(self, gold_answer: int):\n",
" self.system_prompt: Message = {\"role\": \"system\", \"content\": SYSTEM_PROMPT}\n",
" self.turns: list[Message] = []\n",
" self.gold_answer: int = gold_answer\n",
"\n",
" async def initial_observation(self) -> list[int]:\n",
" return message_to_tokens(self.system_prompt)\n",
"\n",
" async def step(self, action_tokens: list[int]) -> tuple[list[int], float, bool]:\n",
" action_message = tokens_to_message(action_tokens)\n",
" guessed_number = extract_number(action_message[\"content\"])\n",
"\n",
" if guessed_number == self.gold_answer:\n",
" text, reward = \"Correct\", 1.0\n",
" elif guessed_number < self.gold_answer:\n",
" text, reward = \"Too low\", 0.0\n",
" else:\n",
" text, reward = \"Too high\", 0.0\n",
"\n",
" self.turns.append(action_message)\n",
" self.turns.append({\"role\": \"assistant\", \"content\": text})\n",
" episode_done = reward == 1 or len(self.turns) // 2 >= MAX_TURNS\n",
" return message_to_tokens(self.turns), reward, episode_done\n",
"```\n",
"\n",
"As agents grow more complex, writing them in callback style becomes increasingly painful. You have to break the control flow whenever an LLM call is required, which fragments the code and makes it harder to maintain.\n",
"\n",
"Agent-lightning hides that translation step: you keep the first style for development and production, while the framework queues tasks to the store, rebuilds trajectories from spans, and feeds them to the training loop. This example shows how to make Tinker's original training loop work with Agent-lightning.\n",
"\n",
"## Included files\n",
"\n",
"| Path | Purpose |\n",
"| ---- | ------- |\n",
"| `hello.py` | Minimal end-to-end fine-tuning example. Trains a model to repeat small identity strings. |\n",
"| `q20_agent.py` | CrewAI flow that powers the 20 Questions player, answerer, and mock search tool. Shared by training and evaluation. **Unrelated to Agent-lightning or Tinker.** |\n",
"| `q20_train.py` | Reinforcement-learning driver that adapts the Cookbook loop to Agent-lightning rollouts. Supports dry-run, distributed training, and search tool toggles. **Related to both Agent-lightning and Tinker.** |\n",
"| `q20_evaluate.py` | Offline evaluator that reuses the CrewAI flow to benchmark any OpenAI- or Qwen-backed model against the provided dataset. **Related to Tinker only.** |\n",
"| `q20_nouns.csv` | Categories and answers used for training and validation. Contains `split` and `search_enabled` metadata. |\n",
"| `agl_tinker/` | Bridge package for integrating Agent-lightning with Tinker (see breakdown below). |\n",
"| `tests/test_tinker_llm.py` | Sanity tests for the custom LiteLLM provider. Run with `pytest examples/tinker/tests`. |\n",
"| `.env.example` | Template for environment variables required by LiteLLM, CrewAI helpers, and the hosted Tinker service. |\n",
"\n",
"`agl_tinker/` components:\n",
"\n",
"| Path | Purpose |\n",
"| ---- | ------- |\n",
"| `agl_tinker/algo.py` | Agent-lightning `Algorithm` wrapper that plugs the training loop into `agl.Trainer`. |\n",
"| `agl_tinker/env.py` | Dummy env and dataset builders that adapt Agent-lightning tasks to Tinker expectations. |\n",
"| `agl_tinker/llm.py` | LiteLLM custom provider backed by the Tinker sampling client. |\n",
"| `agl_tinker/rollout.py` | Span-to-trajectory reconstruction and rollout batching helpers. |\n",
"| `agl_tinker/train.py` | RL training loop adapted from the Tinker Cookbook. |\n",
"\n",
"## Setup\n",
"\n",
"**1. Install dependencies.** From the repo root:\n",
"\n",
"```bash\n",
"uv sync --frozen --extra apo --group dev --group agents --group tinker\n",
"```\n",
"\n",
"If you are not using `uv`, make sure `tinker`, `tinker_cookbook`, `litellm`, `crewai`, and Agent-lightning are available in the same environment.\n",
"\n",
"**2. Copy the environment template and fill in credentials:**\n",
"\n",
"```bash\n",
"cp examples/tinker/.env.example examples/tinker/.env\n",
"```\n",
"\n",
"- `OPENAI_API_KEY` / `OPENAI_BASE_URL`: routes helper agents (answerer, search, tool simulations) through a LiteLLM or OpenAI-compatible endpoint.\n",
"- `TINKER_API_KEY`: required to talk to the hosted Tinker training service. Skip if you are using OpenAI models only.\n",
"- `WANDB_API_KEY`: optional, enables Weights & Biases logging when configured in `q20_train.py`.\n",
"- `CREWAI_DISABLE_TELEMETRY=true`: keeps CrewAI from emitting its own telemetry so that Agent-lightning tracing stays coherent.\n",
"\n",
"3. Load the environment before running commands, e.g. `dotenv run -- <command>` or export the variables manually.\n",
"\n",
"## Running the Hello 1024 example\n",
"\n",
"This is the quickest way to see the integration in action. It fine-tunes a Qwen model so it introduces itself with the target identity.\n",
"\n",
"**One-click workflow (spawns store, algorithm, and runners in a single process)**\n",
"\n",
"```bash\n",
"dotenv run python hello.py oneclick\n",
"```\n",
"\n",
"The script will pick free ports for the LiteLLM proxy and Agent-lightning store, then iterate through the synthetic dataset of identities.\n",
"\n",
"**Distributed workflow (useful for inspecting each component)**\n",
"\n",
"```bash\n",
"agl store --port 4747\n",
"dotenv run python hello.py algo\n",
"dotenv run python hello.py runner\n",
"```\n",
"\n",
"Start the commands in separate terminals. The algorithm process connects to the existing store, while the runner process launches eight worker processes by default. Logs are written to `examples/tinker/logs/hello`.\n",
"\n",
"## Training the 20 Questions agent\n",
"\n",
"The 20 Questions setup mirrors the official Cookbook recipe but drives rollouts through the shared CrewAI flow.\n",
"\n",
"**Dry run (in-memory store and LiteLLM proxy)**\n",
"\n",
"```bash\n",
"dotenv run python q20_train.py dryrun\n",
"```\n",
"\n",
"Useful to verify that the CrewAI flow, reward emission, and span reconstruction succeed on a handful of samples without touching the hosted Tinker service.\n",
"\n",
"**Full distributed training**\n",
"\n",
"```bash\n",
"agl store --port 4747\n",
"dotenv run python q20_train.py algo --model qwen30b --search --port 4747\n",
"dotenv run python q20_train.py runner --port 4747 --n-runners 4\n",
"```\n",
"\n",
"`--model` selects the Tinker-hosted checkpoint (`qwen4b` or `qwen30b`). Add `--search` to enable the mocked search tool, which relies on the helper LLM defined in the environment variables (the example uses an LLM-powered search simulation instead of a real API). Training metrics and checkpoints are recorded under `examples/tinker/logs/q20_*`. You can also use `verl` as a substitute for the `algo` command when Tinker service is not available.\n",
"\n",
"You can run additional runner processes at any time; they register with the store and start dequeuing tasks immediately.\n",
"\n",
"## Evaluating a model on 20 Questions\n",
"\n",
"Reuse the CrewAI flow to benchmark any OpenAI-compatible model (hosted on Tinker, OpenAI, or another LiteLLM backend):\n",
"\n",
"```bash\n",
"dotenv run python q20_evaluate.py \\\n",
" --model Qwen/Qwen3-30B-A3B-Instruct-2507 \\\n",
" --output-file logs/twenty_questions_results.jsonl \\\n",
" --search\n",
"```\n",
"\n",
"Results append to the specified JSONL file so you can compute aggregate stats later.\n",
"\n",
"## How the bridge works\n",
"\n",
"The `agl_tinker` package keeps the rest of the Tinker or Tinker Cookbook's codebase untouched by emulating the interfaces it expects:\n",
"\n",
"- `AGLDatasetBuilder` and `AGLDummyEnv` wrap plain Agent-lightning datasets so batches still yield Tinker `EnvGroupBuilder` objects, even though rollouts run remotely.\n",
"- `do_group_of_group_rollouts` (in [`rollout.py`](agl_tinker/rollout.py)) enqueues tasks to the Agent-lightning store, waits for runners to finish, then reconstructs `Trajectory` objects from span triplets collected by `TracerTraceToTriplet`.\n",
"- `TinkerLLM` implements LiteLLM's `CustomLLM` so the training loop can update sampling clients and expose them through an OpenAI-compatible endpoint without rewriting agent code.\n",
"- `agl_tinker.algo.Tinker` satisfies Agent-lightning's `Algorithm` contract, meaning you can launch training via `agl.Trainer` alongside other algorithms, schedulers, or resources.\n",
"\n",
"Because spans and rewards are emitted by the same rollout function you would deploy, evaluation and production stay in sync—no separate simulator code paths to maintain.\n",
"\n",
"## Troubleshooting tips\n",
"\n",
"- If the runner logs show `Triplet has no token_ids`, ensure your LiteLLM proxy returns logprobs and token IDs, and that the token IDs are present in the store. The provided adapter requires them to rebuild trajectories. See the debugging tutorial for more details.\n",
"- CrewAI telemetry must stay disabled (see `.env.example`) so AgentOps traces remain self-contained; otherwise, you may see malformed traces.\n",
"- Tune `learning_rate`, `batch_size` and `group_size` carefully. The training is sensitive to these hyper-parameters.\n",
"\n"
]
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "7b5dd523"
},
"source": [
"## Tinker Example Requires Extensive Setup\n",
"\n",
"The `examples/tinker/README.md` has been reviewed, and it outlines the following key requirements for running this example:\n",
"\n",
"* **Dependencies**: Requires `tinker`, `tinker_cookbook`, `litellm`, and `crewai`. The setup instructions also recommend installing extras using `uv` (`uv sync --frozen --extra apo --group dev --group agents --group tinker`).\n",
"* **API Keys/Credentials**: Requires `OPENAI_API_KEY` and `OPENAI_BASE_URL` for helper agents (answerer, search, tool simulations). Additionally, `TINKER_API_KEY` is required to interact with the hosted Tinker training service.\n",
"* **External Service**: The core of this example integrates with Tinker's reinforcement learning infrastructure, which is an external training service. The `README` explicitly states, \"not yet maintained on CI due to the cost of running the Tinker training service,\" indicating a potentially high operational cost.\n",
"* **Complex Workflow**: The example involves distributed workflows, setting up store, algorithm, and runner components, and possibly large models (e.g., `qwen30b` mentioned for training), which often imply significant computational resources like high-end GPUs.\n",
"\n",
"Given the need for multiple external API keys, integration with a specialized and potentially costly external training service (Tinker), and complex distributed setup, this example cannot be fully demonstrated in the current Google Colab environment without extensive user intervention to meet these prerequisites. Attempting to run it would lead to installation, authentication, or resource allocation failures."
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "e072e7aa"
},
"source": [
"## Explore 'unsloth' Example README\n",
"\n",
"### Subtask:\n",
"Read the `README.md` file within the `examples/unsloth` directory to understand the example's purpose, requirements, and how to execute it.\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "2ae95a09"
},
"source": [
"**Reasoning**:\n",
"To understand the 'unsloth' example and identify its execution method, I need to read the `README.md` file located in the `examples/unsloth` directory as instructed.\n",
"\n"
]
},
{
"cell_type": "code",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "674e3b37",
"outputId": "17468f67-2b91-48da-c5ca-11a7031aa599"
},
"source": [
"import subprocess\n",
"\n",
"try:\n",
" result = subprocess.run(['cat', 'examples/unsloth/README.md'], capture_output=True, text=True, check=True)\n",
" print(result.stdout)\n",
"except subprocess.CalledProcessError as e:\n",
" print(f\"Error reading README.md: {e}\")\n",
" print(\"Standard Error:\", e.stderr)\n",
"except FileNotFoundError:\n",
" print(\"Error: 'cat' command not found or file not found. Ensure you are in the correct directory.\")"
],
"execution_count": null,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"# Unsloth SFT Example\n",
"\n",
"[![unsloth CI status](https://github.com/microsoft/agent-lightning/actions/workflows/examples-unsloth.yml/badge.svg)](https://github.com/microsoft/agent-lightning/actions/workflows/examples-unsloth.yml)\n",
"\n",
"This example demonstrates Supervised Fine-Tuning (SFT) using the Unsloth library for efficient training with 4-bit quantization and LoRA. The example trains a math-solving agent on the GSM-hard dataset. It's compatible with Agent-lightning v0.2 or later.\n",
"\n",
"## Overview\n",
"\n",
"The SFT workflow iteratively improves the model by collecting rollouts, ranking them by reward, and fine-tuning on the top-performing examples. Unsloth optimizes the training process with memory-efficient techniques including 4-bit quantization, LoRA (Low-Rank Adaptation), and gradient checkpointing.\n",
"\n",
"## Requirements\n",
"\n",
"Follow the [installation guide](../../docs/tutorials/installation.md) to install Agent-Lightning, PyTorch and vLLM. You will not need VERL for this example. Additionally, install Unsloth and related packages.\n",
"\n",
"```bash\n",
"pip install torch==2.8.0 torchvision==0.23.0 --index-url https://download.pytorch.org/whl/cu128\n",
"pip install vllm==0.10.2\n",
"pip install unsloth==2025.10.1 unsloth_zoo==2025.10.1 bitsandbytes peft datasets transformers trl kernels\n",
"\n",
"pip install openai-agents mcp\n",
"```\n",
"\n",
"This example requires a GPU with 16GB memory to load models in 4-bit quantization. The training uses LoRA to reduce memory requirements during fine-tuning.\n",
"\n",
"## Dataset\n",
"\n",
"The example uses the GSM-hard dataset from Hugging Face. The dataset contains mathematical reasoning problems with numeric answers. A convenience function is provided in `math_agent.py` to download the first 64 samples for quick experimentation. The samples have already been included in the repository in `data_gsmhard.jsonl`.\n",
"\n",
"## Included Files\n",
"\n",
"| File/Directory | Description |\n",
"|----------------|-------------|\n",
"| `math_agent.py` | Math agent implementation using the OpenAI Agents library and MCP calculator tool |\n",
"| `sft_allinone.py` | All-in-one SFT training script that runs the complete workflow |\n",
"| `sft_algorithm.py` | Core SFT algorithm implementation with data collection and training logic |\n",
"| `sft_rollout_runners.py` | Rollout runner configuration for parallel agent execution |\n",
"| `unsloth_helper.py` | Unsloth training utilities with LoRA configuration and model management |\n",
"| `data_gsmhard.jsonl` | Local copy of GSM-hard dataset samples (64 samples) |\n",
"\n",
"## Running Examples\n",
"\n",
"### Training\n",
"\n",
"The all-in-one script handles the complete SFT workflow including store management, rollout execution, and model training:\n",
"\n",
"```bash\n",
"python sft_allinone.py\n",
"```\n",
"\n",
"See [How to Fine-tune with Unsloth](../../docs/how-to/unsloth-sft.md) for more details.\n",
"\n",
"### Training in Separate Processes\n",
"\n",
"The all-in-one script is recommended for most use cases. However, you can also run the algorithm, runners, and store in separate processes if needed:\n",
"\n",
"```bash\n",
"# Terminal 1: Start the store\n",
"agl store\n",
"\n",
"# Terminal 2: Run the algorithm\n",
"python sft_algorithm.py\n",
"\n",
"# Terminal 3: Run the rollout runners\n",
"python sft_rollout_runners.py\n",
"```\n",
"\n",
"This approach provides more control for debugging and distributed setups but requires manual coordination between processes.\n",
"\n",
"### Debugging\n",
"\n",
"To test the math agent without training:\n",
"\n",
"```bash\n",
"python math_agent.py\n",
"```\n",
"\n",
"This runs a dry run with a few problems to verify the agent setup. Set `OPENAI_API_KEY` and `OPENAI_BASE_URL` environment variables to configure the API endpoint.\n",
"\n"
]
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "5a198318"
},
"source": [
"## Unsloth Example Requires Extensive Setup\n",
"\n",
"The `examples/unsloth/README.md` has been reviewed, and it outlines the following key requirements for running this example:\n",
"\n",
"* **Hardware**: Training with Unsloth \"requires a GPU with 16GB memory to load models in 4-bit quantization.\" This is a significant constraint for the typical Google Colab environment.\n",
"* **Dependencies**: A long list of specific Python packages are required, including `torch`, `vllm`, `unsloth`, `bitsandbytes`, `peft`, `transformers`, `trl`, `openai-agents`, and `mcp`. Installing these with specific versions can be complex and prone to conflicts.\n",
"* **API Keys**: The debugging script (`math_agent.py`) mentions the need for `OPENAI_API_KEY` and `OPENAI_BASE_URL` if an OpenAI service is used.\n",
"\n",
"Given the stringent hardware requirements (16GB GPU) and the need for numerous specific deep learning package versions, this example cannot be fully demonstrated in the current Google Colab environment without explicit user intervention to meet these prerequisites. Attempting to run it would likely lead to resource allocation errors or complex dependency issues."
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "b0d39f98"
},
"source": [
"## Final Task\n",
"\n",
"### Subtask:\n",
"Summarize the demo, including what was successfully demonstrated, any key takeaways, and potential next steps for further exploration of the Agent-Lightning library.\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "82461d93"
},
"source": [
"## Summary:\n",
"\n",
"### Q&A\n",
"\n",
"1. **Was the `apo_debug.py` script successfully executed to demonstrate debugging techniques?**\n",
" No, the `apo_debug.py` script failed to execute due to an `openai.AuthenticationError`, indicating that it requires a valid `OPENAI_API_KEY`, contrary to initial assumptions.\n",
"\n",
"2. **Could any of the `azure`, `calc_x`, `spider`, or `unsloth` examples be executed or fully demonstrated in the current environment?**\n",
" No, none of these examples could be fully demonstrated or executed due to stringent prerequisites such as specific hardware (e.g., GPUs with 16GB or 40GB memory), numerous external API keys (OpenAI, Azure OpenAI, Tinker API), complex dependency installations, and large dataset downloads/preparations.\n",
"\n",
"### Data Analysis Key Findings\n",
"\n",
"* The `apo_debug.py` example, intended to demonstrate debugging techniques, failed due to an `openai.AuthenticationError`, confirming that it also requires a valid OpenAI API key for execution.\n",
"* The `azure` example for supervised fine-tuning with Azure OpenAI requires an active Azure subscription, Azure OpenAI resources, and specific Azure credentials (`AZURE_OPENAI_API_KEY`, `AZURE_OPENAI_ENDPOINT`).\n",
"* The `calc_x` example for training a mathematical reasoning agent demands a single node with at least one 40GB GPU, an OpenAI API key, and a specific `calc-x-data.zip` dataset.\n",
"* The `spider` example for training a SQL agent also requires a single node with at least one 40GB GPU, an OpenAI API key, and detailed preparation of the Spider dataset.\n",
"* The `tinker` example, involving a reinforcement learning framework, necessitates `OPENAI_API_KEY`, `OPENAI_BASE_URL`, and a `TINKER_API_KEY` to interact with an external, costly Tinker training service, along with significant computational resources for large models.\n",
"* The `unsloth` example for supervised fine-tuning requires a GPU with at least 16GB memory for 4-bit quantization, numerous specific deep learning library dependencies, and potentially an OpenAI API key for debugging scripts.\n",
"* Across all reviewed examples (`apo_debug.py`, `azure`, `calc_x`, `spider`, `tinker`, `unsloth`), a common theme emerged: the requirement for external API keys (OpenAI, Azure, Tinker) and/or substantial hardware resources (e.g., high-memory GPUs) that are not readily available in a standard Google Colab environment.\n",
"\n",
"### Insights or Next Steps\n",
"\n",
"* To effectively demonstrate most Agent-Lightning examples, particularly those involving training or complex LLM interactions, access to valid API keys for services like OpenAI or Azure OpenAI, and/or powerful GPU hardware, is a critical prerequisite.\n",
"* For future demonstrations in environments with limited resources or without API keys, focus should be placed on examples that are explicitly designed to run locally with minimal dependencies or pre-trained models, or that offer mock API interfaces for testing purposes.\n"
]
}
]
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment