Skip to content

Instantly share code, notes, and snippets.

@aurotripathy
Last active December 16, 2024 20:53
Show Gist options
  • Select an option

  • Save aurotripathy/e68f3c7cf56d15fb023b44cc4cdf7c62 to your computer and use it in GitHub Desktop.

Select an option

Save aurotripathy/e68f3c7cf56d15fb023b44cc4cdf7c62 to your computer and use it in GitHub Desktop.
Furiosa RNGD Tool Calling Example
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"attachments": {
"image.png": {
"image/png": ""
}
},
"cell_type": "markdown",
"metadata": {},
"source": [
"### Tool Calling with Llama 3.1 and Furiosa RNGD\n",
"This notebook highlights the tool-calling capabilities of the Llama 3.1 models executing on the \\\n",
"Furiosa RNGD LLM accelerator card. \\\n",
"Our goal is show the flow of a tool-calling system. \\\n",
"We use the Llama-3.2-8B-Instruct model.\n",
"\n",
"This notebook is inspired by the notebook at the link below \\\n",
"https://github.com/huggingface/huggingface-llama-recipes/blob/main/tool_calling/tool_calling.ipynb\n",
"\n",
"The tool calling capability is important to Enterprise AI. \\\n",
"We expect enterprise LLMs to compute functions, make database-dips,\\\n",
"read rows/cols in spreadsheets, draw charts & graphs, etc.\\\n",
"The sequence diagram below captures the flow of the code below.\n",
"\n",
"![image.png](attachment:image.png)"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"/home/furiosa/.local/lib/python3.10/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html\n",
" from .autonotebook import tqdm as notebook_tqdm\n",
"2024-12-12 10:38:38,841\tINFO util.py:154 -- Missing packages: ['ipywidgets']. Run `pip install -U ipywidgets`, then restart the notebook server for rich notebook output.\n",
"Xformers is not installed correctly. If you want to use memory_efficient_attention to accelerate training use the following command to install Xformers\n",
"pip install xformers.\n"
]
}
],
"source": [
"from furiosa_llm import LLM, SamplingParams"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Create a custom tool of your choice that Llama will invoke (if warrented)\n",
"We'll create a custom tool that adds two integer numbers.\\\n",
"You can create any custom tool, here we created on that adds two integers"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"# our custom tool\n",
"def add_two_integers(x: int, y: int):\n",
" \"\"\"\n",
" Adds two integer numerals\n",
"\n",
" Args:\n",
" x: An integer\n",
" y: An integer\n",
" \"\"\"\n",
" return x + y"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Create the System Prompt and the Custom Tool spec \n",
"Below, we define the system_prompt.\n",
"We could use a chat template, we're doing it manually to illustrate the steps.\n",
"\n",
"The system prompt for tool calling is made up of two key components:\n",
"\n",
"`system instruction for tool calling`: This is the default system-level prompt. It describes the tool-calling functionality and outlines its workflow. \n",
"\n",
"`custom tool spec in JSON format`: This is the tool (function) the model can access (e.g., the `add_two_integers` function in our example).\n",
"These two parts are combined to form the complete system_prompt."
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"*** Tool calling system prompt + custom tool spec: ***\n",
"<|start_header_id|>system<|end_header_id|>\n",
"\n",
"You are an expert in composing functions. You are given a question and a set of possible functions.\n",
"Based on the question, you will need to make one or more function/tool calls to achieve the purpose.\n",
"If none of the function can be used, point it out. If the given question lacks the parameters required by the function,\n",
"also point it out. You should only return the function call in tools call sections.\n",
"\n",
"If you decide to invoke any of the function(s), you MUST put it in the format of [func_name1(params_name1=params_value1, params_name2=params_value2...), func_name2(params)]\n",
"You SHOULD NOT include any other text in the response.\n",
"\n",
"Here is a list of functions in JSON format that you can invoke.[\n",
" {\n",
" \"name\": \"add_two_integers\",\n",
" \"description\": \"Adds two integer numerals\",\n",
" \"parameters\": {\n",
" \"type\": \"dict\",\n",
" \"required\": [\"x\", \"y\"],\n",
" \"properties\": {\n",
" \"x\": {\n",
" \"type\": \"integer\",\n",
" \"description\": \"An integer\"\n",
" },\n",
" \"y\": {\n",
" \"type\": \"integer\",\n",
" \"description\": \"An integer\"\n",
" },\n",
" }\n",
" }\n",
" }\n",
"]<|eot_id|><|start_header_id|>user<|end_header_id|>\n",
"\n",
"\n"
]
}
],
"source": [
"system_instruction_for_tool_calling = \"\"\"\\\n",
"<|start_header_id|>system<|end_header_id|>\n",
"\n",
"You are an expert in composing functions. You are given a question and a set of possible functions.\n",
"Based on the question, you will need to make one or more function/tool calls to achieve the purpose.\n",
"If none of the function can be used, point it out. If the given question lacks the parameters required by the function,\n",
"also point it out. You should only return the function call in tools call sections.\n",
"\n",
"If you decide to invoke any of the function(s), you MUST put it in the format of [func_name1(params_name1=params_value1, params_name2=params_value2...), func_name2(params)]\n",
"You SHOULD NOT include any other text in the response.\n",
"\n",
"Here is a list of functions in JSON format that you can invoke.\"\"\"\n",
"\n",
"custom_tool = \"\"\"\\\n",
"[\n",
" {\n",
" \"name\": \"add_two_integers\",\n",
" \"description\": \"Adds two integer numerals\",\n",
" \"parameters\": {\n",
" \"type\": \"dict\",\n",
" \"required\": [\"x\", \"y\"],\n",
" \"properties\": {\n",
" \"x\": {\n",
" \"type\": \"integer\",\n",
" \"description\": \"An integer\"\n",
" },\n",
" \"y\": {\n",
" \"type\": \"integer\",\n",
" \"description\": \"An integer\"\n",
" },\n",
" }\n",
" }\n",
" }\n",
"]\"\"\"\n",
"\n",
"\n",
"system_prompt = f\"{system_instruction_for_tool_calling}{custom_tool}<|eot_id|><|start_header_id|>user<|end_header_id|>\\n\\n\"\n",
"\n",
"\n",
"print(f'*** Tool calling system prompt + custom tool spec: ***\\n{system_prompt}')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Add the user prompt\n",
"This is the two-number-addition question you'll pose to the LLM and expect it to invoke the tool calling capability"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"*** Prompt after adding the user prompt: ***\n",
"<|start_header_id|>system<|end_header_id|>\n",
"\n",
"You are an expert in composing functions. You are given a question and a set of possible functions.\n",
"Based on the question, you will need to make one or more function/tool calls to achieve the purpose.\n",
"If none of the function can be used, point it out. If the given question lacks the parameters required by the function,\n",
"also point it out. You should only return the function call in tools call sections.\n",
"\n",
"If you decide to invoke any of the function(s), you MUST put it in the format of [func_name1(params_name1=params_value1, params_name2=params_value2...), func_name2(params)]\n",
"You SHOULD NOT include any other text in the response.\n",
"\n",
"Here is a list of functions in JSON format that you can invoke.[\n",
" {\n",
" \"name\": \"add_two_integers\",\n",
" \"description\": \"Adds two integer numerals\",\n",
" \"parameters\": {\n",
" \"type\": \"dict\",\n",
" \"required\": [\"x\", \"y\"],\n",
" \"properties\": {\n",
" \"x\": {\n",
" \"type\": \"integer\",\n",
" \"description\": \"An integer\"\n",
" },\n",
" \"y\": {\n",
" \"type\": \"integer\",\n",
" \"description\": \"An integer\"\n",
" },\n",
" }\n",
" }\n",
" }\n",
"]<|eot_id|><|start_header_id|>user<|end_header_id|>\n",
"\n",
"What is the result of 12322 added to 1242453<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n",
"\n",
"\n"
]
}
],
"source": [
"user_prompt = \"What is the result of 12322 added to 1242453\"\n",
"\n",
"prompt = f\"{system_prompt}{user_prompt}<|eot_id|><|start_header_id|>assistant<|end_header_id|>\\n\\n\"\n",
"\n",
"print(f'*** Prompt after adding the user prompt: ***\\n{prompt}')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Use Llama 3.1 to generate the completion based on the prompt\n",
"Now you're ready to execute step 1 generation\\\n",
"This step completes the prompt you just created \\\n",
"and should return the function call with the appropriate paramters filled in."
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"INFO:2024-12-12 10:38:39 Prefill buckets: [Bucket(batch_size=1, attention_size=512), Bucket(batch_size=1, attention_size=1024)]\n",
"INFO:2024-12-12 10:38:39 Decode buckets: [Bucket(batch_size=64, attention_size=2048), Bucket(batch_size=128, attention_size=2048)]\n",
"INFO:2024-12-12 10:38:39 For some LLaMA V1 models, initializing the fast tokenizer may take a long time. To reduce the initialization time, consider using 'hf-internal-testing/llama-tokenizer' instead of the original tokenizer.\n",
"/home/furiosa/.local/lib/python3.10/site-packages/huggingface_hub/file_download.py:797: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.\n",
" warnings.warn(\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"***Response from step 1:\n",
"[add_two_integers(x=12322, y=1242453)]\n"
]
}
],
"source": [
"path = \"./Llama-3.1-8B-Instruct\"\n",
"llm = LLM.from_artifacts(path)\n",
"\n",
"# step 1 of 2\n",
"sampling_params = SamplingParams(min_tokens=10, top_p=0.3, top_k=100)\n",
"responses = llm.generate([prompt], sampling_params)\n",
"\n",
"\n",
"for response in responses:\n",
" print(f'***Response from step 1:\\n{response.outputs[0].text}')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Execute the function call locally\n",
"Now that the LLM has given you he function call, execute it locally. \\\n",
"Note, the LLM has no way to execute it. "
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"*****To be used in step 2:\n",
"[add_two_integers(x=12322, y=1242453)]\n",
"Local function execution result:\n",
"1254775\n"
]
}
],
"source": [
"model_tool_call_response = responses[0].outputs[0].text\n",
"print(f'*****To be used in step 2:\\n{model_tool_call_response}')\n",
"\n",
"# function executed locally\n",
"tool_call_response = str(add_two_integers(x=12322, y=1242453))\n",
"print(f'Local function execution result:\\n{tool_call_response}')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Assemble the entire prompt\n",
"Now that you executed the extracted function locally, you're ready to construct the entire prompt and elicit a completion"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"***Print the assembled from from step 2:\n",
"\n",
"<|start_header_id|>system<|end_header_id|>\n",
"\n",
"You are an expert in composing functions. You are given a question and a set of possible functions.\n",
"Based on the question, you will need to make one or more function/tool calls to achieve the purpose.\n",
"If none of the function can be used, point it out. If the given question lacks the parameters required by the function,\n",
"also point it out. You should only return the function call in tools call sections.\n",
"\n",
"If you decide to invoke any of the function(s), you MUST put it in the format of [func_name1(params_name1=params_value1, params_name2=params_value2...), func_name2(params)]\n",
"You SHOULD NOT include any other text in the response.\n",
"\n",
"Here is a list of functions in JSON format that you can invoke.[\n",
" {\n",
" \"name\": \"add_two_integers\",\n",
" \"description\": \"Adds two integer numerals\",\n",
" \"parameters\": {\n",
" \"type\": \"dict\",\n",
" \"required\": [\"x\", \"y\"],\n",
" \"properties\": {\n",
" \"x\": {\n",
" \"type\": \"integer\",\n",
" \"description\": \"An integer\"\n",
" },\n",
" \"y\": {\n",
" \"type\": \"integer\",\n",
" \"description\": \"An integer\"\n",
" },\n",
" }\n",
" }\n",
" }\n",
"]<|eot_id|><|start_header_id|>user<|end_header_id|>\n",
"\n",
"What is the result of 12322 added to 1242453<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n",
"\n",
"<|python_tag|>[add_two_integers(x=12322, y=1242453)]<|start_header_id|>ipython<|end_header_id|>\n",
"\n",
"\"1254775\"<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n",
"\n",
"\n"
]
}
],
"source": [
"# Step 2 of 2\n",
"# setting up the prompt for step 2\n",
"prompt = f\"{prompt}<|python_tag|>{model_tool_call_response}<|start_header_id|>ipython<|end_header_id|>\\n\\n\"\n",
"prompt = prompt + f'\"{tool_call_response}\"<|eot_id|><|start_header_id|>assistant<|end_header_id|>\\n\\n'\n",
"print(f'***Print the assembled from from step 2:\\n\\n{prompt}')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Finally, have the LLm generate a cogent answer \n",
"From the prompt above generate a cogent answer"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Completing the step 2 prompt...\n",
"The result of 12322 added to 1242453 is 125477\n"
]
}
],
"source": [
"# Generation in step 2\n",
"print(f'Completing the step 2 prompt...')\n",
"responses = llm.generate([prompt], sampling_params)\n",
"for response in responses:\n",
" print(response.outputs[0].text)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Note that, in spite of LLM hallucinations, this answer is repeatable which is a requirement in Enterpise AI applications as they go about computing functions \\\n",
"making database-dips, extracting rows/columns/cells from spreadsheets, drawing charts and graphs, etc."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.12"
}
},
"nbformat": 4,
"nbformat_minor": 2
}
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment