bedwards/a.md

## a.md

      
    Raw
  

              a.md
            
          
    Introduction: Agenda and the "Agent" definition
[Music]
[Music]
Okay. Yeah, thanks for joining me. I uh I'm still on the West Coast time, so it feels like I'm doing this at like 7:00
a.m. Uh so yeah, but um glad to talk to you
about the Claude agent SDK. So um yeah,
I think like this is going to be like a rough agenda, but we're going to talk about we're going to talk about like what is the claud agent SDK? Why use it?
There's so many other agent frameworks. What is an agent? What is an agent framework? um how do you design an agent uh using
the agent SDK or or just in general? Um and then I'm going to do some like live
coding or Claude is going to do some live coding on prototyping an agent. Um and uh I've got some starter code. But
uh yeah, I I the whole goal of this is like know we got two hours. We're going
to be super collaborative, ask questions. Um, this is also going to be
not like a super canned demo in the sense that like we're going to be like thinking through things live. You know,
I'm not going to have all the answers right away. Um, and I think that'll be a good way of like building an agent loop
I think is like really very much like kind of an art or intuition. So, um, but
yeah, before we get started, just curious, a show of hands, like how many people have heard of the cloud agent SDK
or Okay, great. Cool. How many have like used it or tried it out? Okay, awesome.
Okay, so pretty good show of hands. Um, yeah, so I'll I'll just get started on
like the like, you know, overview on agents. I I think that like this is I I
I think something that people have seen before, but I think it still is taking some time to like really sink in. Uh how
AI features are evolving, you know? So I think like when GPT, you know, 3 came
out, it was really about like single LLM features, right? You're like, oh, like, hey, can you categorize this like return
a response in one of these categories? Um, and then we've got more like workflow like things, right? Hey, like
can you like take this email and label it or like, hey, here's my codebase like
index via rag. Can you give me like the next completion or the next um the next
file to edit, right? And so that's what we'd call like a workflow where you're very like structured. You're like, hey,
like given this code, give me code back out, right? And now we're getting to
agents, right? And uh like the canonical agent to use is cloud code, right? Cloud
code is a tool where you don't really tell it. We don't restrict what it can
do really, right? You're just talking to it in text and it will take a really wide variety of actions, right? And so
agents uh build their own context, like decide their own trajectories, are
working very very autonomously, right? And so, uh, yeah, and I think like as
the future goes on, like agents will get more and more autonomous. Um, and we,
uh, yeah, I think it's like we're kind of at a break point where we can start to build these agents. Um, they're not
perfect, you know, but it's definitely like the right time to get started. So,
um, yeah, Cloud Code, I'm sure many of you have have tried or used. Um it is
yeah I think the first true agent right like the first uh time where I saw an AI
working for like 10 20 30 minutes right so um yeah it's a coding agent and uh
the cloud agent SDK is actually built on top of cloud code and uh the reason we
did that is because um basically we found that when we were
building agents at anthropic we kept rebuilding the same parts over and over again. And
so to to give you a sense of like what that looks like, of course, they're the models to start, right? Um, and then in
the harness, you've got tools, right? And that's like sort of the first obvious step, like let's add some tools
to this harness. And later on, we'll give an example of sort of like trying
to build your own harness from scratch, too, and and what that looks like and and how challenging it can be. But tools
are not just like your own custom tools. might be tools to interact with your file system like with cloud code. Um did
the volume just go up or were they not holding it close enough? Okay. Now
anyways um got tools tools you run in a loop and then you have the prompts right like the core agent prompts the um the
prompts for the things like that. Uh and then finally you have the file system
right and or not finally but you have the file system. The file system is a way of context engineering that we'll
The "Harness" concept: Tools, Prompts, and Skills
talk more about later, right? And I think like I one of the key insights we had through cloud code was thinking a
lot more through the like context not just a prompt, it's also the tools, the files and scripts that it can use. Um,
and then there are skills which we've like rolled out recently and uh we can talk more about skills uh um if that's
interesting to you guys as well. Um and then yeah things like uh sub aents uh
web search you know like um like research compacting hooks memory there
all these like other things around the harness as well um and uh it ends up
being quite a lot. So the cloud agent SDK is all of these things packaged up for you to use right um and yeah you
have your application. So I I think like uh to give you a sense of uh yeah to
give you a sense of like maybe why the cloud agent SDK is um
yeah like like so yeah people are already building agents on the SDK a lot of software agents uh you know software
reliability security triaging bug finding um site and dashboard builders
if These are extremely popular. If you're using it, you should absolutely use the SDK. Um, I guess office agents, if
you're doing any sort of office work, tons of examples there. Um, got some like, you know, legal, finance,
healthcare ones. Um, so yeah, there are tons of people building on top of it.
Um, I want to Oh, yeah. Okay. So, why the cloud agent SDK, right? Like why did
we do it this way? It's why did we build it on top of cloud code? And we realized
basically that as soon as we put cloud code out, yeah, the engineers started using it, but then the finance people
started using it and the data science people started using it and the marketing people started using it and
yeah, I think it just like it we just realized that people were using cloud code for non-coding tasks and we felt
and and as we were building, you know, non-coding agents, we kept coming back to it, right? And so, um, it's a like,
and we'll go more into why that just works, why we you could use cloud code
for non-coding task. Uh, spoiler alert, it's like the bash tool. Um, but yeah,
it's uh it it was something that we saw as an emergent pattern that we want to use and we've built our agents on top of
it, right? And uh these are lessons that we've learned from deploying cloud code that we've sort of baked in. So, uh,
tool use errors or compacting or things like that, stuff that is like very can
take a lot of scale to find, you know, like what are the best practices we've sort of baked into the cloud agent SDK.
Um, as a result, we have a lot of strong opinions on the best way to build agents. Uh, like I think the cloud agent
SDK is quite opinionated. I'll talk over some of these opinions and and why like uh why we chose them, right? Um but
yeah, one of the big opinions of the bash tool is the most powerful agent tool. So okay, um what what are like
what I would describe as the anthropic way to build agents, right? And I'm not I'm not saying that you can only build agents using the API this way, right?
But this is like um if you're using our opinionated stack on the agent SDK, what
is it? Right? So roughly Unix primitives like the bash and file system and you
know we're going to go over like prototyping an agent using cloud code and my goal is really to sort of show
you what that looks like in real time right like why is batch useful why is the file system useful why not just use
tools um yeah agents uh I mean you can also make workflows and we'll talk about
that a bit later but agents build their own context um thinking about code generation for non-coding
um like we use codegen to generate docs, query the web, like do data analysis,
take uh unstructured action. So um there's a lot of like uh this can be
pretty counterintuitive to some people and again in the like prototyping session, we'll we'll go over how to use
code generation for non-coding agents. Um and yeah, every agent has a container
or is hosted locally because this is cloud code. uh it needs a file system, it needs bash, it needs to be able to
operate on it. And so it's a very very different architecture. I'm not planning to talk too much about the architecture
today, but we can at the end if that's what people are interested in in or sorry by architecture I mean hosting
architecture like how do you host an agent and like uh what are best practices there? Have you talked about
that at the end? Um yeah so well let me pause there because I feel
like I covered a lot already. any questions so far on the agent SDK agents
Live Coding Setup: Initializing the Agent class and environment
um yeah like what you get from it can you can you explain what code
generation for non-coding means exactly yeah um this is um like basically when
you ask cloud code to do a task right like let's say that you ask it to uh
find the weather in San Francisco and like you know tell me what I should wear
or something right? Like uh what it might do is it might start writing a
script uh to fetch a weather API, right? And then start like maybe it wants it to
be reusable. Like maybe you want to do this pretty often, right? So it might fetch the weather API and then get the
like maybe even get your location dynamically right based on your IP address and then it will like um you
know check the weather and then maybe like call out to like a sub agent to give you recommendations. Maybe there's
an API for your closet or wardrobe, right? It's like so that's an example. I
I think that like it's kind of um for any single example we can talk over how you might use code codegen. Uh a lot of
it is like composing APIs is like the high level way to think about it. Yeah.
Uh yeah. And yeah uh workflow versus agent uh like for repetitive task or you know like a
process a business process that is always the same. Do you will still prefer to build an agent versus a fully
deterministic workflow? Yeah. So, we do have
Oh, sure. Yeah. Yeah. Um, so the question the question was about workflows versus agents and would you still use the cloud agent SDK for
workflows? Is that right? Um, yes. And and so uh I mean we I just we just sort
of tell you what we do internally basically and what we do internally is we've done a lot of like GitHub automations and Slack automations built
on the cloud agent SDK. So, uh, you know, we have a bot that triages issues when it comes in. That's a pretty
workflow like thing, but we've still found that, you know, in order to triage issues, we want it to be able to clone
the codebase and sometimes spin up a Docker container and test it and things like that. And so, it's still ends up
being like a very like there's a lot of steps in the middle that need to be quite free flowing. Um, and then you
like give structured output at the end. So, um, yes. All right, we'll take one
more question and then we'll keep going. So, yeah, in the blue. Yeah. Uh so could you talk about security and guardians like if if you know you're using cloud
agent SDK and you know you lean towards using bash as the you know all powerful
generic tool and is the onus on uh building the agent builder to make sure
that you know you're preventing against like common attack vectors or is that something that the model is is is doing
itself? Yeah. So I I think this is sort of like the Swiss chief. Oh yeah. Okay. So the question was uh permissions on the bash
tool, right? Or like how do you think about permissions and guardrails the like in like when you're giving the
agent this much power over you know your its environment and the computer, how do you make sure it's aligned, right? And
so the way we think about this is uh what we call like the Swiss cheese defense, right? So like there is um like
on every layer some defenses and together we hope that it like blocks everything, right? So obviously on the
model layer uh we do a lot of um alignment there. We actually just put
out a really good paper on reward hacking. Super recommend you check that out. Um so like definitely I think cloud
models like we try and make them very very aligned, right? And uh so yeah there's the model alignment behavior
then there is like the harness itself, right? And so we have a lot of like permissioning and prompting um and uh
like we do a pass par parser on the bash tool for example so we know um fairly
reliably like what the bash tool is actually doing and definitely not something you want to build yourself. Um
and then finally the last layer is sandboxing right so like let's say that an someone has
maliciously taken over your agent what can it actually do uh we've included a
sandbox and like where you can sandbox network request um and sandbox uh file
system operations outside of the file system. And so, uh, yeah, ultimately
that's what they call like the lethal triacto, right? Is like, um, like the ability to like execute code in an
environment, change a file system, um, excfiltrate the code, right? I think I'm getting the lethal trifecta a little bit
wrong there, but like the idea is basically like if they can excfiltrate your like information back out, right?
Um, that's like they still need to be able to extract information. And so if you sandbox the network, that's a good
way of doing it. Um if you're hosting on a sandbox container like Cloudflare uh
modal or you know E2B Daytona like all of these like sound sandbox providers
they've also done like some level level of security there right it's like you're not hosting it on your personal computer
um or on a computer with like your prod secrets or something so uh yeah lots of different layers there and and yeah we
can talk more about hosting in depth um so okay so I'm going to uh talk a little
bit about bash is all you need you Um, I think this is something that Oh,
yeah. Um, this is like my stickick, you know? I'm just going to like keep talking about this until everyone like
uh agrees with me. Um, or like I think this is something that we found atanthropic. I think it is sort of
implementing the "Think" step: Getting the model to reason before acting
something I discovered once I got here. Um, bash is what makes code so good,
right? So, I think like you guys have probably seen like code mode or
programmatic tool use, right? like the uh different ways of like composing MLPS
uh cloudfl put out some blog post on that we put out some blog posts uh the way I think about code mode is like or
bash is that it was like the first code mode right so the bash tool allows you
to you know like store the results of your tool calls to files uh store memory dynamically generate scripts and call
them compose functionality like tail graph uh it lets you use existing
software like fmp or libra office right so there's a lot of like interesting things and powerful things that the
batch tool can do. And like think about like again what made cloud code so good. If you were designing an agent harness,
maybe what you would do is you'd have a search tool and a lint tool and an execute tool, right? And like you have
end tools, right? Like every time you thought of like a new use case, you're like, I need to have another tool now, right? Um instead now cloud just uses
grap, right? And nodes your package manager. So it runs like npm run like
test.ts or index.ts s or whatever, right? Like it can lint, right? And it can find out how you lint, right? And
can run npm run lint if if you don't have a llinter. It can be like what if I install eslint for you, right? So, um
this is like you know like I said the first programmatic tool calling first code mode, right? Like you can do a lot
of different actions very very generically, right? Um and so to talk
about this a little bit in the context of non-coding agents, right? So let's say that we have an email agent and the
user is like okay how much did I spend on ride sharing this week um a you know
like it's got one tool call or generally it's got the ability to search your inbox right and so it can run a query
like hey search Uber oryft right and
without bash it it searches Uber oryft it gets like a hundred emails or
something and now it's just got to think about it. You know what I mean? And I I think like a good like analogy
is sort of like imagine if someone came to you with like like a stack of papers and like hey, how much did I spend on
ride sharing this week? Can you like read through my emails? You know, I mean like that that would be really hard, right? Like uh you need very very good
precision and recall to do it. Um or with bash, right? Like let's say there's
a Gmail search script, right? It takes in a query function. Um, and then you
can start to save that query function to a file or pipe it. You can GP for
prices. You know, you can uh then add them together. You can check your work too, right? Like you can say, okay, let
me grab all my prices, store those as like in a file with line numbers and
then let me then be able to check afterwards like uh was this actually a price? Like what does each one correlate
to? Right? So there's a lot more like dynamic information you can do to check your work with the bash tool. So this is
like um just a simple example but like hopefully showing you sort of the power
of like the composability of bash right so I'll pause there any questions on
bash is all you need the bash tool any any thing I can make a little bit clearer do you have stats on how many people use
yolo mode uh stats on yolo mode we probably do
um I mean internally we we don't uh but that's just I think we just have a higher security posture. Um,
yeah, I'm not sure. Uh, I can probably pull that. Any other questions on bash?
Okay, cool. Um, yeah, just to give you like some more examples like let's say
that you had an email API and you wanted to uh, you know, like go through like
fetch my like tell me who emailed me this week, right? So, you've got two APIs. You've got an inbox API and a contact API. Um this is like a way you
can do it via bash. You can also do it via codegen. This is kind of like enough bash that it is codegen, right? Like um
bash is a ostensibly codegen tool. Um and then yeah like let's say that you
wanted to you had a video meeting agent, right? You wanted to say like find all the moments where the speaker says
quarterly results in this earnings call, right? You can use ffmpeg to like slice up this video, right? um you can use jq
to like uh start analyzing the information afterward. So um yeah, lots of like def like powerful ways to use uh
to use bash. So I'm going to talk a little bit about workflows and agents. Yeah, you can do
both. You could use uh build workflows and agents on the agent SDK. Um yeah,
agents are like cloud code. If if you are like building something where you want to talk to it in natural language
and take action flexibly, right? Then that's why you're building an agent, right? Like you want you have an agent
that talks to your like business data and you want to get insights or dashboards or answer questions or uh
write code or something like that's an agent, right? And then a workflow is kind of like, you know, we do a lot of
GitHub actions for example, right? So you define the inputs and outputs very closely, right? So you're like, "Okay, take it a PR and give me a code review."
Um, and yeah, both of these you can use agent SDK for. Um, when building workflows, you can use structured
outputs. We just released this. Um, you can, yeah, Google agent SDK structured
outputs. Um, but yeah, so you can do both. I'm going to primarily be talking
about agents right now. A lot of the things that you can like learn from this
are applicable to workflows as well. So, um, yeah, we'll we'll talk about this.
Uh, wait, show of hands. How many people have like designed an agent loop before?
Okay, cool. Okay, great. Great. Um, so yeah, I mean, I think the number one
thing the the metalarning for designing an agent loop to me is just to read the transcripts over and over again. Like
every time you see see the agent running, just read it and figure out like, hey, what is it doing? Why is it
doing this? can I uh help it out somehow? Right? Um and uh we'll do some
of that later, right? So we'll uh we'll build an agent loop. Um but here is the
uh the three parts to an agent loop, right? So uh first it's gather context,
right? Second is taking action and the third is verifying the work, right? And
uh this is like not the only way to build an agent, but I think a pretty good way to think about it. Um gathering
context is uh like you know for cloud code it's grepping and finding the files
needed, right? Um you know for an email agent it's like finding the relevant emails, right? Um, and so these are all
like pretty um, yeah, like I I think thinking about how it finds this context is very important and I think a lot of
people sort of uh, skip the step or like underthink it. This can be like very
very important. Uh, and then taking action um, how does it like do its work?
Uh, does it have the right tools to do it like code generation, uh, bash these are more flexible ways of taking action,
right? And then verification is another really important step. And so the
basically what I'd say right now is like if you're thinking of building an agent, think about like can you verify its
work, right? And if you can verify its work, it's like a great like candidate for an agent. If you can't verify its
work, like it's like you know coding you can verify by lending, right? And you can at least make sure it compiles. So
that's great. uh if you're doing let's say deep research for example it's actually a lot harder to verify your
work one way you can do it is by citing sources right so that's like a step in verification but obviously research is
less verifiable than code in some ways right because like code has a compile step right you can also like execute it
then see what it does right so um I think like thinking on you know like as
we build agents the ones that are closest to being very general are the ones with the verification step that is
very strong right So I I think there was a question here. Yeah. So when where do you generate a plan of
the work? Yeah. I mean you you might
question Oh yeah sorry the the question was when do you generate a plan um before you run
through it. So um like in cloud code you don't always generate a plan. Uh but if
you want to you'd insert it between the gathering context and taking action step, right? And so um plans sort of
help the agent think through step by step, but they add some latency, right? And so there is like some trade-off
there. Um but yeah, the agent SDK helps you like do some planning as well. So yeah.
Yeah. Can you like make the agent create that to-do list for like 100%
sure that it will create that to-do list and run by it? Uh yeah. So the question was will the
agent create the to-do list? Uh yes. Um if you're using the agent SDK, we have
The Agent Loop: connecting act, observe, and loop
like some to-do tools that come with it and so it will like maintain and check off to-dos and you can display that as
you go. So yep. Um, any other questions about this right
now? Okay, cool. Okay, so I'm going to quickly talk about like like how do you
do this stuff? You like what are your tools for doing it, right? And uh there
are three things you can do that you have tools, bash and code generation, right? And I I think traditionally I
think a lot of people are only thinking about tools and uh yeah, basically one of the call to actions is just figuring
out like thinking about it more broadly, right? So tools are extremely structured and very very reliable, right? Like if
you want to sort of have as fast an output as possible with minimal errors, uh minimal retries, uh tools are great.
Uh cons, they're high context usage. If anyone's built an agent with like 50 or
100 tools, right? Like they take up a lot of context and the model it kind of gets a little bit confused, right? Um
there's no like sort of discoverability of the tools. Um and they're not composable, right? and and I say tools
in the sense of like if you're using you know messages or completion API right now um that's how the tools work of
course like you know there's like code mode and programmatic tool calling so you can sort of blend some of these um
but there's bash so bash is very composable right like uh static scripts
low context usage uh it can take a little bit more discovery time because like let's say that you have whatever
you have like the playright MCP or something like that um or sorry the playright CLI the playright like bash
tool um you can do playright-help to figure out all the things you can do but the agent needs to do that every time
right so it needs to like discover what it can do um which is kind of powerful that it helps take away some of the high
context usage but add some latency um there might be slightly lower call rates
you know just because like it has a little bit more time to um it needs to
like find the tools and what it can do. Um but this will definitely like improve as it goes. And then finally, codegen
highly composable dynamic scripts. Um they take the longest to execute, right?
So they need linking possibly compilation. API design becomes like a very very interesting step here, right?
And I and I'll talk more about like uh best like how to think about API design in an agent. Um but yeah I think this is
like how we like the the three tools you have and so yeah using tools think you
still want some tools but you want to think about them as atomic actions your agent usually needs to execute in
sequence and you need a lot of control over right so for example in cloud code we don't use bash to write a file we
have a write file tool right because we want the user to be able to sort of see the output and approve it and um we're
not really composing write file with other things, right? It's like very atomic action. Um, sending an email is
another example. Like any sort of like non-destruct like destructible or sort of like you know uh unreversible change
is definitely like a a tool is a good place for that. Um then we've got bash.
Uh so for example there are like uh composable actions like searching a folder using GitHub linting code and
checking for errors or memory. Um and so yeah you can write files to memory and
that can be your bash like bash can be your memory system for example right so um and then finally you've got code
generation right so if you're trying to do this like highly dynamic very flexible logic composing APIs uh like
you're doing data analysis or deep research or like reusing patterns and so
um yeah we'll talk more about uh code generation in a bit um any questions so far about like the
SDK loop loop or tools versus bash versus codegen. Yeah. Yeah. Uh I was going to ask you are you
going to have any readymade tools for like offloading results
offloading tool called results like into the file system or like let's say goes to bash and then context explodes.
Does it like typed a command that like do everything up? Okay. Or or otherwise just like long outputs
polluting your history. Sure. Yeah. Yeah. Yeah. I imagine like all the time just uploading them to files.
Yeah. Yeah. I I think that's a good common practice. I think um we
I I remember seeing some PRs about this very recently on on cloud code about
handling very long outputs and I I I don't know exactly like I I think I
think we are moving towards a place where more and more things are being like just stored in the file system and this is like a good example. Yeah, like
it's storing like long outputs uh over time. Um, I think like generally
prompting the agent to do this is a good uh way to think about it. Or even if you have I think like something I just do
always now is like whenever I have a tool call I um I save it like the
results of the tool call to the file system so that you can like search across it and then have the tool call return the path of the result. Um just
because like that helps it like sort of recheck its work. So um yes. Um, do you
find that you need to use like the skills um kind of structure to help
claude along to use the bash better or out of the box? You know, that's not
necessary. Yeah. So, the question was about skills and like do we need skills to use bash better? Um, yeah, for context skills
maybe I can Okay, skills. Okay. Yeah, skills are
basically a way of like uh you know allowing our agent to take longer
complex tasks and like sort of load in things via context, right? So some like
for example we have uh a bunch of DOCX skills and these DOCX skills tell it how to do code generation to generate these
files, right? And so um yeah, I think overall skills are yeah, basically just
a collection of files. They're also sort of like an example of being very like file system or bash tool pilled, right?
Um because they're really just folders that your agent can like CD into and
like read, right? Um and so yeah, they give like what we found the skills are
really good for is pretty like repeatable instructions that need a lot of expertise in them. Uh like for
example, we released a front-end design skill recently that I really really like and um it's really just sort of a very
detailed and good prompt on how to do front-end design. Uh but it comes from like our best, you know, like uh AI
front-end engineer, you know what I mean? And he like really put a lot of top thought and iteration to it. So that's one way of using skills. Um
yeah, quick question. Why use that front skill? Sure. It's pretty good. Thanks for
publishing it. Uh I want to understand uh there are multiple MP files like MP
is also there and it is also at the user level and then there are skill files like is
there like a priority order should some stuff be relegated to claw.md and some
other stuff should only come to skill.md? H so the question was about skill.md versus claw.md and how to think
about uh that right and uh I think like I I will say all of these concepts are
so new you know I mean like even cloud code is like released it like eight or nine months ago right like um and so
Tool Execution: Handling XML parsing and tool inputs
skills were released like two weeks ago like I like I won't pretend to know all of the best practices for for everything
right um I think generally skills are a form of progressive context
disclosure closure and that's sort of a pattern that we've talked about a bunch right like with like uh bash and you
know like preferring that over like you know purely like normal tool calls is
like it's a way of like the agent being like okay I need to do this let me find out how to do this and then let me read
in this skill empty right so you ask it to make a docx file and then it like cds
into the directory reads how to do it writes some scripts and keeps going so um yeah I think like there's still some
intuition to build around like what what exactly you like define as a skill and how you split it out. Um but uh yeah, I
think uh yeah, lots of best practices to learn there still. Um
yeah, so yesterday we talked about the future of skills
over time. Do you see these as ultimately becoming part of the model and some of the skills
this is just a way to bridge the gap for now? Yeah. Yeah. So the question was are skills ultimately part of the model? Um
are they a way to bridge the gap? I missed Barry's talk at Barry and M's talk yesterday, but uh yeah, I think
roughly the idea is that the model will get better and better at doing a wide variety of tasks and skills are the best
way to give it out of distribution tasks, right? Um, but I I would broadly
say that like it's really really hard especially like you know if you're like
uh not at a lab to like tell where the models are going exactly. Um my general
rule of thumb is like I try and like rethink or rewrite my like agent code like every 6 months. Uh just cuz I'm
like uh things have probably changed enough that I've like baked in some assumptions here. And so like I think
that like our agent SDK is built to as much as possible sort of advance with capabilities, right? Like the bash tool
will get better and better. Uh we're building it on top of cloud code. So as cloud code evolves, you'll get those
wins out out of the gate. Um but at the same time like you know things are so
different now like than they were a year ago in in terms of like AI engineering, right? And I think like a general best
practice to me is sort of like, hey, we can write code 10 times faster. We should throw out code 10 times faster as
well. Um, and I think thinking about like not so like hedging your bets on
like where is the future right now, but like what can we do today that really works, right? And like like let's get
market share today and not be afraid to throw out code later. Um, if you're a
startup, this is arguably your largest advantage that you have over competitors. They're like, you know, larger companies have like six-month
incubation cycles. And so they're always like stuck in the past of like the agent
capabilities, right? And so your advantage is that you can like be like, hey, the agent the capabilities are here
right now. Let me build something that uses this right now, right? So, um,
yeah. Uh any any other questions on for we're
talking about skills in bash. Okay. It seems like there are a lot of skill questions. So um yeah uh I I think at
the back someone you might have to shout. Yeah. So why would you use a skill versus an API? They look very similar to
that Python program there could be a package, right? Yeah. The question was why use a skill
versus an API? Um, good question. I I think that like um when you like these
are all forms of progressive disclosure basically to the agent to figure out what it needs to do. Um, and I'll go
over like uh examples of like you just have an API, right? In in our like in
our prototyping session. Um, it's totally like use case dependent, right?
Like just I think like I don't have a like I don't think there's a general rule. I think it's like read the
transcript and see what your agent wants. If your agent always wants like thinks about the API better as like a
API.ts file or something or API.py file, do that. You know, that's great. Like I think skills are like like sort of an
introduction into like thinking about the file system as a way of storing context, right? And they're a great
abstraction. Um, but there are many ways to use the system. Um, and I I should
say that like something about skills that like you need the bash tool, you need a virtual file system, things like that. So the agent SDK is like basically
the only way to really use skills to like their full extent right now. So um
yeah. Yeah. Back there. Can we expect a marketplace for skills? Yeah. The question was can we expect a
marketplace for skills? So um yeah, clock code has a plug-in marketplace
that you can also use with the agent SDK. Uh we're evolving that over time, you know, like it was like a very much a
v0ero. Um, and by marketplace, I'm not sure if people will be charging for this exactly. It's more just like a discovery
system, I think. Um, but yeah, that exists right now. You can do SL plugins in cloud code. Um, and and you can find
some. So, yeah. Yep. What's your current thinking about when you're going to reach for like the SDK,
you know, to solve a problem? When? Yes. The question is when do I use the SDK to solve a problem? uh if I'm
building an agent basically I I think that like um my overall belief is that
like for any agent the bash tool gives you so much power and flexibility and using the file system gives you so much
power and flexibility that you can always ek out performance gains over it
right and so uh yeah in the prototyping part of this talk we're going to like look at an example with only tools and
an example without with you bash and the file system and compare those two. Um, and yeah, that's what I mean by being
bashful to build. I'm like I I just like start from the agent SDK, you know, and I think a lot of people at Enthropic
have started like doing that as well. So, um, of course I I do want to say that there are lots of times where the
agent SDK is kind of annoying because you've got like this network sandbox container and you're like, I hate like I
don't want to do this, you know? I mean, like I want to run on my browser locally, right? Um, I totally get that.
And I think it's there is like a real performance trade-off. Um the way I think about it is sort of like React
versus like jQuery, you know, like I like I when I was coming up, I was like very into webdev and like you know I was
using jQuery and Backbone and then React came out and it was by Facebook and they're like you have to here's JSX like
we just made this up and and now there's a bundler, right? I'm like it's so annoying. Um, but like they generally
makes the model or it makes it made web apps more powerful, right? And I think
we're sort of like the agent SDKs are like the react of agent frameworks to me because it's like we build our own stuff
on top of it. So, you know, it's real and all the annoying parts of it are just like things where we're annoyed
about it too, but we're like it just works like you have like got to do this, you know? Um, so yeah.
Uh, yeah. Okay. more more skill questions I guess. Yeah. Right here. Uh what's the question? Bash.
Oh, sure. Bash question. Great. I love bash. Custom internal like bash tools. Yeah. How do you let the agent discover that
or do those have to become tool tools? Okay. The question is if you have custom
agent bash tools, how do you let the agent discover that? By custom bash tools, you mean like bash scripts? We have we have bash scripts. Yeah.
Um yeah. So I I think uh where is it? you just put it in the file system and
you tell it like hey like here is a script. Uh you can call it you know I I'm generally thinking in the context of
the cloud agent SDK where it has the file system and the bash tools are tied together. This is kind of an anti
pattern I see sometimes where people are like, "Oh, like we're going to host the bash tool in this like virtualized place
and it's not going to interact with other parts of like the agent loop, you know, and that sort of, you know, makes
it hard cuz if if you got a tool result that's saving a file, then your bash tool can't like uh read it, you know, I
mean, unless it's all in one one container." So, does that answer your question? Like
Yeah, kind of. I mean, like, so you're just saying you just put it in like system prompt or something? Yeah, I just put in system prompting like hey you
have access to this. I would like sort of design all my CLI scripts to have like a d-help or something so that the
model can call that and then it can like progressively disclose like every like subcomand inside of the script. Yeah.
Uh yeah m yeah so uh like my question is around when to reach for the agent SDK. So have
The "Bash" Tool: Giving the agent command line access
you designed or rather would you recommend someone use the agent SDK to build like a generic chat agent as
compared to like oh you know I'm building an agent where you have some input and the agent goes and does some
stuff and finally I care about the output as compared to let's say someone like are you using or do you foresee
using the agent to build like the agent SDK to build like clot the the app
rather than clot code. Uh yeah. So the question is when do we reach for the agent SDK uh does um like uh like would
we use the agent SDK to build cloud.AI which is a more traditional chatbot uh
than cloud code. Um I one I think cloud code is like a very like like interface
is not a traditional chatbot interface but like the inputs and outputs are right like you input code in you you get
like or you input text in you get text out and you they take actions along the way um you might have seen that like
when we rolled out doc creation for cloud.ai AI. Um, now it has the ability
to spin up a file system and like create spreadsheets and PowerPoint files and
things like that by generating code. And so that is like you know we're in the midst of sort of like um like merging
our agent loops and stuff like that. But but broadly like uh like yeah cloud.ai
will like is getting more and more like you see it with skills and the memory tool and stuff more and more file system
pills, right? So, uh, we do think this like a broad thing that you can use just just generally and happy to talk through
examples. Um, yeah, one more question then we'll keep going. Yeah. Still trying to understand the rule of
thumb on when to build a tool or use a tool, when to wrap something with a script or just let
the agent go wild on the bash because I I'll give you an example. Let's say I
need to access a database from time to time. I can use an MCP. I
can wrap it in a script and I can just let the agent call an endpoint from B
directly from bash, right? Yeah, great question. Great question. So, it still trying to gro like when to
use tools versus bash versus codegen and you gave an example like okay, I have a database. Um, I want the agent to be
able to access it in some way. What should I do? Should I create a tool that queries the database in some way? Um,
should I use the bash? Should I use codegen? Right? These are all these are three ways of doing it. Um I think that
they are like you could use any of them and I I think like part of it is like I I think unfortunately there's no like
single best practice, right? This is like kind of a system design problem. But let's say that you want to access your bash your database via tool. You
would do that if your database was very very structured and you had to be very careful about like I don't know you're
accessing like user sensitive information or something like that and you're like hey I I can only take in
this input and I need to like give this output and I have to mask everything else about the database from the agent
right obviously that like sort of limits what the agent can do right like it can't write a very dynamic query right
um if you're writing a full-on SQL query, I would definitely use bash or cogen uh just because when the model is
writing a SQL query, it can make mistakes and the way it fixes it is is its mistakes is by like linting or like
running the file, looking at the output, seeing if there are errors and then iterating on it, right? Um, and so I
generally like if I'm building an agent today, I'm giving it as much access to
my database as possible and then I'm like putting in guard rails, right? Like I'm probably limiting its like right
access in different ways. But what I probably what I would do is like I would
give it right access and put in specific rules and then give it feedback if it
tries to do something it can't do. You know what I mean? And so I know this is like kind of a hard problem, but I think this is the like set of problems for us
to solve, right? Like we built a bash tool parser. Um, and that's a super
annoying problem. Uh, but we need to solve that in order to like let the agent work generally, right? And same
thing with like database like like yes, it's quite hard to understand what is a query doing, but if you can solve that,
you can let your agent work more generally over time. So um yeah I think thinking about it uh like flexibly as
much as possible and keeping tools to be like very very like sort of atomic actions right that you need a lot of
guarantees around. Um a follow on the same thing right uh how
do you ensure the role based access controls are taken care of
how do you uh so the question is how do you ensure that the role based act uh access controls are taken care of
usually that's in like how you provision your API key or your backend service or something like that right like um I
think that like probably what I do is like I create like temporary API keys sometimes people create proxies in
between to insert the API keys um if you're concerned about excfiltration of that. Um but yeah, I
would create like API keys for your agents that are scoped in certain ways and so then on the back end you can sort
of check it's like you know what it's trying to do and like uh if it's a an agent you can like give it different
feedback. So yeah. All right. I have one more question. Um anything you could tell us uh more
about the the memory tool the internal memory tool? Um,
I have I I'm not trying to like keep a secret. I I don't know exactly like I haven't read the code, but I I think it
generally works on on the file system. And so, um, you expose it to uh to the uh SDK or is
it already available? Um, I would say that like we we've had this question a bunch. I would just use
the file system on in the cloud agent SDK. I would just create like a memories folder or something and tell it to write memories there. Um it's like I I don't
know the exact implementation of the memory tool but it does use the file system in in in that way. So yeah
um all right yeah last question on this. Yeah yeah how you are manage for the b and the code how you are managing the like
reusability suppose the same agent is rolled out to hundreds of users and uh
same code every time it is generating and every time it is executing. So how can we use the reusability? Yeah, that's
a really good question. So, uh yeah, let's say you have two agents interacting with two different people.
The question is like how do you think about reusability between agents or how do agents communicate, right? Um I think
uh this is a thing to be discovered. I think like but I think there's a lot of best practices and system design to be
done on like um because traditionally with web apps you're serving one app to
like a million people right and with agents like with cloud code we serve like you know a onetoone like container
when you use cloud code on the web it's like it's your container right and so there's not a lot of like communication
between containers it's a very very different paradigm I'm not going to say that like I know exactly the best system
design to do that right and like I think there's a lots of best practices on like okay these agents are reusing work um
Safety & Permissions: "ReadOnly" vs "ReadWrite" file access
how can we give them like like general scripts that combine together the work that they've done how can we
make them share it um I would generally think this is sort of like a tangent but
on like agent communication frameworks I would say that like we probably don't
need like a whole we don't I I think this is more of a personal opinion I think like we probably don't need to
reinvent and uh like a new communication system. There are like the agents are good at using the things that we have
like HTTP requests and hash tools and API keys and uh named pipes and all of
these things. And so like probably like the agents are just making HTTP requests back and forth from each other, you
know, using HTTP server. Um there's a bunch of interesting work there. I've seen people make like a virtual forum
for their agents to communicate and they like post topics and like reply and
stuff like that. Um kind of cool. I think there's a lot of things to explore and discover there. Yeah. Okay. Um going
to keep going a little bit. How are we doing for time? Okay. It's got an hour left, I think. Okay. Um
cool. So, an example of designing an agent. Uh this is a like yeah let's this
is not the prototyping session but I think this is like will be a good sort of like like lee way into it. Let's say
we're making a spreadsheet agent. Uh what is the best way to search a spreadsheet? What's the best way to
execute code like or what's the best way to take action in a spreadsheet? What is the best way to link a spreadsheet?
Right? These are all like really interesting things to do. Uh I'm going to do like a Figma and we can go over it. Um, if someone could grab a water as
well, that'd be great. I like could really use water. I'm uh Yeah. Yeah. Okay. Um, thanks. Uh,
okay. So, we're going to um Yeah, let's let's talk through it. Uh,
or why don't you spend like a couple minutes yourselves thinking about this question? You have a spreadsheet agent.
You want it to be able to search. You want to be able to like gather context, take action, verify its work. How would
you think about it? Right? So like just spend some time thinking through that. Take some notes or something.
Okay. Is everyone get had a little bit of time to think about this? Does anyone want more time or want to just dive into
it? Okay. Uh, what's the best way for an agent to search a spreadsheet? Realizing
I have to type with one hand now. Um, I should figure this out because I'm
going to type later. Okay. Um, the Okay, searching a spreadsheet. Uh, any any
ideas how do you search a spreadsheet? Like what would you do? CSV.
Okay. You've got a CSV. Okay. And now like your agent wants to like search the CSV. What what does it do?
A GP. Okay. Uh what does the GP look like? Needs to look at all the headers.
Looks at the headers. Okay. Headers of all sheets. Okay. Great. Yeah. Yeah. And let's say
I'm looking for the revenue in 2024 or something. Um
now I've got my headers like uh I'm just going to pull up a spreadsheet, right?
Um let's say that the revenue is in there's a revenue column and then
there's like a uh so yeah let's see
okay so yeah let's say it's something like this right like um How do I get revenue in 2026? Right? So, this is sort
of like a tabular problem, right? Like there is revenue here and there's also 2026 here, right? So, it's like a
multi-dimensional step, right? We could look at the headers that will then give us uh like if you just pull this, you'll
get 100, 200, 300, right? So, we need a little bit more. Uh any other ideas?
Yeah, there's a bash tool for it. Uh, a AWK, I think. O. Okay. Yeah. Yeah. Yeah. And what
would it A for? Well, depends on what you what you're looking for. Yeah. Yeah. Yeah. That that's a
question, right? Like what is the user looking for, right? They're probably looking for something like this, like revenue in 2026, right? Um,
maybe use the APIs to use the Google tools to add all the numbers together or
V look up something like this, right? Yeah. So the idea is like use the APIs like use the Google APIs to like look it
up. Um that's great. Uh but yeah, let's say we're working locally. We need to sort of design these APIs. Yeah.
SQLite ord CSV directly and work. Oh, interesting. Okay. Yeah, I didn't
know that. That's great. So yeah, you you use SQLite to query a CSV. Um that's
a great like sort of creative way of thinking about API interfaces, right? like um if you can translate something
into a interface that the agent knows very well that's great right and so like
if you have a data source if you can convert it into a SQL query then your agent really knows how to search SQL
right so thinking about this transformation step is really really interesting it's a great way of like designing like an agentic search
interface so um yeah over there sorry real quick while we're talking about tools because you can use TSV for
some of the stuff as well um is there any good ranking within the with Is Cloud smart enough to start ranking the right tool for the right
job? Because that's kind of what we're talking about here is right tool for the right job. Yeah. Is Cloud smart enough to write rank the right cool tool for the right
job? Uh yeah, if you prompt it, you know, like or like I I think that's one of those things where like I don't know, let's find out like let's read the
transcript. Uh if it's not like how can you help it? Yeah. Just sort of like I I think all of
these things are like an intuition, you know? It's like like kind of like riding a horse. Not that I've ever rode a
horse, but I know just like I imagine it's like running.
Yeah. Like you like, you know, you're sort of giving these signals to the horse. You're calming it down. You're trying to understand what it how how do
you push it faster? You know what I mean? And sort of like it's a very organic like thing, right? Um like I
think we like to say that models are grown and not designed, right? And so we're like sort of understanding their
capabilities. Yeah. Uh yeah what and where it is. Yeah quick question. So is a way to add like
metadata to the spreadsheet? Can you give descriptions in a different document? Yeah that's for example KPI
to build intelligence to ask questions. Yeah. So that's another great pattern is like okay can you add metadata to a
spreadsheet? So these are some questions that you might want to think about before like when you're thinking about
search is like what pre-processing can you do to make the search better, right? And so one example is that you translate
it into like a SQL format or something where you use something that can query it, right? That's like a translation
step. Another step is like maybe you have a tool or um like a a
pre-processing step where another agent annotates the the spreadsheet and and like adds information so that the agent
can then like search across that information better. Right. So um yeah, one more. Um, I was just curious what I
mean all those tools sound great, but yeah, why can't the agent just, you know, do what was suggested, read the header and then just get the date?
Context Engineering: Using ls and cat to build dynamic context
Like I feel like that should pretty trivial or retest.
Yeah, probably I should have like prepared this in code. But yeah, I I built a ton of spreadsheet agents
before. Basically, it's not it's kind of hard to do. Yeah. Yeah. So, um, basically what what I would
think about is like, so we we've got like Okay, I Sean, do you have suggestions on how it
can talk and code at the same time? Go ahead.
Oh, I see. Yeah. Do you work at Whisper Flow or something or
Stick the mic in your shirt? There's a microphone button. There's a microphone button on the bat.
Stick the mic in your shirt. Oh, I I I just don't trust that stuff, man. Okay. Um,
maybe I shouldn't be working in an AI lab. Um, okay. So,
uh, let's see. Hold on. Hold on. Um,
search. So, one way to do it is like
you see in spreadsheets, right? Like you can say here you can design formulas right so like B3
2
right so this is a syntax for example that the agent's pretty familiar with like B3 to B5 right and so you can
design an agentic search interface which is like this right like B3 B5 or something right so like your
agentic search interface can take in a range right it can taking a range string, right? And these are things that
like the agent knows pretty well, right? Like you can um do SQL queries, right?
Agent knows SQL queries pretty well, right? Um and uh like these you can also
uh do XML, right? Sorry, the font is so small. Um
okay. Uh yeah, you can also do XML. I'm not sure if you guys know but like
uh XLX files are XML in the back end right and XML is very structured uh you
can do like an XML search query uh and there are different libraries that can do that so that's one example right is
like how do you search and gather context and I hope this sort of like illustrates to you that like gathering context is really really creative right
like and and like there's so many iterations and if you just if you've only tried one iteration it's probably
not enough right like think about like as many different ways as you can like try these out, right? Like try SQL, try
try the CER, try try the GP and O and like all of these things and um have a
few tests that you're trying across different things and and see what the agent likes and what it what it doesn't like. Um it's going to be different for
each case. Sorry. When you say agent, you're referring to
the the model or because we're building an agent here. Yeah. and you're relying on already free
existing knowledge of how to handle XML who's who's doing that the model.
Yeah, because the question is like who uh where is the knowledge come from? Is it the model? Is it like what is what do
I mean by the agent? Yeah, generally I think what you're looking for is like you have a problem you want to make it
as indistribution as possible for the agent, right? And so the agent knows a lot about a lot of different things. It
knows a lot about for example finance, right? So if you ask it to make a DCF model, it knows what DCF is, right? And
you can if if you want to give it more information, you can make a skill, right? But so it knows what DCF is. It
knows what SQL is. Can it combine those things together, right? And so like uh
ideally you want to like your your problem is going to be out of distribution in some way, right? like
like there's some like information that's not on the internet or something that you have um or something somewhat
unique to you and you want to try and like massage it to be as in distribution as possible. Um and uh yeah it's it's
very very creative I think like uh you know it's not like a it's not a science
to be very much like an art. So, um,
yeah. Okay. So, we we've tried gathering context, then taking action. Um, we can
probably do a lot of the same stuff here that we've done before, right? Like we can do like insert
2D array, right? Um, if we've got like a
SQL interface, right, we can um we can do a SQL query, we can edit XML. Um,
these are like often very similar, right? Like taking action and gathering context that that you probably want a similar API back and forth. And then the
last thing is verifying work, right? Like how do you think about how do you think about that? Um, check
for null pointers, right, is one of the ways to do it. Um, any other ideas on on
verification or Yeah. Sorry, I'm I'm a bit confused if you say
like when when you're using other SDKs to build Asian, I don't need to tell it
how it should gather the context. Sure. I just give it the context and explain this is what like basically I explain in
plain English what is meant to do. Yeah. And what I tend to do and you tell me if
I'm wrong, I actually end up creating a separate agent for QA. Oh, interesting.
To to verify because I don't trust the agent to verify itself.
But I'm just I'm I'm just a bit I confused about the level of detail I need to provide the agent in that
example. Yeah. Okay. So the question is about um giving context to the agent versus
having it gather its own context. Uh you mentioned that you sometimes use a Q&A agent. Uh can I ask like what like
domain you you're building your agent in or in uh cyber security.
Okay. Sure. Yeah. Yeah. Um, I think that
I I think I need to like look into more specifics, but the cloud agent SDK is great for cyber security and like I
would generally push people on like let the agent gather context as much as possible, you know, like let it find its
own work as much as possible. Um, you're trying to give it the tools to find its own work. The way I think about this is
kind of like let's say that someone locked you in a room and they were they were like giving you tasks, you know,
like that's your what your job was like a Mr. Beast sort of like scenario, right? Like you get $500,000 if you stay
The "Monitor": Viewing the agent's thought process in real-time
in this room for 6 months. Um then like like someone's giving you a message,
what tools would you want to be able to do it, right? Like would you just want like a list of papers or like would you
want a calculator or like a computer? Right? Probably I would want a computer, right? I'd want Google. I'd want like
all of these things, right? And so like I wouldn't want the person to send me like a stack of papers being like, "Hey,
this is probably all the information you need." I'd rather just be like, "Hey, just give me a computer. Give me the problem. Let me search it and figure it
out." Right? And so that's how I think about agents as well. like they need like like you know they're stuck in a
room. So I need to give them tools. So if you can go back to the slides you have to the graph you had
to the graph like this or yeah the so basically that gathering context is basically these are the tools
I'm offering it. Yeah. Exactly. Yeah. You're I'm giving it like maybe an API for code
generation. Maybe I'm giving it a SQL tool. Maybe I'm giving a bash. These are all like examples, right? So yeah, you
have one more question question. So for all the agents that you're having,
do they share the same context window? Interesting. Yeah. So do agents share
the context window? I think I think this is like an interesting question just overall about how you manage context. Um
I think and I haven't talked about this too much yet, but sub agents are like a very very important way of managing
context. Um, I think that this is like we're using more and more sub agents
inside of cloud code and I would think about like doing sub agents very generally. So like what we might do for
the spreadsheet agent is maybe we have a search sub agent, right? So like sub
aents are great for when you need to do a lot of work and return an answer to the main agent. So for search, let's say
the question is like how do I find my revenue in 2026? Maybe you need to do a bunch of results. Maybe you need to like
uh search the internet, maybe you need to search the spreadsheet, things like that. And there's a bunch of things that don't need to go into the context of the
main agent. The main agent just needs to see the follow result, right? And so that's a great sub agent task. Um I
don't have a dedicated sub aent slide here, but like yeah, they're very very useful and I I think a great way to
think about things. Um yeah, and just to just to build on that question actually
for verification for example, you can imagine doing that through a skill or a sub agent. You might even want to have
an adversarial like the security example is a great one. Want to have really go to town on it and not really have any
sympathetic relationship with the work already done. It's a very I I get it's a spectrum, but do you like are you saying
yes, you'd use a sub agent here, you'd use a skill? How would you think about this? Yeah, definitely. So question on like uh
do you sub agents or oh I'm sure it'll work just to make sure. Oh, sure. Okay. Yeah. Yeah. Thank you.
Appreciate it. Um okay. Yeah. Uh can you use sub agents for verification? Uh yes.
I I think this is a pattern. I think like ideally the the best form of verification is rulebased, right? You're
like is there like a null pointer or something? Uh that's like easy verification. it it doesn't lint or
compile like like as many rules as you can try and insert them and again be creative right like for example uh in
cloud code if the agent tries to write to a file that we know it hasn't read yet like we haven't seen we haven't seen
it enter the read cache we throw it an error we we tell it like hey uh you
haven't read this file yet try reading it first right and that's an example of sort of like a deterministic tool that
we insert into the verification step and So as much as possible like anytime you are thinking about you know verification
first step is like what can you do deterministically what like what like you know outputs can you do and again
like when you're choosing which a like types of agents to make the agents that have more deterministic rules are better
you know like they just like like it just makes a lot of sense right so um of
course as the models get better and better at reasoning then you can have these sub agents that check the work of the main agent the Main thing there is
to like avoid uh context pollution. So you probably wouldn't want to like fork the context. You'd probably want to
start a new context session and just be like, "Hey, yeah, adversarily check um
the work of like this this output was made by a junior analyst at McKenzie or
something. They graduated from like not a grade school like their GPA like you
know like like just like feed it a bunch of stuff and then tell it to critique it, right? Like that's like one of the
tools of the sub agent, right? And so um yeah, the more you like
uh yeah, as the models get better and better, that sort of verification will become better as well. Um but doing it
deterministically is like a great start. Yeah. Just a question about the verify work.
So yeah. Um so let's say we found no pointers.
It's probably easy to just say, "Okay, fix it." But like you know let's say we deploy it to production and the client
is using it that's not us and they somehow get into a spot where the whole spreadsheet is deleted and so like like
on what level do we need to bake in like the ability to like undo tools and
because like um let's say the QA agent returns that their spreadsheet is empty.
Yeah. Not necessarily is able to undo for so like what was your advice there?
Yeah. So the question is like how do you think about state and like undoing and redoing being able to um fix errors
basically right? I think this is like a really good question and honestly another sort of like um like when you
think about like what are agents good at right like or what problem domains are agents good at? How reversible is the
work is like a really good intuition right? So code is quite reversible. you can just like go back, you can undo the
git history. We we come with like, you know, these atomic operations right out of the gate, right? Like I use git
constantly through cloud code. I I don't type g commands anymore, right? So, um that's like a really good example. A
really bad example is computer use, you know, because computer use has is not
reversible in state, right? Like let's say you go to like door-ash.com and you add like the user wants you to order a
Coke and you add order a Pepsi now like you can't just go back and click on the Coke. You have to like go to the cart
and you have to remove the Pepsi, right? And so your mistake has like compounded this like you know this state and the
state machine has gotten more complex, right? And and so like whenever you're dealing with like very very complex state machines that you can't undo or
redo of it does become harder, right? And I think one of the questions for you as an engineer is like can you turn this
into a reversible state machine kind of like you said can you store state between checkpoints such that the user
can be like oh my spreadsheet is messed up right now just go back to the previous uh checkpoint right uh
potentially even can the model go back to previous checkpoints um I I think someone had this like time travel tool
um that they were giving one of the coding agents which was kind of cool where you're like it's like you can time
travel back to a point before this happened. You know what I mean? Uh it's kind of fun. I think like all of these
tools, some of them don't work that well yet, but you know, we'll we'll get there. Um yeah, thinking about state and
verification is is very useful, right? So, um yeah, quick question at the back.
Yeah. Um I'm kind of curious about scale. Um so
Handling "Stuck" States: Feedback loops and error correction
what if the spreadsheet is like millions of rows million and thou hundreds of
thousands of columns right or just like any sort of database like in that type of situation how would you go about
searching there's obviously a context you have to commentate for yeah this is great um I probably should
have done the spreadsheet example as my coding example for for a preview my coding like agent is a Pokemon agent um
probably spreadsheet would have been Okay. Uh the question was what if the spreadsheet is very big? If you have a
million rows, uh how do you think about 100 column yeah 100,000 columns or 100 columns or
whatever like how do you think about it right like your database is also very big like how do you how do you do that?
Um I think for all of these things uh one of course as the data becomes larger
and larger it's just a harder problem like you know it just absolutely is your accuracy will go down right like cloud
code is worse in larger code bases than it is in smaller code bases right as the models get better they will get better
at all of that um for all of these I would think about like how would I do this if I had a spreadsheet that was
like a million columns and a million rows what would I do I I mean I would need to start searching for it Right. I
would need to be like like if I'm searching for revenue, I'd be like searching Ctrl+F revenue and then I'd go
check each of these like results and I'd be like is this right? And then like I'd see like hey is there a number here? And
then I'd probably keep a scratch pad like a new sheet where I'm like hey like
equals revenue equals this you know and and and store this reference and and keep going. So I I think that's a good
way of thinking about it is like the model should you should never like read the entire spreadsheet into context
because it would it would take too much right like um you want to give it like the starting amount of context that's
also how you work right like let's say that you open up the spreadsheet what you see is rows is this right you see
like the first 10 rows and the first like you know 30 columns or something
right that's what you see you don't load all of it into context right away you probably have an intuition for like,
hey, I should load more of this into context, right? And and like, oh, I should navigate to this other sheet,
right? And this other sheet has more data, right? Um, but you need to like
sort of you gather context yourself, right? And so the agent can operate in the same way. It can like navigate to
these sheets, read them, like try and like keep a scratch pad, keep some notes and keep going. So that's how I would
think about it. Uh, yeah, in the back. Yeah. So my question is about managing context pollution and actually I guess
relates to the previous question. Um do you have a rule of thumb for you know
what fraction of the context window do you use before you start hitting diminishing returns or it becomes less
effective? Yeah the question is yeah context management. Do you have a rule of thumb for like uh how much of the context
window to use before it becomes less effective? This is actually I'd say a
pretty interesting problem right now. Um, I think a lot of times when I talk to
people who are using cloud code, they're like, I'm on my fifth compact. I'm like, what? Like like I've I like almost have
never done a compact before. You know what I mean? Like I have to like test the UX myself by like like forcing
myself to get compacted. Um just because like I I tend to like clear the context window very often right when I'm using
cloud code myself just because like um at least in in code the state is in the
the files of the codebase right so let's say that I've made some changes uh cloud code can just look at my git diff and be
like oh hey these are the changes you made it doesn't need to know like my entire chat history with it you know in
order to continue a new task right and so in cloud code I clear the context very very often often and I'm like,
"Hey, look at my outstanding get changes. I'm working on this. Can you help me extend it in this way?" Right?
That's like a way of thinking about it. And um when you're building your own agent, like let's say we're building a
spreadsheet agent, it gets a little bit more complex because your users are less technical, right? And they don't know what a context window is, right? Um that
is like I'd say a hard problem. I think there's like some UX design there of like can you reset the conversation
state, right? like can you maybe every time the user asks a new question can
you do your own compact or something and can you like uh summarize the context?
Um does it like in a spreadsheet a lot of the state is in the spreadsheet itself so it probably doesn't need you
know to know the entire context. um can you store user preferences um as it goes so that you remember some
of this stuff you know like there's a lot of like again like it's an art there's like so many different angles
and ways in which you can do this right um but yeah you are trying to like sort of minimize context usage um you
probably don't need s a million context or something you know I mean like you just need good context management like UX design yeah um yeah
um just I just want to ask the sub agents were made to protect the conduct of the core agent. Right.
That's right. Yeah. Sub agents were made to spreadsheet. Would we be able to use multiple sub agents and try to make a
process where we chunk up the spreadsheet in the case where it's super large. So then the agents can kind of run through each portion like in
parallel with each other. Yeah. Yeah. I mean um Yeah. So like one
of the things I love about cloud code is that we are like the best experience for using sub agents like especially sub
agents with bash. It is very very good. I didn't really quite realize uh all the
pain. Um I think if anyone's going to QCON, I believe Adam Wolf is giving a talk on QCON about how we did the bash
tool. Adam's a legend and the bash tool such a good job. Um when you're running
parallel sub agents at the same time, bash becomes like very complex and there are lots of like like race conditions
and stuff like that and and so there's a lot of work that we've solved there, right? So this is like one of the things
I love about cloud code is you can just be like hey like spin up three sub agents to do this task and it will do that and in the agent SDK as well you
you can just ask it to do that. So number one sub agents are a great primitive in the agent SDK and I haven't
seen anyone do it as well. So that's like a big reason to use it. Um yes generally you want it you want these sub
agents to preserve context. Let's say you have if you have a spreadsheet, you could potentially have multiple read sub aents going on at the same time, right?
So maybe the main agent is like, "Hey, can this agent read and summarize sheet one? Can this agent read and summarize
sheet two? Can this re agent summarize sheet three?" And then they return their results and then the agent maybe spins
off more sub agents again. Right? So this is like another knob you have. Um, and I I think what I
want to say is like there's like we've talked so many so much about like all these different
creative ways that you can like do things. This is like the level at which you should think about should have to
think about your problem. You should not really in my opinion think about like uh like how like how do I spin off a
process to make a sub agent or like you know like the system engineering between like behind like what is a compact or
something right? So like we take care of all of this for you in the harness so that you can think about like hey what
sub agents do I need to spin off right and like how do I create a a a genic search interface and how do I like
verify it's work these are the really core and hard problems that you have to solve and any time you spend not solving
these problems and and solving like lower level problems you're probably not delivering value to your users you know
and and so um yeah I I think sub agents big fan of the agent SDK in case of yeah
uh yeah so uh like we have this uh text and the verification task so where exactly we
need to put the verification in this example I let's say after generation of the SQL query I can verify it is the
right query is generated or not that is the one path second path is like generation the query directly executing
and once I will get the output then I will do the verification so uh and how
how agent can choose dynamically like which one is the right path? Yeah. So the question is like where do
you do verification? Uh is it only at the end? You do it in the middle like things like that. I would say like
everywhere you can just like constantly verify right like uh like I said we do
some verification in the read step of the of cloud code right so that's like a great example um you can do it at the
Multi-turn Complex Tasks: Building a "Research Agent" demo
end you should absolutely do it at the end but at any other point if you have rules or heruristics especially uh like
if for example you're like hey one of my rules is that you shouldn't do like the the total number of columns you should
search is should be under 10,000 or under a thousand or something that's like a a nice way of doing it, right?
Like similarly here like maybe you shouldn't be inserting like a huge like row like of of values like give feedback
to the model be like hey chunk this up right you throw an error and give a feedback and the great thing about the model is like it listens to feedback it
will read the error outputs right and then it'll just keep going so yeah verification is definitely like I I know
I have it in this like as a sort of a loop but um it's definitely more like
verification can happen anywhere and and should happen anywhere like like put it in as many places as you can. So, um all
right, I do need to start doing some of the prototyping, but I'll take one more question. So, right here. Yeah. How do we say how do we form the steps?
I mean, like how do we say the agent that go search first and then this step and then do that step?
How does the loop actually from the start point to the end? How do we you just tell it? So, like uh
like is it in a system prompt or Yeah, in the system prompt. Yeah. So like with cloud code, we just give it the bash tool and we're like, "Hey, like
gather context, read your files, uh do stuff like run your linting, you know
what I mean?" Um, and so yeah, again with the agent, you don't need to enforce this, right? You don't need to tell it, hey, like you need to do this
because like sometimes it might not be necessary, right? Like let's say that someone is asking a readonly question
for your spreadsheet. you don't need to like verify that uh
like you're that there are no compile errors, right? Because there you haven't done any write errors, write write
operations, right? So, um let the agent be intelligent and and like in the same way that you would like that same
freedom when you're doing your work, right? Uh you're trapped in this box or whatever like same way, right? Uh so,
okay, cool. I I I do want to try and see if I can do some prototyping now that we have this uh uh the the holder as well.
Um okay, yeah, execute lint. We've done a bunch of Q&A. Okay. Prototyping. Okay.
Let's say that you have an agent, right? Like you want you want to build an agent. You come out of this talk and
you're like great. I have a bunch of ideas. How how do I do this? Um I think what I say overall is like building an
agent should be simple. Your agent at the end should be simple, but simple is not the same as easy, right? So like it
should be very simple to get started and it is just go to cloud code, give cloud
code some scripts and libraries and uh custom cloud identities and ask it to do
it, right? That's what we're going to do, right? Um that's like it should be so easy to be like, hey, this is my API.
This would be like an API key. uh can you like go search like you know I don't
know like my customer support tickets or something and organize them by priority or something like that right and then
look at what cloud code does and and and iterate on it right and this is like a great way of like just skipping to like
the hard domain specific problems that you have right so you have a lot of like domain problems like how do you organize
your data your agentic search how do you like create guard rails on your database these are all questions that you can
just start solving right away with cloud code, right? And so try and like build something that feels pretty good with
cloud code. And I think generally what I've seen is that you can do this and get really good results just out of the
bat using cloud code locally, right? And and you should have high conviction by the end of it, right? And so um yeah, I
think like I forgot more info. Watch my AI engineer
talk. Uh this is like a deck for internal that we were using. Um okay. So
uh yeah, I'm going to be inserting this. So yeah, you're getting what we what we show customers, right? So um okay. Uh
yeah. So yeah, use use cloud code. Uh again, simple but simple is not easy,
right? So like the amount of code in your agent should not be like super large. Doesn't need to be huge. doesn't
need to be extremely complex, but it does need to be elegant. It needs to be
like what the model wants. You want to have this interesting insight. Let's turn the the model into a SQL query. Oh,
let's turn this spreadsheet into a SQL query and then go from there, right? So, um, think about it that way. And cloud code is like a great way of doing that.
So, okay, uh, let's make a Pokemon agent, right? This is what we're going to do. Uh, Pokemon is a game with a lot
of information. There are thousands of Pokemon, each with a ton of moves. Um,
uh, we want to be pretty general. And so there is actually like a Pokey API. Um, and the reason I chose Pokemon is just
because like I know that you guys have your own APIs as well, right? And they're all like very unique, right? And
uh, so I wanted to choose something with a kind of complex API that I haven't tried before. Um, so the Poke API has
like, you know, you can search up Pokemon like Ditto. Uh, you can search up like items and things like that. Um,
and so it's got this like yeah, this custom API. You've got everything in the
games, right? So, um, and yeah, like one of the Quest things
agent might want, your user might want to do is make a Pokemon team, right? I love Pokemon. I know very little about
making an interesting Pokemon team for competitive play. Uh, could my agent help me with that? That'd be that'd be
cool, right? So, um, my goal is to make an agent that can chat about Pokemon and
then we will like, you know, see what we can do, right? And and and how far we get. So, um, I've done like some of this
work already and I will like open up and show you. So, um, the first step and the
prompt here is like the first step is I'm I'm going to do mostly code generation for this, right? And so, um,
let me Is that going to be on GitHub somewhere? Uh, actually it is. Uh, yeah, it's on my
personal GitHub. Oh, yeah. I was going to commit all of this as well.
Yeah. Um, yeah. Yeah. So, uh, I think my personal GitHub is, let's see. All
right. Is it secure GitHub or does it have malware in it?
You guys are AI engineers. Yeah. Like, if you can get owned, that's that's your fault. Um,
yeah. So, um, yeah, you can you can clone this if you'd like. Um, I need to push the last
change this. So, okay. So, um, yeah. Can you guys see this? Should I put it in dark mode instead or is this fine? Like,
um, dark mode. Dark mode. Okay.
Okay. Okay, this better. No. You want a different dark mode?
Dark hard. Okay. I don't think this as good as it's going to get, guys. Um, okay.
I Is it How does this work? Can you guys still hear me or Okay. Um, okay. So here's an example of
like I've taken the the prompt I gave it was
hey I go search Pokey API for its API and create a TypeScript library right
and so this is all by coded um and so you can see here that it's created this like interface for Pokemon right and so
it's created like this Pokemon API I can get by name I can list Pokemon I can get
all Pokemon I can get species and ability abilities and stuff like that. And so like this is just a prompt that I
gave it, right? And it generated this like TypeScript API. It also did it for moves. Um and then it's created this um
like uh it's created this like API that I can use import Poke API right from the
Poke API SDK. And uh yeah, you can see like sort of how it's like set set this
up. And uh now in contrast, right, and and so this is the cloud. MB, right?
This is a TypeScript SDK for the Pokey API. Um, this is like the the modules in
the Pokey API. Here are some of the key features. Um, uh, I'm asking it to write
scripts in the examples directory and then it will execute those scripts to help me with my queries, right? Um, and
I give it some example scripts. It doesn't always need all this information, right? Like, uh, but yeah, fetching Pokemon, listing the resources,
getting data, things like that. So this is like my agent really. It's like a prompt I gave it to generate a
TypeScript library and then this cloud.md and I I can chat with it in cloud code. I'll also show you a version
of it that is just tools, right? So here I'm using the messages completion API,
right? And I've given it a bunch of tools from the API. So like get Pokemon, get Pokemon species, uh get Pokemon
ability, get Pokemon type, get move. So you've defined all of these tools and you can see that like you know I also
just gave it a prompt and told it to make the tools. Um it doesn't want to make a 100 tools right like there's a
ton of smoke on or sorry um pokey API data. Um but like it you know there's
only so many parameters it can do. So it's got this like tool call and now um
and I I made like a little chat interface with it. Right. So let me now go here and say like uh this is my tool
calling. Um
did I
great. So yeah, here we've got this chat.ts, right? Um
I I use bun when I'm prototyping stuff just cuz like I don't want to compile from Typescript to JavaScript. Um and uh
again bun has like linting built into it. Uh it's a way of like simplifying for the agent so the agent doesn't need
to remember to compile but TypeScript is better for generation because it has types right. I'm going to start this like fun chat and then I'm going to try
like, okay, what are the generation two water Pokemon?
Um, and you'll see that it's it's starting to like search and I'm logging
all the tool calls here. This is very very important, right? Because like it needs to like do the tool calls. And so
you can see that what it's doing is like it's searching a bunch of Pokemon. Um,
and then it told me, okay, here are the water Pokemon for Gen 2, right? It's got Toadile, Crocenoff, or alligator. You
can see sort of like how it's thought like in between each step, it's thinking through um the previous steps. Right
now, like let's say that I want to do with claw code. I think I might need to
uh I need to delete this example. Um Oh, yeah.
Small question. How do you log the the tool calls? It's like there's just an
argument you can Oh yeah, this is um this is like in the normal API, right? So I just like uh in
the model every time it logs it, I just call this this is in the like normal anthropic API um in the SDK. I I'll get
back to get to the SDK. Um it's just like you just log every system message. So, um, just doing it in console logs.
Does that make sense or Yeah. Okay. Yeah. So, so that chat interface you were showing, is that just using the regular
API or Yeah, that's using the regular API. So, not the agent SK, not the agent SDK. Yeah. Yeah. And so,
what I'm going to do here is um here I'm going to delete the script
because I don't want it to cheat. Um, but okay. So, here you know that um I've
I'm just opening cloud code. I've created a bunch of files here. I'm going to say like, can you tell me all the
generation 2 water Pokemon? Um, and then we'll see what it can do, right? So, um, I forget if I need to
prompt it to write a script or something. I think it'll be fine. We'll see what happens. Do you mind going to the core SDK file
and just showing you talked about getting context and then action and then verification? Can you show that in the
code and how we're configuring the tool description? Yeah. So, uh, we haven't done the SDK
part yet. So, so far I've just put put some APIs in cloud code. Yeah. Yeah.
Sorry, I thought I missed that. That's Yeah. Yeah. Yeah. Of course. Okay. Um, but yeah. So, okay. You can see here um
it's it's given me a lot more, right? And um
yeah, it's given me a lot more. So, it it it's it's saying there's 20 water Pokemon, right? And I think this is
roughly right. I've like um uh what did it do?
I think it just knows. Okay. That's funny. Live this. Um
um anyways uh yeah Pokemon is slightly in distribution
which is which is I I guess good um but yeah so like what what it will do
Refactoring patterns: "Hooks" and deterministic overrides
is like it will try and like write like a script and uh because you don't want it to think as much right so here it's
like okay what I'm going to do is um let's see gen two water type Pokemon
and where is it? Okay. So, yeah, you can see here it
knows like, okay, the start of the generations. It fetches these uh per API. Um I guess this decided not to use
like my pre-built API here. Um and then uh yeah, and and then runs it,
right? So, um I think I need to like improve the cloud. MV for this. But anyways, you can see that like it's able
to like check 200 plus Pokémon and then check for their type and and you know
get their get their information, right? So this is like uh just a quick example on like how to do codegen and how to use
cloud code to do it, right? So um we'll run this script and then like uh um like
keep going, right? So, uh it will give me the output and um yeah, basically
what I want to show, let's see, we have roughly 15 minutes left. Um
play Pokemon. The time play Pokemon. Yeah. Yeah. Actually, this is one of the demos I was thinking of doing. Um Cloud Code plays
Pokemon. So, like let's say you want to do like an agentic version of Cloud Plays Pokemon. How would you do it?
um what you would do I think is like you would give it access to the internal
memory of the uh the ROM right and so let's say that it wanted to find its
party it could search that in memory and Pokémon Red is like a very well in distribution uh reverse engineered uh
game right and so it could search in memory to be like hey these are the Pokemon um these are like this is how I
figure out where the map is this how I navigate it right so this is like maybe exercise to the reader if you want to
try it out. It's like um there is like a no.js GBA emulator. Um I think I have to
legally say you have to go buy Pokemon Red and try it. Um but yeah, I think like uh yeah, good example. Anyways,
here so it's it's fetched all of them and it's listed all their types and um yeah, you can see how it's like used
code generation to do this, right? So um a quick example of using cloud code to prototype this. Um now there can be like
more interesting like data here. So um I do want to leave time for example. So I
think I'll just sort of like for questions. So I'll just sort of go through like an example. Let's say
you're making competitive Pokemon. Competitive Pokemon has a lot of different variables and data. So, this is like a a
text file from this online like a library basically which stores like all
of the Pokemon and their like moves and who they work well with and don't work
well with and you know like who they're countered by and all of these things, right? So, there's a ton of data here,
right? And it's all in text file. Um, which is actually pretty good for cloud code, right? because I can say like,
okay, um, hey, I'm going to give it a little bit more data. Normally, I put this in the, um, check the data folder.
Tell me, I I want to make a team around Venusaur.
Can you give me some suggestions based on the Smogon data? Um,
and Smoke on is like this online API. And so I'm I'm not entirely sure what it'll do here yet. I haven't done this
query before. Uh, but we'll see. I think it'll be it'll be fun. Um,
where am I? Oh, I see.
Um, yeah. But what I wanted to do is sort of grapple through this this data,
right? And and sort of figure out from itself from first principles, not having seen this data before, how can I like
answer my query, right? So um while it does does that I'll I'll take any
questions. Yeah. Um first of all great work job. Uh so
this is like really on top of cloud code and so my question is if we were to
deploy this customer basically are we supposed to have cloud code running in like a like a swarm or are we
somehow able to take the cloud code part out just bot and the agent SDK?
Yeah. So, let me show you like very quickly like what the what it looks like to use the agent SDK here. Um, so I've
already done this file system, right? And again, I want you to think about the file system as a way of doing context
engineering, right? Like this is like a lot of the inputs into the agent. So, my actual agent file is like 50 lines,
right? Um, and it's mostly just like random like boiler plate, right? Like I
guess, yeah, it's decided to stop it from uh writing scripts outside of the custom
scripts directory. Again, fully backcoded. So, um yeah, you can see like it just runs this query, takes in the
working directory um and uh like like runs it in a loop,
right? And so probably I'd want to like turn into like some allowed tools here and stuff, but it it's very simple. And
and so um if I were to like productionize this, the first step I do is like okay, I I've tested it on cloud
cloud code. It seems to do pretty well. I write this file. Then I put it there are two ways to do it. So one is I do
think that like local apps might be coming back with AI because I think that
like there's such an overhead to running it. Like for example, cloud code is a front-end app, right? Like it works on
your computer. So maybe the way I shift this as a Pokemon app is like hey I have like an app that you install and it
works locally on your computer and writing scripts. I think that's one way of doing it, right? Um the other way is
yeah you have you host it in a sandbox. Um and again there's a bunch of different sandbox providers that make it
really easy like Cloudflare has a good example um of using the agent SDK and it's just like sandbox.st start you know
and then like bun agent.ts ts and that's kind of all it takes, right? Like it's
like like they've abstracted away a lot of it. Um so you run like the sandbox um
and then you communicate with it and um yeah I think there is like some very
interesting stuff that I'm not sure I had time to get to but um like I I think
some interesting questions are like um yeah like how do you do this sort of
like service now you're just spinning up a sub like a sandbox per user. Um, there's a lot of like I'd say best
practices to solve here. One thing I just want to call out for you guys to think about, um, if you're making a an
agent with a UI, like let's say that you have, uh, yeah, my Pokemon agent and I
wanted to have an UI that is adaptable to the user, right? Like maybe some users are doing team building, some
users are helping it with their game, some users just want pictures of Pokemon. How would how would I have an
agent that adapts in in real time to my user, right? Um the way I would do it is
in my sandbox, I would have a dev server, right? And the dev server would expose a port. Um it would run on bun or
node or something. It would like expose a port. The agent could edit code and it would live refresh and and your user
would be interacting with that website. This is how a lot of like site builders like lovable and stuff work, right? they
they use sandboxes and they host essentially a dev server. And so
thinking about this for your users, if you want a customized interface, this is a great way to do it. Um, okay, let's
see. Let's see what it did. Um,
okay, cool. Okay. So, um it's like written this like script has generated
like show me some base stats and suggested it a like um uh a move set and
some teammates and you can see sort of like see what did it do? Um control E.
Um yeah. Okay. Okay. So, you can see here what it started doing is like it started
searching for Venusaur, right? And it started finding uh those types the like
those Pokemon and when it does that it also gets other Pokemon that mentioned Venusaur. So, it gets like its teammates
and it counters and stuff, right? And it's sort of over this time found interesting Pokemon, right, that like it
might work with, right? So, it's done a bunch of these searches and it's gone these profile. It's found most common
teammates and and written a script to to analyze it, right? And so this is all based on a text file. Of course, I could
have pre-processed a text file a little bit more. Um, but yeah, it's like done
this sort of like interesting um analysis for me, right? And again, I'll I'll push out more code to the
GitHub repo. And um I'll also tweet about this. I'm on Twitter. I'm uh TRQ212.
Uh I tweet a lot. So, uh, definitely like mostly about agent SDK stuff. Um, but yeah, we have about 8 minutes left,
so I want to spend the rest of time taking questions about kind of anything, you know, and I'm sorry we didn't get to do more prototyping. Um, but, uh, yeah,
over there. Yeah, I was going to say it's a flaw play. Can you uh sort of plug this in with that just to see if the agent will
uh be more selective with the team it uh tries to capture? Yeah, put it in in Cloud Play's Pokemon.
Yeah. Yeah, I do want to play CL codeplays Pokemon. I think that would be fun. Yeah. Yeah. I I think cloud plays
Pokemon. I think we try and keep it like a pure reasoning task as much as possible. Yeah. Um other questions? Yeah.
I was curious about how people are monetizing like you know kind of like
you kind of like lose the opportunity to get all the margins. Yeah. I'm curious like
shipping your own SDK so that they kind of take the usage base.
Yeah. I I do think overall, especially right now, agents are kind of pricey,
you know what I mean? Because like um the models are have just started to get agentic. We really focus on like having
the most intelligent models, you know, and like you generally this is just like an overall like SAS business software
thing. You'd rather charge fewer people more money that really have like a hard problem, you know? And so I think this
is still good. like you probably should find um you know these hard use cases but I would say like number one make
sure you're solving a problem that people want to pay for right is like the number one step right and then number
two um yeah I think you could do subscription or token based I I think
this kind of comes down to like how much you expect people to use your product uh versus like how much you expect them to
like use it occasionally like cloud code obviously people use a lot and in order to like we do a mix of like if we give
you some rate limits and if you exceed it we do uh usage based pricing. Um I
think that like yeah it's very like dependent on your own user base and kind of like what they will do but I will say
monetization is something you should think about up front and design your you
know agent around because it's hard to walk back these promises. Um, yeah, back there.
I haven't heard you talk at all about hooks and be curious to hear your take on how
uh Yeah, there's so much to talk about. Um, hooks are great. We we do ship with hooks. Um, hooks are a way of doing
deterministic verification in particular or inserting context. So, um, you know,
we fire these hooks as events and you can register them in the in the agent SDK. There's like a guide on how to do
that. Um, examples of things you might use hooks for is like for example, um,
yeah, you can run it to verify the like a spreadsheet each time. Uh, you can also like let's say I'm working with an
agent and, uh, I'm the agent is doing some spreadsheet operations and the user has also changed the spreadsheet. This
is an interesting like place to use a hook because you can be like hey has after every tool call insert changes
that the user has made uh and and so you're giving it kind of live context
changes um in an interesting way. So um yeah I think uh uh yeah there there's
more stuff on like the docs about hooks um and happy to like talk about it
afterwards as well. Yeah, more questions. Yeah. like I do
in I realize it's working. I want to take the same conversation that I've already done because I'm going
through and convert that into a new okay which is that I followed a few steps now
it's actually working but I don't want to rewrite all of the code to write it
like because it works. Yeah. Sure. Yeah. So like let's say you've done this prototyping, you found something that works. What I would do is
Q&A: Reproducibility, helper scripts, and non-determinism
like I somewhere the cloud MD like obviously like when I tried doing this one time it like didn't use my API
directly and it wrote JavaScript. I should have been more specific in my cloud. Mmd to be like hey you should use
this. Um I yeah I think like that's one thing. Um the second thing is uh yeah do
summarize in terms have the helper scripts that you need and then like write something like this agent.ts
script for like to run the test again. Uh yeah more questions in the grade. Uh
yeah, I just put it a Pokemon and I think it's lying about using the scripts
answer. It tries a couple times like my SDK isn't very good it tries twice and
then it's like oh here's your comparison table but it's just because it's a distribution. Do you have any advice for that kind of problem?
Yeah, this is a good question and and you know like I'm I think there is some messiness, right? Like I I think one of
the things if an agent knows an answer um and you want to like sort of like
fight it kind of to be like okay like no it's generation 9 now and like Venusaur stats have changed and there's like this
new like charact like um this is hard I actually think uh one of the ways of
doing that is hooks. So you can say for example like hey uh don't if if you've
like returned a response without writing a script you know you can check that you
can be like give feedback to bit like please make sure you write a script please make sure you read this data
right and and you can use hooks to like give that feedback in in the same way that in cloud code uh we have these like
rules like make sure you read a file before you write to it right so add some determinism uh it can definitely be like
I said it's an art you know sometimes you know yeah maybe like like writing course I guess probably um yeah and the
Q&A: Strategies for massive codebases (50M+ lines)
gray how are you guys dealing with like large code bases I'm working like a 50 million plus code base and so
bre tool doesn't work really um so I'm having to build like my own like semantic indexing type thing to
kind of help with that right is there any kind of like addthropic maybe thinking about how that can be
more native to the product like you know in a couple months is the thing I'm writing just going to go away or like how how do you guys think about
Okay, your last question in a couple months is you're thinking to go away generally. Yes. Yeah. Anytime you ask
about AI, yeah. Uh I think that um semantic search this is a cloud code
question more than a security question, but happy to answer it. Like um we you know there are trade-offs of semantic
search. It's more brittle. Um I think you have to like index and and and search and we it's not necessar the
model is not trained on semantic search and so I think that's sort of like a problem like you know grap it's trained
on because it's like it's easy to do that but like semantic search you're implementing your bespoke query um for
like very large code bases you know we have lots of customers that work in large code bases I think what I've seen
is sort of like they just do like good claw mds you start in you know trying
Make sure you start in the directory you want, have like good like verification steps and hooks and links and things
like that. And so u you know that's what we do. We don't have you know a custom we dog food clunk code, right? So um
yeah. Okay. Yeah. Last question. We have to close unfortunately actually. Give it up for Tariq everyone.
Closing remarks and future SDK roadmap
[Music]
Heat.
All
From the series
No results found