- Be boring - use chat completions, don't use provider specific things, you probably don't need an agent (a fancy for loop)
- Store LLM requests in the DB - time, provider, model, request, request headers, response, response headers, latency, token usage, estimated cost
- Use PrismPHP
- Use JSON structured output and provide a JSON schema of what output you want back from the AI (validate it too)
- In the prompt say "You must output JSON" (you want both this & json structured output flag)
- Keep as much determinism as possible - get the LLM to 'pick' from known-good options when possible
- Use mixture of markdown and XML tags in the prompt:
<goal>Summarize the things...etc</goal> <things> <thing name="blah">nfnfnfnf</thing></things> - Provide examples:
# Examples <example name="impossible to categorize"><input>xyz</input><output>ERR_IMPOSSIBLE</output></example>kinda thing (edge cases, happy cases, etc..) - Don't add dynamic data in the prompt if you can help it, or add it near the end - prompt caching is real and amazing
- They're the same thing
- Use them after everything is working and you need more, they're great, but not boring, hard to follow, and easy to mess up
- Prefer 'arrays' as inputs - the agent should be able to add 5 tasks at once, it shouldn't need to make 5 tool calls
- You have to add the function call outuput back into the chat context with the tool call id, don't forget
- Function calls shouldn't be API endpoints - combine lower level functions to a higher level function related to a goal the user is trying to achieve
- Have a 'max tool call' limit so you don't loop 4, 000 times
Note
This is critical.
- This is much more important than you think
- Gather 50? example pieces of data
- Write a basic ass CSV with them in column 1, then 'expected output' or similar in column 2
- Write a basic ass PHP script to go through each one, to see if you get the results you expect based on the prompt as above ^ - what is your 'success rate'? Modify your prompt until you get what you want, at the price/latency you want
- Run the script with different models to see if you get better/worse results
- By default: don't use OpenAI's evals/graders or a separate library
- This is still more important than you think. Do this.
- Stick with OpenAI - don't complicate things with multi-providers
- Start with the powerful model - gpt-5, default thinking probably
- Once working/you know it's possible -> experiment with cheaper/faster models (gpt-5 nano/mini, lower thinking, etc..) - don't let price get in the way of the PoC
- By default: don't use OpenAI's assistant/responses library or anything
- By default: you don't need tools, connectors, MCP, web search, file search, retrieval, or fine-tuned models
- By default: you don't need streaming
- Should you be thinking about images and audio too? (really easy, just need to consider)
whisper-1is still great for transcription, it supportsoggeven though it doesn't say it does
- Use a RecursiveTextSplitter to chunk to 500 chars, with 100 overlap. Split in priority order:
^ ##[^#],###,\n\n,\n,.,?,, `` - By default:
text-embedding-3-smalland 1536 vectors. Chunks need to be < 8192 tokens - Use postgres, pgvector, and
cosine <=>nearest neighbor with a maximum distance - Store relationship to 'parent' chunk if there is one
- Store the chunk's 'index' within the main document (allowing context expansion if needed, behind or ahead)
- Estimate tokens as
charCount / 4- keep your life simple for now