You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This is a simple Opencode / Claude Code skill that you can use to delegate the document / image processing to
another LLM, in this case, gemini-2.5-pro.
Why?
While Claude Code and other harnesses come with their own ways to read files, I sometimes stumble on issues
when there's a lot of them and especially when they're handwritten or scanned PDFs.
Maybe I'm just unlucky sometimes, but I usually end up with a session that gets stuck or takes longer than
usual and this approach works for me.
I also just prefer Gemini models for document processing. It reads far better for my use cases and they cost
better than just throwing other expensive models to do this.
Parse PDFs (including scanned/image PDFs) using the `llm` CLI with an LLM attachment workflow; extract structured data and/or text when native PDF text is unreliable.
opencode
input
tool
default_model
pdf
llm
gemini/gemini-2.5-pro
What I do
When a PDF needs parsing—especially scanned PDFs or PDFs where text extraction is unreliable—I use the llm CLI with the PDF attached.
I craft prompts to extract:
clean plain text
structured JSON (tables, fields, invoices, forms)
specific answers with citations to page numbers (when feasible)
I iterate: first get a high-level extraction, then refine prompts for missing fields, tables, or ambiguous sections.
When to use me
Use this skill when:
You encounter a .pdf that is likely scanned (image-based) or has broken/non-native text.
You need reliable extraction into text/JSON for downstream processing.
You need to interpret tables, forms, or multi-column layouts.
Do not use this skill when:
You already have high-quality extracted text and only need summarization (unless verification against the PDF is required).
Required tool
The llm utility is available for use.
Command template (required)
Use this exact structure:
llm -m {{llm-id}} "{{prompt}}" -a "{{path-to-pdf}}"
Attachment guidance
Attach up to five files at a time; use this only for images and small PDFs.
For large files or long documents, attach one file per request.
Default model
Unless instructed otherwise, use gemini/gemini-2.5-pro by default.
Prompting patterns
1) Fast full-text extraction
Use when you just need readable text.
Prompt example:
"Extract all readable text from this PDF. Preserve headings and paragraph breaks. If the PDF is scanned, perform OCR. Return plain text."
Command:
llm -m gemini/gemini-2.5-pro "Extract all readable text from this PDF. Preserve headings and paragraph breaks. If the PDF is scanned, perform OCR. Return plain text." -a "PATH/TO/FILE.pdf"
2) Structured JSON extraction (preferred for automation)
Use when you need schema’d output.
Prompt example:
"Extract the following fields as JSON: invoice_number, invoice_date, vendor_name, vendor_address, customer_name, customer_address, line_items[{description, quantity, unit_price, amount}], subtotal, tax, total. If a field is missing, use null. Also include page_number for each extracted top-level field if possible."
Command:
llm -m gemini/gemini-2.5-pro "Extract the following fields as JSON: invoice_number, invoice_date, vendor_name, vendor_address, customer_name, customer_address, line_items[{description, quantity, unit_price, amount}], subtotal, tax, total. If a field is missing, use null. Also include page_number for each extracted top-level field if possible." -a "PATH/TO/FILE.pdf"
3) Table extraction
Prompt example:
"Locate the table(s) in this PDF and return them as JSON arrays. Preserve column headers. If multiple tables exist, return an array of tables with {title, page_number, rows}."
4) Targeted Q&A with page grounding
Prompt example:
"Answer: What is the contract termination notice period? Quote the exact clause and provide the page number."
Workflow guidance
Identify the user’s objective (full text vs structured fields vs tables vs Q&A).
Run an initial extraction prompt.
If the output is incomplete/garbled:
narrow to specific pages/sections in the prompt
request alternative renderings (e.g., “focus on the header area of each page”)
extract tables separately from narrative text
Return results in the format the user asked for (plain text / JSON / CSV-like text), and note uncertainties explicitly.