Skip to content

Instantly share code, notes, and snippets.

@jerieljan
Created February 26, 2026 04:22
Show Gist options
  • Select an option

  • Save jerieljan/5837225984750447aacb3b1a4e6a92be to your computer and use it in GitHub Desktop.

Select an option

Save jerieljan/5837225984750447aacb3b1a4e6a92be to your computer and use it in GitHub Desktop.
pdf-parse-with-llm - skill to process PDFs with `llm`

pdf-parse-with-llm

This is a simple Opencode / Claude Code skill that you can use to delegate the document / image processing to another LLM, in this case, gemini-2.5-pro.

Why?

While Claude Code and other harnesses come with their own ways to read files, I sometimes stumble on issues when there's a lot of them and especially when they're handwritten or scanned PDFs.

Maybe I'm just unlucky sometimes, but I usually end up with a session that gets stuck or takes longer than usual and this approach works for me.

I also just prefer Gemini models for document processing. It reads far better for my use cases and they cost better than just throwing other expensive models to do this.

How to Use

  1. Install llm: https://llm.datasette.io/en/stable/
  2. Configure it to use llm-gemini or whichever model you'd like to choose for document processing.
  3. Place the SKILL.md in this page to your coding agent's skill directory (e.g., in Opencode, ~/.config/opencode/skill/pdf-parse-with-llm/SKILL.md)
  4. Verify if it works by asking your agent to use the skill (or something like /skills)
name description compatibility metadata
pdf-parse-with-llm
Parse PDFs (including scanned/image PDFs) using the `llm` CLI with an LLM attachment workflow; extract structured data and/or text when native PDF text is unreliable.
opencode
input tool default_model
pdf
llm
gemini/gemini-2.5-pro

What I do

  • When a PDF needs parsing—especially scanned PDFs or PDFs where text extraction is unreliable—I use the llm CLI with the PDF attached.
  • I craft prompts to extract:
    • clean plain text
    • structured JSON (tables, fields, invoices, forms)
    • specific answers with citations to page numbers (when feasible)
  • I iterate: first get a high-level extraction, then refine prompts for missing fields, tables, or ambiguous sections.

When to use me

Use this skill when:

  • You encounter a .pdf that is likely scanned (image-based) or has broken/non-native text.
  • You need reliable extraction into text/JSON for downstream processing.
  • You need to interpret tables, forms, or multi-column layouts.

Do not use this skill when:

  • You already have high-quality extracted text and only need summarization (unless verification against the PDF is required).

Required tool

The llm utility is available for use.

Command template (required)

Use this exact structure:

llm -m {{llm-id}} "{{prompt}}" -a "{{path-to-pdf}}"

Attachment guidance

  • Attach up to five files at a time; use this only for images and small PDFs.
  • For large files or long documents, attach one file per request.

Default model

Unless instructed otherwise, use gemini/gemini-2.5-pro by default.

Prompting patterns

1) Fast full-text extraction

Use when you just need readable text.

Prompt example:

  • "Extract all readable text from this PDF. Preserve headings and paragraph breaks. If the PDF is scanned, perform OCR. Return plain text."

Command:

llm -m gemini/gemini-2.5-pro "Extract all readable text from this PDF. Preserve headings and paragraph breaks. If the PDF is scanned, perform OCR. Return plain text." -a "PATH/TO/FILE.pdf"

2) Structured JSON extraction (preferred for automation)

Use when you need schema’d output.

Prompt example:

  • "Extract the following fields as JSON: invoice_number, invoice_date, vendor_name, vendor_address, customer_name, customer_address, line_items[{description, quantity, unit_price, amount}], subtotal, tax, total. If a field is missing, use null. Also include page_number for each extracted top-level field if possible."

Command:

llm -m gemini/gemini-2.5-pro "Extract the following fields as JSON: invoice_number, invoice_date, vendor_name, vendor_address, customer_name, customer_address, line_items[{description, quantity, unit_price, amount}], subtotal, tax, total. If a field is missing, use null. Also include page_number for each extracted top-level field if possible." -a "PATH/TO/FILE.pdf"

3) Table extraction

Prompt example:

  • "Locate the table(s) in this PDF and return them as JSON arrays. Preserve column headers. If multiple tables exist, return an array of tables with {title, page_number, rows}."

4) Targeted Q&A with page grounding

Prompt example:

  • "Answer: What is the contract termination notice period? Quote the exact clause and provide the page number."

Workflow guidance

  1. Identify the user’s objective (full text vs structured fields vs tables vs Q&A).
  2. Run an initial extraction prompt.
  3. If the output is incomplete/garbled:
    • narrow to specific pages/sections in the prompt
    • request alternative renderings (e.g., “focus on the header area of each page”)
    • extract tables separately from narrative text
  4. Return results in the format the user asked for (plain text / JSON / CSV-like text), and note uncertainties explicitly.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment