Skip to content

Instantly share code, notes, and snippets.

@DamirGadiev
Created November 28, 2025 10:03
Show Gist options
  • Select an option

  • Save DamirGadiev/d35d39ef81aa77a676c10d6510cc004f to your computer and use it in GitHub Desktop.

Select an option

Save DamirGadiev/d35d39ef81aa77a676c10d6510cc004f to your computer and use it in GitHub Desktop.
Antigravity Prompt: Build an In-Browser PDF Chat App.
## Project Goal
Build a **privacy-first PDF question-answering application** that runs 100% in the browser with **no backend**.
- All documents and embeddings are processed locally.
- No data is sent to any server.
- All AI runs client-side.
Your output must include:
- Complete `index.html` file
- Complete `app.js` file
- Any minimal inline CSS (in `index.html`) needed for a clean UI
---
## Core Features
Implement the following:
1. **PDF Upload**
- User can upload a PDF file.
- For performance, limit processing to the **first 3 pages**.
2. **Chat Interface**
- Input box to ask questions about the uploaded PDF.
- Display chat history (user messages + model responses).
3. **Context Visualization**
- Show which text chunks were used to answer the question (e.g., as a list or panel).
4. **Live PDF Preview**
- Display the uploaded PDF pages in a scrollable panel using PDF.js.
5. **100% Client-Side**
- All processing is done in the browser.
- No network calls for inference or vector search once models are downloaded.
---
## Technical Stack
Use exactly this stack:
**Database:**
- **PGLite** (PostgreSQL compiled to WebAssembly, running in the browser)
- **pgvector** extension for vector storage and similarity search
**AI Models (via transformers.js + ONNX Runtime):**
- **Embeddings**: `Xenova/all-MiniLM-L6-v2` (384-dimensional vectors)
- **QA Model**: `Xenova/LaMini-Flan-T5-783M` (generative question answering)
**Libraries:**
- `transformers.js` (ONNX Runtime backend for in-browser ML)
- `PDF.js` (for parsing and rendering PDFs)
**Frontend:**
- Pure HTML, CSS, and JavaScript
- No frontend frameworks (no React, Vue, etc.)
- Use WebGPU acceleration when available (through transformers.js config)
---
## UI Layout
Create a **3-column responsive layout**:
- **Left (25%)**
- PDF upload controls
- Chunking strategy selection (dropdown)
- Status messages (e.g., “Embedding…”, “Loading model…”)
- Visualization of retrieved context chunks
- **Middle (25%)**
- Chat interface:
- Scrollable history of messages
- Input box for user questions
- Send button
- Typing indicator during answer generation
- **Right (50%)**
- PDF preview rendered via PDF.js
- Vertical scrolling
- Prevent aggressive auto-scrolling during rendering
Use simple but clean styling (e.g., light theme, clear borders, enough spacing).
---
## Chunking Strategies
Implement **6 user-selectable chunking strategies** in JavaScript.
Provide a dropdown in the UI to choose the strategy before processing.
1. **Standard**
- Fixed-size chunks of ~400 characters
- 50-character overlap between chunks
2. **Semantic**
- Group sentences by embedding similarity
- Use sentence-level segmentation before grouping
3. **Outline-based**
- Use the model (LLM) to infer document structure (e.g., headings, sections)
- Chunk text by sections (e.g., heading + following paragraphs)
4. **Atomic Facts**
- Use the model to extract independent factual statements
- Each fact becomes a separate chunk
5. **Q&A-oriented**
- Generate one or more questions for each passage
- Store: `{ question, answer_context }`
- Retrieval is based on similarity to the generated questions
6. **Hybrid**
- Combine outline-based structure with atomic facts
- Use sections as containers and then extract facts per section
**Dynamic sizing:**
Implement logic to target approximately **3 chunks per page** regardless of strategy.
For example, adjust chunk size or number of facts/questions based on total text length per page.
---
## Implementation Details
### Database Setup (PGLite + pgvector)
On page load:
- Initialize PGLite with pgvector enabled.
- Create a table `chunks` if it does not exist:
```sql
CREATE TABLE IF NOT EXISTS chunks (
id SERIAL PRIMARY KEY,
text TEXT NOT NULL,
embedding vector(384) NOT NULL,
metadata JSONB
);
```
- Ensure you can clear the `chunks` table when a new document is uploaded.
### Processing Pipeline
When a PDF is uploaded:
1. Extract text from the **first 3 pages** using PDF.js.
2. Apply the selected chunking strategy to produce a list of chunks.
3. Normalize chunks into a consistent format (see “Data Normalization”).
4. Embed chunks in **batches of 10** using `all-MiniLM-L6-v2`.
5. Insert chunks into PGLite with their embeddings and metadata.
### Query Flow
When the user asks a question:
1. Embed the user question using `all-MiniLM-L6-v2`.
2. Run a vector similarity search in PGLite using SQL similar to:
```sql
SELECT id, text, metadata,
1 - (embedding <=> $1::vector) AS similarity
FROM chunks
ORDER BY similarity DESC
LIMIT 3;
```
3. Retrieve the top-3 chunks.
4. Construct a prompt for `LaMini-Flan-T5-783M` that includes:
- The user question
- The retrieved chunks as context
5. Generate an answer with streaming token-by-token output.
6. Update the chat UI as tokens arrive.
7. Show the retrieved chunks in the context visualization area.
For the **Q&A-oriented** strategy:
- Store and retrieve based on the generated questions.
- Embed the stored questions for similarity, but still display the original passage as context.
---
## Performance Requirements
Implement the following optimizations:
- **Batch embedding**
- Process chunks in batches of 10.
- Handle errors per batch without breaking the whole pipeline.
- **Model quantization**
- Use quantized models (4-bit or 8-bit) where supported by transformers.js to reduce download sizes.
- **Browser caching**
- Ensure models are cached locally so subsequent loads are significantly faster.
- **Progress indicators**
- Show download progress for each model (percentage or clear textual updates).
- Show status messages during each stage: model loading, embedding, querying, answer generation.
---
## UX and Diagnostics
Add the following UX and diagnostic elements:
- Modern, minimal light theme (blue/white is fine).
- Indicator showing whether **WebGPU** is being used or if the app is falling back to CPU.
- Basic memory/usage indicator (even if approximate, like count of chunks).
- Custom scroll styling for the PDF preview panel (simple but visible).
- Typing indicator while the model is generating an answer.
- Avoid jarring auto-scroll when new PDF pages are rendered.
---
## Data Normalization and Edge Cases
Implement a helper function, for example:
```javascript
function normalizeChunks(rawChunks) {
// Input: array of strings or objects
// Output: array of objects with at least: { text, metadata? }
}
```
Requirements:
1. **Chunk Format Consistency**
- Some strategies may produce plain strings.
- Others produce objects (e.g., `{ text, question, facts, sectionTitle }`).
- Normalize everything into a common structure:
`{ text, question?, sectionTitle?, metadata }`.
2. **Embedding Validation**
- Filter out null/undefined/empty `text` before embedding.
- Log (console) any chunks that are skipped and why.
3. **Q&A Strategy**
- Embed the generated **question**, not the full passage.
- Store both question and passage in the `metadata` column.
- Retrieval uses question embedding; display passage as context.
4. **Database Cleanup**
- Clear the `chunks` table when a new document is uploaded.
- Avoid inserting duplicate embeddings if the same document is reprocessed.
5. **Model Loading Errors**
- Handle failures (e.g., network issues, 401 for gated models).
- Show clear error messages in the UI.
- Fallback gracefully to CPU if WebGPU fails or is unsupported.
---
## File Structure
Generate code assuming the following structure:
```
in-browser-rag/
├── index.html # Main UI and basic styles
├── app.js # All JavaScript logic
└── favicon.png # App icon (you can reference a placeholder path)
```
Assume the user will run the app locally using:
```bash
python3 -m http.server 8002
```
and open `http://localhost:8002` in the browser.
---
## Development Approach for the Agent
As the agent, follow this approach:
1. **Plan**
- Briefly outline the architecture in comments at the top of `app.js`.
2. **Implement**
- Generate full, ready-to-run contents of `index.html` and `app.js`.
- Include all necessary `<script>` tags for external libraries (with CDN URLs) in `index.html`.
3. **Wire Up**
- Ensure all UI controls are connected to their handlers in `app.js`.
- Verify the full flow:
- upload → chunk → embed → store → ask question → retrieve → answer → display.
4. **Do Not Omit Code**
- Do not describe code in prose.
- Output complete code listings so the user can save them as files and run immediately.
---
Generate the final answer as:
1. Full `index.html` content.
2. Full `app.js` content.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment