DamirGadiev/gist:d35d39ef81aa77a676c10d6510cc004f

## gistfile1.txt
## Project Goal

Build a **privacy-first PDF question-answering application** that runs 100% in the browser with **no backend**.

- All documents and embeddings are processed locally.
- No data is sent to any server.
- All AI runs client-side.

Your output must include:

- Complete `index.html` file
- Complete `app.js` file
- Any minimal inline CSS (in `index.html`) needed for a clean UI

---

## Core Features

Implement the following:

1. **PDF Upload**
   - User can upload a PDF file.
   - For performance, limit processing to the **first 3 pages**.

2. **Chat Interface**
   - Input box to ask questions about the uploaded PDF.
   - Display chat history (user messages + model responses).

3. **Context Visualization**
   - Show which text chunks were used to answer the question (e.g., as a list or panel).

4. **Live PDF Preview**
   - Display the uploaded PDF pages in a scrollable panel using PDF.js.

5. **100% Client-Side**
   - All processing is done in the browser.
   - No network calls for inference or vector search once models are downloaded.

---

## Technical Stack

Use exactly this stack:

**Database:**

- **PGLite** (PostgreSQL compiled to WebAssembly, running in the browser)
- **pgvector** extension for vector storage and similarity search

**AI Models (via transformers.js + ONNX Runtime):**

- **Embeddings**: `Xenova/all-MiniLM-L6-v2` (384-dimensional vectors)
- **QA Model**: `Xenova/LaMini-Flan-T5-783M` (generative question answering)

**Libraries:**

- `transformers.js` (ONNX Runtime backend for in-browser ML)
- `PDF.js` (for parsing and rendering PDFs)

**Frontend:**

- Pure HTML, CSS, and JavaScript
- No frontend frameworks (no React, Vue, etc.)
- Use WebGPU acceleration when available (through transformers.js config)

---

## UI Layout

Create a **3-column responsive layout**:

- **Left (25%)**
  - PDF upload controls
  - Chunking strategy selection (dropdown)
  - Status messages (e.g., “Embedding…”, “Loading model…”)
  - Visualization of retrieved context chunks

- **Middle (25%)**
  - Chat interface:
    - Scrollable history of messages
    - Input box for user questions
    - Send button
    - Typing indicator during answer generation

- **Right (50%)**
  - PDF preview rendered via PDF.js
  - Vertical scrolling
  - Prevent aggressive auto-scrolling during rendering

Use simple but clean styling (e.g., light theme, clear borders, enough spacing).

---

## Chunking Strategies

Implement **6 user-selectable chunking strategies** in JavaScript.
Provide a dropdown in the UI to choose the strategy before processing.

1. **Standard**
   - Fixed-size chunks of ~400 characters
   - 50-character overlap between chunks

2. **Semantic**
   - Group sentences by embedding similarity
   - Use sentence-level segmentation before grouping

3. **Outline-based**
   - Use the model (LLM) to infer document structure (e.g., headings, sections)
   - Chunk text by sections (e.g., heading + following paragraphs)

4. **Atomic Facts**
   - Use the model to extract independent factual statements
   - Each fact becomes a separate chunk

5. **Q&A-oriented**
   - Generate one or more questions for each passage
   - Store: `{ question, answer_context }`
   - Retrieval is based on similarity to the generated questions

6. **Hybrid**
   - Combine outline-based structure with atomic facts
   - Use sections as containers and then extract facts per section

**Dynamic sizing:**
Implement logic to target approximately **3 chunks per page** regardless of strategy.
For example, adjust chunk size or number of facts/questions based on total text length per page.

---

## Implementation Details

### Database Setup (PGLite + pgvector)

On page load:

- Initialize PGLite with pgvector enabled.
- Create a table `chunks` if it does not exist:

  ```sql
  CREATE TABLE IF NOT EXISTS chunks (
    id SERIAL PRIMARY KEY,
    text TEXT NOT NULL,
    embedding vector(384) NOT NULL,
    metadata JSONB
  );
  ```

- Ensure you can clear the `chunks` table when a new document is uploaded.

### Processing Pipeline

When a PDF is uploaded:

1. Extract text from the **first 3 pages** using PDF.js.
2. Apply the selected chunking strategy to produce a list of chunks.
3. Normalize chunks into a consistent format (see “Data Normalization”).
4. Embed chunks in **batches of 10** using `all-MiniLM-L6-v2`.
5. Insert chunks into PGLite with their embeddings and metadata.

### Query Flow

When the user asks a question:

1. Embed the user question using `all-MiniLM-L6-v2`.
2. Run a vector similarity search in PGLite using SQL similar to:

   ```sql
   SELECT id, text, metadata,
          1 - (embedding <=> $1::vector) AS similarity
   FROM chunks
   ORDER BY similarity DESC
   LIMIT 3;
   ```

3. Retrieve the top-3 chunks.
4. Construct a prompt for `LaMini-Flan-T5-783M` that includes:
   - The user question
   - The retrieved chunks as context
5. Generate an answer with streaming token-by-token output.
6. Update the chat UI as tokens arrive.
7. Show the retrieved chunks in the context visualization area.

For the **Q&A-oriented** strategy:

- Store and retrieve based on the generated questions.
- Embed the stored questions for similarity, but still display the original passage as context.

---

## Performance Requirements

Implement the following optimizations:

- **Batch embedding**
  - Process chunks in batches of 10.
  - Handle errors per batch without breaking the whole pipeline.

- **Model quantization**
  - Use quantized models (4-bit or 8-bit) where supported by transformers.js to reduce download sizes.

- **Browser caching**
  - Ensure models are cached locally so subsequent loads are significantly faster.

- **Progress indicators**
  - Show download progress for each model (percentage or clear textual updates).
  - Show status messages during each stage: model loading, embedding, querying, answer generation.

---

## UX and Diagnostics

Add the following UX and diagnostic elements:

- Modern, minimal light theme (blue/white is fine).
- Indicator showing whether **WebGPU** is being used or if the app is falling back to CPU.
- Basic memory/usage indicator (even if approximate, like count of chunks).
- Custom scroll styling for the PDF preview panel (simple but visible).
- Typing indicator while the model is generating an answer.
- Avoid jarring auto-scroll when new PDF pages are rendered.

---

## Data Normalization and Edge Cases

Implement a helper function, for example:

```javascript
function normalizeChunks(rawChunks) {
  // Input: array of strings or objects
  // Output: array of objects with at least: { text, metadata? }
}
```

Requirements:

1. **Chunk Format Consistency**
   - Some strategies may produce plain strings.
   - Others produce objects (e.g., `{ text, question, facts, sectionTitle }`).
   - Normalize everything into a common structure:
     `{ text, question?, sectionTitle?, metadata }`.

2. **Embedding Validation**
   - Filter out null/undefined/empty `text` before embedding.
   - Log (console) any chunks that are skipped and why.

3. **Q&A Strategy**
   - Embed the generated **question**, not the full passage.
   - Store both question and passage in the `metadata` column.
   - Retrieval uses question embedding; display passage as context.

4. **Database Cleanup**
   - Clear the `chunks` table when a new document is uploaded.
   - Avoid inserting duplicate embeddings if the same document is reprocessed.

5. **Model Loading Errors**
   - Handle failures (e.g., network issues, 401 for gated models).
   - Show clear error messages in the UI.
   - Fallback gracefully to CPU if WebGPU fails or is unsupported.

---

## File Structure

Generate code assuming the following structure:

```
in-browser-rag/
├── index.html      # Main UI and basic styles
├── app.js          # All JavaScript logic
└── favicon.png     # App icon (you can reference a placeholder path)
```

Assume the user will run the app locally using:

```bash
python3 -m http.server 8002
```

and open `http://localhost:8002` in the browser.

---

## Development Approach for the Agent

As the agent, follow this approach:

1. **Plan**
   - Briefly outline the architecture in comments at the top of `app.js`.

2. **Implement**
   - Generate full, ready-to-run contents of `index.html` and `app.js`.
   - Include all necessary `<script>` tags for external libraries (with CDN URLs) in `index.html`.

3. **Wire Up**
   - Ensure all UI controls are connected to their handlers in `app.js`.
   - Verify the full flow:
     - upload → chunk → embed → store → ask question → retrieve → answer → display.

4. **Do Not Omit Code**
   - Do not describe code in prose.
   - Output complete code listings so the user can save them as files and run immediately.

---

Generate the final answer as:

1. Full `index.html` content.
2. Full `app.js` content.
	## Project Goal

	Build a privacy-first PDF question-answering application that runs 100% in the browser with no backend.

	- All documents and embeddings are processed locally.
	- No data is sent to any server.
	- All AI runs client-side.

	Your output must include:

	- Complete `index.html` file
	- Complete `app.js` file
	- Any minimal inline CSS (in `index.html`) needed for a clean UI

	---

	## Core Features

	Implement the following:

	1. PDF Upload
	- User can upload a PDF file.
	- For performance, limit processing to the first 3 pages.

	2. Chat Interface
	- Input box to ask questions about the uploaded PDF.
	- Display chat history (user messages + model responses).

	3. Context Visualization
	- Show which text chunks were used to answer the question (e.g., as a list or panel).

	4. Live PDF Preview
	- Display the uploaded PDF pages in a scrollable panel using PDF.js.

	5. 100% Client-Side
	- All processing is done in the browser.
	- No network calls for inference or vector search once models are downloaded.

	---

	## Technical Stack

	Use exactly this stack:

	Database:

	- PGLite (PostgreSQL compiled to WebAssembly, running in the browser)
	- pgvector extension for vector storage and similarity search

	AI Models (via transformers.js + ONNX Runtime):

	- Embeddings: `Xenova/all-MiniLM-L6-v2` (384-dimensional vectors)
	- QA Model: `Xenova/LaMini-Flan-T5-783M` (generative question answering)

	Libraries:

	- `transformers.js` (ONNX Runtime backend for in-browser ML)
	- `PDF.js` (for parsing and rendering PDFs)

	Frontend:

	- Pure HTML, CSS, and JavaScript
	- No frontend frameworks (no React, Vue, etc.)
	- Use WebGPU acceleration when available (through transformers.js config)

	---

	## UI Layout

	Create a 3-column responsive layout:

	- Left (25%)
	- PDF upload controls
	- Chunking strategy selection (dropdown)
	- Status messages (e.g., “Embedding…”, “Loading model…”)
	- Visualization of retrieved context chunks

	- Middle (25%)
	- Chat interface:
	- Scrollable history of messages
	- Input box for user questions
	- Send button
	- Typing indicator during answer generation

	- Right (50%)
	- PDF preview rendered via PDF.js
	- Vertical scrolling
	- Prevent aggressive auto-scrolling during rendering

	Use simple but clean styling (e.g., light theme, clear borders, enough spacing).

	---

	## Chunking Strategies

	Implement 6 user-selectable chunking strategies in JavaScript.
	Provide a dropdown in the UI to choose the strategy before processing.

	1. Standard
	- Fixed-size chunks of ~400 characters
	- 50-character overlap between chunks

	2. Semantic
	- Group sentences by embedding similarity
	- Use sentence-level segmentation before grouping

	3. Outline-based
	- Use the model (LLM) to infer document structure (e.g., headings, sections)
	- Chunk text by sections (e.g., heading + following paragraphs)

	4. Atomic Facts
	- Use the model to extract independent factual statements
	- Each fact becomes a separate chunk

	5. Q&A-oriented
	- Generate one or more questions for each passage
	- Store: `{ question, answer_context }`
	- Retrieval is based on similarity to the generated questions

	6. Hybrid
	- Combine outline-based structure with atomic facts
	- Use sections as containers and then extract facts per section

	Dynamic sizing:
	Implement logic to target approximately 3 chunks per page regardless of strategy.
	For example, adjust chunk size or number of facts/questions based on total text length per page.

	---

	## Implementation Details

	### Database Setup (PGLite + pgvector)

	On page load:

	- Initialize PGLite with pgvector enabled.
	- Create a table `chunks` if it does not exist:

	```sql
	CREATE TABLE IF NOT EXISTS chunks (
	id SERIAL PRIMARY KEY,
	text TEXT NOT NULL,
	embedding vector(384) NOT NULL,
	metadata JSONB
	);
	```

	- Ensure you can clear the `chunks` table when a new document is uploaded.

	### Processing Pipeline

	When a PDF is uploaded:

	1. Extract text from the first 3 pages using PDF.js.
	2. Apply the selected chunking strategy to produce a list of chunks.
	3. Normalize chunks into a consistent format (see “Data Normalization”).
	4. Embed chunks in batches of 10 using `all-MiniLM-L6-v2`.
	5. Insert chunks into PGLite with their embeddings and metadata.

	### Query Flow

	When the user asks a question:

	1. Embed the user question using `all-MiniLM-L6-v2`.
	2. Run a vector similarity search in PGLite using SQL similar to:

	```sql
	SELECT id, text, metadata,
	1 - (embedding <=> $1::vector) AS similarity
	FROM chunks
	ORDER BY similarity DESC
	LIMIT 3;
	```

	3. Retrieve the top-3 chunks.
	4. Construct a prompt for `LaMini-Flan-T5-783M` that includes:
	- The user question
	- The retrieved chunks as context
	5. Generate an answer with streaming token-by-token output.
	6. Update the chat UI as tokens arrive.
	7. Show the retrieved chunks in the context visualization area.

	For the Q&A-oriented strategy:

	- Store and retrieve based on the generated questions.
	- Embed the stored questions for similarity, but still display the original passage as context.

	---

	## Performance Requirements

	Implement the following optimizations:

	- Batch embedding
	- Process chunks in batches of 10.
	- Handle errors per batch without breaking the whole pipeline.

	- Model quantization
	- Use quantized models (4-bit or 8-bit) where supported by transformers.js to reduce download sizes.

	- Browser caching
	- Ensure models are cached locally so subsequent loads are significantly faster.

	- Progress indicators
	- Show download progress for each model (percentage or clear textual updates).
	- Show status messages during each stage: model loading, embedding, querying, answer generation.

	---

	## UX and Diagnostics

	Add the following UX and diagnostic elements:

	- Modern, minimal light theme (blue/white is fine).
	- Indicator showing whether WebGPU is being used or if the app is falling back to CPU.
	- Basic memory/usage indicator (even if approximate, like count of chunks).
	- Custom scroll styling for the PDF preview panel (simple but visible).
	- Typing indicator while the model is generating an answer.
	- Avoid jarring auto-scroll when new PDF pages are rendered.

	---

	## Data Normalization and Edge Cases

	Implement a helper function, for example:

	```javascript
	function normalizeChunks(rawChunks) {
	// Input: array of strings or objects
	// Output: array of objects with at least: { text, metadata? }
	}
	```

	Requirements:

	1. Chunk Format Consistency
	- Some strategies may produce plain strings.
	- Others produce objects (e.g., `{ text, question, facts, sectionTitle }`).
	- Normalize everything into a common structure:
	`{ text, question?, sectionTitle?, metadata }`.

	2. Embedding Validation
	- Filter out null/undefined/empty `text` before embedding.
	- Log (console) any chunks that are skipped and why.

	3. Q&A Strategy
	- Embed the generated question, not the full passage.
	- Store both question and passage in the `metadata` column.
	- Retrieval uses question embedding; display passage as context.

	4. Database Cleanup
	- Clear the `chunks` table when a new document is uploaded.
	- Avoid inserting duplicate embeddings if the same document is reprocessed.

	5. Model Loading Errors
	- Handle failures (e.g., network issues, 401 for gated models).
	- Show clear error messages in the UI.
	- Fallback gracefully to CPU if WebGPU fails or is unsupported.

	---

	## File Structure

	Generate code assuming the following structure:

	```
	in-browser-rag/
	├── index.html # Main UI and basic styles
	├── app.js # All JavaScript logic
	└── favicon.png # App icon (you can reference a placeholder path)
	```

	Assume the user will run the app locally using:

	```bash
	python3 -m http.server 8002
	```

	and open `http://localhost:8002` in the browser.

	---

	## Development Approach for the Agent

	As the agent, follow this approach:

	1. Plan
	- Briefly outline the architecture in comments at the top of `app.js`.

	2. Implement
	- Generate full, ready-to-run contents of `index.html` and `app.js`.
	- Include all necessary `<script>` tags for external libraries (with CDN URLs) in `index.html`.

	3. Wire Up
	- Ensure all UI controls are connected to their handlers in `app.js`.
	- Verify the full flow:
	- upload → chunk → embed → store → ask question → retrieve → answer → display.

	4. Do Not Omit Code
	- Do not describe code in prose.
	- Output complete code listings so the user can save them as files and run immediately.

	---

	Generate the final answer as:

	1. Full `index.html` content.
	2. Full `app.js` content.
No results found