psteinweber/link-stash-prd.md

## link-stash-prd.md

      
    Raw
  

              link-stash-prd.md
            
          
    Build Your Own Bookmark Manager with AI — A Complete Blueprint

A step-by-step blueprint for building a personal bookmark saving and archiving tool. Built with Next.js, Neon Postgres, and LLM-powered auto-tagging. Designed to be pasted into a Claude Code session and built in one sitting.
What you get: A self-hosted web app that captures URLs, extracts content as clean markdown, auto-tags and summarizes via LLM, and optionally syncs everything as markdown files to a GitHub repo for your second brain.
Cost: $0/month (Vercel free tier + Neon free tier). Only cost is LLM usage via OpenRouter (~$0.0002 per bookmark with Gemini Flash — roughly 5,000 bookmarks per dollar).

How to Use This Document


Create a new folder for your project
Open Claude Code in that folder
Paste this entire document and say: "Build this project step by step"
Claude will scaffold, implement, and deploy the entire thing


Architecture Overview

URL saved (extension / iOS shortcut / web UI)
  → API creates bookmark record instantly (< 1s response)
  → Background processing starts:
      1. Detect content type from URL (tweet? youtube? article? tool?)
      2. Type-specific extraction:
           Tweet:   FixTweet API → structured data (text, author, media)
           YouTube: oEmbed API (title/author) + caption extractor (transcript)
           Generic: Readability → article HTML → Turndown → clean markdown
      3. Extract metadata (title, OG image, author, date)
      4. If content is low quality (ASCII art, broken HTML): LLM cleanup
      5. LLM auto-tagging (categories + tags + content type classification)
      6. LLM summarization (clean title + 2-3 sentence summary)
      7. Save tags and summary to database
      8. Commit markdown file to GitHub archive repo
  → Web UI polls and shows processing status → final result


Tech Stack


Component
Technology
Why


Framework
Next.js 15+ (App Router)
API routes + React UI + Vercel-native


Language
TypeScript
Type safety


Database
Neon Postgres (via Vercel integration)
Free tier, zero maintenance, no separate signup needed


ORM
Drizzle ORM (pg dialect)
Lightweight, type-safe


Hosting
Vercel (free tier)
Auto-deploys from Git, zero maintenance


LLM
OpenRouter API (Gemini Flash)
Cheap, fast, model flexibility


Styling
Tailwind CSS v4
Fast to build


IDs
nanoid
Short, URL-safe unique IDs


Content Extraction Libraries


Library
Purpose
npm Package


Mozilla Readability
Article extraction from HTML
@mozilla/readability


linkedom
Lightweight DOM for serverless (no jsdom)
linkedom


Turndown
HTML → Markdown conversion
turndown + turndown-plugin-gfm


FixTweet API
Tweet content extraction (free, no API key)
HTTP call to api.fxtwitter.com


youtube-transcript-plus
YouTube transcript extraction
youtube-transcript-plus


YouTube oEmbed API
Reliable video title/author/thumbnail
HTTP call to youtube.com/oembed


YouTube Innertube API
Video description extraction (MWEB client)
HTTP POST to youtube.com/youtubei/v1/player


octokit
GitHub API for markdown sync
octokit


Important: Do NOT use metascraper — its re2 native dependency breaks Turbopack/Vercel builds. Use a custom metadata extractor with linkedom instead (parse <meta> and <title> tags directly).


Database Schema

Use Drizzle ORM with Neon Postgres. The schema includes an original_title field to preserve the raw extracted title alongside the LLM-generated clean title.
// src/lib/db/schema.ts
import { pgTable, text, boolean, primaryKey } from "drizzle-orm/pg-core";

export const bookmarks = pgTable("bookmarks", {
  id: text("id").primaryKey(),           // nanoid
  url: text("url").notNull().unique(),
  title: text("title"),                   // LLM-generated clean title
  originalTitle: text("original_title"),  // Raw extracted title
  type: text("type"),                     // article | tweet | video | tool
  contentMd: text("content_md"),          // Full extracted markdown
  summary: text("summary"),              // LLM-generated 2-3 sentence summary
  author: text("author"),
  imageUrl: text("image_url"),
  createdAt: text("created_at").notNull(),
  updatedAt: text("updated_at").notNull(),
  processed: boolean("processed").default(false),
  synced: boolean("synced").default(false),
});

export const tags = pgTable("tags", {
  id: text("id").primaryKey(),
  name: text("name").notNull().unique(),
  isPredefined: boolean("is_predefined").default(false),
});

export const bookmarkTags = pgTable("bookmark_tags", {
  bookmarkId: text("bookmark_id").notNull()
    .references(() => bookmarks.id, { onDelete: "cascade" }),
  tagId: text("tag_id").notNull()
    .references(() => tags.id, { onDelete: "cascade" }),
}, (table) => [primaryKey({ columns: [table.bookmarkId, table.tagId] })]);

export const categories = pgTable("categories", {
  id: text("id").primaryKey(),
  name: text("name").notNull().unique(),
});

export const bookmarkCategories = pgTable("bookmark_categories", {
  bookmarkId: text("bookmark_id").notNull()
    .references(() => bookmarks.id, { onDelete: "cascade" }),
  categoryId: text("category_id").notNull()
    .references(() => categories.id, { onDelete: "cascade" }),
}, (table) => [primaryKey({ columns: [table.bookmarkId, table.categoryId] })]);

API Endpoints

POST   /api/bookmarks          — Save a new URL (from extension, shortcut, or web UI)
GET    /api/bookmarks          — List bookmarks (supports ?q=search&tag=name&type=article&page=1)
GET    /api/bookmarks/:id      — Get single bookmark with full content
PATCH  /api/bookmarks/:id      — Update bookmark (edit tags, title)
DELETE /api/bookmarks/:id      — Delete bookmark
GET    /api/tags               — List all tags (sorted by usage count)
POST   /api/process/:id        — Manually trigger reprocessing
POST   /api/sync               — Trigger full markdown sync
POST   /api/auth               — Login (sets session cookie)

All endpoints require Authorization: Bearer <API_KEY> header (or session cookie for web UI).

LLM Prompts

These are the exact prompts used for LLM processing. All use OpenRouter with Gemini Flash and JSON mode.
Auto-Tagger Prompt

Classifies content type, assigns categories, and suggests tags. Called with response_format: { type: "json_object" }.
SYSTEM:
You are a bookmark categorization assistant. Given a bookmark's title, URL, and content, you must:

1. Determine the content type:
   - "tool" — software, SaaS, libraries, frameworks, apps, component galleries, developer tools, any product/service page
   - "article" — blog posts, essays, tutorials, documentation, news
   - "tweet" — tweets / X posts
   - "video" — YouTube or other video content
2. Pick 1-2 categories from the predefined list
3. Suggest 2-5 tags (short, lowercase, hyphenated)

Rules:
- Tags describe the TOPIC and SUBJECT of the content, NOT the format
- NEVER use format tags like "tweet", "article", "video", "blog-post", "thread"
- Good tags: "ai", "anthropic", "react", "productivity", "startup-funding", "css-grid"
- Bad tags: "tweet", "article", "blog", "video", "post", "thread"
- Include names of companies, tools, or notable people when the content is ABOUT them
- Don't tag random tweet authors just because they posted it — only tag a person if the content is about that person or their notable work
- Prefer existing tags over creating new ones
- Max 5 tags total
- Categories must come from the provided list only

Respond in JSON format:
{
  "contentType": "tool",
  "categories": ["Category Name"],
  "tags": ["tag-one", "tag-two"]
}

USER:
Title: {title}
URL: {url}
Content (truncated): {first 2000 chars of content}

Available categories: {comma-separated list}
Existing tags (prefer these): {comma-separated list}

Summarizer Prompt

Generates a clean descriptive title and a 2-3 sentence summary. Also uses JSON mode.
SYSTEM:
You are a summarization assistant. Given a bookmark's title, URL, and content, generate:

1. A clean, descriptive title (max 80 chars) — describe WHAT the content is about, not who posted it. For tweets, don't start with the author's handle. For articles, improve vague titles.
2. A concise 2-3 sentence summary.

Examples of good titles:
- "Anthropic's Report on AI Capability Across Occupations"
- "CSS Container Queries: Complete Guide"
- "Why SQLite Is Good Enough for Most Web Apps"

Examples of bad titles:
- "@username: Anthropic just released..."
- "Thread about AI"
- "Interesting article"

Respond in JSON format:
{
  "title": "Clean descriptive title",
  "summary": "2-3 sentence summary."
}

USER:
Original title: {title}
URL: {url}
Content (truncated): {first 3000 chars of content}

Content Cleaner Prompt (Fallback)

Used only when extracted content is detected as low quality (ASCII art, SVG links, broken HTML). Not JSON mode.
SYSTEM:
You are a content cleaning assistant. You receive raw extracted markdown from a web page that may contain noise: ASCII art, SVG references, broken links, navigation elements, footers, cookie notices, etc.

Clean it up into readable, useful markdown. Keep:
- The main description/explanation of what this page is about
- Key features, capabilities, or points
- Code examples if relevant

Remove:
- ASCII art and decorative elements
- Navigation links, footers, headers
- SVG/image references that aren't meaningful
- Repetitive or boilerplate text
- Login/signup CTAs

If the content is mostly noise, write a concise description based on whatever you can extract. Keep the output under 500 words.

USER:
Page: {title} ({url})

Raw extracted content:
{first 4000 chars of content}


Content Type Detection

URL-detected types (tweet, video) are authoritative — the LLM cannot override them. The LLM only refines between article and tool for generic URLs.
function detectContentType(url: string): "tweet" | "video" | "article" | "tool" {
  const host = new URL(url).hostname.replace("www.", "");

  if (host === "twitter.com" || host === "x.com") return "tweet";
  if (host === "youtube.com" || host === "youtu.be" || host === "m.youtube.com") return "video";
  if (host === "github.com") {
    const parts = new URL(url).pathname.split("/").filter(Boolean);
    if (parts.length === 2) return "tool"; // repo root
  }
  return "article"; // LLM will refine to "tool" if appropriate
}

// In processBookmark():
const urlType = bookmark.type;
const finalType = (urlType === "tweet" || urlType === "video")
  ? urlType                // URL detection is authoritative
  : tagResult.contentType; // LLM refines article vs tool

Low Quality Content Detection

A heuristic that detects garbage extraction output (common with SPAs, landing pages with lots of SVGs/ASCII art):
function isLowQualityContent(content: string | null): boolean {
  if (!content) return true;
  const lines = content.split("\n").filter((l) => l.trim());
  if (lines.length < 3) return true;

  // Lines that are mostly non-alphanumeric (ASCII art, symbols)
  const noiseLines = lines.filter((l) => {
    const alphaChars = l.replace(/[^a-zA-Z]/g, "").length;
    return alphaChars < l.trim().length * 0.3;
  });
  if (noiseLines.length > lines.length * 0.4) return true;

  // Too many links relative to text
  const linkCount = (content.match(/\[.*?\]\(.*?\)/g) || []).length;
  const wordCount = content.split(/\s+/).length;
  if (linkCount > 0 && linkCount > wordCount * 0.15) return true;

  return false;
}
When detected, the content is sent to the LLM content cleaner before storage.

Processing Pipeline

The full async pipeline that runs after a bookmark is saved:
async function processBookmarkPipeline(bookmarkId: string) {
  try {
    // 1. Extract content (type-specific)
    const result = await extractContent(bookmark.url);

    // 2. Clean low-quality content via LLM (if needed)
    let content = result.content;
    if (isLowQualityContent(content)) {
      content = await cleanContent(content, result.title, bookmark.url);
    }

    // 3. Save extraction results to DB
    await db.update(bookmarks).set({
      title: result.title,
      type: result.type,
      contentMd: content,
      author: result.author,
      imageUrl: result.imageUrl,
    }).where(eq(bookmarks.id, bookmarkId));

    // 4. LLM processing: tagging + summarization (run in parallel)
    const [tagResult, summaryResult] = await Promise.all([
      autoTag(title, url, content, existingCategories, existingTags),
      summarize(title, url, content),
    ]);

    // 5. Save LLM results
    await db.update(bookmarks).set({
      originalTitle: bookmark.title,
      title: summaryResult.title,
      summary: summaryResult.summary,
      type: finalType, // URL-detected types are authoritative
      processed: true,
    }).where(eq(bookmarks.id, bookmarkId));

    // 6. Save tags (with upsert)
    for (const tagName of tagResult.tags) {
      // Insert tag if new, then link to bookmark
      await db.insert(tags).values({ id: nanoid(), name: tagName }).onConflictDoNothing();
      const tag = await db.query.tags.findFirst({ where: eq(tags.name, tagName) });
      await db.insert(bookmarkTags).values({ bookmarkId, tagId: tag.id }).onConflictDoNothing();
    }

    // 7. Sync markdown file to GitHub
    await syncBookmark(bookmarkId);
  } catch (error) {
    console.error(`Processing failed for bookmark ${bookmarkId}:`, error);
    // Bookmark is still saved — can retry via /api/process/:id
  }
}

Markdown Sync Format

Each bookmark becomes a markdown file committed to a private GitHub repo via the GitHub API (octokit).
File naming: YYYY-MM-DD-slugified-title.md
Structure: Flat (all files in repo root, organized by frontmatter not folders)
---
title: "Article Title"
url: "https://example.com/article"
saved: 2026-03-07T12:00:00Z
type: article
tags: [productivity, ai, tools]
categories: [AI Tools]
summary: "Brief 2-3 sentence summary generated by LLM."
author: "Author Name"
image: "https://example.com/og-image.jpg"
---

# Article Title

Extracted content as clean markdown...

Visual Design

Aesthetic: Terminal-inspired but readable. Think "markdown rendered beautifully", not a SaaS dashboard.

Typography: Monospaced font throughout (Geist Mono, JetBrains Mono, or IBM Plex Mono)
Colors: Black text on white. Subtle grays for secondary info. One accent color (dark pink #c2185b) for active states, links, hover effects
Layout: Centered content column (max ~800px), generous whitespace, line-height 1.6+
Components: Text-driven, minimal iconography. Tags as small inline labels. Search as underlined input. Expand/collapse with a rotating › arrow.
Interactions: Standard mouse-driven UI. No terminal simulation.


Chrome Extension (Manifest V3)

Minimal unpacked extension — no Chrome Web Store needed. Works in Chrome, Arc, Edge, Brave.
{
  "manifest_version": 3,
  "name": "Link Stash",
  "version": "1.0",
  "permissions": ["activeTab"],
  "action": { "default_popup": "popup.html" },
  "commands": {
    "save-bookmark": {
      "suggested_key": { "default": "Ctrl+Shift+S", "mac": "Command+Shift+S" },
      "description": "Save current page"
    }
  },
  "background": { "service_worker": "background.js" }
}
The popup auto-saves the current page on open (no click needed), shows status, and closes after 1.2s. The background script handles the keyboard shortcut. Options page lets user configure API URL and key. Add host_permissions for your deployment domain.

iOS Shortcut

Create an iOS Shortcut that:

Receives URL from share sheet (or clipboard)
Sends HTTP POST to /api/bookmarks with Authorization: Bearer <API_KEY> header and { "url": "..." } body
Shows "Saved!" notification


Environment Variables

DATABASE_URL=           # Neon Postgres URL (auto-set by Vercel Neon integration)
API_KEY=                # Shared secret for API auth (generate a random string)
OPENROUTER_API_KEY=     # From openrouter.ai
GITHUB_TOKEN=           # GitHub personal access token (repo scope) for markdown sync
GITHUB_REPO=            # e.g., "username/bookmark-archive"


Predefined Categories

AI & Machine Learning
Development Tools
Design & UI
Productivity
Business & Startups
Science & Research
Culture & Media
Personal Development
Reference / Tutorial

The LLM picks from these. Edit them in the database as needed.

Key Pitfalls & Learnings

These are real issues encountered during development. Addressing them upfront will save hours.
1. Vercel Function Timeouts

Problem: Vercel hobby plan defaults to 10s function timeout. The full processing pipeline (extraction + 2 LLM calls + DB writes + GitHub sync) can exceed this.
Fix: Add export const maxDuration = 60; to API routes that run the processing pipeline. This extends the timeout to 60s on the hobby plan.
2. Background Processing on Vercel

Problem: Floating promises (void processBookmark()) get killed when the serverless function returns.
Fix: Use after() from next/server — it keeps the function alive after the response is sent. Also expose a /api/process/:id endpoint for manual retries.
3. metascraper Breaks Turbopack

Problem: The metascraper npm package depends on re2 (a native C++ module) which fails to build in Turbopack/Vercel.
Fix: Don't use metascraper. Build a simple metadata extractor using linkedom that parses <meta> OG tags and <title> directly from HTML. It's 30 lines of code and works everywhere.
4. Database Client Initialization

Problem: Drizzle client initialized at module import time tries to connect during build, when DATABASE_URL isn't available.
Fix: Use a Proxy-based lazy initialization pattern — wrap getDb() in a Proxy so the real Neon connection is only created on first use at runtime.
5. Tag Upsert Race Condition

Problem: onConflictDoNothing() silently skips if a tag already exists, but the code may reference a newly generated ID that was never inserted.
Fix: After onConflictDoNothing(), always re-fetch the tag from the DB to get the correct existing ID.
6. API Routes Need force-dynamic

Problem: Next.js tries to pre-render API routes at build time, which fails because the database isn't available.
Fix: Add export const dynamic = "force-dynamic"; to every API route file.
7. YouTube Extraction from Cloud Servers

Problem: YouTube blocks cloud server IPs (Vercel, AWS, etc.) from fetching page HTML and transcripts. Most transcript libraries fail silently with 0 results.
Fix: Use the YouTube oEmbed API for title/author/thumbnail (always works). Use youtube-transcript-plus for transcripts (works locally, may fail on Vercel). Use YouTube's Innertube API with the MWEB client for descriptions — this works from cloud servers when page scraping doesn't. For reliable transcripts on Vercel, consider the YouTube Data API with an API key or a proxy service like Apify.
8. LLM Tag Quality

Problem: The LLM tends to generate format-based tags ("tweet", "article", "thread") rather than topic-based tags. It also tags tweet authors even when the content isn't about them.
Fix: Explicit rules in the prompt: "NEVER use format tags", "only tag a person if the content is ABOUT that person". Provide examples of good and bad tags. Pass existing tags to the LLM and tell it to "prefer existing tags over creating new ones".
9. Type Detection vs LLM Classification

Problem: The LLM sometimes reclassifies tweets as "tool" or "article" based on content, overriding the URL-based type detection.
Fix: URL-detected types (tweet from x.com, video from youtube.com) should be authoritative. The LLM should only refine between article and tool for generic URLs.
10. Spaces in Project Path

Problem: If your project folder path contains spaces (e.g., in a Dropbox folder), npx commands break.
Fix: Run commands directly via node: node node_modules/.bin/next build instead of npx next build. Or just avoid spaces in the project path.
11. iOS Shortcuts URL Field Can Be Empty

Problem: Using the "Get URLs from Input" action in iOS Shortcuts as an intermediate step can return an empty string, causing the API to receive {"url":""}.
Fix: Skip the intermediate "Get URLs from Input" action entirely. Pass Shortcut Input directly as the JSON url value in the "Get Contents of URL" action. Also handle URL as array on the API side (if (Array.isArray(url)) url = url[0]) since Shortcuts may send it that way.
12. Chrome Extension CORS

Problem: Chrome extensions get "Network error" when calling your Vercel API without explicit host permissions.
Fix: Add "host_permissions": ["https://*.vercel.app/*"] to manifest.json. Adjust the pattern to match your actual deployment domain.
13. Some Sites Block Cloud Server IPs

Problem: Content extraction silently fails for some sites that block or challenge requests from cloud server IPs (Cloudflare, bot detection). The fetchPage function gets an error page instead of article HTML, and Readability returns null.
Fix: Add proper browser-like headers (Accept, Accept-Language) to fetch requests. Log non-200 responses so failures are visible. The LLM can still tag and summarize from the URL and OG metadata alone, so the bookmark isn't lost — just missing the full article content.

Deployment Checklist


Push to GitHub
Connect repo to Vercel
Add Neon Postgres via Vercel integrations (this auto-sets DATABASE_URL)
Set remaining env vars in Vercel dashboard: API_KEY, OPENROUTER_API_KEY, GITHUB_TOKEN, GITHUB_REPO
Deploy
Push database schema: npx drizzle-kit push
Visit the app URL, log in with your API key
Load the Chrome extension via chrome://extensions → "Load unpacked"
Create the iOS Shortcut


Project Structure

link-stash/
├── src/
│   ├── app/                      # Next.js App Router
│   │   ├── page.tsx              # Main bookmark list UI
│   │   ├── layout.tsx            # Root layout with monospace font
│   │   ├── globals.css           # Tailwind + CSS variables
│   │   ├── login/page.tsx        # Simple password login
│   │   └── api/
│   │       ├── bookmarks/        # CRUD + background processing via after()
│   │       ├── sync/             # GitHub markdown sync trigger
│   │       ├── tags/             # Tag listing (sorted by usage count)
│   │       ├── process/[id]/     # Manual LLM reprocessing trigger
│   │       └── auth/             # Login endpoint (sets cookie)
│   ├── lib/
│   │   ├── db/
│   │   │   ├── schema.ts         # Drizzle schema (see above)
│   │   │   └── index.ts          # Lazy-init DB client (Proxy pattern)
│   │   ├── extraction/
│   │   │   ├── detect.ts         # URL → content type detection
│   │   │   ├── article.ts        # Readability + Turndown
│   │   │   ├── tweet.ts          # FixTweet API
│   │   │   ├── youtube.ts        # oEmbed + caption extraction
│   │   │   ├── metadata.ts       # Custom linkedom-based OG tag parser
│   │   │   └── index.ts          # Extraction orchestrator
│   │   ├── llm/
│   │   │   ├── client.ts         # OpenRouter API client
│   │   │   ├── tagger.ts         # Auto-tagging prompt + parsing
│   │   │   ├── summarizer.ts     # Title generation + summarization
│   │   │   ├── clean-content.ts  # Content cleanup for garbage HTML
│   │   │   └── index.ts          # LLM processing orchestrator
│   │   ├── sync/index.ts         # GitHub markdown sync via octokit
│   │   ├── process.ts            # Full async pipeline orchestrator
│   │   └── auth.ts               # API key + cookie session auth
│   ├── components/
│   │   ├── bookmark-item.tsx     # Bookmark display with expand/collapse
│   │   ├── add-bookmark.tsx      # URL input form
│   │   ├── search-bar.tsx        # Search input
│   │   ├── type-filter.tsx       # Filter by content type
│   │   └── tag-filter.tsx        # Filter by tag (sorted by usage, truncated)
│   └── middleware.ts             # Auth redirect for unauthenticated requests
├── extension/                    # Chrome extension (Manifest V3)
│   ├── manifest.json
│   ├── popup.html / popup.js
│   ├── background.js             # Keyboard shortcut handler
│   └── options.html / options.js # API URL + key config
├── drizzle.config.ts
├── package.json
└── .env.example


Resolved Design Decisions


Decision
Resolution
Why


Database
Neon Postgres via Vercel integration
No separate signup, free tier, auto-configured


LLM provider
OpenRouter (Gemini Flash)
Extremely cheap (~$0.0002/bookmark), fast, model flexibility


Metadata extraction
Custom linkedom parser
metascraper's native deps break serverless builds


Tag application
Auto-apply, no review queue
Low friction; can edit later in UI


Type detection
URL pattern is authoritative for tweets/videos
LLM sometimes misclassifies tweet content as "tool"


Git sync structure
Flat (one folder, no subdirectories)
Organized by frontmatter, not filesystem


Duplicate URLs
Update existing bookmark, reprocess
No duplicates in archive


Failed extraction
Save whatever we get
LLM can still tag/summarize from title + URL alone


Background processing
after() from next/server
Keeps Vercel function alive after response


Function timeout
maxDuration = 60 on processing routes
Default 10s is too short for extraction + 2 LLM calls


Chrome Web Store
Not needed
Unpacked extension loaded in dev mode works fine


Archive repo
Private GitHub repo
Personal bookmarks shouldn't be public
Component	Technology	Why
Framework	Next.js 15+ (App Router)	API routes + React UI + Vercel-native
Language	TypeScript	Type safety
Database	Neon Postgres (via Vercel integration)	Free tier, zero maintenance, no separate signup needed
ORM	Drizzle ORM (pg dialect)	Lightweight, type-safe
Hosting	Vercel (free tier)	Auto-deploys from Git, zero maintenance
LLM	OpenRouter API (Gemini Flash)	Cheap, fast, model flexibility
Styling	Tailwind CSS v4	Fast to build
IDs	nanoid	Short, URL-safe unique IDs
Library	Purpose	npm Package
Mozilla Readability	Article extraction from HTML	`@mozilla/readability`
linkedom	Lightweight DOM for serverless (no jsdom)	`linkedom`
Turndown	HTML → Markdown conversion	`turndown` + `turndown-plugin-gfm`
FixTweet API	Tweet content extraction (free, no API key)	HTTP call to `api.fxtwitter.com`
youtube-transcript-plus	YouTube transcript extraction	`youtube-transcript-plus`
YouTube oEmbed API	Reliable video title/author/thumbnail	HTTP call to `youtube.com/oembed`
YouTube Innertube API	Video description extraction (MWEB client)	HTTP POST to `youtube.com/youtubei/v1/player`
octokit	GitHub API for markdown sync	`octokit`
Decision	Resolution	Why
Database	Neon Postgres via Vercel integration	No separate signup, free tier, auto-configured
LLM provider	OpenRouter (Gemini Flash)	Extremely cheap (~$0.0002/bookmark), fast, model flexibility
Metadata extraction	Custom linkedom parser	metascraper's native deps break serverless builds
Tag application	Auto-apply, no review queue	Low friction; can edit later in UI
Type detection	URL pattern is authoritative for tweets/videos	LLM sometimes misclassifies tweet content as "tool"
Git sync structure	Flat (one folder, no subdirectories)	Organized by frontmatter, not filesystem
Duplicate URLs	Update existing bookmark, reprocess	No duplicates in archive
Failed extraction	Save whatever we get	LLM can still tag/summarize from title + URL alone
Background processing	`after()` from next/server	Keeps Vercel function alive after response
Function timeout	`maxDuration = 60` on processing routes	Default 10s is too short for extraction + 2 LLM calls
Chrome Web Store	Not needed	Unpacked extension loaded in dev mode works fine
Archive repo	Private GitHub repo	Personal bookmarks shouldn't be public