A step-by-step blueprint for building a personal bookmark saving and archiving tool. Built with Next.js, Neon Postgres, and LLM-powered auto-tagging. Designed to be pasted into a Claude Code session and built in one sitting.
What you get: A self-hosted web app that captures URLs, extracts content as clean markdown, auto-tags and summarizes via LLM, and optionally syncs everything as markdown files to a GitHub repo for your second brain.
Cost: $0/month (Vercel free tier + Neon free tier). Only cost is LLM usage via OpenRouter (~$0.0002 per bookmark with Gemini Flash — roughly 5,000 bookmarks per dollar).
- Create a new folder for your project
- Open Claude Code in that folder
- Paste this entire document and say: "Build this project step by step"
- Claude will scaffold, implement, and deploy the entire thing
URL saved (extension / iOS shortcut / web UI)
→ API creates bookmark record instantly (< 1s response)
→ Background processing starts:
1. Detect content type from URL (tweet? youtube? article? tool?)
2. Type-specific extraction:
Tweet: FixTweet API → structured data (text, author, media)
YouTube: oEmbed API (title/author) + caption extractor (transcript)
Generic: Readability → article HTML → Turndown → clean markdown
3. Extract metadata (title, OG image, author, date)
4. If content is low quality (ASCII art, broken HTML): LLM cleanup
5. LLM auto-tagging (categories + tags + content type classification)
6. LLM summarization (clean title + 2-3 sentence summary)
7. Save tags and summary to database
8. Commit markdown file to GitHub archive repo
→ Web UI polls and shows processing status → final result
| Component | Technology | Why |
|---|---|---|
| Framework | Next.js 15+ (App Router) | API routes + React UI + Vercel-native |
| Language | TypeScript | Type safety |
| Database | Neon Postgres (via Vercel integration) | Free tier, zero maintenance, no separate signup needed |
| ORM | Drizzle ORM (pg dialect) | Lightweight, type-safe |
| Hosting | Vercel (free tier) | Auto-deploys from Git, zero maintenance |
| LLM | OpenRouter API (Gemini Flash) | Cheap, fast, model flexibility |
| Styling | Tailwind CSS v4 | Fast to build |
| IDs | nanoid | Short, URL-safe unique IDs |
| Library | Purpose | npm Package |
|---|---|---|
| Mozilla Readability | Article extraction from HTML | @mozilla/readability |
| linkedom | Lightweight DOM for serverless (no jsdom) | linkedom |
| Turndown | HTML → Markdown conversion | turndown + turndown-plugin-gfm |
| FixTweet API | Tweet content extraction (free, no API key) | HTTP call to api.fxtwitter.com |
| youtube-transcript-plus | YouTube transcript extraction | youtube-transcript-plus |
| YouTube oEmbed API | Reliable video title/author/thumbnail | HTTP call to youtube.com/oembed |
| YouTube Innertube API | Video description extraction (MWEB client) | HTTP POST to youtube.com/youtubei/v1/player |
| octokit | GitHub API for markdown sync | octokit |
Important: Do NOT use
metascraper— itsre2native dependency breaks Turbopack/Vercel builds. Use a custom metadata extractor with linkedom instead (parse<meta>and<title>tags directly).
Use Drizzle ORM with Neon Postgres. The schema includes an original_title field to preserve the raw extracted title alongside the LLM-generated clean title.
// src/lib/db/schema.ts
import { pgTable, text, boolean, primaryKey } from "drizzle-orm/pg-core";
export const bookmarks = pgTable("bookmarks", {
id: text("id").primaryKey(), // nanoid
url: text("url").notNull().unique(),
title: text("title"), // LLM-generated clean title
originalTitle: text("original_title"), // Raw extracted title
type: text("type"), // article | tweet | video | tool
contentMd: text("content_md"), // Full extracted markdown
summary: text("summary"), // LLM-generated 2-3 sentence summary
author: text("author"),
imageUrl: text("image_url"),
createdAt: text("created_at").notNull(),
updatedAt: text("updated_at").notNull(),
processed: boolean("processed").default(false),
synced: boolean("synced").default(false),
});
export const tags = pgTable("tags", {
id: text("id").primaryKey(),
name: text("name").notNull().unique(),
isPredefined: boolean("is_predefined").default(false),
});
export const bookmarkTags = pgTable("bookmark_tags", {
bookmarkId: text("bookmark_id").notNull()
.references(() => bookmarks.id, { onDelete: "cascade" }),
tagId: text("tag_id").notNull()
.references(() => tags.id, { onDelete: "cascade" }),
}, (table) => [primaryKey({ columns: [table.bookmarkId, table.tagId] })]);
export const categories = pgTable("categories", {
id: text("id").primaryKey(),
name: text("name").notNull().unique(),
});
export const bookmarkCategories = pgTable("bookmark_categories", {
bookmarkId: text("bookmark_id").notNull()
.references(() => bookmarks.id, { onDelete: "cascade" }),
categoryId: text("category_id").notNull()
.references(() => categories.id, { onDelete: "cascade" }),
}, (table) => [primaryKey({ columns: [table.bookmarkId, table.categoryId] })]);POST /api/bookmarks — Save a new URL (from extension, shortcut, or web UI)
GET /api/bookmarks — List bookmarks (supports ?q=search&tag=name&type=article&page=1)
GET /api/bookmarks/:id — Get single bookmark with full content
PATCH /api/bookmarks/:id — Update bookmark (edit tags, title)
DELETE /api/bookmarks/:id — Delete bookmark
GET /api/tags — List all tags (sorted by usage count)
POST /api/process/:id — Manually trigger reprocessing
POST /api/sync — Trigger full markdown sync
POST /api/auth — Login (sets session cookie)
All endpoints require Authorization: Bearer <API_KEY> header (or session cookie for web UI).
These are the exact prompts used for LLM processing. All use OpenRouter with Gemini Flash and JSON mode.
Classifies content type, assigns categories, and suggests tags. Called with response_format: { type: "json_object" }.
SYSTEM:
You are a bookmark categorization assistant. Given a bookmark's title, URL, and content, you must:
1. Determine the content type:
- "tool" — software, SaaS, libraries, frameworks, apps, component galleries, developer tools, any product/service page
- "article" — blog posts, essays, tutorials, documentation, news
- "tweet" — tweets / X posts
- "video" — YouTube or other video content
2. Pick 1-2 categories from the predefined list
3. Suggest 2-5 tags (short, lowercase, hyphenated)
Rules:
- Tags describe the TOPIC and SUBJECT of the content, NOT the format
- NEVER use format tags like "tweet", "article", "video", "blog-post", "thread"
- Good tags: "ai", "anthropic", "react", "productivity", "startup-funding", "css-grid"
- Bad tags: "tweet", "article", "blog", "video", "post", "thread"
- Include names of companies, tools, or notable people when the content is ABOUT them
- Don't tag random tweet authors just because they posted it — only tag a person if the content is about that person or their notable work
- Prefer existing tags over creating new ones
- Max 5 tags total
- Categories must come from the provided list only
Respond in JSON format:
{
"contentType": "tool",
"categories": ["Category Name"],
"tags": ["tag-one", "tag-two"]
}
USER:
Title: {title}
URL: {url}
Content (truncated): {first 2000 chars of content}
Available categories: {comma-separated list}
Existing tags (prefer these): {comma-separated list}
Generates a clean descriptive title and a 2-3 sentence summary. Also uses JSON mode.
SYSTEM:
You are a summarization assistant. Given a bookmark's title, URL, and content, generate:
1. A clean, descriptive title (max 80 chars) — describe WHAT the content is about, not who posted it. For tweets, don't start with the author's handle. For articles, improve vague titles.
2. A concise 2-3 sentence summary.
Examples of good titles:
- "Anthropic's Report on AI Capability Across Occupations"
- "CSS Container Queries: Complete Guide"
- "Why SQLite Is Good Enough for Most Web Apps"
Examples of bad titles:
- "@username: Anthropic just released..."
- "Thread about AI"
- "Interesting article"
Respond in JSON format:
{
"title": "Clean descriptive title",
"summary": "2-3 sentence summary."
}
USER:
Original title: {title}
URL: {url}
Content (truncated): {first 3000 chars of content}
Used only when extracted content is detected as low quality (ASCII art, SVG links, broken HTML). Not JSON mode.
SYSTEM:
You are a content cleaning assistant. You receive raw extracted markdown from a web page that may contain noise: ASCII art, SVG references, broken links, navigation elements, footers, cookie notices, etc.
Clean it up into readable, useful markdown. Keep:
- The main description/explanation of what this page is about
- Key features, capabilities, or points
- Code examples if relevant
Remove:
- ASCII art and decorative elements
- Navigation links, footers, headers
- SVG/image references that aren't meaningful
- Repetitive or boilerplate text
- Login/signup CTAs
If the content is mostly noise, write a concise description based on whatever you can extract. Keep the output under 500 words.
USER:
Page: {title} ({url})
Raw extracted content:
{first 4000 chars of content}
URL-detected types (tweet, video) are authoritative — the LLM cannot override them. The LLM only refines between article and tool for generic URLs.
function detectContentType(url: string): "tweet" | "video" | "article" | "tool" {
const host = new URL(url).hostname.replace("www.", "");
if (host === "twitter.com" || host === "x.com") return "tweet";
if (host === "youtube.com" || host === "youtu.be" || host === "m.youtube.com") return "video";
if (host === "github.com") {
const parts = new URL(url).pathname.split("/").filter(Boolean);
if (parts.length === 2) return "tool"; // repo root
}
return "article"; // LLM will refine to "tool" if appropriate
}
// In processBookmark():
const urlType = bookmark.type;
const finalType = (urlType === "tweet" || urlType === "video")
? urlType // URL detection is authoritative
: tagResult.contentType; // LLM refines article vs toolA heuristic that detects garbage extraction output (common with SPAs, landing pages with lots of SVGs/ASCII art):
function isLowQualityContent(content: string | null): boolean {
if (!content) return true;
const lines = content.split("\n").filter((l) => l.trim());
if (lines.length < 3) return true;
// Lines that are mostly non-alphanumeric (ASCII art, symbols)
const noiseLines = lines.filter((l) => {
const alphaChars = l.replace(/[^a-zA-Z]/g, "").length;
return alphaChars < l.trim().length * 0.3;
});
if (noiseLines.length > lines.length * 0.4) return true;
// Too many links relative to text
const linkCount = (content.match(/\[.*?\]\(.*?\)/g) || []).length;
const wordCount = content.split(/\s+/).length;
if (linkCount > 0 && linkCount > wordCount * 0.15) return true;
return false;
}When detected, the content is sent to the LLM content cleaner before storage.
The full async pipeline that runs after a bookmark is saved:
async function processBookmarkPipeline(bookmarkId: string) {
try {
// 1. Extract content (type-specific)
const result = await extractContent(bookmark.url);
// 2. Clean low-quality content via LLM (if needed)
let content = result.content;
if (isLowQualityContent(content)) {
content = await cleanContent(content, result.title, bookmark.url);
}
// 3. Save extraction results to DB
await db.update(bookmarks).set({
title: result.title,
type: result.type,
contentMd: content,
author: result.author,
imageUrl: result.imageUrl,
}).where(eq(bookmarks.id, bookmarkId));
// 4. LLM processing: tagging + summarization (run in parallel)
const [tagResult, summaryResult] = await Promise.all([
autoTag(title, url, content, existingCategories, existingTags),
summarize(title, url, content),
]);
// 5. Save LLM results
await db.update(bookmarks).set({
originalTitle: bookmark.title,
title: summaryResult.title,
summary: summaryResult.summary,
type: finalType, // URL-detected types are authoritative
processed: true,
}).where(eq(bookmarks.id, bookmarkId));
// 6. Save tags (with upsert)
for (const tagName of tagResult.tags) {
// Insert tag if new, then link to bookmark
await db.insert(tags).values({ id: nanoid(), name: tagName }).onConflictDoNothing();
const tag = await db.query.tags.findFirst({ where: eq(tags.name, tagName) });
await db.insert(bookmarkTags).values({ bookmarkId, tagId: tag.id }).onConflictDoNothing();
}
// 7. Sync markdown file to GitHub
await syncBookmark(bookmarkId);
} catch (error) {
console.error(`Processing failed for bookmark ${bookmarkId}:`, error);
// Bookmark is still saved — can retry via /api/process/:id
}
}Each bookmark becomes a markdown file committed to a private GitHub repo via the GitHub API (octokit).
File naming: YYYY-MM-DD-slugified-title.md
Structure: Flat (all files in repo root, organized by frontmatter not folders)
---
title: "Article Title"
url: "https://example.com/article"
saved: 2026-03-07T12:00:00Z
type: article
tags: [productivity, ai, tools]
categories: [AI Tools]
summary: "Brief 2-3 sentence summary generated by LLM."
author: "Author Name"
image: "https://example.com/og-image.jpg"
---
# Article Title
Extracted content as clean markdown...Aesthetic: Terminal-inspired but readable. Think "markdown rendered beautifully", not a SaaS dashboard.
- Typography: Monospaced font throughout (Geist Mono, JetBrains Mono, or IBM Plex Mono)
- Colors: Black text on white. Subtle grays for secondary info. One accent color (dark pink
#c2185b) for active states, links, hover effects - Layout: Centered content column (max ~800px), generous whitespace, line-height 1.6+
- Components: Text-driven, minimal iconography. Tags as small inline labels. Search as underlined input. Expand/collapse with a rotating
›arrow. - Interactions: Standard mouse-driven UI. No terminal simulation.
Minimal unpacked extension — no Chrome Web Store needed. Works in Chrome, Arc, Edge, Brave.
{
"manifest_version": 3,
"name": "Link Stash",
"version": "1.0",
"permissions": ["activeTab"],
"action": { "default_popup": "popup.html" },
"commands": {
"save-bookmark": {
"suggested_key": { "default": "Ctrl+Shift+S", "mac": "Command+Shift+S" },
"description": "Save current page"
}
},
"background": { "service_worker": "background.js" }
}The popup auto-saves the current page on open (no click needed), shows status, and closes after 1.2s. The background script handles the keyboard shortcut. Options page lets user configure API URL and key. Add host_permissions for your deployment domain.
Create an iOS Shortcut that:
- Receives URL from share sheet (or clipboard)
- Sends HTTP POST to
/api/bookmarkswithAuthorization: Bearer <API_KEY>header and{ "url": "..." }body - Shows "Saved!" notification
DATABASE_URL= # Neon Postgres URL (auto-set by Vercel Neon integration)
API_KEY= # Shared secret for API auth (generate a random string)
OPENROUTER_API_KEY= # From openrouter.ai
GITHUB_TOKEN= # GitHub personal access token (repo scope) for markdown sync
GITHUB_REPO= # e.g., "username/bookmark-archive"
AI & Machine Learning
Development Tools
Design & UI
Productivity
Business & Startups
Science & Research
Culture & Media
Personal Development
Reference / Tutorial
The LLM picks from these. Edit them in the database as needed.
These are real issues encountered during development. Addressing them upfront will save hours.
Problem: Vercel hobby plan defaults to 10s function timeout. The full processing pipeline (extraction + 2 LLM calls + DB writes + GitHub sync) can exceed this.
Fix: Add export const maxDuration = 60; to API routes that run the processing pipeline. This extends the timeout to 60s on the hobby plan.
Problem: Floating promises (void processBookmark()) get killed when the serverless function returns.
Fix: Use after() from next/server — it keeps the function alive after the response is sent. Also expose a /api/process/:id endpoint for manual retries.
Problem: The metascraper npm package depends on re2 (a native C++ module) which fails to build in Turbopack/Vercel.
Fix: Don't use metascraper. Build a simple metadata extractor using linkedom that parses <meta> OG tags and <title> directly from HTML. It's 30 lines of code and works everywhere.
Problem: Drizzle client initialized at module import time tries to connect during build, when DATABASE_URL isn't available.
Fix: Use a Proxy-based lazy initialization pattern — wrap getDb() in a Proxy so the real Neon connection is only created on first use at runtime.
Problem: onConflictDoNothing() silently skips if a tag already exists, but the code may reference a newly generated ID that was never inserted.
Fix: After onConflictDoNothing(), always re-fetch the tag from the DB to get the correct existing ID.
Problem: Next.js tries to pre-render API routes at build time, which fails because the database isn't available.
Fix: Add export const dynamic = "force-dynamic"; to every API route file.
Problem: YouTube blocks cloud server IPs (Vercel, AWS, etc.) from fetching page HTML and transcripts. Most transcript libraries fail silently with 0 results.
Fix: Use the YouTube oEmbed API for title/author/thumbnail (always works). Use youtube-transcript-plus for transcripts (works locally, may fail on Vercel). Use YouTube's Innertube API with the MWEB client for descriptions — this works from cloud servers when page scraping doesn't. For reliable transcripts on Vercel, consider the YouTube Data API with an API key or a proxy service like Apify.
Problem: The LLM tends to generate format-based tags ("tweet", "article", "thread") rather than topic-based tags. It also tags tweet authors even when the content isn't about them. Fix: Explicit rules in the prompt: "NEVER use format tags", "only tag a person if the content is ABOUT that person". Provide examples of good and bad tags. Pass existing tags to the LLM and tell it to "prefer existing tags over creating new ones".
Problem: The LLM sometimes reclassifies tweets as "tool" or "article" based on content, overriding the URL-based type detection.
Fix: URL-detected types (tweet from x.com, video from youtube.com) should be authoritative. The LLM should only refine between article and tool for generic URLs.
Problem: If your project folder path contains spaces (e.g., in a Dropbox folder), npx commands break.
Fix: Run commands directly via node: node node_modules/.bin/next build instead of npx next build. Or just avoid spaces in the project path.
Problem: Using the "Get URLs from Input" action in iOS Shortcuts as an intermediate step can return an empty string, causing the API to receive {"url":""}.
Fix: Skip the intermediate "Get URLs from Input" action entirely. Pass Shortcut Input directly as the JSON url value in the "Get Contents of URL" action. Also handle URL as array on the API side (if (Array.isArray(url)) url = url[0]) since Shortcuts may send it that way.
Problem: Chrome extensions get "Network error" when calling your Vercel API without explicit host permissions.
Fix: Add "host_permissions": ["https://*.vercel.app/*"] to manifest.json. Adjust the pattern to match your actual deployment domain.
Problem: Content extraction silently fails for some sites that block or challenge requests from cloud server IPs (Cloudflare, bot detection). The fetchPage function gets an error page instead of article HTML, and Readability returns null.
Fix: Add proper browser-like headers (Accept, Accept-Language) to fetch requests. Log non-200 responses so failures are visible. The LLM can still tag and summarize from the URL and OG metadata alone, so the bookmark isn't lost — just missing the full article content.
- Push to GitHub
- Connect repo to Vercel
- Add Neon Postgres via Vercel integrations (this auto-sets
DATABASE_URL) - Set remaining env vars in Vercel dashboard:
API_KEY,OPENROUTER_API_KEY,GITHUB_TOKEN,GITHUB_REPO - Deploy
- Push database schema:
npx drizzle-kit push - Visit the app URL, log in with your API key
- Load the Chrome extension via
chrome://extensions→ "Load unpacked" - Create the iOS Shortcut
link-stash/
├── src/
│ ├── app/ # Next.js App Router
│ │ ├── page.tsx # Main bookmark list UI
│ │ ├── layout.tsx # Root layout with monospace font
│ │ ├── globals.css # Tailwind + CSS variables
│ │ ├── login/page.tsx # Simple password login
│ │ └── api/
│ │ ├── bookmarks/ # CRUD + background processing via after()
│ │ ├── sync/ # GitHub markdown sync trigger
│ │ ├── tags/ # Tag listing (sorted by usage count)
│ │ ├── process/[id]/ # Manual LLM reprocessing trigger
│ │ └── auth/ # Login endpoint (sets cookie)
│ ├── lib/
│ │ ├── db/
│ │ │ ├── schema.ts # Drizzle schema (see above)
│ │ │ └── index.ts # Lazy-init DB client (Proxy pattern)
│ │ ├── extraction/
│ │ │ ├── detect.ts # URL → content type detection
│ │ │ ├── article.ts # Readability + Turndown
│ │ │ ├── tweet.ts # FixTweet API
│ │ │ ├── youtube.ts # oEmbed + caption extraction
│ │ │ ├── metadata.ts # Custom linkedom-based OG tag parser
│ │ │ └── index.ts # Extraction orchestrator
│ │ ├── llm/
│ │ │ ├── client.ts # OpenRouter API client
│ │ │ ├── tagger.ts # Auto-tagging prompt + parsing
│ │ │ ├── summarizer.ts # Title generation + summarization
│ │ │ ├── clean-content.ts # Content cleanup for garbage HTML
│ │ │ └── index.ts # LLM processing orchestrator
│ │ ├── sync/index.ts # GitHub markdown sync via octokit
│ │ ├── process.ts # Full async pipeline orchestrator
│ │ └── auth.ts # API key + cookie session auth
│ ├── components/
│ │ ├── bookmark-item.tsx # Bookmark display with expand/collapse
│ │ ├── add-bookmark.tsx # URL input form
│ │ ├── search-bar.tsx # Search input
│ │ ├── type-filter.tsx # Filter by content type
│ │ └── tag-filter.tsx # Filter by tag (sorted by usage, truncated)
│ └── middleware.ts # Auth redirect for unauthenticated requests
├── extension/ # Chrome extension (Manifest V3)
│ ├── manifest.json
│ ├── popup.html / popup.js
│ ├── background.js # Keyboard shortcut handler
│ └── options.html / options.js # API URL + key config
├── drizzle.config.ts
├── package.json
└── .env.example
| Decision | Resolution | Why |
|---|---|---|
| Database | Neon Postgres via Vercel integration | No separate signup, free tier, auto-configured |
| LLM provider | OpenRouter (Gemini Flash) | Extremely cheap (~$0.0002/bookmark), fast, model flexibility |
| Metadata extraction | Custom linkedom parser | metascraper's native deps break serverless builds |
| Tag application | Auto-apply, no review queue | Low friction; can edit later in UI |
| Type detection | URL pattern is authoritative for tweets/videos | LLM sometimes misclassifies tweet content as "tool" |
| Git sync structure | Flat (one folder, no subdirectories) | Organized by frontmatter, not filesystem |
| Duplicate URLs | Update existing bookmark, reprocess | No duplicates in archive |
| Failed extraction | Save whatever we get | LLM can still tag/summarize from title + URL alone |
| Background processing | after() from next/server |
Keeps Vercel function alive after response |
| Function timeout | maxDuration = 60 on processing routes |
Default 10s is too short for extraction + 2 LLM calls |
| Chrome Web Store | Not needed | Unpacked extension loaded in dev mode works fine |
| Archive repo | Private GitHub repo | Personal bookmarks shouldn't be public |