Skip to content

Instantly share code, notes, and snippets.

@bkataru
Created February 24, 2026 23:55
Show Gist options
  • Select an option

  • Save bkataru/ba0e7cb4fd43bf1a0ed2a96ed124b99b to your computer and use it in GitHub Desktop.

Select an option

Save bkataru/ba0e7cb4fd43bf1a0ed2a96ed124b99b to your computer and use it in GitHub Desktop.
SKILL: Chrome Extension LevelDB Recovery — rusty_leveldb pitfalls, double-encoded JSON, hot-copy corruption, Edge WAL extraction
# Skill: Chrome Extension LevelDB Recovery
> Recovering data from Chrome/Edge/Brave extension LevelDB stores (e.g., OneTab, 1Password, any browser extension that uses `chrome.storage.local`).
---
## Extension Storage Paths
| Browser | Extension LevelDB path |
|---------|----------------------|
| Chrome (Windows) | `%LOCALAPPDATA%\Google\Chrome\User Data\{Profile}\Local Extension Settings\{ext_id}\` |
| Edge (Windows) | `%LOCALAPPDATA%\Microsoft\Edge\User Data\{Profile}\Local Extension Settings\{ext_id}\` |
| Brave (Windows) | `%LOCALAPPDATA%\BraveSoftware\Brave-Browser\User Data\{Profile}\Local Extension Settings\{ext_id}\` |
| Chrome (Linux) | `~/.config/google-chrome/{Profile}/Local Extension Settings/{ext_id}/` |
| Edge (Linux) | `~/.config/microsoft-edge/{Profile}/Local Extension Settings/{ext_id}/` |
Profile is usually `Default`. Check `chrome://version` → Profile Path to confirm.
**Note:** Edge uses a different extension ID than Chrome/Brave for the same extension (e.g., OneTab: Chrome = `chphlpgkkbolifaimnlloiipkdnihall`, Edge = `hoimpamkkoehapgenciaoajfkfkpgfop`).
---
## Critical: Values Are Double-Encoded JSON Strings
Chrome extension storage (`chrome.storage.local`) stores values as **JSON-encoded strings**, not raw JSON objects. The LevelDB value for a JSON object like `{"key":"val"}` is stored as the string `"{\"key\":\"val\"}"` — with outer quotes.
In raw bytes: the value starts with `"` (0x22), then `{` (0x7b), then `\` (0x5c).
**In Rust with rusty_leveldb:**
```rust
// WRONG — silently returns Err for all chrome.storage.local values
serde_json::from_str::<MyStruct>(value_str)
// CORRECT — unwrap the outer JSON string first
let json_to_parse = if value_str.starts_with('"') {
serde_json::from_str::<String>(value_str)
.unwrap_or_else(|_| value_str.to_string())
} else {
value_str.to_string()
};
let data: MyStruct = serde_json::from_str(&json_to_parse)?;
```
---
## Critical: Iterator Must Call `seek_to_first()` Before Loop
`rusty_leveldb` iterators start **unpositioned**. Calling `advance()` before checking `valid()` means the loop body never executes.
```rust
// WRONG — loop never runs
let mut iter = db.new_iter()?;
iter.advance();
while iter.valid() { ... }
// CORRECT
let mut iter = db.new_iter()?;
iter.seek_to_first();
while iter.valid() {
iter.current(&mut key, &mut value);
// process...
iter.advance();
}
```
---
## Critical: Copy the LevelDB BEFORE Reading (Browser Must Be Closed)
If the browser is running during `scp` or file copy:
- The MANIFEST file may be mid-write → `Corruption: no meta-lognumber entry in descriptor`
- SSTable block CRCs will mismatch → `Corruption: checksum mismatch`
- Both `rusty_leveldb` and `plyvel` (Python) will fail
- Raw binary extraction yields only fragments (SSTable blocks split data across boundaries)
**Fix:** Fully close the browser (`File > Exit`, not just close window — check Task Manager), then copy. On Windows via SCP:
```powershell
# Close Chrome/Edge/Brave first, then:
scp -r "C:\Users\user\AppData\Local\Google\Chrome\User Data\Default\Local Extension Settings\chphlpgkkbolifaimnlloiipkdnihall" user@host:/tmp/onetab-chrome
```
If `scp` targets an existing directory, the browser folder will be placed *inside* it as a subdirectory (named after the extension ID). Adjust the `--db-path` accordingly.
---
## Edge Case: No `.ldb` Files (Data Only in WAL)
Edge (and sometimes Chrome on first run) may not have compacted the write-ahead log yet. The extension directory will contain only a `.log` file (possibly 100KB+) and `CURRENT`, with **no `.ldb` files**.
`rusty_leveldb` handles this transparently — the WAL is replayed into the in-memory DB on open.
If you need to extract raw from the WAL (e.g., tablitz's recover falls back):
```python
import json, re
with open('000003.log', 'rb') as f:
data = f.read()
# Find the last "state" entry (most complete snapshot)
# The value in WAL has the same double-encoding as LevelDB
pattern = b'"{\\"tabGroups'
idx = data.rfind(pattern)
chunk = data[idx:]
text = chunk.decode('utf-8', errors='replace')
# Walk to find closing `}"` of the outer JSON string
i = 1; esc = False
while i < len(text):
c = text[i]
if esc: esc = False
elif c == '\\': esc = True
elif c == '"':
outer_str = text[:i+1]
# Strip WAL block boundary noise (control chars)
cleaned = re.sub(r'[\x00-\x08\x0b\x0c\x0e-\x1f\x7f\ufffd]', '', outer_str)
inner = json.loads(cleaned) # unwrap outer string
data2 = json.loads(inner) # parse inner JSON
print(f"Groups: {len(data2['tabGroups'])}")
break
i += 1
```
Control characters (`\x0f`, etc.) appear at WAL block boundaries (every 32KB) inside string values — strip them before parsing.
---
## Rust Crate Recommendation
Use `rusty_leveldb` (pure Rust, no C deps) over `plyvel` (Python C bindings) or `leveldb` (C++ FFI). It handles the WAL replay and compaction transparently, and works on any platform without needing LevelDB system libraries.
```toml
[dependencies]
rusty-leveldb = "1"
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment