Skip to content

Instantly share code, notes, and snippets.

@CodyEngel
Last active December 30, 2025 12:18
Show Gist options
  • Select an option

  • Save CodyEngel/f7cb1ee7dc6850e69829fd00be6fc302 to your computer and use it in GitHub Desktop.

Select an option

Save CodyEngel/f7cb1ee7dc6850e69829fd00be6fc302 to your computer and use it in GitHub Desktop.
For situations where you want to bulk check if "redacted" files are actually redacted...

Check Redaction

This software should be necessary. The only reason it exists is to bulk check if redactions were made improperly. This doesn't do anything wizardy or voodoo magic. It simply extracts the text from the document and compares the text to the PDF. You can do all of this manually by viewing a PDF, copying the text, and pasting it into a text editor of your choice.

Usage

This assumes the file name is lowercase. Since gists alphabetize based on ASCII characters, uppercase come first and in order to avoid this gist being named "LICENSE.md" captilization of the file name was required.

# Scan current directory (where script is saved)
python check_redaction.py

# Scan specific directory
python check_redaction.py --directory /path/to/pdfs

# Scan recursively
python check_redaction.py --recursive

# Scan specific directory recursively
python check_redaction.py --directory /path/to/pdfs --recursive

Setup Instructions

1. Install Python

macOS:

# Install Homebrew if not already installed
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"

# Install Python
brew install python

Linux (Ubuntu/Debian):

sudo apt update
sudo apt install python3 python3-pip

Windows: Download and install from python.org

  • Check "Add Python to PATH" during installation

2. Verify Python Installation

python3 --version
# Should show: Python 3.x.x

3. Install Required Dependency

pip3 install PyMuPDF

4. Save the Script

Save the script as check_redaction.py in your desired location.

5. Make it Executable (Optional - macOS/Linux)

chmod +x check_redaction.py

6. Run the Script

From script directory:

python3 check_redaction.py

Scan specific directory:

python3 check_redaction.py --directory ~/Documents/epstein/dev-test/VOL00008/IMAGES/0001

Scan recursively:

python3 check_redaction.py --directory ~/Documents/epstein/dev-test --recursive

Example Output

Given the following pdf that wasn't properly redacted you can expect a text file with the following details:

  1. READACTED WORDS: these are all of the words that were found in the file, these will appear in the order they were found and are intended to just be the raw output to assist with quickly scanning a file for the unredacted redactions.
  2. FULL TEXT: this is the full text in it's unredacted form.
*-unredacted.txt

REDACTED WORDS: ['JURY TRIAL DEMANDED', 'SECOND AMENDED COMPLAINT', 'Epstein’s real property (as laid out below); and of Financial Strategy Group, Ltd.; Financial', 'Epstein’s real property (as laid out below); and of Financial Strategy Group, Ltd.; Financial', 'Trust, Inc.; FT Real Estate Inc.; Gratitude America, Inc.; Hyperion Air, Inc.; J. Epstein Virgin', 'Trust, Inc.; FT Real Estate Inc.; Gratitude America, Inc.; Hyperion Air, Inc.; J. Epstein Virgin', '-', '-', 'As of October 23, 2007, Indyke was listed as President of the Foundation. He', 'also was a signatory on the Foundation’s checking accounts.', 'Between September 2015 and June 2019, Indyke signed Foundation account', 'checks for over $400,000 made payable to young female models and actresses, including a', 'former Russian model who received over $380,000 through monthly payments of $8,333 made', '■', 'In November 2017, Indyke signed a Foundation check made payable to the', 'immigration lawyer in New York who was involved in one or more forced marriages arranged', 'among Epstein’s victims to secure a victim’s immigration status. The check’s memo line', 'references the former Russian model’s last name.', 'JSC Interiors, LLC is a New York Limited Liability Company, the Articles of', 'Organization of which were filed in November 2014. The Articles list JSC, who was forced and', 'coerced to have sex with Epstein, as the company’s sole owner.  JSC was manipulated,', 'coerced to have sex with Epstein, as the company’s sole owner.  JSC was manipulated,', 'According to JSC’s operating agreement, Kahn was to be the initial Manager of', 'for JSC’s bank accounts.', '1111', '1111', '•', '•', '•', 'One of JSC’s bank accounts was funded entirely with money transferred from', 'JSC’s payroll was paid to two persons, one of whom was the listed sole owner.', 'Kahn gave conflicting reports to JSC’s bank about the second person on the company’s payroll', 'and the reasons for its payments to her. Once, he described her as an interior designer, which', 'would justify the payments in light of JSC’s purported line of business, but which appears to', 'have been false. The other time, Kahn described this payroll recipient as a dentist, which would', 'not justify JSC Interiors’ payments to her, but which appears to be true.', '1111', '1111', '1111', '1111', '-', '-', 'different bank totaling almost $50,000 between November 2016 and July 2019 (just before', 'Epstein’s arrest) to women with Eastern European surnames, including one known to have', 'recruited young women and girls for Epstein.', '-', '-', 'which Indyke had signatory authority, someone acting on Epstein’s behalf made a total of 21', 'which Indyke had signatory authority, someone acting on Epstein’s behalf made a total of 21', 'separate withdrawals each in the amount of $1,000 on every but one business day from April 9,', '2019 to May 8, 2019.', 'Payments from this account totaling almost $60,000 were transferred by wire to', 'young women mostly at foreign beneficiary banks in February and March 2016.', 'From 2011 to 2019, Epstein and Epstein-owned entities paid over $16 million', 'net to Defendant/Co-Executor Indyke, and over $10 million net to Defendant/Co-Executor', 'net to Defendant/Co-Executor Indyke, and over $10 million net to Defendant/Co-Executor', 'Kahn.  This includes loans that are still outstanding to Indyke- and Kahn-related entities.  Based', 'on records obtained so far, tax forms provided by Epstein entities did not report nearly the full', 'compensation to Indyke and Kahn.', '-', '■', 'F. The Epstein Enterprise Used Corporate Entities to Defraud the Government', 'and Fund its Criminal Activities', '1. Defendant Southern Trust Company, Inc.', 'above whom Kahn represented to be, alternatively, an interior designer and a dentist, was', 'paid employee of Southern Trust Company, which did not actually or even pretend to perform', 'either interior design or dentistry services, in 2019.', 'year ended December 31, 2018, despite it paying $106,394.60 in Santa Fe property taxes on', 'November 6, 2018.', '$29,736 and expenses of $150, despite it paying $55,770.41 and $113,679.56 in Santa Fe', 'property taxes during 2017.', 'ended December 31, 2018, despite it paying $336,471.87 in New York City property taxes', 'during 2018.', '-', '$18,281 and expenses of $150, despite it paying $327,497.48 and $6,487.04 in New York City', 'property taxes during 2017.', 'for the year ended December 31, 2018, despite it paying $196,673.56 in Palm Beach property', 'taxes on November 6, 2018.', '$37,129 and expenses of $150, despite it paying $191,941.52 in Palm Beach property taxes on', 'October 31, 2017.', 'Defendants also attempted to conceal their criminal sex trafficking and abuse', 'conduct by paying large sums of money to participant-witnesses, including by paying for their', 'attorneys’ fees and case costs in litigation related to this conduct.', 'Epstein also threatened harm to victims and helped release damaging stories', 'about them to damage their credibility when they tried to go public with their stories of being', 'trafficked and sexually abused.', 'Epstein also instructed one or more Epstein Enterprise participant-witnesses to', 'destroy evidence relevant to ongoing court proceedings involving Defendants’ criminal sex', 'trafficking and abuse conduct.', 'Dated: February 10, 2021', '/s/ Carol Thomas-Jacobs']

FULL TEXT:

Government Exhibit 1 
 
I~ THE SUPERIOR COURT 
OF THE VIRGIN ISLANDS 
FILED 
March 17, 2022 06:09 eM 
ST-2021-RV-OOOOS 
TAMARA CHARLES 
CLERK OF THE COURT 

IN THE SUPERIOR COURT OF THE VIRGIN ISLANDS 
DIVISION OF ST. THOMAS AND ST. JOHN 
******************************** 
 
GOVERNMENT OF THE UNITED STATES 
 
VIRGIN ISLANDS,  

Case No.:  ST-20-CV-14 

PLAINTIFF,  
 
...wayyyyyyyyyy too much text to be practical in a readme file you should know the gist though, it's the text from the
pdf but in situations where the text behind the blackbox still exists it will be here so you can read the entirety of
the file.
#!/usr/bin/env python3
import sys
import argparse
from pathlib import Path
import fitz # PyMuPDF
def check_improper_redaction(pdf_path):
"""Detect if text exists under black rectangles."""
try:
doc = fitz.open(pdf_path)
improper_redactions = []
full_text = []
for page_num, page in enumerate(doc):
# Get all text with coordinates
text_instances = page.get_text("dict")
# Get all drawing operations (rectangles)
drawings = page.get_drawings()
# Find black-filled rectangles
black_rects = []
for drawing in drawings:
if drawing.get("fill"): # Has fill color
fill_color = drawing.get("fill")
# Check if black or very dark (RGB close to 0)
if all(c < 0.1 for c in fill_color[:3]):
black_rects.append(drawing["rect"])
# Extract full page text for context
full_text.append(page.get_text())
# Check if any text falls within black rectangles
for block in text_instances.get("blocks", []):
if block.get("type") == 0: # Text block
for line in block.get("lines", []):
for span in line.get("spans", []):
text = span.get("text", "").strip()
if text:
bbox = fitz.Rect(span["bbox"])
# Check overlap with black rectangles
for rect in black_rects:
if bbox.intersects(rect):
improper_redactions.append(text)
doc.close()
if improper_redactions:
# Save extracted improperly redacted text
output_path = pdf_path.parent / f"{pdf_path.stem}-unredacted.txt"
with open(output_path, 'w') as f:
f.write(f"REDACTED WORDS: {improper_redactions}\n\n")
f.write("FULL TEXT:\n")
f.write("\n".join(full_text))
print(f"⚠️ {pdf_path.name}: Found {len(improper_redactions)} improper redactions → {output_path.name}")
return True
else:
print(f"✓ {pdf_path.name}: No improper redactions detected")
return False
except Exception as e:
print(f"✗ Error processing {pdf_path.name}: {e}")
return False
def main():
parser = argparse.ArgumentParser(
description='Scan PDF files for improper redactions (text under black rectangles)'
)
parser.add_argument(
'--directory',
type=str,
default=None,
help='Directory to scan (default: script location)'
)
parser.add_argument(
'--recursive',
action='store_true',
help='Recursively scan subdirectories'
)
args = parser.parse_args()
# Determine directory to scan
if args.directory:
scan_dir = Path(args.directory).expanduser()
else:
scan_dir = Path(__file__).parent
if not scan_dir.exists():
print(f"✗ Directory not found: {scan_dir}")
sys.exit(1)
# Find all PDF files
if args.recursive:
pdf_files = list(scan_dir.rglob("*.pdf"))
else:
pdf_files = list(scan_dir.glob("*.pdf"))
if not pdf_files:
print(f"No PDF files found in {scan_dir}")
sys.exit(0)
print(f"Scanning {len(pdf_files)} PDF file(s) in {scan_dir}")
print(f"Recursive: {args.recursive}\n")
# Process each PDF
total_improper = 0
for pdf_file in pdf_files:
if check_improper_redaction(pdf_file):
total_improper += 1
print(f"\n{'=' * 80}")
print(f"Summary: {total_improper} file(s) with improper redactions out of {len(pdf_files)} total")
if __name__ == "__main__":
main()

EPSTEIN DOCUMENT TRANSPARENCY LICENSE v1.0

Permission is hereby granted, free of charge, to any person or organization to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of this software, EXCEPT:

GOVERNMENT ENTITIES AND INDIVIDUALS WORKING IN THE PUBLIC SECTOR must pay a licensing fee of $420 per document scanned. Failure to disclose usage will result in a retroactive licensing fee of $28,980 per document scanned.

Payment can be made to: The Epstein Victims’ Compensation Fund

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED. IN NO EVENT SHALL THE AUTHORS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY ARISING FROM THE USE OF THIS SOFTWARE, INCLUDING BUT NOT LIMITED TO FINDING OUT THINGS YOU WISH YOU HADN'T.

By using this software, you acknowledge that transparency and accountability are not optional, and that black rectangles over text in PDFs is the legal equivalent of a child covering their eyes and saying "you can't see me."

SPECIAL EXEMPTION: The FBI, DOJ, and Donald J. Trump (no JUNIORS) may use this software for free, but only after publicly releasing all unredacted Epstein-related documents in their possession in accordance to The Epstein Files Transparancy Act, H.R.4405.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment