Skip to content

Instantly share code, notes, and snippets.

@AndrewAltimit
Last active August 21, 2025 00:50
Show Gist options
  • Select an option

  • Save AndrewAltimit/6b16410c5e999510223de9109aa132ea to your computer and use it in GitHub Desktop.

Select an option

Save AndrewAltimit/6b16410c5e999510223de9109aa132ea to your computer and use it in GitHub Desktop.

URL fetcher with imported cookies

A powerful Python script for fetching content from any URL with advanced features including automatic cookie loading, authentication support, and SSL options.

Features

  • Universal URL Fetching: Works with any HTTP/HTTPS URL
  • Automatic Cookie Loading: Load cookies from Chrome, Firefox, and Waterfox browsers (macOS)
  • Authentication Support: Basic auth and custom headers
  • SSL Options: Disable SSL verification for development/testing
  • Confluence Integration: Special support for Confluence REST API
  • File Output: Save content to files
  • JSON Output: Machine-readable response format
  • Comprehensive Error Handling: Detailed error reporting and debugging

Installation

Requirements

  • Python 3.6+
  • macOS (for automatic cookie loading)

Dependencies

pip install requests

Download

Save the script as url_fetcher.py and make it executable:

chmod +x url_fetcher.py

Quick Start

# Basic URL fetch
python url_fetcher.py https://example.com

# Fetch with cookies from Chrome
python url_fetcher.py https://intranet.company.com --cookies chrome

# Fetch with authentication
python url_fetcher.py https://api.example.com -u username -p password

# Save to file
python url_fetcher.py https://example.com -o output.html

Usage

Basic Syntax

python url_fetcher.py <URL> [OPTIONS]

Command Line Options

Option Description
URL The URL to fetch (required)
-u, --username Username for authentication
-p, --password Password or API token
-o, --output Save content to file
-H, --header Custom header (format: "Key: Value")
-t, --timeout Request timeout in seconds (default: 30)
--no-ssl-verify Disable SSL certificate verification
--cookies Load cookies from browser (chrome, firefox, waterfox, auto)
--json Output response as JSON
--confluence Use Confluence API format

Examples

Basic Usage

# Simple GET request
python url_fetcher.py https://httpbin.org/get

# Fetch and save to file
python url_fetcher.py https://example.com -o page.html

# Get response as JSON
python url_fetcher.py https://api.github.com/users/octocat --json

Authentication

# Basic authentication
python url_fetcher.py https://protected-site.com -u admin -p secret123

# API token authentication
python url_fetcher.py https://api.example.com -u user -p api_token_here

# Custom authorization header
python url_fetcher.py https://api.example.com -H "Authorization: Bearer token123"

Cookie Loading (macOS only)

# Load cookies from Chrome
python url_fetcher.py https://intranet.company.com --cookies chrome

# Load cookies from Firefox
python url_fetcher.py https://confluence.company.com --cookies firefox

# Try all available browsers
python url_fetcher.py https://site.com --cookies auto

SSL and Development

# Disable SSL verification for self-signed certificates
python url_fetcher.py https://localhost:8443/api --no-ssl-verify

# Development server with cookies
python url_fetcher.py https://dev.company.com --cookies chrome --no-ssl-verify

Confluence Integration

# Fetch Confluence page using REST API
python url_fetcher.py https://company.atlassian.net/wiki/spaces/DOCS/pages/123456 \
  --confluence -u email@company.com -p api_token

# With cookies (if already logged in)
python url_fetcher.py https://confluence.company.com/pages/123456 \
  --confluence --cookies chrome

Advanced Usage

# Multiple custom headers
python url_fetcher.py https://api.example.com \
  -H "Accept: application/json" \
  -H "User-Agent: MyApp/1.0" \
  --json

# Complete example with all options
python url_fetcher.py https://secure-api.company.com/data \
  -u apiuser -p secret123 \
  --cookies chrome \
  --no-ssl-verify \
  -H "Accept: application/json" \
  -t 60 \
  -o response.json \
  --json

Cookie Support

The script automatically loads cookies from browser databases on macOS:

Supported Browsers

  • Google Chrome: ~/Library/Application Support/Google/Chrome/Default/Cookies
  • Firefox: ~/Library/Application Support/Firefox/Profiles/*/cookies.sqlite
  • Waterfox: ~/Library/Application Support/Waterfox/Profiles/*/cookies.sqlite

Cookie Features

  • Automatically finds default browser profiles
  • Filters out expired cookies
  • Safely handles locked databases (creates temporary copies)
  • Works with complex authentication systems
  • Supports session persistence

Cookie Usage Tips

  1. Auto Mode: Use --cookies auto to try all available browsers
  2. Corporate Networks: Great for accessing intranet sites you're already logged into
  3. Development: Perfect for testing authenticated endpoints
  4. Session Management: Maintains complex login states automatically

Output Formats

Standard Output

✅ Successfully fetched https://example.com
Status: 200
Content length: 1234 characters
Encoding: utf-8

==================================================
CONTENT:
==================================================
<!DOCTYPE html>...

JSON Output

{
  "url": "https://example.com",
  "status_code": 200,
  "success": true,
  "headers": {
    "content-type": "text/html"
  },
  "content": "<!DOCTYPE html>...",
  "content_length": 1234,
  "encoding": "utf-8"
}

Error Handling

The script provides detailed error information:

❌ Failed to fetch https://invalid-url.com
Error: HTTPSConnectionPool(host='invalid-url.com', port=443): 
Max retries exceeded with url: / (Caused by NameResolutionError(...))

Common error types:

  • Connection errors: Network issues, invalid URLs
  • Authentication errors: Wrong credentials, expired tokens
  • SSL errors: Certificate issues, self-signed certs
  • Timeout errors: Slow responses, unresponsive servers
  • HTTP errors: 404, 500, etc.

Security Considerations

SSL Verification

  • Default: SSL verification is enabled
  • Development: Use --no-ssl-verify only for trusted development environments
  • Warning: Disabling SSL verification removes protection against man-in-the-middle attacks

Cookie Security

  • Cookies are loaded from local browser databases
  • Script creates temporary copies to avoid conflicts
  • Only non-expired cookies are loaded
  • Secure and HttpOnly flags are preserved

Authentication

  • Passwords can be provided via environment variables
  • Use API tokens instead of passwords when possible
  • Credentials are not logged or stored

Troubleshooting 🔧

Common Issues

"No cookies found"

# Check which browsers are available
python url_fetcher.py --cookies auto https://example.com

SSL Certificate errors

# Disable SSL verification
python url_fetcher.py https://site.com --no-ssl-verify

Permission denied (cookies)

# Close browser and try again, or use different browser
python url_fetcher.py https://site.com --cookies firefox

Timeout errors

# Increase timeout
python url_fetcher.py https://slow-site.com -t 120

Debug Tips

  1. Use --json for machine-readable output
  2. Check browser cookie locations manually
  3. Verify URLs are accessible in browser first
  4. Test with simple URLs before complex ones
#!/usr/bin/env python3
"""
URL Fetcher Script
Fetches content from any URL, including Confluence pages.
Supports authentication for protected resources.
"""
import requests
from urllib.parse import urlparse
import argparse
import sys
import time
import json
from requests.auth import HTTPBasicAuth
import os
import sqlite3
import http.cookies
from pathlib import Path
import shutil
import tempfile
import platform
class URLFetcher:
def __init__(self, timeout=30, verify_ssl=True, load_cookies=None):
self.session = requests.Session()
self.timeout = timeout
self.verify_ssl = verify_ssl
# Disable SSL warnings if verification is disabled
if not verify_ssl:
import urllib3
urllib3.disable_warnings(urllib3.exceptions.InsecureRequestWarning)
# Load cookies if requested
if load_cookies:
self.load_browser_cookies(load_cookies)
def get_browser_cookie_paths(self):
"""Get default cookie database paths for different browsers on macOS"""
home = Path.home()
paths = {
'chrome': home / 'Library/Application Support/Google/Chrome/Default/Cookies',
'firefox': self._find_firefox_cookies(home),
'waterfox': self._find_waterfox_cookies(home)
}
# Filter out None values and non-existent paths
return {k: v for k, v in paths.items() if v and v.exists()}
def _find_firefox_cookies(self, home):
"""Find Firefox cookies.sqlite in the default profile"""
firefox_dir = home / 'Library/Application Support/Firefox/Profiles'
if not firefox_dir.exists():
return None
# Look for default profile (usually ends with .default or .default-release)
for profile_dir in firefox_dir.iterdir():
if profile_dir.is_dir() and ('.default' in profile_dir.name):
cookies_path = profile_dir / 'cookies.sqlite'
if cookies_path.exists():
return cookies_path
return None
def _find_waterfox_cookies(self, home):
"""Find Waterfox cookies.sqlite in the default profile"""
waterfox_dir = home / 'Library/Application Support/Waterfox/Profiles'
if not waterfox_dir.exists():
return None
# Look for default profile
for profile_dir in waterfox_dir.iterdir():
if profile_dir.is_dir() and ('.default' in profile_dir.name):
cookies_path = profile_dir / 'cookies.sqlite'
if cookies_path.exists():
return cookies_path
return None
def load_browser_cookies(self, browser_name):
"""Load cookies from specified browser"""
if platform.system() != 'Darwin':
print("⚠️ Cookie loading currently only supports macOS")
return
cookie_paths = self.get_browser_cookie_paths()
if browser_name == 'auto':
# Try all available browsers
for browser, path in cookie_paths.items():
print(f"🍪 Loading cookies from {browser}...")
try:
self._load_cookies_from_db(path, browser)
print(f"✅ Loaded cookies from {browser}")
except Exception as e:
print(f"❌ Failed to load cookies from {browser}: {e}")
else:
if browser_name not in cookie_paths:
available = list(cookie_paths.keys())
print(f"❌ {browser_name} cookies not found. Available: {available}")
return
path = cookie_paths[browser_name]
try:
self._load_cookies_from_db(path, browser_name)
print(f"✅ Loaded cookies from {browser_name}")
except Exception as e:
print(f"❌ Failed to load cookies from {browser_name}: {e}")
def _load_cookies_from_db(self, db_path, browser_type):
"""Load cookies from browser SQLite database"""
# Create a temporary copy since browsers may lock the database
with tempfile.NamedTemporaryFile(suffix='.sqlite', delete=False) as tmp_file:
shutil.copy2(db_path, tmp_file.name)
temp_db_path = tmp_file.name
try:
conn = sqlite3.connect(temp_db_path)
cursor = conn.cursor()
if browser_type in ['chrome']:
# Chrome/Chromium cookie schema
query = """
SELECT host_key, name, value, path, expires_utc, is_secure, is_httponly
FROM cookies
WHERE expires_utc > ? OR expires_utc = 0
"""
# Current time in Chrome's epoch (microseconds since 1601-01-01)
chrome_epoch_start = 11644473600000000 # microseconds
current_time = chrome_epoch_start + (int(time.time()) * 1000000)
cursor.execute(query, (current_time,))
elif browser_type in ['firefox', 'waterfox']:
# Firefox/Waterfox cookie schema
query = """
SELECT host, name, value, path, expiry, isSecure, isHttpOnly
FROM moz_cookies
WHERE expiry > ? OR expiry = 0
"""
current_time = int(time.time())
cursor.execute(query, (current_time,))
cookies = cursor.fetchall()
conn.close()
# Add cookies to session
for cookie_data in cookies:
if browser_type in ['chrome']:
host, name, value, path, expires, secure, httponly = cookie_data
domain = host.lstrip('.')
else: # firefox/waterfox
host, name, value, path, expires, secure, httponly = cookie_data
domain = host
# Create cookie
self.session.cookies.set(
name=name,
value=value,
domain=domain,
path=path or '/',
secure=bool(secure),
rest={'HttpOnly': bool(httponly)} if httponly else None
)
finally:
# Clean up temporary file
try:
os.unlink(temp_db_path)
except:
pass
def fetch_url(self, url, auth=None, headers=None, save_to_file=None):
"""
Fetch content from a URL
Args:
url (str): The URL to fetch
auth (tuple): Optional (username, password) for basic auth
headers (dict): Optional custom headers
save_to_file (str): Optional filename to save content
Returns:
dict: Response data including status, content, etc.
"""
try:
# Set default headers
default_headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36'
}
if headers:
default_headers.update(headers)
# Make the request
response = self.session.get(
url,
headers=default_headers,
auth=HTTPBasicAuth(*auth) if auth else None,
timeout=self.timeout,
allow_redirects=True,
verify=self.verify_ssl
)
# Prepare result
result = {
'url': url,
'status_code': response.status_code,
'success': response.status_code == 200,
'headers': dict(response.headers),
'content': response.text,
'content_length': len(response.content),
'encoding': response.encoding
}
# Save to file if requested
if save_to_file and result['success']:
with open(save_to_file, 'w', encoding='utf-8') as f:
f.write(response.text)
result['saved_to'] = save_to_file
return result
except requests.exceptions.RequestException as e:
return {
'url': url,
'success': False,
'error': str(e),
'error_type': type(e).__name__
}
def fetch_confluence_page(self, base_url, page_id, username=None, token=None):
"""
Fetch a specific Confluence page using the REST API
Args:
base_url (str): Confluence base URL (e.g., 'https://company.atlassian.net/wiki')
page_id (str): The page ID or title
username (str): Username for authentication
token (str): API token or password
Returns:
dict: Page content and metadata
"""
# Construct API URL
api_url = f"{base_url.rstrip('/')}/rest/api/content/{page_id}?expand=body.storage,version"
auth = (username, token) if username and token else None
headers = {'Accept': 'application/json'}
return self.fetch_url(api_url, auth=auth, headers=headers)
def main():
parser = argparse.ArgumentParser(description='Fetch content from any URL')
parser.add_argument('url', help='URL to fetch')
parser.add_argument('-u', '--username', help='Username for authentication')
parser.add_argument('-p', '--password', help='Password or API token')
parser.add_argument('-o', '--output', help='Save content to file')
parser.add_argument('-H', '--header', action='append', help='Custom header (format: "Key: Value")')
parser.add_argument('-t', '--timeout', type=int, default=30, help='Request timeout in seconds')
parser.add_argument('--no-ssl-verify', action='store_true', help='Disable SSL certificate verification')
parser.add_argument('--cookies', choices=['chrome', 'firefox', 'waterfox', 'auto'],
help='Load cookies from browser (auto tries all available)')
parser.add_argument('--json', action='store_true', help='Output response as JSON')
parser.add_argument('--confluence', action='store_true', help='Use Confluence API format')
args = parser.parse_args()
# Parse custom headers
headers = {}
if args.header:
for header in args.header:
if ':' in header:
key, value = header.split(':', 1)
headers[key.strip()] = value.strip()
# Create fetcher
fetcher = URLFetcher(timeout=args.timeout, verify_ssl=not args.no_ssl_verify,
load_cookies=args.cookies)
# Handle authentication
auth = None
if args.username:
password = args.password
if not password:
# Try to get password from environment or prompt
password = os.getenv('URL_FETCHER_PASSWORD')
if not password:
import getpass
password = getpass.getpass(f"Password for {args.username}: ")
auth = (args.username, password)
# Fetch the URL
if args.confluence:
# Extract base URL and page ID for Confluence
parsed = urlparse(args.url)
base_url = f"{parsed.scheme}://{parsed.netloc}"
page_id = parsed.path.split('/')[-1]
result = fetcher.fetch_confluence_page(base_url, page_id,
args.username, args.password)
else:
result = fetcher.fetch_url(args.url, auth=auth, headers=headers,
save_to_file=args.output)
# Output results
if args.json:
print(json.dumps(result, indent=2))
else:
if result['success']:
print(f"✅ Successfully fetched {result['url']}")
print(f"Status: {result['status_code']}")
print(f"Content length: {result['content_length']} characters")
print(f"Encoding: {result.get('encoding', 'unknown')}")
if args.output:
print(f"💾 Saved to: {args.output}")
else:
print("\n" + "="*50)
print("CONTENT:")
print("="*50)
print(result['content'])
else:
print(f"❌ Failed to fetch {result['url']}")
print(f"Error: {result.get('error', 'Unknown error')}")
sys.exit(1)
if __name__ == "__main__":
main()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment