Skip to content

Instantly share code, notes, and snippets.

View philerooski's full-sized avatar

Phil Snyder philerooski

  • San Francisco, CA
View GitHub Profile
@philerooski
philerooski / verify_load_snapshot.py
Created February 24, 2026 19:32
A helper script to verify that data loaded by `load_snapshot_data.py` looks as expected
"""
Verify snapshot data load integrity.
This script checks:
1. LOAD_LOG table for any errors or anomalies
2. Compares tables between PROD_576 and PROD_568 schemas
3. Validates record counts (PROD_576 should have more records)
Usage:
python verify_snapshot_load.py --database SYNAPSE_RDS_SNAPSHOT \\
@philerooski
philerooski / download_all_forms.py
Created February 13, 2026 18:54
Download all form data from a specific form group to local directory
#!/usr/bin/env python3
"""
Download all form data from a specific form group to local directory.
"""
import argparse
import sys
import json
import tempfile
import shutil
@philerooski
philerooski / load_snapshot_data.py
Last active February 24, 2026 19:49
The latest version of the script used to load RDS snapshot data into Snowflake
"""
Load RDS snapshot data from S3 via Snowflake external stage into tables.
This script supports two modes:
1. Bootstrap mode (--bootstrap-stack): Creates a new schema, external stage,
file format, and grants privileges before loading data.
2. Manual mode: Loads data into an existing schema with pre-configured stage.
The script dynamically discovers all data types from the S3 stage URL,
creates tables using INFER_SCHEMA from Parquet files, loads the data,
"""
Analyze errors from the LOAD_LOG table.
This script queries the LOAD_LOG table for failed operations,
categorizes errors by type, and groups data types by error category.
This is a complementary script to https://gist.github.com/philerooski/a740b25f066f1ad205344637160aa969
"""
import snowflake.connector
"""
Load snapshot data from Snowflake stage into tables.
This script processes prefixes from PREFIX_LIST table, derives table names,
creates tables using INFER_SCHEMA, and logs all operations to LOAD_LOG.
See `--help` for optional parameter `--only-affected`
"""
import snowflake.connector
@philerooski
philerooski / create_confluence_pages.py
Created June 25, 2025 17:12
A rough draft of a script which pulls Snowflake table/column comments into Confluence
import os
import random
import logging
import argparse
import toml
import snowflake.connector
from atlassian import Confluence
# Configure logging
logging.basicConfig(level=logging.INFO, format='%(asctime)s [%(levelname)s] %(message)s')
### Does not work
expectation_suite_name = "my_expectation_suite"
checkpoint_name = "my_checkpoint"
context = gx.get_context()
# # Initialize expectation suite
def init_expectation_suite():
expectation_suite = context.add_expectation_suite(
"""
Run this script from within the unzipped directory `JMV_fitbit_dta`
Download zipped data here: https://www.synapse.org/Synapse:syn62667431
"""
import pandas as pd
import json
import os
"""
A script which uploads validation results and a data validation
report to S3 for the FitbitSleepLogs data type. This was run in
Glue 4.0 while specifying --additional-python-modules great_expectations==0.18.11,boto3==1.24.70
"""
import json
import logging
import os
import subprocess
import sys
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.