Skip to content

Instantly share code, notes, and snippets.

View palewire's full-sized avatar

Ben Welsh palewire

View GitHub Profile
@palewire
palewire / README.md
Last active December 2, 2025 14:28
Python Parallelization Example

Python Parallelization Example

This script demonstrates how to upload multiple PDF documents to DocumentCloud in parallel across your CPU cores using Python. It reads a list of PDF URLs from a CSV file, uploads each document to DocumentCloud, and saves the resulting document IDs back to the CSV.

It was created to demonstrate the use of the parallelize function for concurrent processing, which can significantly speed up tasks. In practice, our data team at Reuters uses this function to spread tens of thousands of operations across dozens of computer cores.

A more basic example would be something like the following:

from main import parallelize
def stochastic_oscillator(
series: pd.Series,
sample_window: int = 200,
smoothing_window: int = 3,
) -> pd.Series:
"""Calculate the stochastic oscillator.
Args:
series: The series to calculate the oscillator for. Must be in chronological order.
sample_window: The window to sample over. Default is 200.
@palewire
palewire / new-school.ipynb
Created October 15, 2024 16:28
"New School" LLM Classifier
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@palewire
palewire / old-school.ipynb
Created October 15, 2024 16:25
"Old School" Machine Learning Classifier
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@palewire
palewire / athena.py
Last active April 2, 2024 23:31
Python helpers for running Amazon Athena queries
"""Utilities for working Amazon Athena.
Example:
To run a query and get the results as a pandas DataFrame:
>>> query = "SELECT * FROM my_table"
>>> df = get_df_from_athena(query)
>>> df.head()
@palewire
palewire / README.md
Last active December 18, 2022 10:39
Install The Tor Project's snowflake-proxy on a Raspberry Pi

Your Raspberry Pi can join The Tor Project network that helps Russians read censored news sites, including Twitter.

These are all the commands necessary to spin up a Snowflake proxy that gives Tor users a way around government attempts to block access.

You can also run a Snowflake proxy from your web browser with Firefox or Chrome. Standing it up on your Raspberry Pi is a way to support the system 24 hours a day. And, unlike other Tor server setups, it doesn't require a static IP address.

@palewire
palewire / numoji.py
Last active March 14, 2022 11:51
numoji.py
def numoji(number):
"""Convert a number into a series of emojis.
Args:
number (int): The number to convert into emoji
Returns: Am emoji string
"""
# Convert the provided number to a string
s = str(number)
@palewire
palewire / README.md
Last active February 28, 2022 17:51
U.S. Earthquake Risk in 3D

U.S. Earthquake Risk in 3D

[
{
"case_number":"2017-04514",
"slug":"eddie-rosendo-lino",
"first_name":"Eddie",
"middle_name":"Rosendo",
"last_name":"Lino",
"death_date":"2017-06-18T00:00:00.000Z",
"death_year":2017,
"age":23.0,
@palewire
palewire / README.md
Last active May 24, 2023 18:09
How to deploy a Prefect agent to Google Kubernetes Engine

How to deploy a Prefect agent to Google Kubernetes Engine

This post contains code and commands you can use to deploy Prefect agents to Google Cloud’s Google Kubernetes Engine. The agents stand ready to execute workflows triggered by Prefect projects. One agent can run tasks from multiple projects.

The example here demonstrates how to create a single agent with minimal customization. It is configured with a Dockerfile, which installs necessary dependencies, and a k8s.cfg file, which connects the system to a Prefect account.

Agents are deployed via the gcloud command-line utility and its kubectl extension. Proper permissions within a project on Google Cloud are required.

Getting started