Duke University - Durham, North Carolina
2024-02-28, 10:05 - 11:20 AM ET
Class visit: what is investigative data journalism?
NICAR 2024 - Baltimore, Maryland
2024-03-08, 9:00 - 10:00 AM ET
Workshop: Finding and using undocumented APIs
2023-07-19
This is my last week at The Markup. It’s been a true privilege to practice and produce impactful hypothesis-driven journalism with first-class journalists over the past four years.
In year one of publication, Adrianne Jeffries, Sam Morris, Evelyn Larrubia and I measured Google’s self-preferential search results using a method adapted from the life sciences. Our findings were cited in congressional hearing on Big Tech and antitrust.
Aaron Sankin, Sam Morris, Evelyn Larrubia and I found that Google blocked advertisers from finding YouTube videos related to Black Lives Matter and other [social justice phrases](https://themarkup.org/google-the-giant/2021/04/09/google-blocks-advertisers-from-targeting-black-lives-mat
Tow Tea @ The Tow Center - New York, New York
2023-02-17, 5:00 - 6:30 PM ET
Workshop: Finding and using undocumented APIs
Net Inclusion - San Antonio, Texas
2023-03-01, 2:30 - 3:30 PM CT
Panel: Advancing Digital Inclusion Data Quality, Tools, and Applications
Co-paneling with David Keyes, Christine Parker, and Ryan Palmer
| numpy | |
| tqdm | |
| pdf2image | |
| opencv-python | |
| pytesseract | |
| Pillow |
Machine Bias - Julia Angwin, Jeff Larson, Surya Mattu and Lauren Kirchner (2016)
Gender Shades - Joy Buolamwini and Timnit Gebru (2018)
Automating Inequality: How High-Tech Tools Profile, Police, and Punish the Poor - Virginia Eubanks (2018)
How We Analyzed Google's Search Results - Leon Yin and Adrianne Jeffries (2020)
| def value_counts(df: pd.DataFrame, | |
| col: str, | |
| *args, **kwargs) -> pd.DataFrame: | |
| """ | |
| For a DataFrame (`df`): display normalized (percentage) | |
| `value_counts(normalize=True)` and regular counts | |
| `value_counts()` for a given `col`. | |
| """ | |
| count = df[col].value_counts(*args, **kwargs).to_frame(name='count') | |
| perc = df[col].value_counts(normalize=True, *args, **kwargs) \ |
| import json | |
| fn = 'notebook.ipynb' | |
| notebook = json.load(open(fn)) | |
| notebook.keys() | |
| for cell in notebook['cells']: | |
| if cell['cell_type'] == "markdown": | |
| for sent in cell['source']: | |
| if sent == '\n': |
| """ | |
| A simple script to make a Markdown table for a data dictionary (assumes you just have a column name and description). | |
| """ | |
| import pandas as pd | |
| col2description = { | |
| "Name": "What you can call me", | |
| "Id": "The identifier", | |
| "Nickname": "Do you have to ask?" |