Skip to content

Instantly share code, notes, and snippets.

@MichelNivard
MichelNivard / FastBlockJackKnife.R
Created February 25, 2026 19:49
Fast block jackknife for OLS (based on LD score regression)
# Fast block jackknife for linear regression (the LDSC trick):
# compute block-wise XtX and Xty once, then get all 200 leave-one-block-out fits
# by subtracting 1 block at a time and re-solving.
block_jackknife_lm <- function(X, y, n_blocks = 200L, separators = NULL) {
X <- as.matrix(X)
y <- matrix(as.numeric(y), ncol = 1)
n <- nrow(X); p <- ncol(X)
stopifnot(nrow(y) == n, ncol(y) == 1, p <= n)
import os
import numpy as np
import sidechainnet as scn
from denoising_diffusion_pytorch import Unet, GaussianDiffusion
import torch
from torch.utils.data import Dataset, DataLoader
import torchvision.transforms as T
from PIL import Image
from pathlib import Path
from tqdm import tqdm
# Full training script for protein contact map diffusion model
# Using LucidRain's denoising-diffusion-pytorch (grayscale input)
from denoising_diffusion_pytorch import Unet, GaussianDiffusion, Trainer
import matplotlib.pyplot as plt
import numpy as np
model = Unet(
# t-test vs wilcox vs ordinal
library(tidyverse)
library(multidplyr) # parallelize
library(rms) # ordinal regression
sample_size = 500 # N
# genarate paired sets and calculate p-values with different techniques
## The problem with (personal) non-ordinality:
n <- 30000 # 10k fictional people
# a pair of exposures, no measurmennt issues:
xa <- rnorm(n)
xb <- rnorm(n)
# Personal threshold 1 - 4 for each person,
# reasonable scale design I think by which I mean the bins fill up sort of "normal" like this matters a lot!!
require(MASS)
# fixed a2 and e2 for the entire script:
a <- .87 # additive genetic variance
e <- .13 # environmental variance
# make ZM covariance, i.e. the cov is a, the var = 1
sigma_mz <- matrix(c(1,a,a,1),2,2)
# sample size (is big becuase rare traits)
@MichelNivard
MichelNivard / openllama-3b-bnb-4bit-training.ipynb
Created June 12, 2023 07:24
OpenLlama 3b bnb-4bit-training.ipynb
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
import torch
from transformers import GPT2Tokenizer, GPT2LMHeadModel
# Load the tokenizer and model
tokenizer = GPT2Tokenizer.from_pretrained('gpt2')
model = GPT2LMHeadModel.from_pretrained('./results')
# Set the device to GPU if available, otherwise use CPU
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model.to(device)
import torch
from torch.utils.data import Dataset, DataLoader
from transformers import GPT2Tokenizer, GPT2LMHeadModel, TextDataset, DataCollatorForLanguageModeling, Trainer, TrainingArguments
# Set the path to the text file to fine-tune on
path_to_file = "path/to/text/file.txt"
# Load the tokenizer and model
tokenizer = GPT2Tokenizer.from_pretrained('gpt2')
model = GPT2LMHeadModel.from_pretrained('gpt2')
@MichelNivard
MichelNivard / Example _transcript.md
Last active March 6, 2023 11:52
Example long training data

Speaker 0:

You wrote a piece a follow-up piece to your oral history titled, there is no replacement for black Twitter. I think back in November, What do you think we lose if we lose black Twitter? Tell

Speaker 1:

me not to meet your Mac, but we lose everything. I'm John Favreau. Welcome to offline.

Speaker 0: