Calvin McCarter calvinmccarter

## hallucinate.py
import modal

def download_boltz2():
    from mosaic.models.boltz2 import Boltz2
    Boltz2()


### Build modal image: install mosaic + deps and download boltz2 model.
image = (
    modal.Image.debian_slim(python_version="3.12")

## rl-wrong-about-rewards.md

      
              1 file
            
          
              0 forks
            
          
                0 comments
              
            
              16 stars
            
          
                yoavg
                / rl-wrong-about-rewards.md
            
            
              Last active
              February 24, 2026 09:54
            
          
    Computer-science Reinforcement Learning got Rewards Wrong

In a recent blog post, Ben Recht described the Reinforcement Learning (RL) setup as:

Paraphrasing Thorndike’s Law of Effect, Lior defines reinforcement learning as the iterative process:

Receive external validation on how good you’re currently doing
Adjust what you’re currently doing so that you are better the next time around.

Whether or not this is how humans or animals learn, this is a spot-on definition of computer scientific reinforcement learning.


## nanochat_simple_rl.py
"""
Simple RL training script for teaching a model to add.
Demonstrates REINFORCE and GRPO algorithms in a minimal implementation.

If you want to run this script, put it inside of nanochat/scripts/ and run it with:
python -m scripts.simple_rl

First add "matplotlib>=3.9.0" to pyproject.toml and run 'uv sync'

I wrote a separate script to download the weights for the model:

## Transcript.txt
Hi, everybody. I'm Nikolaj Tangen of the Norwegian Southern Wealth Fund. And today we are hosting an investor legend, Paul Singer, who founded Elliott Asset Management and probably the most important activist investor in the world. Paul, warm welcome. Thank you.

What is activist investing? Activist investing is taking a position largely in an equity security of a company and trying to engage with the company to improve outcomes, control or influence outcomes, better outcomes to unlock value. It could be management changes that are requested. It could be capital structure changes, finance strategies and tactics. Anything that will make the company  earn more money, be better positioned, more rationally deploy assets.

Why do you have to do this? Don't companies do this themselves? Well, as you know, the trend away from active investing, and by active investing, I don't necessarily mean activist. Active investing just means you open the mail from the company in which you invest, you try to figure it out, you t

## twohot.md

      
              1 file
            
          
              0 forks
            
          
                0 comments
              
            
              1 star
            
          
                wassname
                / twohot.md
            
            
              Last active
              July 22, 2025 00:38
            
              
                two-hot encoding notes
              
          
    What is two-hot encoding?

Description

Two hot encoding was introduced in 2017 in "Marc G Bellemare et all "A distributional perspective on
reinforcement learning" but the clearest description is in the 2020 paper "Dreamer-v3" by Danijar Hafner et al.) where it is used for reward and value distributions.

two-hot encoding is a generalization of onehot encoding to continuous values. It produces a vector of length |B| where all elements are 0 except for the two entries closest to the encoded continuous number, at positions k and k + 1. These two entries sum up to 1, with more weight given to the entry that is closer to the encoded number

Code samples


## FlatCnnLayer.py
import torch
import torch.nn as nn
import torch.nn.init as init

dropout_prob = 0.5


class FlatCnnLayer(nn.Module):
    def __init__(self, embedding_size, sequence_length, filter_sizes=[3, 4, 5], out_channels=128):
        super(FlatCnnLayer, self).__init__()

## text.py
# -*- coding: utf-8 -*-
# Authors: Olivier Grisel <olivier.grisel@ensta.org>
# Mathieu Blondel <mathieu@mblondel.org>
# Lars Buitinck <L.J.Buitinck@uva.nl>
# Robert Layton <robertlayton@gmail.com>
#          Jochen Wersdörfer <jochen@wersdoerfer.de>
#          Roman Sinayev <roman.sinayev@gmail.com>
#
# License: BSD 3 clause
"""
	import modal

	def download_boltz2():
	from mosaic.models.boltz2 import Boltz2
	Boltz2()


	### Build modal image: install mosaic + deps and download boltz2 model.
	image = (
	modal.Image.debian_slim(python_version="3.12")
	"""
	Simple RL training script for teaching a model to add.
	Demonstrates REINFORCE and GRPO algorithms in a minimal implementation.

	If you want to run this script, put it inside of nanochat/scripts/ and run it with:
	python -m scripts.simple_rl

	First add "matplotlib>=3.9.0" to pyproject.toml and run 'uv sync'

	I wrote a separate script to download the weights for the model:
	Hi, everybody. I'm Nikolaj Tangen of the Norwegian Southern Wealth Fund. And today we are hosting an investor legend, Paul Singer, who founded Elliott Asset Management and probably the most important activist investor in the world. Paul, warm welcome. Thank you.

	What is activist investing? Activist investing is taking a position largely in an equity security of a company and trying to engage with the company to improve outcomes, control or influence outcomes, better outcomes to unlock value. It could be management changes that are requested. It could be capital structure changes, finance strategies and tactics. Anything that will make the company earn more money, be better positioned, more rationally deploy assets.

	Why do you have to do this? Don't companies do this themselves? Well, as you know, the trend away from active investing, and by active investing, I don't necessarily mean activist. Active investing just means you open the mail from the company in which you invest, you try to figure it out, you t
	import torch
	import torch.nn as nn
	import torch.nn.init as init

	dropout_prob = 0.5


	class FlatCnnLayer(nn.Module):
	def __init__(self, embedding_size, sequence_length, filter_sizes=[3, 4, 5], out_channels=128):
	super(FlatCnnLayer, self).__init__()
	# -- coding: utf-8 --
	# Authors: Olivier Grisel <olivier.grisel@ensta.org>
	# Mathieu Blondel <mathieu@mblondel.org>
	# Lars Buitinck <L.J.Buitinck@uva.nl>
	# Robert Layton <robertlayton@gmail.com>
	# Jochen Wersdörfer <jochen@wersdoerfer.de>
	# Roman Sinayev <roman.sinayev@gmail.com>
	#
	# License: BSD 3 clause
	"""