John Miller Johnrobmiller

## RLHF.md

      
              1 file
            
          
              9 forks
            
          
                39 comments
              
            
              128 stars
            
          
                JoaoLages
                / RLHF.md
            
            
              Last active
              January 6, 2026 11:28
            
              
                Reinforcement Learning from Human Feedback (RLHF) - a simplified explanation 
              
          
    Maybe you've heard about this technique but you haven't completely understood it, especially the PPO part. This explanation might help.
We will focus on text-to-text language models 📝, such as GPT-3, BLOOM, and T5. Models like BERT, which are encoder-only, are not addressed.
Reinforcement Learning from Human Feedback (RLHF) has been successfully applied in ChatGPT, hence its major increase in popularity. 📈
RLHF is especially useful in two scenarios 🌟:

You can’t create a good loss function

Example: how do you calculate a metric to measure if the model’s output was funny?


You want to train with production data, but you can’t easily label your production data


## wgsl_noise.md

      
              1 file
            
          
              6 forks
            
          
                5 comments
              
            
              77 stars
            
          
                munrocket
                / wgsl_noise.md
            
            
              Last active
              February 24, 2026 22:02
            
              
                WGSL Noise Algorithms
              
          
    WGSL Noise Algorithms

Good and fast integer hash

// https://www.pcg-random.org/
fn pcg(n: u32) -> u32 {
    var h = n * 747796405u + 2891336453u;
    h = ((h >> ((h >> 28u) + 4u)) ^ h) * 277803737u;
    return (h >> 22u) ^ h;
}