Skip to content

Instantly share code, notes, and snippets.

The Evolution of Policy Optimization: Understanding GRPO, DAPO, and Dr. GRPO's Theoretical Foundations

Introduction

This article serves as the theoretical companion to "Bridging Theory and Practice: Understanding GRPO Implementation Details in Hugging Face's TRL Library." While the companion piece focuses on implementation specifics, here we'll explore the mathematical foundations and conceptual evolution of these cutting-edge reinforcement learning algorithms for language models.

I'll examine three key algorithms that represent the rapid advancement in this field:

  • GRPO (Group Relative Policy Optimization): The pioneering approach from DeepSeek that established a new paradigm for training reasoning capabilities in LLMs

  • DAPO (Decouple Clip and Dynamic sAmpling Policy Optimization): An open-source system that scaled reinforcement learning for LLMs while addressing key limitations in GRPO

@linoytsaban
linoytsaban / flux_with_cfg
Last active December 9, 2024 06:26
Flux with CFG and negative prompts
# download FluxCFGPipline
!wget https://raw.githubusercontent.com/linoytsaban/diffusers/refs/heads/dreambooth-lora-flux-exploration/examples/community/pipeline_flux_with_cfg.py
# load pipeline
import diffusers
import torch
from pipeline_flux_with_cfg import FluxCFGPipeline
pipe = FluxCFGPipeline.from_pretrained("black-forest-labs/FLUX.1-dev",
torch_dtype=torch.bfloat16)
@unixzii
unixzii / ForceEnablingXcodeLLM.md
Last active December 2, 2025 21:19
A guide to force enabling Xcode LLM feature on China-SKU Macs.

Introduction

Apple restricted the access to Xcode LLM (Predictive code completion) feature on China models of Mac. This guide provides a way to bypass that restriction. It's verified on macOS 15.0 Beta (24A5264n), but there is no guarentee that it will always work on later macOS versions.

Prerequisites

  • Xcode is installed and run at least once.
  • SIP debugging restrictions are disabled (via csrutil enable --without debug command in recovery mode).

Disclaimer

@ctlllll
ctlllll / longest_chinese_tokens_gpt4o.py
Created May 13, 2024 19:53
Longest Chinese tokens in gpt4o
import tiktoken
import langdetect
T = tiktoken.get_encoding("o200k_base")
length_dict = {}
for i in range(T.n_vocab):
try:
length_dict[i] = len(T.decode([i]))
except:
@Proteas
Proteas / xnu-4570.1.46-arm64-steps.txt
Created October 9, 2017 02:46
steps to build arm64 version of xnu-4570.1.46
Following are my steps to build the ARM64 version of xnu-4570.1.46, hope this is helpfull for saving time.
1. Use Xcode 9.0
2. Preparation is same as macOS, and there is a guide: https://0xcc.re/building-xnu-kernel-macosx-sierrra-10-12-x/
3. There is an ARM64 version libfirehose: https://github.com/Proteas/install_firehose_lib
4. Copy and edit the ARM64 config(CFLAGS, LDFLAGS) from darwin-on-arm/xnu to your target project
5. Example CFLAGS: -Darm64 -DARM64 -D__arm64__ -D__ARM64__ -DLP64 -DCONFIG_EMBEDDED -mkernel -DARM64_BOARD_CONFIG_T8011=1
6. Fix compiling stage errors by directly importing the missing headers or editing the code
7. Fix linking stage errors by implementing place holder funcitons for: chudxnu_cpu_alloc, etc
8. If missing symbol __divti3 in linking stage, get the runtime from llvm.
@nirizr
nirizr / idapython_get_stack_refs.py
Last active September 1, 2022 20:11
IDAPYTHON: List all references to all stack variables of a function
import idc, idaapi, idautils, ida_xref
def find_stack_members(func_ea):
members = {}
base = None
frame = idc.GetFrame(func_ea)
for frame_member in idautils.StructMembers(frame):
member_offset, member_name, _ = frame_member
members[member_offset] = member_name
if member_name == ' r':
@jaredcatkinson
jaredcatkinson / Get-InjectedThread.ps1
Last active October 14, 2025 02:45
Code from "Taking Hunting to the Next Level: Hunting in Memory" presentation at SANS Threat Hunting Summit 2017 by Jared Atkinson and Joe Desimone
function Get-InjectedThread
{
<#
.SYNOPSIS
Looks for threads that were created as a result of code injection.
.DESCRIPTION
@Arinerron
Arinerron / root.sh
Last active May 24, 2025 14:53
"Root" via dirtyc0w privilege escalation exploit (automation script) / Android (32 bit)
#!/bin/bash
# Give the usual warning.
clear;
echo "[INFO] Automated Android root script started.\n\n[WARN] Exploit requires sdk module \"NDK\".\nFor more information, visit the installation guide @ https://goo.gl/E2nmLF\n[INFO] Press Ctrl+C to stop the script if you need to install the NDK module. Waiting 10 seconds...";
sleep 10;
clear;
# Download and extract exploit files.
echo "[INFO] Downloading exploit files from GitHub...";
@mattifestation
mattifestation / wmi_provider_association.ps1
Last active August 16, 2022 05:14
Enumerates WMI providers, the DLLs that back the provider, and the classes hosted by the provider.
<#
Author: Matthew Graeber (@mattifestation)
License: BSD 3-Clause
#>
function Get-WmiNamespace {
[OutputType([String])]
Param (
[String]
[ValidateNotNullOrEmpty()]
@alirobe
alirobe / reclaimWindows10.ps1
Last active December 6, 2025 06:24
This Windows 10 Setup Script turns off a bunch of unnecessary Windows 10 telemetery, bloatware, & privacy things. Not guaranteed to catch everything. Review and tweak before running. Reboot after running. Scripts for reversing are included and commented. Fork of https://github.com/Disassembler0/Win10-Initial-Setup-Script (different defaults). N.…
###
###
### UPDATE: For Win 11, I recommend using this tool in place of this script:
### https://christitus.com/windows-tool/
### https://github.com/ChrisTitusTech/winutil
### https://www.youtube.com/watch?v=6UQZ5oQg8XA
### iwr -useb https://christitus.com/win | iex
###
### OR take a look at
### https://github.com/HotCakeX/Harden-Windows-Security