I spent weeks training a single PPO agent to make portfolio decisions. It worked well in calm markets. Then VIX spiked from 14 to 22 and the model froze — its learned policy didn't generalize to a volatility regime it hadn't trained in.
This isn't a novel observation. Any quantitative researcher who's deployed a single model to trade across regimes has hit this wall. The issue isn't the algorithm. It's that markets have structural breaks — volatility regimes, sector rotations, macro shocks — and no single model captures all of them.



