Tag
#rlaif
2 posts tagged rlaif.
- deep-dive
Constitutional AI Explained: How Principle-Based Training Builds Safer Models
Constitutional AI replaces human harm labels with a written set of principles and AI self-critique. Here is how the method works, where it sits in your
- deep-dive
KV Cache Compression Is Now an Alignment Problem
A new preprint argues that compressing KV cache during RL rollouts silently biases the policy you ship. For teams treating RLHF as a defensive control