Painless Activation Steering (PAS): Automated, Lightweight Post-Training for LLM Behavior
Introduction
Reproduced with permission from Sasha Cui
We're releasing "Painless Activation Steering (PAS)," a fully automated approach to steer large language models after training—without modifying weights and without hand-crafted prompt pairs or labor-intensive feature labeling. Post-training options often trade precision for convenience (prompting) or cost for control (fine-tuning, RL). PAS constructs a fast, lightweight activation vector you can train cheaply, store easily, and toggle on demand—bringing controllable steering closer to a "plug-and-play" workflow.