Research

Publications

Conferences

Efficient Online Reinforcement Learning Fine-Tuning Need Not Retain Offline Data

deep reinforcement learning fine-tuning

Zhiyuan Zhou*, Andy Peng*, Qiyang Li, Sergey Levine, Aviral Kumar,
International Conference on Learning Representations (ICLR), 2025
[paper] [website] [code]

Can we finetune policies and values from offline RL *without retaining the offline data*? Current methods require keeping the offline data for stability and performance, but this make RL hard to scale up when the offline dataset gets bigger and bigger. Turns out a simple recipe, Warm-start RL, is able to finetune rapidly without data retention!