Publications

Preprints

Efficient Online Reinforcement Learning Fine-Tuning Need Not Retain Offline Data

deep reinforcement learning fine-tuning
Zhiyuan Zhou*, Andy Peng*, Qiyang Li, Sergey Levine, Aviral Kumar,
arXiv preprint, 2024
[paper] [website] [code]

Can we finetune policies and values from offline RL *without retaining the offline data*? Current methods require keeping the offline data for stability and performance, but this make RL hard to scale up when the offline dataset gets bigger and bigger. Turns out a simple receipe, Warm-start RL, is able to finetune rapidly without data retention!