Publications
Preprints
|
Efficient Online Reinforcement Learning Fine-Tuning Need Not Retain Offline Data
deep reinforcement learning
fine-tuning
Zhiyuan Zhou*,
Andy Peng*,
Qiyang Li,
Sergey Levine,
Aviral Kumar,
arXiv preprint, 2024 [paper] [website] [code]
Can we finetune policies and values from offline RL *without retaining the offline data*? Current methods require keeping the offline data
for stability and performance, but this make RL hard to scale up when the offline dataset gets bigger and bigger. Turns out a simple receipe, Warm-start RL, is able to finetune rapidly without data retention! |