Back to Team
First-year PhD student on Reinforcement Learning (with diffusion models), Trust Region methods, dexterous hands and recently RLVR in large language modeling.
First-year PhD student on Reinforcement Learning (with diffusion models), Trust Region methods, dexterous hands and recently RLVR in large language modeling.