Reinforcement Learning

Our group works on reinforcement learning algorithms, where the goal is to optimize the parameters of a policy purely based on environment interaction. This learning paradigm is specifically interesting for robotic use cases because the robot can potentially learn solving problems autonomously. Our group focuses on several aspects in reinforcement learning.

One key question is how to represent the policy in RL in order to acquire skills that can solve complex tasks? A look into advances in supervised learning show that we need reinforcement learning algorithms that can train complex policy representations. These policy representations also include diffusion/flow based policies that have proven very successful in other supervised learning fields and are mostly part of big models such as in VLAs, but training them in the context of reinforcement learning is not straightforward.

Training these policies also requires improved exploration behavior in order to generate high-quality data points during training. One promising option for improved exploration is generating time-correlated actions during the exploration phase using action chunking, or motion primitives. The underlying policies have a drastically increased action space making the optimization in general more complex, which also requires specific RL algorithms for efficient training.

A very promising approach towards generalists policies is based on fine tuning existing foundation-model based policy representations. Vision-Language-Action (VLAs) policies have the potential to solve tasks where the instruction is text-based easying communication with humans. We believe that RL is a key approach in this field to enable robots adapting their generalist policies using reinforcement learning such that they can learn solving a task that is not part of their supervised training data set.

Key Areas

Complex policy representations for reinforcement learning
Time-correlated action selection using action chunking and motion primitives
Fine-tuning of foundation models using reinforcement learning

Members

Onur Celik

Reinforcement Learning

Tai Hoang

Reinforcement learning and physical modelling with graphs.

Max Nagy

FZI

Reinforcement Learning, Real World Robotics, Heavy Machines

Emiliyan Gospodinov

Vision-Language-Action Models, Reinforcement Learning

Weiran Liao

Reinforcement Learning, Imitation Learning

Serge Thilges

Reinforcement Learning, Diffusion Models, Dexterous Hands.

Andreas Boltres

SAP

Computer Networking, Geometric Deep Learning, Multi-Agent Reinforcement Learning

Huy Le

Bosch

Robot Manipulation, Diffusion Policies, and Massively Parallel Reinforcement Learning

Publications

2025

Scaffolding Dexterous Manipulation with Vision-Language Models

Vincent de Bakker, Joey Hejna, Tyler Ga Wei Lum, Onur Celik, Aleksandar Taranovic, Denis Blessing, Gerhard Neumann, Jeannette Bohg, Dorsa Sadigh

Preprint · 2025

PDF

TROLL: Trust Regions improve Reinforcement Learning for Large Language Models

Philipp Becker, Niklas Freymuth, Serge Thilges, Fabian Otto, Gerhard Neumann

Preprint · 2025

PDF

Dime: Diffusion-based maximum entropy reinforcement learning

Onur Celik, Zechu Li, Denis Blessing, Ge Li, Daniel Palenicek, Jan Peters, Georgia Chalvatzaki, Gerhard Neumann

ICML · 2025

Geometry-aware RL for Manipulation of Varying Shapes and Deformable Objects

Tai Hoang, Huy Le, Philipp Becker, Vien Anh Ngo, Gerhard Neumann

ICLR · 2025 Oral

PDF

Enhancing Exploration With Diffusion Policies in Hybrid Off-Policy RL: Application to Non-Prehensile Manipulation

Huy Le, Tai Hoang, Miroslav Gabriel, Gerhard Neumann, Ngo Anh Vien

IEEE Robotics and Automation Letters · 2025

Vlearn: Off-Policy Learning with Efficient State-Value Function Estimation

Fabian Otto, Philipp Becker, Vien Anh Ngo, Gerhard Neumann

ICLR · 2025

Chunking the critic: A transformer-based soft actor-critic with N-step returns

Dong Tian, Ge Li, Hongyi Zhou, Onur Celik, Gerhard Neumann

Preprint · 2025

2024

Beyond Shortest-Paths: A Benchmark for Reinforcement Learning on Traffic Engineering

Andreas Boltres, Niklas Freymuth, Patrick Jahnke, Gerhard Neumann

· 2024

Learning Sub-Second Routing Optimization in Computer Networks requires Packet-Level Dynamics

Andreas Boltres, Niklas Freymuth, Patrick Jahnke, Holger Karl, Gerhard Neumann

Preprint · 2024

Acquiring Diverse Skills using Curriculum Reinforcement Learning with Mixture of Experts

Onur Celik, Aleksandar Taranovic, Gerhard Neumann

ICML · 2024

PointPatchRL--Masked Reconstruction Improves Reinforcement Learning on Point Clouds

Balázs Gyenes, Nikolai Franke, Philipp Becker, Gerhard Neumann

CORL · 2024

Robust Black-Box Optimization for Stochastic Search and Episodic Reinforcement Learning

Maximilian Hüttenrauch, Gerhard Neumann

JMLR · 2024

Open the Black Box: Step-based Policy Updates for Temporally-Correlated Episodic Reinforcement Learning

Ge Li, Hongyi Zhou, Dominik Roth, Serge Thilges, Fabian Otto, Rudolf Lioutikov, Gerhard Neumann

ICLR · 2024

Top-erl: Transformer-based off-policy episodic reinforcement learning

Ge Li, Dong Tian, Hongyi Zhou, Xinkai Jiang, Rudolf Lioutikov, Gerhard Neumann

Preprint · 2024

Efficient Off-Policy Learning for High-Dimensional Action Spaces

Fabian Otto, Philipp Becker, Ngo Anh Vien, Gerhard Neumann

Preprint · 2024

2023

Reinforcement Learning from Multiple Sensors via Joint Representations

Philipp Becker, Sebastian Markgraf, Fabian Otto, Gerhard Neumann

Reinforcement Learning Conference (RLC) · 2023

Swarm Reinforcement Learning for Adaptive Mesh Refinement

Niklas Freymuth, Philipp Dahlinger, Tobias Daniel Würth, Simon Reisch, Luise Kärger, Gerhard Neumann

NeurIPS · 2023

PDF

MP3: Movement Primitive-Based (Re-) Planning Policy

Fabian Otto, Hongyi Zhou, Onur Celik, Ge Li, Rudolf Lioutikov, Gerhard Neumann

Preprint · 2023

2022

On Uncertainty in Deep State Space Models for Model-Based Reinforcement Learning

Philipp Becker, Gerhard Neumann

TMLR · 2022

Specializing Versatile Skill Libraries using Local Mixture of Experts

Onur Celik, Dongzhuoran Zhou, Ge Li, Philipp Becker, Gerhard Neumann

CORL · 2022

Deep Black-Box Reinforcement Learning with Movement Primitives

Fabian Otto, Onur Celik, Hongyi Zhou, Hanna Ziesche, Ngo Anh Vien, Gerhard Neumann

CORL · 2022

Push-to-See: Learning Non-Prehensile Manipulation to Enhance Instance Segmentation via Deep Q-Learning

BARIS SERHAN, HARIT PANDYA, AYSE KUCUKYILMAZ, GERHARD NEUMANN

ICRA · 2022

2021

Differentiable Trust Region Layers for Deep Reinforcement Learning

Fabian Otto, Philipp Becker, Ngo Anh Vien, Hanna Carolin Ziesche, Gerhard Neumann

ICLR · 2021

Residual Feedback Learning for Contact-Rich Manipulation Tasks with Uncertainty

Alireza Ranjbar, Ngo Anh Vien, Hanna Ziesche, Joschka Boedecker, Gerhard Neumann

Preprint · 2021