BeamDojo: Learning Agile Humanoid Locomotion on Sparse Footholds

Overview

Stepping Stones (Forward)

*Stones are randomly distributed, each 20 cm in size (≈ feet length), with a maximum distance of 45 cm and an average distance of 35 cm.

Stepping Stones (Backward)

Balancing Beams

*The beam width is 20 cm.

Zero-shot Transfer

Gap

Mixed Terrain

*BeamDojo showcases zero-shot transfer to gaps and stepping beams, and demonstrates robustness to missteps. The gap width is 50 cm.

Robustness Test

6kg Payload (≈ Torso Mass)

External Force

Simulation Experiments

Abstract

Traversing risky terrains with sparse footholds poses a significant challenge for humanoid robots, requiring precise foot placements and stable locomotion. Existing learning-based approaches often struggle on such complex terrains due to sparse foothold rewards and inefficient learning processes. To address these challenges, we introduce BeamDojo, a reinforcement learning (RL) framework designed for enabling agile humanoid locomotion on sparse footholds. BeamDojo begins by introducing a sampling-based foothold reward tailored for polygonal feet, along with a double critic to balancing the learning process between dense locomotion rewards and sparse foothold rewards. To encourage sufficient trial-and-error exploration, BeamDojo incorporates a two-stage RL approach: the first stage relaxes the terrain dynamics by training the humanoid on flat terrain while providing it with task-terrain perceptive observations, and the second stage fine-tunes the policy on the actual task terrain. Moreover, we implement a onboard LiDAR-based elevation map to enable real-world deployment. Extensive simulation and real-world experiments demonstrate that BeamDojo achieves efficient learning in simulation and enables agile locomotion with precise foot placement on sparse footholds in the real world, maintaining a high success rate even under significant external disturbances.

Framework

(a) Training in Simulation. BeamDojo incorporates a two-stage RL approach.

In stage 1, we let the humanoid robot traverse flat terrain, while simultaneously receiving the elevation map of the task terrain. This setup enables the robot to "imagine" walking on the true task terrain while actually traversing the safer flat terrain, where missteps do not lead to termination.
Therefore, during stage 1, proprioceptive and perceptive information, locomotion rewards and the foothold reward are decoupled respectively, with the former obtained from flat terrain and the latter from task terrain. The double-critic module separately learns two reward groups.
In stage 2, the policy is fine-tuned on the task terrain, utilizing the full set of observations and rewards. The double-critic module undergoes a deep copy.

(b) Deployment. The robot-centric elevation map, reconstructed using LiDAR data, is combined with proprioceptive information to serve as the input for the actor.

BibTeX

@inproceedings{wang2025beamdojo,
  title     = {BeamDojo: Learning Agile Humanoid Locomotion on Sparse Footholds},
  author    = {Wang, Huayi and Wang, Zirui and Ren, Junli and Ben, Qingwei and Huang, Tao and Zhang, Weinan and Pang, Jiangmiao},
  booktitle = {Robotics: Science and Systems ({RSS})},
  year      = {2025},
}