<div class="csl-bib-body">
<div class="csl-entry">Nguyen, K., Le, A. T., Peters, J., & Vu, M. N. (2025). DoublyAware : Dual Planning and Policy Awareness for Temporal Difference Learning in Humanoid Locomotion. <i>IEEE Robotics and Automation Letters</i>, <i>11</i>(2), 2162–2169. https://doi.org/10.1109/LRA.2025.3648611</div>
</div>
-
dc.identifier.issn
2377-3766
-
dc.identifier.uri
http://hdl.handle.net/20.500.12708/226129
-
dc.description.abstract
Achieving robust robot learning for humanoid locomotion is a fundamental challenge in model-based reinforcement learning (MBRL), where environmental stochasticity and randomness can hinder efficient exploration and learning stability. The environmental, so-called aleatoric, uncertainty can be amplified in high-dimensional action spaces with complex contact dynamics and entangled with epistemic uncertainty in the models during learning phases. In this work, we propose DoublyAware, an uncertainty-aware extension of Temporal Difference Model Predictive Control (TD-MPC) that explicitly decomposes uncertainty into two disjoint, interpretable components, i.e., planning and policy uncertainties. To handle the planning uncertainty, DoublyAware employs conformal prediction to filter candidate trajectories using quantile-calibrated risk bounds, ensuring statistical consistency and robustness against stochastic dynamics. Meanwhile, policy rollouts are leveraged as structured informative priors to support the learning phase with Group-Relative Policy Constraint (GRPC) optimizers, which impose a group-based adaptive trust region in the latent action space. This combination enables the robot agent to prioritize high-confidence, high-reward behavior while maintaining effective, targeted exploration under uncertainty. Evaluated on the HumanoidBench locomotion suite with the Unitree 26-DoF H1-2 humanoid, DoublyAware demonstrates improved sample efficiency, accelerated convergence, and enhanced motion feasibility compared to RL baselines. Our results emphasize the significance of structured uncertainty modeling for data-efficient and reliable decision-making in TD-MPC-based humanoid locomotion learning.
en
dc.language.iso
en
-
dc.publisher
IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
-
dc.relation.ispartof
IEEE Robotics and Automation Letters
-
dc.subject
humanoid and bipedal locomotion
en
dc.subject
reinforcement learning
en
dc.title
DoublyAware : Dual Planning and Policy Awareness for Temporal Difference Learning in Humanoid Locomotion