SuperDyno: Scalable Humanoid Whole-Body Control via Differentiable Neural Network Dynamics


  1. Why Neural Network?
  2. More Comparison with Baselines and SOTA
  3. Evaluation of Dynamics Models
  4. Dense/Sparse Reward Task Evaluation


Why Neural Network?

In this section, we start from a case study of ball-wall experiment to illustrate why we choose neural network dynamics over differentiable simulators. We then discuss the advantages and disadvantages of neural network dynamics from future applications and limitations.

Policy Performance Comparison

reference motion
with gradients from MJX
with gradients from Neural Networks

Case Study: Ball-wall experiment

As shown by the figure below, consider a simplified ball-wall experiment: a point mass (ball) being launched forward on the ground, at a velocity \(v\). Other problem related variables include the wall height \(H\), the distance to the wall \(L\). And the goal is to maximize forward distance \(x\) by optimizing the initial angle \(\theta\), which can be formulated as \(\max x = f(\theta, v, H, L)\). For simplicity we assume that the ball sticks to the wall (without complex contact). With objective defined, we try to learn it with MLPs. The first MLP uses ReLU activation functions, while the latter uses SiLU activation functions. Both models are initialized with identical random parameters and are trained with the RADAM optimizer for 200 epochs using a batch size of \(B=300\). We provide the interactive 3D loss landscape and compare with the ground truth, which stands for differentiable simulator. It can be observed that the learned loss landscape of neural networks is smoother, which leads to smoother gradients. More specifically, like in \(X(\theta,v)\), it is much likely for the policy to be stuck in local optimal with GT-simulator, where the gradients are continually zero.

GT-simulator
NN-ReLU
NN-SiLU
X(θ, v)
X(θ, H)
X(θ, L)

Advantages & Limitations

There are more advantages of world models with neural networks.

There are also many disadvantages of neural network world models.

Connecting with Choice of Activation Function

With SiLU activation function providing smoother gradients, training dynamics model with SiLU should perform better. We prove this hypothesis by the following comparison on the combination of activation function. All of these experiments are trained on AMASS for 24 hours.

Success Rate Global MPJPE Local MPJPE Acc Vel
P(SiLU) W(SiLU) 96.8 24.3 19.8 2.3 3.1
P(SiLU) W(ReLU) 93.9 30.6 24.4 2.6 3.6
P(ReLU) W(ReLU) 95.8 25.7 19.9 2.7 3.3
P(ReLU) W(SiLU) 97.9 21.1 16.6 2.5 3.0

More Comparison with Baselines and SOTAs

More quantitative results on imitating MoCap motion sequences. AMASS-Train* and AMASS-Test* contains 11313 and 140 high-quality MoCap sequences respectively. FT represents future tracks. * indicates the results are produced on a single NVIDIA A6000 GPU. Our+ also changes the activation of policy network from 'SiLU' to 'ReLU'. PULSE is a distillation method, which is not directly comparable.

To compare with DreamerV1, we first show the average reward curves here, which is trained on single motion sequence. We will include the full results on AMASS by DreamerV1 and DreamerV3 later.

Overfit on "Standing"
Overfit on "Handball"

Evaluation of Dynamics Model

As shown by the figures below, the neural dynamics model we trained can stably predict future state, which is demonstrated on AMASS training & test dataset. The per-joint error is smaller than 0.066m after 1.5 seconds.

On AMASS training set
on AMASS test set

Also we provide the visualization of world model open-loop prediction. The red one is the reference motion, the blue one is our policy's tracking and the orange one is the open-loop prediction of dynamics model.

AMASS Running
AMASS Running2 - Local

Dense/Sparse Reward Evaluation

In this section, we visualize SuperDyno's ability to imitate high-quality motion capture (MoCap) data on both seen and and unseen sequences during training. All rendered SMPL mesh (bottom left) is produced using simulation result without any post-processing.

AMASS Train & Test

AMASS-Train-Overview
AMASS-Train-Dynamic motion
AMASS-Test

Comparison with SOTA

SuperDyno Handball
PHC+ Handball

Sparse Reward Tasks

We demonstrate our framework's capability on two downstream tasks with sparse rewards, velocity tracking and trajectory following.

Speed - certain direction, velocity tracking
Trajectory following