SuperDyno: Scalable Humanoid Whole-Body Control via Differentiable Neural Network Dynamics

Why Neural Network?

Policy Performance Comparison
Case Study: Ball-Wall Experiment
Advantages & Limitations
Connecting with Choice of Activation Function

More Comparison with Baselines and SOTA
Evaluation of Dynamics Models
Dense/Sparse Reward Task Evaluation

AMASS Train & Test
Comparison with SOTA
Sparse Reward Tasks

Why Neural Network?

In this section, we start from a case study of ball-wall experiment to illustrate why we choose neural network dynamics over differentiable simulators. We then discuss the advantages and disadvantages of neural network dynamics from future applications and limitations.

Policy Performance Comparison


reference motion	with gradients from MJX	with gradients from Neural Networks

Case Study: Ball-wall experiment

As shown by the figure below, consider a simplified ball-wall experiment: a point mass (ball) being launched forward on the ground, at a velocity \(v\). Other problem related variables include the wall height \(H\), the distance to the wall \(L\). And the goal is to maximize forward distance \(x\) by optimizing the initial angle \(\theta\), which can be formulated as \(\max x = f(\theta, v, H, L)\). For simplicity we assume that the ball sticks to the wall (without complex contact). With objective defined, we try to learn it with MLPs. The first MLP uses ReLU activation functions, while the latter uses SiLU activation functions. Both models are initialized with identical random parameters and are trained with the RADAM optimizer for 200 epochs using a batch size of \(B=300\). We provide the interactive 3D loss landscape and compare with the ground truth, which stands for differentiable simulator. It can be observed that the learned loss landscape of neural networks is smoother, which leads to smoother gradients. More specifically, like in \(X(\theta,v)\), it is much likely for the policy to be stuck in local optimal with GT-simulator, where the gradients are continually zero.

GT-simulator

NN-ReLU

NN-SiLU

X(θ, v)

X(θ, H)

X(θ, L)

Advantages & Limitations

There are more advantages of world models with neural networks.

Smoother gradients -> escape local optimal, less exploding/vanishing gradients compared to simulators
Can be easily scaled up and support parallel computation naturally.
Aid us in domain transfer and exhibit strong generalization, showing powerful potential in real-world adaptation.
Can be directly deployed on real robots.

There are also many disadvantages of neural network world models.

Short in long-term accurate prediction.
Unclear for how to learn from unstructured multi-modal data.

Connecting with Choice of Activation Function

With SiLU activation function providing smoother gradients, training dynamics model with SiLU should perform better. We prove this hypothesis by the following comparison on the combination of activation function. All of these experiments are trained on AMASS for 24 hours.

	Success Rate	Global MPJPE	Local MPJPE	Acc	Vel
P(SiLU) W(SiLU)	96.8	24.3	19.8	2.3	3.1
P(SiLU) W(ReLU)	93.9	30.6	24.4	2.6	3.6
P(ReLU) W(ReLU)	95.8	25.7	19.9	2.7	3.3
P(ReLU) W(SiLU)	97.9	21.1	16.6	2.5	3.0

More Comparison with Baselines and SOTAs

More quantitative results on imitating MoCap motion sequences. AMASS-Train* and AMASS-Test* contains 11313 and 140 high-quality MoCap sequences respectively. FT represents future tracks. * indicates the results are produced on a single NVIDIA A6000 GPU. Our+ also changes the activation of policy network from 'SiLU' to 'ReLU'. PULSE is a distillation method, which is not directly comparable.

To compare with DreamerV1, we first show the average reward curves here, which is trained on single motion sequence. We will include the full results on AMASS by DreamerV1 and DreamerV3 later.

Evaluation of Dynamics Model

As shown by the figures below, the neural dynamics model we trained can stably predict future state, which is demonstrated on AMASS training & test dataset. The per-joint error is smaller than 0.066m after 1.5 seconds.

Also we provide the visualization of world model open-loop prediction. The red one is the reference motion, the blue one is our policy's tracking and the orange one is the open-loop prediction of dynamics model.


AMASS Running	AMASS Running2 - Local

Dense/Sparse Reward Evaluation

In this section, we visualize SuperDyno's ability to imitate high-quality motion capture (MoCap) data on both seen and and unseen sequences during training. All rendered SMPL mesh (bottom left) is produced using simulation result without any post-processing.

AMASS Train & Test


AMASS-Train-Overview	AMASS-Train-Dynamic motion	AMASS-Test

Comparison with SOTA


SuperDyno Handball	PHC+ Handball

Sparse Reward Tasks

We demonstrate our framework's capability on two downstream tasks with sparse rewards, velocity tracking and trajectory following.


Speed - certain direction, velocity tracking	Trajectory following