In this section, we start from a case study of ball-wall experiment to illustrate why we choose neural network dynamics over differentiable simulators. We then discuss the advantages and disadvantages of neural network dynamics from future applications and limitations.
|
|
|
As shown by the figure below, consider a simplified ball-wall experiment: a point mass (ball) being launched forward on the ground, at a velocity \(v\). Other problem related variables include the wall height \(H\), the distance to the wall \(L\). And the goal is to maximize forward distance \(x\) by optimizing the initial angle \(\theta\), which can be formulated as \(\max x = f(\theta, v, H, L)\). For simplicity we assume that the ball sticks to the wall (without complex contact). With objective defined, we try to learn it with MLPs. The first MLP uses ReLU activation functions, while the latter uses SiLU activation functions. Both models are initialized with identical random parameters and are trained with the RADAM optimizer for 200 epochs using a batch size of \(B=300\). We provide the interactive 3D loss landscape and compare with the ground truth, which stands for differentiable simulator. It can be observed that the learned loss landscape of neural networks is smoother, which leads to smoother gradients. More specifically, like in \(X(\theta,v)\), it is much likely for the policy to be stuck in local optimal with GT-simulator, where the gradients are continually zero.
There are more advantages of world models with neural networks.
There are also many disadvantages of neural network world models.
With SiLU activation function providing smoother gradients, training dynamics model with SiLU should perform better. We prove this hypothesis by the following comparison on the combination of activation function. All of these experiments are trained on AMASS for 24 hours.
Success Rate | Global MPJPE | Local MPJPE | Acc | Vel | |
---|---|---|---|---|---|
P(SiLU) W(SiLU) | 96.8 | 24.3 | 19.8 | 2.3 | 3.1 |
P(SiLU) W(ReLU) | 93.9 | 30.6 | 24.4 | 2.6 | 3.6 |
P(ReLU) W(ReLU) | 95.8 | 25.7 | 19.9 | 2.7 | 3.3 |
P(ReLU) W(SiLU) | 97.9 | 21.1 | 16.6 | 2.5 | 3.0 |
More quantitative results on imitating MoCap motion sequences. AMASS-Train* and AMASS-Test* contains 11313 and 140 high-quality MoCap sequences respectively. FT represents future tracks. * indicates the results are produced on a single NVIDIA A6000 GPU. Our+ also changes the activation of policy network from 'SiLU' to 'ReLU'. PULSE is a distillation method, which is not directly comparable.
To compare with DreamerV1, we first show the average reward curves here, which is trained on single motion sequence. We will include the full results on AMASS by DreamerV1 and DreamerV3 later.
As shown by the figures below, the neural dynamics model we trained can stably predict future state, which is demonstrated on AMASS training & test dataset. The per-joint error is smaller than 0.066m after 1.5 seconds.
Also we provide the visualization of world model open-loop prediction. The red one is the reference motion, the blue one is our policy's tracking and the orange one is the open-loop prediction of dynamics model.
|
|
In this section, we visualize SuperDyno's ability to imitate high-quality motion capture (MoCap) data on both seen and and unseen sequences during training. All rendered SMPL mesh (bottom left) is produced using simulation result without any post-processing.
|
|
|
|
|
We demonstrate our framework's capability on two downstream tasks with sparse rewards, velocity tracking and trajectory following.
|
|