Additionally, high-quality motion capture (mocap) data and realistic human shape models are two important components for animation generation, each of which involves a specific skeleton definition including the number of joints, the naming convention, hierarchical relationship and underlying physical meaning of each joint. Ideally, animation software can drive a human shape model to move and articulate according to the given mocap data and optimize the deformation of the body surface with natural smoothness, as shown in Fig. 1. With the development of computer graphics and the mocap technology, there are plenty of mocap data and 3D human models available for various research activities. However, due to their different sources, there is a major gap between the mocap data, shape models and animation software, which often makes animation generation a challenging task. There are three skeleton definitions involved for animation generation which are from the mocap data, shape models, and software build-in skeleton. The incompatibility among those skeletons often make synthesized animation sequences unrealistic, inaccurate or even distorted.