A Review of Multi-modal Human Motion Recognition Based on Deep Learning
Keywords:
Human motion recognition, Computer vision, Multi-modal, Deep learningAbstract
Human motion recognition is a research hotspot in the field of computer vision, which has a wide range of applications, including biometrics, intelligent surveillance and human-computer interaction. In vision-based human motion recognition, the main input modes are RGB, depth image and bone data. Each mode can capture some kind of information, which is likely to be complementary to other modes, for example, some modes capture global information while others capture local details of an action. Intuitively speaking, the fusion of multiple modal data can improve the recognition accuracy. In addition, how to correctly model and utilize spatiotemporal information is one of the challenges facing human motion recognition. Aiming at the feature extraction methods involved in human action recognition tasks in video, this paper summarizes the traditional manual feature extraction methods from the aspects of global feature extraction and local feature extraction, and introduces the commonly used feature learning models of feature extraction methods based on deep learning in detail. This paper summarizes the opportunities and challenges in the field of motion recognition and looks forward to the possible research directions in the future.