Human Action Recognition (HAR) in real-time has become a pivotal field, given its wide range of applications in various industries such as Health and Wellness, Sports, Robotics, Security and Surveillance, Fitness Monitoring, Immediate Diagnostics etc. Human actions and related information can be accurately detected and classified via cameras and sensors like accelerometers etc. integrated with computer vision, image processing, artificial intelligence, pattern recognition, data analytics, nonlinear modeling and biomechanics technologies. By obtaining bi-dimensional and thermal images through enhanced cameras placed in the right viewing angles and illumination, one can observe diverse aspects of human actions such as motions, postures, balance, environmental interactions, smartphone usage, gestures as well as different body sizes, clothes, appearances etc.
As a part of the recognition systems, apart from computer vision human activity recognition powered cameras, virtual sensors are softwares working on AI models that gain and aggregate data asynchronously or synchronously from other physical sensors in the vicinity and simulate their behavior of generating sensor readings. The virtual sensor market reached a value of USD 8.78 billion in 2022, and is expected to increase at a CAGR of 33.3% to reach a market value of USD 15.407 billion by 2032. [GlobeNewsWire]. It would be a fair assumption that computer vision based human activity recognition and related analytics would be surging at a similar pace, given these fall under the same umbrella of Artificial Intelligence, where virtual sensors may or may not be a part of the complete integrated HAR system powered by machine vision.
Applications of HAR Systems
● Patient Monitoring
Human activities recognition systems can be used for monitoring patient’s actions, ensuring adherence to therapy measures, as well as to check progress on recovery. This is of utmost importance in cases of strokes and other heart conditions. These systems can also be useful for fall detection in elderly care monitoring, where emergency alerts can help caregivers reach on time for assistance. Wearable sensors and smartphones help in fitness monitoring as well such as by counting number of steps, and calculating calorie expenditure via physical activity tracking.
● Performance Analysis
HAR can be used for sports performance analysis where movements and techniques of athletes are monitored and analyzed. Such data can be used by coaches to guide athletes for better performance and at the same time prevent potential injuries. Other applications may include posture checkpoints during yoga classes, employee performance with respect to assembly line functions in manufacturing industry, detecting use of smartphones by sales personnel and their overall performances in the retail stores, as well as recognising and analyzing body movements and general actions during sports or other types of rehabilitation etc.
● Security and Surveillance
Another important application of HAR is detection of suspicious activities by customers in self-checkout stores, alerting people on entering restricted or hazardous areas in the manufacturing plants, and classification of types of abnormal behavior, breach due to unauthorized access and intrusions. Similarly, it can be used in the defense to surveil soldier movements, suspicious border activities, and recognize potential threats. These can even assist retailers in detecting foot traffic in store and customer sentiments.
● Smart Homes
Smart homes can now be made more user-friendly and efficient using HAR, where gestures can control lighting, air condition, ventilation, devices and appliances such as kettles, mixers, AR/VR gaming etc. Automatic HAR also plays a major role in energy conservation measures in commercial buildings in terms of HVAC and light controls. Streaming services, Smart TVs and e-commerce platforms can adjust and showcase products, channels etc. according to user preferences upon detecting and classifying their activities.
● Driver Monitoring
With HAR systems integrated in the automotive industry, road safety can be enhanced and potential accidents can be avoided since these can be used to detect driver behavior, aggressive driving, drowsiness, distraction etc. On considering the retail and marketing industry, one can detect the number of drivers observing and recognizing the billboard ads and other related gestures.
Recent Advances in HAR Systems
Improvised HAR Accuracy
As we are aware that development of device agnostic softwares and applications are the need of the hour, it is now possible to gather important data from sensors such as gyroscopes, accelerometers etc. as well as microphones and cameras in smartphones. This helps in monitoring daily activities of users in real-time without having to interfere in their routine and needless interaction with their smartphones for accurate measurement, action differentiation and human motion dataset collection that can be used to train AI models. For example, daily steps counter applications in smartphones and smartwatches. In contrast to this, deep learning techniques are evolving in this scenario where manual characteristics need not be modeled into because of their generalization and pattern recognition capabilities, such as boxing move prediction on the basis of previously studied patterns even when inputs are provided to the system in a real-time scenario.
Another major game changer in the vision based human activity recognition field are Convolutional Neural Networks (CNNs) and their combinations for image based recognition such as Residual Network depth 50 (ResNet-50, also Fourier ResNet and 3D ResNet), Long short-term memory network (LSTM), 2-D Vision Transformer (ViT) etc, and Recurrent Neural Networks (RNNs) in case of time series type of data and related predictions. These are capable of increasing accuracy of detecting and classifying human activities to approximately 95.25%. In addition to these, there are also some highly accurate hybrid models formed by amalgamating CNNs and RNNs that are trained for classification on the basis of visual features extracted from human motion data.
For example, CNNs in a hybrid model can accurately capture and extract features from video frames of human actions such as walking or standing, followed by RNNs to process and classify these extracted features for gait analysis in patient monitoring. Moreover, multimodal systems are paving the path to rapid processing of video-based HAR through training based on combinatorial data such as cameras, audio and wearable sensors. This can help in removing visual ambiguities and data variability issues as AI can compare and classify human actions’ images with audio cues. An example of this can be ambient assisted living centers to detect and identify motion and health conditions of the elderly. The data from wearable tri-axial accelerometers, RGB-Depth sensors (like Asus Xtion) and Sensing nodes (such as those based on Libelium) provide data to Data Receivers (for indoor localization), Person Recognition library (such as OpenNI with the help of 2D, 3D bounding box) and PIR sensors (motion) respectively for accurate HAR.
Fusion of CV and Machine Learning
Custom computer vision solutions enable meaningful framewise visual interpretation by machines with respect to object detection, facial recognition, feature extraction and pose estimation in HAR systems. This is followed by data preprocessing which involves normalizing image resolution, resizing them and removing noise and occlusion etc., and effective pattern recognition required for Machine Learning algorithms such as decision trees, deep learning, support vector machines etc. For example, detecting fraudulent actions during exams as per UCA rules and regulations etc.
Challenges Associated with HAR Systems
Although human activity recognition using machine learning has wide applications, certain challenges persist in the way of its global acceptance such as differentiating between important and unimportant tasks, non-explicit poses or actions, issues due to partial occlusion by excessive number of objects in the area of observation, videos and images of poor-quality or illumination interfering with analysis, processing of large scale datasets in case of multinational organizations, delayed actions or actions performed with large time gaps in-between etc. This also includes cross-entropy loss measurement between machine learning model predictions and the real probability distributions.
Challenges related to data variability can also occur in terms of differences in intensity, speed, situation as well as other parameters such as change in weather and viewing angle. Another issue with respect to HAR systems can be related to real-time and low-latency processing demand, for example, in border security cases. There have also been some concerns around ethical data privacy around HAR systems, such as in the case of the healthcare industry, which must be taken into thorough consideration by using confidentiality measures.
The Bottom Line
Use of computer vision and machine learning in detecting, recognising and classifying human activities turns out to be a boon, given the accuracy and low manpower requirement for receiving important updates. We also looked at various challenges associated with human action recognition systems which can nevertheless be conquered in the near future, for example, data variability for classifying actions that can be tackled with varied and representative dataset collections for training robust models. Accurate and swift real-time processing can be done by computation parallelization, hardware acceleration and optimizing model architecture. Use of HAR can be widened by applying multimodal systems into various industries. Furthermore, methods like DINO (Self-Distillation with no labels by Meta) are being used to enhance CNNs overall classification performance through a transformers-based self-supervised vision.
KritiKal can enable you to push the boundaries of what is possible by developing secure and intuitive customized HAR systems such as for human pose estimation, fitness monitoring, eldercare, head pose estimation, sports performance analysis and much more. We assure accurate visual reasoning by mapping human actions in red, blue and green (RGB) videos and modeling entity interactions in the necessary spatial dimensions and its optical flows to achieve the best of performance metrics in a handy feasible manner as per your requirement. This can be achieved initially via a semi-supervised learning approach, followed by further modifications to form clusters of human actions for learned representations. Join hands with KritiKal for best vision related services as well as custom software development in the USA, to attain exponential business growth, especially in the HAR field. Please call us or mail us at sales@kritikalsolutions.com to avail our services.
Abhay Mishra holds the position of Lead Engineer at KritiKal Solutions. He is dedicatedly invested in tackling real-world problems with the help of different Machine Learning and Deep Learning enabled technologies as well as deploying them in production based environments. With considerable experience in the fields of Artificial Intelligence and Deep Neural Networks, he has helped KritiKal Solutions in timely delivery of some notable projects.