Motion recognition, the duty of identifying and classifying human actions from video sequences, is an important field inside computer vision. Nevertheless, its reliance on large-scale datasets containing images of individuals brings forth significant challenges related to privacy, ethics, and data protection. These issues arise resulting from the potential identification of people based on personal attributes and data collection without explicit consent. Furthermore, biases related to gender, race, or specific actions performed by certain groups can affect the accuracy and fairness of models trained on such datasets.
In motion recognition, advancements in pre-training methodologies on massive video datasets have been pivotal. Nevertheless, these advancements include challenges, reminiscent of ethical considerations, privacy issues, and biases inherent in datasets with human imagery. Existing approaches to tackle these issues include blurring faces, downsampling videos, or employing synthetic data for training. Despite these efforts, there must be more evaluation of how well privacy-preserving pre-trained models transfer their learned representations to downstream tasks. The state-of-the-art models sometimes fail to predict actions accurately resulting from biases or an absence of diverse representations within the training data. These challenges demand novel approaches that address privacy concerns and enhance the transferability of learned representations to numerous motion recognition tasks.
To beat the challenges posed by privacy concerns and biases in human-centric datasets used for motion recognition, a brand new method was recently presented at NeurIPS 2023, the well-known conference, that introduces a groundbreaking approach. This newly published work devises a technique to pre-train motion recognition models using a mixture of synthetic videos containing virtual humans and real-world videos with humans removed. By leveraging this novel pre-training strategy termed Privacy-Preserving MAE-Align (PPMA), the model learns temporal dynamics from synthetic data and contextual features from real videos without humans. This modern method helps address privacy and ethical concerns related to human data. It significantly improves the transferability of learned representations to diverse downstream motion recognition tasks, closing the performance gap between models trained with and without human-centric data.
Concretely, the proposed PPMA method follows these key steps:
- Privacy-Preserving Real Data: The method begins with the Kinetics dataset, from which humans are removed using the HAT framework, leading to the No-Human Kinetics dataset.
- Synthetic Data Addition: Synthetic videos from SynAPT are included, offering virtual human actions facilitating concentrate on temporal features.
- Downstream Evaluation: Six diverse tasks evaluate the model’s transferability across various motion recognition challenges.
- MAE-Align Pre-training: This two-stage strategy involves:
- Stage 1: MAE Training to predict pixel values, learning real-world contextual features.
- Stage 2: Supervised Alignment using each No-Human Kinetics and artificial data for motion label-based training.
- Privacy-Preserving MAE-Align (PPMA): Combining Stage 1 (MAE trained on No-Human Kinetics) with Stage 2 (alignment using each No-Human Kinetics and artificial data), PPMA ensures robust representation learning while safeguarding privacy.
The research team conducted experiments to judge the proposed approach. Using ViT-B models trained from scratch without ImageNet pre-training, they employed a two-stage process: MAE training for 200 epochs followed by supervised alignment for 50 epochs. Across six diverse tasks, PPMA outperformed other privacy-preserving methods by 2.5% in finetuning (FT) and 5% in linear probing (LP). Although barely less effective on high scene-object bias tasks, PPMA significantly reduced the performance gap in comparison with models trained on real human-centric data, showcasing promise in achieving robust representations while preserving privacy. Ablation experiments highlighted the effectiveness of MAE pre-training in learning transferable features, particularly evident when finetuned on downstream tasks. Moreover, exploring the mix of contextual and temporal features, methods like averaging model weights and dynamically learning mixing proportions showed potential for improving representations, opening avenues for further exploration.
This text introduces PPMA, a novel privacy-preserving approach for motion recognition models, addressing privacy, ethics, and bias challenges in human-centric datasets. Leveraging synthetic and human-free real-world data, PPMA effectively transfers learned representations to diverse motion recognition tasks, minimizing the performance gap between models trained with and without human-centric data. The experiments underscore PPMA’s effectiveness in advancing motion recognition while ensuring privacy and mitigating ethical concerns and biases linked to standard datasets.
Take a look at the Paper and Github. All credit for this research goes to the researchers of this project. Also, don’t forget to affix our 33k+ ML SubReddit, 41k+ Facebook Community, Discord Channel, and Email Newsletter, where we share the newest AI research news, cool AI projects, and more.
In case you like our work, you’ll love our newsletter..
Mahmoud is a PhD researcher in machine learning. He also holds a
bachelor’s degree in physical science and a master’s degree in
telecommunications and networking systems. His current areas of
research concern computer vision, stock market prediction and deep
learning. He produced several scientific articles about person re-
identification and the study of the robustness and stability of deep
networks.