Home Community Max Planck Researchers Introduce PoseGPT: An Artificial Intelligence Framework Employing Large Language Models (LLMs) to Understand and Reason about 3D Human Poses from Images or Textual Descriptions

Max Planck Researchers Introduce PoseGPT: An Artificial Intelligence Framework Employing Large Language Models (LLMs) to Understand and Reason about 3D Human Poses from Images or Textual Descriptions

0
Max Planck Researchers Introduce PoseGPT: An Artificial Intelligence Framework Employing Large Language Models (LLMs) to Understand and Reason about 3D Human Poses from Images or Textual Descriptions

Human posture is crucial in overall health, well-being, and various facets of life. It encompasses the alignment and positioning of the body while sitting, standing, or lying down. Good posture supports the optimal alignment of muscles, joints, and ligaments, reducing the danger of muscular imbalances, joint pain, and overuse injuries. It helps distribute the body’s weight evenly, stopping excessive stress on specific body parts. 

Proper posture allows for higher lung expansion and facilitates adequate respiration. Slouching or poor posture can compress the chest cavity, restricting lung capability and hindering efficient respiration. Moreover, good posture supports healthy circulation throughout the body. Research suggests that maintaining good posture can positively influence mood and self-confidence. Adopting an upright and open posture is related to increased assertiveness, positivity, and reduced stress levels.

A team of researchers from Max Plank Institute for Intelligent Systems, ETH Zurich, Meshcapade, and Tsinghua University built a framework employing a Large Language Model called PoseGPT to grasp and reason about 3D human poses from images or textual descriptions. Traditional human pose estimation methods, like image-based or text-based, often need more holistic scene comprehension and nuanced reasoning, resulting in a disconnect between visual data and its real-world implications. PoseGPT addresses these limitations by embedding SMPL poses as a definite signal token inside a multimodal LLM by enabling the direct generation of 3D body poses from each textual and visual inputs.

Their method embeds SMPL poses as a novel token by prompting the LLM to output these when queried about SMPL pose-related questions. They extracted the language embedding from this token and used an MLP (multi-layer perceptron) to predict the SMPL pose parameters directly. This allows the model to take either text or images as input and output 3D body poses. 

They evaluated PoseGPT on various diverse tasks, like the standard task of 3D human pose estimation from a single image and pose generation from text descriptions. The metric accuracy on these classical tasks still must match that of specialised methods, but they see this as a primary proof of concept. More importantly, once the LLMs understand SMPL poses, they will use their inherent world knowledge to relate and reason about human poses without requiring extensive additional data or training.

Contrary to traditional approaches in pose regression, their methodology doesn’t involve providing the multimodal LLM with a cropped bounding box surrounding the person. As an alternative, the model is exposed to your entire scene, enabling them to formulate queries regarding the individuals and their respective poses inside that context. 

Once the LLM grasps the concept of 3D body pose, it gains the twin ability to generate human poses and to understand the world. This allows it to reason through complex verbal and visual inputs and develop human poses. This results in the introduction of novel tasks made possible by this capability and benchmarks to evaluate performance to any model.


Try the Paper and Project. All credit for this research goes to the researchers of this project. Also, don’t forget to hitch our 33k+ ML SubReddit, 41k+ Facebook Community, Discord Channel, and Email Newsletter, where we share the newest AI research news, cool AI projects, and more.

For those who like our work, you’ll love our newsletter..


Arshad is an intern at MarktechPost. He’s currently pursuing his Int. MSc Physics from the Indian Institute of Technology Kharagpur. Understanding things to the elemental level results in latest discoveries which result in advancement in technology. He’s keen about understanding the character fundamentally with the assistance of tools like mathematical models, ML models and AI.


✅ [Featured AI Model] Try LLMWare and It’s RAG- specialized 7B Parameter LLMs

LEAVE A REPLY

Please enter your comment!
Please enter your name here