Home Community Meet MindGPT: A Non-Invasive Neural Decoder that Interprets Perceived Visual Stimuli into Natural Languages from fMRI Signals

Meet MindGPT: A Non-Invasive Neural Decoder that Interprets Perceived Visual Stimuli into Natural Languages from fMRI Signals

Meet MindGPT: A Non-Invasive Neural Decoder that Interprets Perceived Visual Stimuli into Natural Languages from fMRI Signals

To speak with others, humans can only use a limited amount of words to clarify what they see in the skin world. This adaptable cognitive ability shows that the semantic information communicated through language is intricately interwoven with different types of sensory input, particularly for vision. Based on neuroscientific investigations, amodal semantic representations are shared across visual and linguistic experiences. For instance, the word “cat” generates conceptual information comparable to a cat’s mental image. Nevertheless, the semantic relationships between conceptual categories and the graceful transition between V&L modalities have only sometimes been quantified or realized using computational models. 

Recent research on neural decoders showed that visual content could be recreated from representations of the visual cortex captured via functional magnetic resonance imaging. Nevertheless, the blurriness and semantic meaninglessness or mismatch of the rebuilt pictures persevered. Alternatively, the neuroscience community has provided strong evidence to back the claim that the VC of the brain can access semantic ideas in each V&L forms. The outcomes compel us to develop latest “mind reading” equipment to translate what you perceive vocally. Such an effort has considerable scientific value in illuminating cross-modal semantic integration mechanisms and will offer useful information for augmentative or restorative brain-computer interfaces. 

The authors from Zhejiang University introduce MindGPT, a non-invasive neural language decoder that converts the blood-oxygen-level-dependent patterns produced by static visual stimuli into well-formed word sequences, as seen in Fig. 1 Left. To their knowledge, Tang et al. first attempted to create a non-invasive neural decoder for perceived speech reconstruction that may even recuperate the meaning of silent movies for the non-invasive language decoder. Nevertheless, because fMRI has a poor temporal resolution, much fMRI data should be gathered to predict the fine-grained semantic significance between the candidate words and the induced brain responses. 

Figure 1: Left: The MindGPT non-intrusive language decoder’s overall pipeline. Right: The outcomes of our MindGPT reconstruction, the SMALLCAP image captioning model, and the VQ-fMRI and MinD-Vis visual decoding approaches.

As an alternative, this research focuses on whether and to what degree amodal language maps are semantically labeled by static visual sensory experiences, akin to a single image. Their MindGPT is built to satisfy two necessary requirements: (i) it must have the ability to extract visual semantic representations from brain activity, and (ii) it must include a technique for converting learned VSRs into properly constructed word sequences. They first decided to make use of a giant language model, GPT-2, as their text generator. This model has been pre-trained on a dataset of hundreds of thousands of internet sites called WebText, and it allows us to limit sentence patterns to resemble well-formed natural English.

Then, to shut the meaning gap between brain-visual linguistic representations end-to-end, they adopt an easy yet effective CLIP-guided fMRI encoder with cross-attention layers. This neural decoding formulation has a really low variety of learnable parameters, making it each lightweight and efficient. They’ve shown on this work that the MindGPT may function a link between the brain’s VC and machine for reliable V&L semantic transformations. Their technique has learned generalizable brain semantic representations and an intensive comprehension of B & V & L modalities for the reason that language it produces accurately captures the visual semantics of the observed inputs. 

As well as, they found that even with little or no fMRI picture training data, the well-trained MindGPT appears to emerge with the capability to record visual cues of stimulus images, which makes it easier for us to analyze how visual features contribute to language semantics. In addition they noticed, with the help of a visualization tool, that the latent brain representations taught by MindGPT had helpful locality-sensitive characteristics in each low-level visual points and high-level semantic ideas, consistent with certain findings from the sphere of neuroscience. Overall, their MindGPT revealed that, in contrast to previous work, it is feasible to deduce the semantic relationships between V&L representations from their brain’s VC without considering the temporal resolution of fMRI.

Take a look at the Paper and Github. All Credit For This Research Goes To the Researchers on This Project. Also, don’t forget to hitch our 31k+ ML SubReddit, 40k+ Facebook Community, Discord Channel, and Email Newsletter, where we share the most recent AI research news, cool AI projects, and more.

In case you like our work, you’ll love our newsletter..

We’re also on WhatsApp. Join our AI Channel on Whatsapp..

Aneesh Tickoo is a consulting intern at MarktechPost. He’s currently pursuing his undergraduate degree in Data Science and Artificial Intelligence from the Indian Institute of Technology(IIT), Bhilai. He spends most of his time working on projects geared toward harnessing the facility of machine learning. His research interest is image processing and is enthusiastic about constructing solutions around it. He loves to attach with people and collaborate on interesting projects.

▶️ Now Watch AI Research Updates On Our Youtube Channel [Watch Now]


Please enter your comment!
Please enter your name here