Brain 🧠. Essentially the most fascinating organ of the human body. Understanding how it really works is the important thing to unlocking the secrets of life. How do we predict, sense, smell, sense, act? The reply to all these questions lies in understanding how the brain works.
Understanding how the brain responds to what we see is a hot research topic, as this data could lead on to the event of advanced computational cognitive systems. Since we now have fancy tools like functional magnetic resonance imaging (fMRI) and electroencephalograph (EEG), scientists can now record brain activity triggered by visual stimuli. This has led to a growing interest in decoding and reconstructing the actual content that provokes these responses within the human brain.
One common approach to studying human visual perception is to reconstruct the photographs or videos that subjects viewed during experiments. This is finished using computational methods, particularly deep neural networks, and is based totally on fMRI data. Nevertheless, collecting fMRI data is pricey and inconvenient for practical use. I mean, if you’ve gotten ever been in an MRI device, you’ll probably know the way uncomfortable to remain there. No person is willingly agreeing to be in an experiment with that.Â
That is where EEG is available in. EEG is a more efficient strategy to record and analyze brain signals while subjects view various stimuli, however it has its own challenges. EEG signals are time-series data, which could be very different from static images. This makes it difficult to match stimuli to corresponding brain signal pieces. Moreover, issues like electrode misplacement and body motion can introduce significant noise into the information. Simply mapping EEG inputs to pixels for image reconstruction produces low-quality results.
However, diffusion models have emerged as state-of-the-art approaches in generative modeling. They’ve been successfully applied to numerous tasks, including image synthesis and video generation. By operating within the latent space of powerful pre-trained autoencoders, the researchers overcome the constraints of pixel space evaluation, enabling faster inference and reducing training costs.
Allow us to meet with NeuroImageGen, which tackles this problem using the ability of diffusion models.
NeuroImageGen is a pipeline for neural image generation using EEG signals. It addresses the challenges related to EEG-based image reconstruction by incorporating a multi-level semantics extraction module. This module decodes different levels of semantic information from EEG signals, starting from sample-level semantics to pixel-level details like saliency maps. These multi-level outputs are then fed into pretrained diffusion models, effectively controlling the generation process at various semantic levels.
The EEG signals are complex time-series data liable to noise, making them difficult to work with. NeuroImageGen overcomes this by extracting multi-level semantics, which incorporates each pixel-level and sample-level information. Pixel-level semantics involve capturing fine-grained color, position, and shape details of visual stimuli through saliency maps. However, sample-level semantics provide a more coarse-grained understanding, reminiscent of recognizing image categories or text captions. This multi-level approach enables NeuroImageGen to handle the noisy EEG data effectively, facilitating high-quality visual stimulus reconstruction.
NeuroImageGen integrates these multi-level semantics right into a latent diffusion model for image reconstruction. The pixel-level semantics, represented as saliency maps generated from EEG features, are used as an initial image. Sample-level semantics, derived from CLIP model embeddings of image captions, guide the denoising process within the diffusion model. This integration allows for a versatile control of semantic information at different levels in the course of the reconstruction process. The result’s the reconstructed visual stimulus, which effectively combines fine-grained and coarse-grained information to supply high-quality images.
The outcomes of this approach are promising, outperforming traditional image reconstruction methods on EEG data. NEUROIMAGEN significantly enhances the structural similarity and semantic accuracy of reconstructed images, improving our understanding of the impact of visual stimuli on the human brain.
Try the Paper. All Credit For This Research Goes To the Researchers on This Project. Also, don’t forget to hitch our 30k+ ML SubReddit, 40k+ Facebook Community, Discord Channel, and Email Newsletter, where we share the newest AI research news, cool AI projects, and more.
If you happen to like our work, you’ll love our newsletter..
Ekrem Çetinkaya received his B.Sc. in 2018, and M.Sc. in 2019 from Ozyegin University, Istanbul, Türkiye. He wrote his M.Sc. thesis about image denoising using deep convolutional networks. He received his Ph.D. degree in 2023 from the University of Klagenfurt, Austria, together with his dissertation titled “Video Coding Enhancements for HTTP Adaptive Streaming Using Machine Learning.” His research interests include deep learning, computer vision, video encoding, and multimedia networking.