![Gamifying medical data labeling to advance AI Gamifying medical data labeling to advance AI](http://aiguido.com/wp-content/uploads/https://news.mit.edu/sites/default/files/images/202306/MIT-Centaur-labs-01.jpg)
When Erik Duhaime PhD ’19 was working on his thesis in MIT’s Center for Collective Intelligence, he noticed his wife, then a medical student, spending hours studying on apps that offered flash cards and quizzes. His research had shown that, as a gaggle, medical students could classify skin lesions more accurately than skilled dermatologists; the trick was to repeatedly measure each student’s performance on cases with known answers, throw out the opinions of people that were bad at the duty, and intelligently pool the opinions of folks that were good.
Combining his wife’s studying habits along with his research, Duhaime founded Centaur Labs, an organization that created a mobile app called DiagnosUs to assemble the opinions of medical examiners on real-world scientific and biomedical data. Through the app, users review anything from images of doubtless cancerous skin lesions or audio clips of heart and lung sounds that would indicate an issue. If the users are accurate, Centaur uses their opinions and awards them small money prizes. Those opinions, in turn, help medical AI firms train and improve their algorithms.
The approach combines the will of medical examiners to hone their skills with the desperate need for well-labeled medical data by firms using AI for biotech, developing pharmaceuticals, or commercializing medical devices.
“I noticed my wife’s studying could possibly be productive work for AI developers,” Duhaime recalls. “Today we now have tens of hundreds of individuals using our app, and about half are medical students who’re blown away that they win money within the strategy of studying. So, we now have this gamified platform where individuals are competing with one another to coach data and winning money in the event that they’re good and improving their skills at the identical time — and by doing that they’re labeling data for teams constructing life saving AI.”
Gamifying medical labeling
Duhaime accomplished his PhD under Thomas Malone, the Patrick J. McGovern Professor of Management and founding director of the Center for Collective Intelligence.
“What interested me was the wisdom of crowds phenomenon,” Duhaime says. “Ask a bunch of individuals what number of jelly beans are in a jar, and the typical of everybody’s answer is pretty close. I used to be concerned about the way you navigate that problem in a task that requires skill or expertise. Obviously you don’t just wish to ask a bunch of random people if you’ve gotten cancer, but at the identical time, we all know that second opinions in health care could be extremely beneficial. You may consider our platform as a supercharged way of getting a second opinion.”
Duhaime began exploring ways to leverage collective intelligence to enhance medical diagnoses. In a single experiment, he trained groups of lay people and medical school students that he describes as “semiexperts” to categorise skin conditions, finding that by combining the opinions of the very best performers he could outperform skilled dermatologists. He also found that by combining algorithms trained to detect skin cancer with the opinions of experts, he could outperform either method by itself.
“The core insight was you do two things,” Duhaime explains. “The very first thing is to measure people’s performance — which sounds obvious, but even within the medical domain it isn’t done much. Should you ask a dermatologist in the event that they’re good, they are saying, ‘Yeah in fact, I’m a dermatologist.’ They don’t necessarily know the way good they’re at specific tasks. The second thing is that whenever you get multiple opinions, you might want to discover complementarities between the various people. It is advisable recognize that expertise is multidimensional, so it’s just a little more like putting together the optimal trivia team than it’s getting the five people who find themselves all the most effective at the identical thing. For instance, one dermatologist could be higher at identifying melanoma, whereas one other could be higher at classifying the severity of psoriasis.”
While still pursuing his PhD, Duhaime founded Centaur and commenced using MIT’s entrepreneurial ecosystem to further develop the concept. He received funding from MIT’s Sandbox Innovation Fund in 2017 and took part within the delta v startup accelerator run by the Martin Trust Center for MIT Entrepreneurship over the summer of 2018. The experience helped him get into the celebrated Y Combinator accelerator later that 12 months.
The DiagnosUs app, which Duhaime developed with Centaur co-founders Zach Rausnitz and Tom Gellatly, is designed to assist users test and improve their skills. Duhaime says about half of users are medical school students and the opposite half are mostly doctors, nurses, and other medical professionals.
“It’s higher than studying for exams, where you may have multiple alternative questions,” Duhaime says. “They get to see actual cases and practice.”
Centaur gathers thousands and thousands of opinions every week from tens of hundreds of individuals around the globe. Duhaime says most individuals earn coffee money, although the one who’s earned probably the most from the platform is a physician in eastern Europe who’s made around $10,000.
“People can do it on the couch, they will do it on the T,” Duhaime says. “It doesn’t feel like work — it’s fun.”
The approach stands in sharp contrast to traditional data labeling and AI content moderation, that are typically outsourced to low-resource countries.
Centaur’s approach produces accurate results, too. In a paper with researchers from Brigham and Women’s Hospital, Massachusetts General Hospital (MGH), and Eindhoven University of Technology, Centaur showed its crowdsourced opinions labeled lung ultrasounds as reliably as experts did. One other study with researchers at Memorial Sloan Kettering showed crowdsourced labeling of dermoscopic images was more accurate than that of highly experienced dermatologists. Beyond images, Centaur’s platform also works with video, audio, text from sources like research papers or anonymized conversations between doctors and patients, and waves from electroencephalograms (EEGs) and electrocardiographys (ECGs).
Finding the experts
Centaur has found that the most effective performers come from surprising places. In 2021, to gather expert opinions on EEG patterns, researchers held a contest through the DiagnosUs app at a conference featuring about 50 epileptologists, each with greater than 10 years of experience. The organizers made a custom shirt to offer to the competition’s winner, who they assumed can be in attendance on the conference.
But when the outcomes got here in, a pair of medical students in Ghana, Jeffery Danquah and Andrews Gyabaah, had beaten everyone in attendance. The best-ranked conference attendee had are available ninth.
“I began by doing it for the cash, but I noticed it actually began helping me loads,” Gyabaah told Centaur’s team later. “There have been times within the clinic where I noticed that I used to be doing higher than others due to what I learned on the DiagnosUs app.”
As AI continues to alter the character of labor, Duhaime believes Centaur Labs will probably be used as an ongoing check on AI models.
“Without delay, we’re helping people train algorithms primarily, but increasingly I believe we’ll be used for monitoring algorithms and together with algorithms, principally serving because the humans within the loop for a variety of tasks,” Duhaime says. “You may consider us less as a solution to train AI and more as an element of the total life cycle, where we’re providing feedback on models’ outputs or monitoring the model.”
Duhaime sees the work of humans and AI algorithms becoming increasingly integrated and believes Centaur Labs has a very important role to play in that future.
“It’s not only train algorithm, deploy algorithm,” Duhaime says. “As an alternative, there will probably be these digital assembly lines all throughout the economy, and you wish on-demand expert human judgment infused in other places along the worth chain.”