Amid an enormous amount of hype around generative AI, a brand new study from researchers at MIT sheds light on the technology’s impact on work, finding that it increased productivity for employees assigned tasks like writing cover letters, delicate emails, and cost-benefit analyses.
The tasks within the study weren’t quite replicas of real work: They didn’t require precise factual accuracy or context about things like an organization’s goals or a customer’s preferences. Still, numerous the study’s participants said the assignments were much like things they’d written of their real jobs — and the advantages were substantial. Access to the assistive chatbot ChatGPT decreased the time it took employees to finish the tasks by 40 percent, and output quality, as measured by independent evaluators, rose by 18 percent.
The researchers hope the study, which appears today in open-access form within the journal , helps people understand the impact that AI tools like ChatGPT can have on the workforce.
“What we will say of course is generative AI goes to have a giant effect on white collar work,” says Shakked Noy, a PhD student in MIT’s Department of Economics, who co-authored the paper with fellow PhD student Whitney Zhang ’21. “I believe what our study shows is that this sort of technology has essential applications in white collar work. It’s a useful technology. However it’s still too early to inform if it would be good or bad, or how exactly it’s going to cause society to regulate.”
Simulating work for chatbots
For hundreds of years, people have anxious that recent technological advancements would result in mass automation and job loss. But recent technologies also create recent jobs, and after they increase employee productivity, they will have a net positive effect on the economy.
“Productivity is front of mind for economists when considering of latest technological developments,” Noy says. “The classical view in economics is that a very powerful thing that technological advancement does is raise productivity, within the sense of letting us produce economic output more efficiently.”
To check generative AI’s effect on employee productivity, the researchers gave 453 college-educated marketers, grant writers, consultants, data analysts, human resource professionals, and managers two writing tasks specific to their occupation. The 20- to 30-minute tasks included writing cover letters for grant applications, emails about organizational restructuring, and plans for analyses helping an organization determine which customers to send push notifications to based on given customer data. Experienced professionals in the identical occupations as each participant evaluated each submission as in the event that they were encountering it in a piece setting. Evaluators didn’t know which submissions were created with the assistance of ChatGPT.
Half of participants got access to the chatbot ChatGPT-3.5, developed by the corporate OpenAI, for the second task. Those users finished tasks 11 minutes faster than the control group, while their average quality evaluations increased by 18 percent.
The information also showed that performance inequality between employees decreased, meaning employees who received a lower grade in the primary task benefitted more from using ChatGPT for the second task.
The researchers say the tasks were broadly representative of assignments such professionals see of their real jobs, but they noted numerous limitations. Because they were using anonymous participants, the researchers couldn’t require contextual knowledge about a particular company or customer. Additionally they had to present explicit instructions for every task, whereas real-world tasks could also be more open-ended. Moreover, the researchers didn’t think it was feasible to rent fact-checkers to guage the accuracy of the outputs. Accuracy is a serious problem for today’s generative AI technologies.
The researchers said those limitations could lessen ChatGPT’s productivity-boosting potential in the actual world. Still, they imagine the outcomes show the technology’s promise — an idea supported by one other of the study’s findings: Staff exposed to ChatGPT in the course of the experiment were twice as more likely to report using it of their real job two weeks after the experiment.
“The experiment demonstrates that it does bring significant speed advantages, even when those speed advantages are lesser in the actual world because you could spend time fact-checking and writing the prompts,” Noy says.
Taking the macro view
The study offered a close-up have a look at the impact that tools like ChatGPT can have on certain writing tasks. But extrapolating that impact out to grasp generative AI’s effect on the economy is tougher. That’s what the researchers hope to work on next.
“There are such a lot of other aspects which are going to affect wages, employment, and shifts across sectors that might require pieces of evidence that aren’t in our paper,” Zhang says. “However the magnitude of time saved and quality increases are very large in our paper, so it does appear to be that is pretty revolutionary, not less than for certain kinds of work.”
Each researchers agree that, even when it’s accepted that ChatGPT will increase many employees’ productivity, much work stays to be done to work out how society should reply to generative AI’s proliferation.
“The policy needed to regulate to those technologies could be very different depending on what future research finds,” Zhang says. “If we expect this can boost wages for lower-paid employees, that’s a really different implication than if it’s going to extend wage inequality by boosting the wages of already high earners. I believe there’s a number of downstream economic and political effects which are essential to pin down.”
The study was supported by an Emergent Ventures grant, the Mercatus Center, George Mason University, a George and Obie Shultz Fund grant, the MIT Department of Economics, and a National Science Foundation Graduate Research Fellowship Grant.