Home Community Researchers teach an AI to write down higher chart captions

Researchers teach an AI to write down higher chart captions

0
Researchers teach an AI to write down higher chart captions

Chart captions that specify complex trends and patterns are essential for improving a reader’s ability to grasp and retain the information being presented. And for individuals with visual disabilities, the data in a caption often provides their only technique of understanding the chart.

But writing effective, detailed captions is a labor-intensive process. While autocaptioning techniques can alleviate this burden, they often struggle to explain cognitive features that provide additional context.

To assist people writer high-quality chart captions, MIT researchers have developed a dataset to enhance automatic captioning systems. Using this tool, researchers could teach a machine-learning model to differ the extent of complexity and variety of content included in a chart caption based on the needs of users.

The MIT researchers found that machine-learning models trained for autocaptioning with their dataset consistently generated captions that were precise, semantically wealthy, and described data trends and sophisticated patterns. Quantitative and qualitative analyses revealed that their models captioned charts more effectively than other autocaptioning systems.  

The team’s goal is to offer the dataset, called VisText, as a tool researchers can use as they work on the thorny problem of chart autocaptioning. These automatic systems could help provide captions for uncaptioned online charts and improve accessibility for individuals with visual disabilities, says co-lead writer Angie Boggust, a graduate student in electrical engineering and computer science at MIT and member of the Visualization Group within the Computer Science and Artificial Intelligence Laboratory (CSAIL).

“We’ve tried to embed lots of human values into our dataset in order that once we and other researchers are constructing automatic chart-captioning systems, we don’t find yourself with models that aren’t what people want or need,” she says.

Boggust is joined on the paper by co-lead writer and fellow graduate student Benny J. Tang and senior writer Arvind Satyanarayan, associate professor of computer science at MIT who leads the Visualization Group in CSAIL. The research will probably be presented on the Annual Meeting of the Association for Computational Linguistics.

Human-centered evaluation

The researchers were inspired to develop VisText from prior work within the Visualization Group that explored what makes an excellent chart caption. In that study, researchers found that sighted users and blind or low-vision users had different preferences for the complexity of semantic content in a caption. 

The group desired to bring that human-centered evaluation into autocaptioning research. To try this, they developed VisText, a dataset of charts and associated captions that may very well be used to coach machine-learning models to generate accurate, semantically wealthy, customizable captions.

Developing effective autocaptioning systems isn’t any easy task. Existing machine-learning methods often attempt to caption charts the way in which they’d a picture, but people and models interpret natural images otherwise from how we read charts. Other techniques skip the visual content entirely and caption a chart using its underlying data table. Nonetheless, such data tables are sometimes not available after charts are published.

Given the shortfalls of using images and data tables, VisText also represents charts as scene graphs. Scene graphs, which might be extracted from a chart image, contain all of the chart data but additionally include additional image context.

“A scene graph is like the very best of each worlds — it comprises just about all the data present in a picture while being easier to extract from images than data tables. Because it’s also text, we will leverage advances in modern large language models for captioning,” Tang explains.

They compiled a dataset that comprises greater than 12,000 charts — each represented as an information table, image, and scene graph — in addition to associated captions. Each chart has two separate captions: a low-level caption that describes the chart’s construction (like its axis ranges) and a higher-level caption that describes statistics, relationships in the information, and sophisticated trends.

The researchers generated low-level captions using an automatic system and crowdsourced higher-level captions from human staff.

“Our captions were informed by two key pieces of prior research: existing guidelines on accessible descriptions of visual media and a conceptual model from our group for categorizing semantic content. This ensured that our captions featured essential low-level chart elements like axes, scales, and units for readers with visual disabilities, while retaining human variability in how captions might be written,” says Tang.

Translating charts

Once that they had gathered chart images and captions, the researchers used VisText to coach five machine-learning models for autocaptioning. They desired to see how each representation — image, data table, and scene graph — and mixtures of the representations affected the standard of the caption.

“You’ll be able to take into consideration a chart captioning model like a model for language translation. But as a substitute of claiming, translate this German text to English, we’re saying translate this ‘chart language’ to English,” Boggust says.

Their results showed that models trained with scene graphs performed as well or higher than those trained using data tables. Since scene graphs are easier to extract from existing charts, the researchers argue that they is perhaps a more useful representation.

Additionally they trained models with low-level and high-level captions individually. This system, generally known as semantic prefix tuning, enabled them to show the model to differ the complexity of the caption’s content.

As well as, they conducted a qualitative examination of captions produced by their best-performing method and categorized six varieties of common errors. For example, a directional error occurs if a model says a trend is decreasing when it is definitely increasing.

This fine-grained, robust qualitative evaluation was essential for understanding how the model was making its errors. For instance, using quantitative methods, a directional error might incur the identical penalty as a repetition error, where the model repeats the identical word or phrase. But a directional error may very well be more misleading to a user than a repetition error. The qualitative evaluation helped them understand these kind of subtleties, Boggust says.

These kinds of errors also expose limitations of current models and lift ethical considerations that researchers must consider as they work to develop autocaptioning systems, she adds.

Generative machine-learning models, akin to people who power ChatGPT, have been shown to hallucinate or give misinformation that might be misleading. While there’s a transparent profit to using these models for autocaptioning existing charts, it may lead to the spread of misinformation if charts are captioned incorrectly.

“Perhaps which means we don’t just caption every little thing in sight with AI. As a substitute, perhaps we offer these autocaptioning systems as authorship tools for people to edit. It is necessary to take into consideration these ethical implications throughout the research process, not only at the top when we have now a model to deploy,” she says.

Boggust, Tang, and their colleagues need to proceed optimizing the models to cut back some common errors. Additionally they need to expand the VisText dataset to incorporate more charts, and more complex charts, akin to those with stacked bars or multiple lines. And they’d also like to achieve insights into what these autocaptioning models are literally learning about chart data.

This research was supported, partly, by a Google Research Scholar Award, the National Science Foundation, the MLA@CSAIL Initiative, and the USA Air Force Research Laboratory.

LEAVE A REPLY

Please enter your comment!
Please enter your name here