Home Community Meet Surya: A Multilingual Text Line Detection AI Model for Documents

Meet Surya: A Multilingual Text Line Detection AI Model for Documents

0
Meet Surya: A Multilingual Text Line Detection AI Model for Documents

In a recent tweet from the founding father of Dataquest.io, Vik Paruchuri recently publicized the launch of a multilingual document OCR toolkit, Surya. The framework can efficiently detect line-level bboxes and column breaks in documents, scanned images, or presentations. The prevailing text detection models like Tesseract work on the word or character level, while this open-source AI works at the road level. The largest challenge in constructing a text-line detection model is the unavailability of 100% correct datasets with line-level annotations. 

Surya is an encoder-decoder model using a picture of the document as input and produces a picture with boxes drawn across the line boxes on the unique input image. The initial layers of the decoder contain SegFormer, a transformer for semantic segmentation, while the 2nd convolutional layer with batch-normalization layers makes the top of the decoder network. Before using the image or PDF, the pages are split into segments to the utmost dimension of the image and undergo various pre-processing. 

For model evaluation for the accuracy of bboxes, researchers used precision and recall on the coverage area as an alternative of the standard IoU metric (Intersection over union). The precision calculates how well predicted bboxes cover ground truth bboxes and recall calculates how well ground truth bboxes cover predicted bboxes. Surya is compared with Tesseract, experiments suggested that the precision of Surya is way higher than that of Tesseract, and Tesseract’s recall is barely greater than that of Surya but overall Surya outperforms Tesseract. One other advantage of Surya over the Tesseract model is that it could actually work each on CPU and GPU and is way faster than Tesseract.

Surya, named after the Hindu God of the Sun, has successfully worked on multiple languages and is anticipated to work on just about all languages. The limitation of this model shouldn’t be more likely to work on photos or other images because it is specialized on documents. Experiments also show it doesn’t work well with images that appear to be ads. Despite this limitation, the model remains to be of great use and could be further expanded to text detection, table, and chart detection.


Pragati Jhunjhunwala is a consulting intern at MarktechPost. She is currently pursuing her B.Tech from the Indian Institute of Technology(IIT), Kharagpur. She is a tech enthusiast and has a keen interest within the scope of software and data science applications. She is at all times reading in regards to the developments in several field of AI and ML.


🐝 Join the Fastest Growing AI Research Newsletter Read by Researchers from Google + NVIDIA + Meta + Stanford + MIT + Microsoft and lots of others…

Previous articleAnthropic AI Experiment Reveals Trained LLMs Harbor Malicious Intent, Defying Safety Measures

LEAVE A REPLY

Please enter your comment!
Please enter your name here