Home Artificial Intelligence Automatic Labeling With GroundingDino Introduction GroundingDino Code Implementation Concluding remarks Thanks for reading! References

Automatic Labeling With GroundingDino Introduction GroundingDino Code Implementation Concluding remarks Thanks for reading! References

Automatic Labeling With GroundingDino
Code Implementation
Concluding remarks
Thanks for reading!

Prompt Engineering

The GroundingDino model encodes text prompts right into a learned latent space. Altering the prompts can result in different text features, which might affect the performance of the detector. To reinforce prediction performance, it’s advisable to experiment with multiple prompts, selecting the one which delivers the very best results. It’s necessary to notice that while writing this text I needed to try several prompts before finding the best one, sometimes encountering unexpected results.

Getting Began

To start, we’ll clone the GroundingDino repository from GitHub, arrange the environment by installing the obligatory dependencies, and download the pre-trained model weights.

# Clone:
!git clone https://github.com/IDEA-Research/GroundingDINO.git

# Install
%cd GroundingDINO/
!pip install -r requirements.txt
!pip install -q -e .

# Get weights
!wget -q https://github.com/IDEA-Research/GroundingDINO/releases/download/v0.1.0-alpha/groundingdino_swint_ogc.pth

Inference on a picture

We’ll start our exploration of the item detection algorithm by applying it to a single image of tomatoes. Our initial goal is to detect all of the tomatoes within the image, so we’ll use the text prompt tomato. If you would like to use different category names, you may separate them with a dot .. Note that the colours of the bounding boxes are random and haven’t any particular meaning.

python3 demo/inference_on_a_image.py 
--config_file 'groundingdino/config/GroundingDINO_SwinT_OGC.py'
--checkpoint_path 'groundingdino_swint_ogc.pth'
--image_path 'tomatoes_dataset/tomatoes1.jpg'
--text_prompt 'tomato'
--box_threshold 0.35
--text_threshold 0.01
--output_dir 'outputs'
Annotations with the ‘tomato’ prompt. Image by Markus Spiske.

GroundingDino not only detects objects as categories, corresponding to tomato, but additionally comprehends the input text, a task often known as Referring Expression Comprehension (REC). Let’s change the text prompt from tomato to ripened tomato, and acquire the final result:

python3 demo/inference_on_a_image.py 
--config_file 'groundingdino/config/GroundingDINO_SwinT_OGC.py'
--checkpoint_path 'groundingdino_swint_ogc.pth'
--image_path 'tomatoes_dataset/tomatoes1.jpg'
--text_prompt 'ripened tomato'
--box_threshold 0.35
--text_threshold 0.01
--output_dir 'outputs'
Annotations with the ‘ripened tomato’ prompt. Image by Markus Spiske.

Remarkably, the model can ‘understand’ the text and differentiate between a ‘tomato’ and a ‘ripened tomato’. It even tags partially ripened tomatoes that aren’t fully red. If our task requires tagging only fully ripened red tomatoes, we will adjust the box_threshold from the default 0.35 to 0.5.

python3 demo/inference_on_a_image.py 
--config_file 'groundingdino/config/GroundingDINO_SwinT_OGC.py'
--checkpoint_path 'groundingdino_swint_ogc.pth'
--image_path 'tomatoes_dataset/tomatoes1.jpg'
--text_prompt 'ripened tomato'
--box_threshold 0.5
--text_threshold 0.01
--output_dir 'outputs'
Annotations with the ‘ripened tomato’ prompt, with box_threshold = 0.5. Image by Markus Spiske.

Generation of tagged dataset

Though GroundingDino has remarkable capabilities, it’s a big and slow model. If real-time object detection is required, think about using a faster model like YOLO. Training YOLO and similar models require a variety of tagged data, which could be expensive and time-consuming to supply. Nevertheless, in case your data isn’t unique, you need to use GroundingDino to tag it. To learn more about efficient YOLO training, seek advice from my previous article [4].

The GroundingDino repository features a script to annotate image datasets within the COCO format, which is suitable for YOLOx, as an example.

from demo.create_coco_dataset import foremost

foremost(image_directory= 'tomatoes_dataset',
text_prompt= 'tomato',
box_threshold= 0.35,
text_threshold = 0.01,
export_dataset = True,
view_dataset = False,
export_annotated_images = True,
weights_path = 'groundingdino_swint_ogc.pth',
config_path = 'groundingdino/config/GroundingDINO_SwinT_OGC.py',
subsample = None

  • export_dataset — If set to True, the COCO format annotations might be saved in a directory named ‘coco_dataset’.
  • view_dataset — If set to True, the annotated dataset might be displayed for visualization within the FiftyOne app.
  • export_annotated_images — If set to True, the annotated images might be stored in a directory named ‘images_with_bounding_boxes’.
  • subsample (int) — If specified, only this variety of images from the dataset might be annotated.

Different YOLO algorithms require different annotation formats. For those who’re planning to coach YOLOv5 or YOLOv8, you’ll have to export your dataset within the YOLOv5 format. Although the export type is hard-coded within the foremost script, you may easily change it by adjusting the dataset_type argument in create_coco_dataset.foremost, from fo.types.COCODetectionDataset to fo.types.YOLOv5Dataset(line 72). To maintain things organized, we’ll also change the output directory name from ‘coco_dataset’ to ‘yolov5_dataset’. After changing the script, run create_coco_dataset.foremost again.

  if export_dataset:

GroundingDino offers a major leap in object detection annotations through the use of text prompts. On this tutorial, we now have explored the best way to use the model for automated labeling of a picture or a complete dataset. It’s crucial, nonetheless, to manually review and confirm these annotations before they’re utilized in training subsequent models.


A user-friendly Jupyter notebook containing the whole code is included in your convenience:

Wish to learn more?

[1] Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection, 2023.

[2] Dino: Detr with improved denoising anchor boxes for end-to-end object detection, 2022.

[3] An Open and Comprehensive Pipeline for Unified Object Grounding and Detection, 2023.

[4] The sensible guide for Object Detection with YOLOv5 algorithm, by Dr. Lihi Gur Arie.


Please enter your comment!
Please enter your name here