Zero-shot object detection and image segmentation with Grounding DINO
Grounding DINO backend integration
This integration will allow you to:
- Use text prompts for zero-shot detection of objects in images.
- Specify the detection of any object and get state-of-the-art results without any model fine tuning.
- Get segmentation predictions from SAM with just text prompts.
See here for more details about the pre-trained Grounding DINO model.
Quickstart
Make sure Docker is installed.
Edit
docker-compose.yml
to include the following:LABEL_STUDIO_HOST
sets the endpoint of the Label Studio host. Must begin withhttp://
LABEL_STUDIO_ACCESS_TOKEN
sets the API access token for the Label Studio host. This can be found by logging
into Label Studio and going to the Account & Settings page.
Example:
LABEL_STUDIO_HOST=http://123.456.7.8:8080
LABEL_STUDIO_ACCESS_TOKEN=c9djf998eii2948ee9hh835nferkj959923
Run
docker compose up
Check the IP of your backend using
docker ps
. You will use this URL when connecting the backend to a Label Studio project. Usually this ishttp://localhost:9090
.Create a project and edit the labeling config (an example is provided below). When editing the labeling config, make sure to add all rectangle labels under the
RectangleLabels
tag, and all corresponding brush labels under theBrushLabels
tag.
<View>
<Image name="image" value="$image"/>
<Style>
.lsf-main-content.lsf-requesting .prompt::before { content: ' loading...'; color: #808080; }
</Style>
<View className="prompt">
<TextArea name="prompt" toName="image" editable="true" rows="2" maxSubmissions="1" showSubmitButton="true"/>
</View>
<RectangleLabels name="label" toName="image">
<Label value="cats" background="yellow"/>
<Label value="house" background="blue"/>
</RectangleLabels>
<BrushLabels name="label2" toName="image">
<Label value="cats" background="yellow"/>
<Label value="house" background="blue"/>
</BrushLabels>
</View>
- From the Model page in the project settings, connect the model.
- Go to an image task in your project. Enable Auto-annotation (found at the bottom of the labeling interface). Then enter in the prompt box and press Add. After this, you should receive your predictions. See the video above for a demo.
Using GPU
For the best user experience, it is recommended to use a GPU. To do this, you can update the docker-compose.yml
file including the following lines:
environment:
- NVIDIA_VISIBLE_DEVICES=all
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: 1
capabilities: [gpu]
Using GroundingSAM
Combine the Segment Anything Model with your text input to automatically generate mask predictions!
To do this, set USE_SAM=true
before running.
Warning: Using GroundingSAM without a GPU may result in slow performance and is not recommended. If you must use a CPU-only machine, and experience slow performance or don’t see any predictions on the labeling screen, consider one of the following:
- Increase memory allocated to the Docker container (e.g.
memory: 16G
indocker-compose.yml
)- Increase the prediction timeout on Label Studio instance with the
ML_TIMEOUT_PREDICT=100
environment variable.- Use “MobileSAM” as a lightweight alternative to “SAM”.
If you want to use a more efficient version of SAM, set USE_MOBILE_SAM=true
.
Batching inputs
Note: This is an experimental feature.
Clone the Label Studio feature branch that includes the experimental batching functionality.
git clone -b feature/dino-support https://github.com/HumanSignal/label-studio.git
Run this branch with
docker compose up
Do steps 2-5 from the quickstart section, now using access code and host IP info of the newly cloned Label Studio branch. GroundingSAM is supported.
Go to the Data Manager in your project and select the tasks you would like to annotate.
Select Actions > Add Text Prompt for GroundingDINO.
Enter the prompt you would like to retrieve predictions for and click Submit.
Note: If your prompt is different from the label values you have assigned, you can use the underscore to give the correct label values to your prompt outputs. For example, if you wanted to select all brown cats but still give them the label value “cats” from your labeling config, your prompt would be “brown cat_cats”.
Other environment variables
Adjust BOX_THRESHOLD
and TEXT_THRESHOLD
values in the Dockerfile to a number between 0 to 1 if experimenting. Defaults are set in dino.py
. For more information about these values, click here.
If you want to use SAM models saved from either directories, you can use the MOBILESAM_CHECKPOINT
and SAM_CHECKPOINT
as shown in the Dockerfile.