
We’ve taught our AI to classify an image (e.g., “This is a cat”). Now let’s teach it to find the cat.
Object Detection is a computer vision task that identifies what is in an image and where it is by drawing a “bounding box” around it.
Step 1: Installation
You’ll need Pillow to handle images and timm.
pip install transformers torch pillow timm
Step 2: The Code
We’ll use the object-detection pipeline with DETR, a popular model from Facebook AI.
from transformers import pipeline
from PIL import Image
import requests # To get an image from the web
# 1. Load the pipeline
# This will download a DETR model
detector = pipeline("object-detection")
# 2. Get an image
# Let's use a sample image URL
url = "http://images.cocodataset.org/val2017/000000039769.jpg"
img = Image.open(requests.get(url, stream=True).raw)
# 3. Run the detector!
results = detector(img)
# 4. Print the results
print("--- Objects Found ---")
for obj in results:
print(f"Label: {obj['label']}")
print(f"Confidence: {obj['score']:.4f}")
print(f"Location: {obj['box']}")
print("-----")Step 3: The Result
The output will be a list of all objects the model found.
--- Objects Found ---
Label: remote
Confidence: 0.9982
Location: {'ymin': 74, 'xmin': 42, 'ymax': 118, 'xmax': 176}
-----
Label: cat
Confidence: 0.9960
Location: {'ymin': 19, 'xmin': 30, 'ymax': 375, 'xmax': 289}
-----
Label: cat
Confidence: 0.9952
Location: {'ymin': 12, 'xmin': 255, 'ymax': 375, 'xmax': 640}
-----It found the remote and both cats! You can use this to count items, track objects in videos, and more.
Key Takeaways
- The article teaches how to implement Hugging Face Object Detection to locate objects in images.
- Object Detection identifies what is in an image and where it is by using bounding boxes.
- Installation requires the
Pillowandtimmlibraries for image handling. - Use the
object-detectionpipeline with theDETRmodel from Facebook AI to find objects. - The output provides a list of detected items, which can be used for counting and tracking in videos.





