Computer Vision

TACO Trash Detection & Segmentation

5-model benchmark on 1,500 trash images (4,784 annotations, 60 categories). RT-DETR-L best: mAP50=0.2778, Precision=0.4833. Faster R-CNN loss converges 0.76→0.11. YOLOv8n/s/l + RT-DETR + Faster R-CNN.

View on Kaggle

0.2778

RT-DETR-L mAP50

0.4833

RT-DETR-L Precision

0.196

YOLOv8l mAP50

0.76 → 0.11

Faster R-CNN loss

Dataset

TACO: 1,500 images, 4,784 annotations, 60 categories

Approach

5-model benchmark: 3 YOLO variants + RT-DETR transformer + Faster R-CNN region-based

Tech Stack

PythonYOLOv8 (Ultralytics)RT-DETRFaster R-CNNResNet50-FPN

Keywords

RT-DETRYOLOv8Faster R-CNNTACOEnvironmental AI60-class

Visualizations6 Charts

Deep Dive

Waste detection and segmentation benchmark on TACO — one of the most challenging real-world trash datasets.

Dataset

▸1,500 images: 1,200 train / 300 val
▸4,784 bounding box annotations across 60 waste categories
▸Categories: plastics (bottles, bags, wrappers), metals (cans, foil), organics, hazardous, glass, cardboard
▸COCO JSON format

5-Model Comparison

Model	mAP50	mAP50-95	Precision	Recall
YOLOv8n	0.123	0.097	0.457	0.137
YOLOv8s	0.167	0.139	0.355	0.174
YOLOv8l	0.196	0.162	0.330	0.232
RT-DETR-L	0.278	0.233	0.483	0.313
Faster R-CNN	(loss 0.11)	—	—	—

Why Low mAP? 60 classes × ~20 images/class average. Intra-class variation is extreme (crumpled vs intact bottles). RT-DETR's transformer attention better handles irregular trash shapes.

Faster R-CNN Training SGD + StepLR, 15 epochs: loss 0.7608 → 0.1141 (85% reduction). ResNet50-FPN v2 backbone for multi-scale features.

RT-DETR-L Advantage Transformer-based end-to-end detector — no anchor boxes, no NMS. Handles overlapping objects and irregular shapes that confuse YOLO's anchor-based approach.

Back to Projects Hire Me