TACO Trash Detection & Segmentation
5-model benchmark on 1,500 trash images (4,784 annotations, 60 categories). RT-DETR-L best: mAP50=0.2778, Precision=0.4833. Faster R-CNN loss converges 0.76→0.11. YOLOv8n/s/l + RT-DETR + Faster R-CNN.
TACO: 1,500 images, 4,784 annotations, 60 categories
5-model benchmark: 3 YOLO variants + RT-DETR transformer + Faster R-CNN region-based
Waste detection and segmentation benchmark on TACO — one of the most challenging real-world trash datasets.
Dataset
- ▸1,500 images: 1,200 train / 300 val
- ▸4,784 bounding box annotations across 60 waste categories
- ▸Categories: plastics (bottles, bags, wrappers), metals (cans, foil), organics, hazardous, glass, cardboard
- ▸COCO JSON format
5-Model Comparison
| Model | mAP50 | mAP50-95 | Precision | Recall |
|---|---|---|---|---|
| YOLOv8n | 0.123 | 0.097 | 0.457 | 0.137 |
| YOLOv8s | 0.167 | 0.139 | 0.355 | 0.174 |
| YOLOv8l | 0.196 | 0.162 | 0.330 | 0.232 |
| RT-DETR-L | 0.278 | 0.233 | 0.483 | 0.313 |
| Faster R-CNN | (loss 0.11) | — | — | — |
Why Low mAP? 60 classes × ~20 images/class average. Intra-class variation is extreme (crumpled vs intact bottles). RT-DETR's transformer attention better handles irregular trash shapes.
Faster R-CNN Training SGD + StepLR, 15 epochs: loss 0.7608 → 0.1141 (85% reduction). ResNet50-FPN v2 backbone for multi-scale features.
RT-DETR-L Advantage Transformer-based end-to-end detector — no anchor boxes, no NMS. Handles overlapping objects and irregular shapes that confuse YOLO's anchor-based approach.