All Projects
Computer Vision

TACO Trash Detection & Segmentation

5-model benchmark on 1,500 trash images (4,784 annotations, 60 categories). RT-DETR-L best: mAP50=0.2778, Precision=0.4833. Faster R-CNN loss converges 0.76→0.11. YOLOv8n/s/l + RT-DETR + Faster R-CNN.

0.2778
RT-DETR-L mAP50
0.4833
RT-DETR-L Precision
0.196
YOLOv8l mAP50
0.76 → 0.11
Faster R-CNN loss
Dataset

TACO: 1,500 images, 4,784 annotations, 60 categories

Approach

5-model benchmark: 3 YOLO variants + RT-DETR transformer + Faster R-CNN region-based

Tech Stack
PythonYOLOv8 (Ultralytics)RT-DETRFaster R-CNNResNet50-FPN
Keywords
RT-DETRYOLOv8Faster R-CNNTACOEnvironmental AI60-class
Visualizations6 Charts
Deep Dive

Waste detection and segmentation benchmark on TACO — one of the most challenging real-world trash datasets.

Dataset

  • 1,500 images: 1,200 train / 300 val
  • 4,784 bounding box annotations across 60 waste categories
  • Categories: plastics (bottles, bags, wrappers), metals (cans, foil), organics, hazardous, glass, cardboard
  • COCO JSON format

5-Model Comparison

ModelmAP50mAP50-95PrecisionRecall
YOLOv8n0.1230.0970.4570.137
YOLOv8s0.1670.1390.3550.174
YOLOv8l0.1960.1620.3300.232
RT-DETR-L0.2780.2330.4830.313
Faster R-CNN(loss 0.11)

Why Low mAP? 60 classes × ~20 images/class average. Intra-class variation is extreme (crumpled vs intact bottles). RT-DETR's transformer attention better handles irregular trash shapes.

Faster R-CNN Training SGD + StepLR, 15 epochs: loss 0.7608 → 0.1141 (85% reduction). ResNet50-FPN v2 backbone for multi-scale features.

RT-DETR-L Advantage Transformer-based end-to-end detector — no anchor boxes, no NMS. Handles overlapping objects and irregular shapes that confuse YOLO's anchor-based approach.