Compressing YOLO models with AIminify

YOLO models are a very popular choice for many different machine learning applications, including (oriented) object detection, object segmentation, image classification and pose estimation. To demonstrate how it works and what your options are, we will show you how to use AIminify to compress a simple YOLOv5 neural network.

The YOLO implementation

By far the most popular library for implementing YOLO models is the Ultralytics library. This library provides a comprehensive code base for training, fine tuning, evaluation and benchmarking different versions of YOLO models. Next to that, it also provides pretrained models.

Implementing AIminify with Ultralytics required a few overrides of class methods, but more on that later in the post. To have a simple example that can easily be recreated, we chose to work with the small YOLOv5s model. We took a model that was pretrained on the COCO dataset (pretraining done by Ultralytics). After compression, we again finetuned it on the COCO dataset. We wrote our own PyTorch dataset class, thus this example can easily be expanded to other use cases with proprietary datasets.

Baseline performance

GFLOPS	Parameters	Model size (MB)	mAP50-95
24.0	9,142,496	36.8	0.412

The baseline results are comparable to the benchmark results published by Ultralytics (here). The experiment was carried out on an A10G GPU. The training split of the COCO dataset contains 118,287 images. An image size of 640 x 640 was used for both testing and validation.

Code changes

A couple of code changes were made in order to leverage the combined power of Ultralytics and AIminify. Code is available upon request. The following code changes were made:

A modification to the v8DetectionLoss class, to return only the loss.
A modification to the torchvision.datasets.CocoDetection class, in order to return the path to the file in the __getitem__ method and do our own data augmentation.
A modification to the ultralytics.models.yolo.detect.val.DetectionValidator class to handle a different directory structure and handle images with only one annotation.

Compressing the model

With these code changes, compressing a model is just a few lines of code!

compressed_model, _ = minify(
            model,
            compression_intensity,
            training_generator=train_loader,
            validation_generator=val_loader,
            loss_function=loss_function,
            quantization=False,
            precision='mixed',
            smart_pruning=True,
            fine_tune=True,
        )

AIminify offers a compression strength of 0 to 5. This compression strength currently only influences the pruning part of the algorithm. Compression strength 0 means we do zero pruning. Compression strength 1 to 5 are different settings of the pruning algorithm which respectively prune a little or a lot of the model. By adding training and validation generators and putting the `fine_tune` setting to `True`, AIminify will automatically fine-tune the model after pruning, which is needed to get it as close to the original accuracy as possible. In this example we used the option for mixed precision training as well, which basically speeds up the training time without influencing the final accuracy.

We compare the number of flops, number of parameters and model size for different compression settings. For performance we use the mAP50-95 metric. The mAP50-95 metric is the mean Average Precision across Intersection over Union (IoU) thresholds from 0.5 to 0.95, offering a more comprehensive assessment of an object detection model’s performance than a single IoU threshold.

Compression strength	GFLOPS	Parameters	Model size (MB)	mAP50-95
1	22.3 (−9.06%)	8,648,691 (−5.40%)	34.8 (−5.43%)	0.406 (−1.46%)
3	19.9 (−17.07%)	7,873,802 (−13.88%)	31.7 (−13.86%)	0.394 (−4.37%)
5	17.3 (−27.83%)	7,037,404 (−23.03%)	28.4 (−22.83%)	0.365 (−11.41%)

We also compare this to a smaller YOLO model, YOLOv5n.

Compression strength	GFLOPS	Parameters	Model size (MB)	mAP50-95
0	7.3 (−67.81%)	2,649,200 (−71.02%)	10.8 (−70.65%)	0.332 (−19.42%)

From these tables we can conclude multiple things:

Pruning a model and reducing the number of parameters helps in reducing the number of FLOPS and model size.
With finetuning performance loss is low, even for high pruning settings.
A YOLOv5s model compressed at the highest compression strength still significantly outperforms the baseline YOLOv5n model, highlighting how AIminify’s advanced compression techniques can deliver highly efficient YOLO architectures with minimal performance trade-offs; perfect for real-world production where resource optimization is critical.

Examples of predictions of a model compressed with `compression_intensity=5`. The model is still accurately able to predict most objects.

Conclusion

In conclusion, these results demonstrate that AIminify effectively balances model size, speed, and accuracy, allowing YOLOv5s to retain high performance while significantly reducing its computational overhead. Through minor code adjustments and the powerful pruning and fine-tuning features, AIminify streamlines the compression process and cuts model FLOPS, parameters, and disk size without sacrificing much in mAP. Even at higher compression strengths, YOLOv5s still outperforms the smaller YOLOv5n baseline, highlighting the potential for resource savings in production environments. This makes AIminify a compelling solution for anyone seeking to optimize YOLO models for real-world applications where efficiency and scalability are crucial. Compressed models are shared on HuggingFace. See here. Code is available upon request by contacting us.

Compressing YOLO models with AIminify

The YOLO implementation

Baseline performance

Code changes

Compressing the model

Conclusion

Compressing YOLO26 with Aiminify

Compression of vision transformer models with AIminify

Optimising U-Net Models with AIminify

The YOLO implementation

Baseline performance

Code changes

Compressing the model

Conclusion

Optimising U-Net Models with AIminify

Compression of vision transformer models with AIminify

You may also like

Compressing YOLO26 with Aiminify

Compression of vision transformer models with AIminify

Optimising U-Net Models with AIminify