YOLO models are a very popular choice for many different machine learning applications, including (oriented) object detection, object segmentation, image classification and pose estimation. To demonstrate how it works and what your options are, we will show you how to use AIminify to compress a simple YOLOv5 neural network.
The YOLO implementation
By far the most popular library for implementing YOLO models is the Ultralytics library. This library provides a comprehensive code base for training, fine tuning, evaluation and benchmarking different versions of YOLO models. Next to that, it also provides pretrained models.
Implementing AIminify with Ultralytics required a few overrides of class methods, but more on that later in the post. To have a simple example that can easily be recreated, we chose to work with the small YOLOv5s model. We took a model that was pretrained on the COCO dataset (pretraining done by Ultralytics). After compression, we again finetuned it on the COCO dataset. We wrote our own PyTorch dataset
class, thus this example can easily be expanded to other use cases with proprietary datasets.
Baseline performance
GFLOPS | Parameters | Model size (MB) | mAP50-95 |
---|---|---|---|
0.675 | 0.538 | 31.0317 | 124.3 MB |
The baseline results are comparable to the benchmark results published by Ultralytics (here). The experiment was carried out on an A10G GPU. The training split of the COCO dataset contains 118,287 images. An image size of 640 x 640 was used for both testing and validation.
Code changes
A couple of code changes were made in order to leverage the combined power of Ultralytics and AIminify. Code is available upon request. The following code changes were made:
- A modification to the
v8DetectionLoss
class, to return only the loss. - A modification to the
torchvision.datasets.CocoDetection
class, in order to return the path to the file in the__getitem__
method and do our own data augmentation. - A modification to the
ultralytics.models.yolo.detect.val.DetectionValidator
class to handle a different directory structure and handle images with only one annotation.
Compressing the model
With these code changes, compressing a model is just a few lines of code!
compressed_model, _ = minify(
model,
compression_intensity,
training_generator=train_loader,
validation_generator=val_loader,
loss_function=loss_function,
quantization=False,
precision='mixed',
smart_pruning=True,
fine_tune=True,
)
AIminify offers a compression strength of 0 to 5. This compression strength currently only influences the pruning part of the algorithm. Compression strength 0 means we do zero pruning. Compression strength 1 to 5 are different settings of the pruning algorithm which respectively prune a little or a lot of the model. By adding training and validation generators and putting the `fine_tune` setting to `True`, AIminify will automatically fine-tune the model after pruning, which is needed to get it as close to the original accuracy as possible. In this example we used the option for mixed precision training as well, which basically speeds up the training time without influencing the final accuracy.
We compare the number of flops, number of parameters and model size for different compression settings. For performance we use the mAP50-95 metric. The mAP50-95 metric is the mean Average Precision across Intersection over Union (IoU) thresholds from 0.5 to 0.95, offering a more comprehensive assessment of an object detection model’s performance than a single IoU threshold.
Compression strength | GFLOPS | Parameters | Model size (MB) | mAP50-95 |
---|---|---|---|---|
1 | 22.3 | 8,648,691 (−5.40%) | 34.8 (−5.43%) | 0.406 (−1.46%) |
3 | 19.9 (−17.07%) | 7,873,802 (−13.88%) | 31.7 (−13.86%) | 0.394 (−4.37%) |
5 | 17.3 (−27.83%) | 7,037,404 (−23.03%) | 28.4 (−22.83%) | 0.365 (−11.41%) |
We also compare this to a smaller YOLO model, YOLOv5n.
Compression strength | GFLOPS | Parameters | Model size (MB) | mAP50-95 |
---|---|---|---|---|
0 | 7.3 | 2,649,200 (−71.02%) | 10.8 (−70.65%) | 0.332 (−19.42%) |
From these tables we can conclude multiple things:
- Pruning a model and reducing the number of parameters helps in reducing the number of FLOPS and model size.
- With finetuning performance loss is low, even for high pruning settings.
- A YOLOv5s model compressed at the highest compression strength still significantly outperforms the baseline YOLOv5n model, highlighting how AIminify’s advanced compression techniques can deliver highly efficient YOLO architectures with minimal performance trade-offs; perfect for real-world production where resource optimization is critical.

compression_intensity=5
. The model is still accurately able to predict most objects.Conclusion
In conclusion, these results demonstrate that AIminify effectively balances model size, speed, and accuracy, allowing YOLOv5s to retain high performance while significantly reducing its computational overhead. Through minor code adjustments and the powerful pruning and fine-tuning features, AIminify streamlines the compression process and cuts model FLOPS, parameters, and disk size without sacrificing much in mAP. Even at higher compression strengths, YOLOv5s still outperforms the smaller YOLOv5n baseline, highlighting the potential for resource savings in production environments. This makes AIminify a compelling solution for anyone seeking to optimize YOLO models for real-world applications where efficiency and scalability are crucial. Compressed models are shared on HuggingFace. See here. Code is available upon request by contacting us.