July 2024

Compress it! 8bit Post-Training Model Quantization

July 07. 2024 • Category: Framework

bef01632-4ae9-4c74-9b7d-99a9cf5a3345_1920x1080

This week, I want to share with you a few notes about the 8bit quantization technique of a PyTorch model using Neural Network Compression Framework. The final goal is to squeeze the model to obtain excellent performances for a local inference on your device, without spending money on expensive new hardware or cloud API providers.

To achieve this goal we are going to navigate several different steps starting from the download of a PyTorch fine-tuned model, pre-trained on MRPC (the Microsoft Research Paraphrase Corpus).

Continue Reading >>>

Tags: Natural Language Processing, Natural Language Understanding, Natural Language Generation

Bare-Metal AI

Generative AI on prem: secure, ethical, and accessible.

Compress it! 8bit Post-Training Model Quantization