Quantization

Igniting 2025 with tons of INT4 Quantizations!

January 01. 2025 • Category: Announcement

As we just ignited 2025, and 2024 came to an end, I am proud to share that I have successfully uploaded over 230 quantized SLM/LLM models to my HuggingFace account. These models were entirely quantized using the computational resources of my homelab, achieving approximately 72 TFLOPS of performance-powered solely by "domestic" hardware.

Continue Reading >>>

Tags: Quantization, INT4, HuggingFace, SLM, LLM, Natural Language Processing, Natural Language Generation

Advanced Weight-only Quantization Technique on CPU

May 05. 2024 • Category: Framework

When LLMs started spreading at the end of 2022, it sounded something really impossible: training or even just fine-tune a model on your modest customer-grade hardware was fantasy.

Now, in the middle of 2024, thanks to an intensive work of scientific research, considerable investment, open governance, open collaboration, and a good dose of human ingenuity, we are now able to fine-tune models directly on our devices. Incredibile!
Continue Reading >>>

Tags: Natural Language Processing, Natural Language Understanding, Natural Language Generation, auto-round, Quantization

Bare-Metal AI

Generative AI on prem: secure, ethical, and accessible.

Igniting 2025 with tons of INT4 Quantizations!

Advanced Weight-only Quantization Technique on CPU