Skip to content

Releases: intel/auto-round

v0.4.7

01 Apr 09:50
Compare
Choose a tag to compare

Highlights

Support W4AFP8 for HPU. Please refer to Intel Neural Compressor for guidance on running these models. by @yiliu30 in #467

Support packing immediately in new quantization api to save ram usage by @wenhuach21 in #466

20x for awq and 4x for gptq packing speedup on cuda by @wenhuach21 in #459

Support auto-round-light to speed up the tuning process @WeiweiZhang1 in #454

Fix critic bug of mxfp4 in tuningby @wenhuach21 in #451

What's Changed

Full Changelog: v0.4.6...v0.4.7

v0.4.6

24 Feb 09:23
Compare
Choose a tag to compare

Highlights:

1 set torch compile to false by default in #447
2 Fix packing hang and force to fp16 at exporting in #430
3 align auto_quantizer with Transformers 4.49 in #437

What's Changed

Full Changelog: v0.4.5...v0.4.6

v0.4.5

27 Jan 12:12
Compare
Choose a tag to compare

Highlights:
We have enhanced support for extremely large models with the following updates:

Multi-Card Tuning Support: Added basic support for multi-GPU tuning. #415 support naive multi-card tuning

Accelerated Packing Stage: Improved the packing speed (2X-4X)for AutoGPTQ and AutoAWQ formats by leveraging cuda. #407 speedup packing stage for autogptq and autoawq forma

Deepseek V3 GGUF Export: Introduced support for exporting models to the Deepseek V3 GGUF format. #416 support to export deepseek v3 gguf format

What's Changed

Full Changelog: v0.4.4...v0.4.5

v0.4.4 release

10 Jan 01:47
86767b0
Compare
Choose a tag to compare

Highlights:
1 Fix install issue in #387
2 support to export gguf q4_0 and q4_1 format in #393
3 fix llm cmd line seqlen issue in #399

What's Changed

Full Changelog: v0.4.3...v0.4.4

v0.4.3: bug fix release

16 Dec 03:24
3323371
Compare
Choose a tag to compare

Highlights:
fix incorrect device setting in autoround format inference by @WeiweiZhang1 in #383
remove the dependency on AutoGPTQ by @XuehaoSun in #380

What's Changed

Full Changelog: v0.4.2...v0.4.3

v0.4.2: bug fix release

09 Dec 09:44
Compare
Choose a tag to compare

Highlights

1 Fix autoawq exporting issue
2 remove bias exporting if possible in autogptq format

What's Changed

Full Changelog: v0.4.1...v0.4.2

v0.4.1: bug fix release

27 Nov 09:53
Compare
Choose a tag to compare

Highlights:

  • Fixed vllm calibration infinite loop issue
  • Corrected the default value for the sym argument in the API configuration.

What's Changed

Full Changelog: v0.4...v0.4.1

v0.4

22 Nov 13:32
Compare
Choose a tag to compare

Highlights

[Experimental Feature] We provide API support for VLM models
[Kernel] We add ipex support for intel cpu
[Bug fix] We fix tuning bug for glm4 model
[Enhancement] better align gradient_accumulate_steps behavior for varied length input

What's Changed

Full Changelog: v0.3.1...v0.4

Intel® auto-round v0.3.1 Release

21 Oct 04:12
Compare
Choose a tag to compare

Release Highlights:

New Features:

Full-Range Symmetric Quantization: We’ve introduced full-range symmetric quantization, which often matches or even exceeds the performance of asymmetric quantization, especially at lower bit widths, such as 2.

Command-Line Support: You can now quantize models using the command auto-round --model xxx --format xxx

Default Exporting Format Change: The default format has been updated to auto_round instead of auto_gptq.

Muiti-thread packing: up to 2X speed up on packing phase

Bug Fixes:

Resolved Missing Cached Position Embeddings: Fixed an issue with missing cached position embeddings in Transformer version 4.45.2.

Mutable Default Values Issue: Addressed problems related to mutable default values.

3 bit packing bug for AutoGPTQ format

What's Changed

New Contributors

Full Changelog: v0.3...v0.3.1

Intel® auto-round v0.3 Release

14 Aug 11:33
Compare
Choose a tag to compare
  • Highlights:

    • Broader Device Support:
      • Expanded support for CPU, HPU, and CUDA inference in the AutoRound format, resolving the 2-bit accuracy issue.
    • New Recipes and Model Releases:
    • Experimental Features:
      • Introduced several experimental features, including activation quantization and mx_fp, with promising outcomes with AutoRound.
    • Multimodal Model Support:
      • Extended capabilities for tuning and inference across several multimodal models.

    Lowlights:

    • Implemented support for low_cpu_mem_usage, auto_awq format, calibration dataset concatenation, and calibration datasets with chat templates.