Releases · intel/auto-round

01 Apr 09:50

v0.4.7

2d904a4

v0.4.7 Latest

Latest

Highlights

Support W4AFP8 for HPU. Please refer to Intel Neural Compressor for guidance on running these models. by @yiliu30 in #467

Support packing immediately in new quantization api to save ram usage by @wenhuach21 in #466

20x for awq and 4x for gptq packing speedup on cuda by @wenhuach21 in #459

Support auto-round-light to speed up the tuning process @WeiweiZhang1 in #454

Fix critic bug of mxfp4 in tuningby @wenhuach21 in #451

What's Changed

step-1 support naive double quant in tuning by @wenhuach21 in #442
fix critic bug of mxfp4 by @wenhuach21 in #451
update readme by @wenhuach21 in #455
update eval by @n1ck-guo in #450
awq exporting bugfix by @WeiweiZhang1 in #456
Support force loading into autoround Format by @WeiweiZhang1 in #453
20x for awq and 4x for gptq packing speedup by @wenhuach21 in #459
fixl eval bug by @n1ck-guo in #461
[STEP-1]W4Afp8 export by @wenhuach21 in #378
[HPU] Update W4A8 for HPU by @yiliu30 in #467
support for gemma3 by @n1ck-guo in #468
upload_auto-round-light results by @WeiweiZhang1 in #454
GGUF support step2: add naive Q2_KS and Q4_KS by @n1ck-guo in #448
fix incorrect recipe data by @WeiweiZhang1 in #471
support for mistral3 by @n1ck-guo in #472
support to export gemma3 gguf format by @n1ck-guo in #470
Increase unit test timeout from 120 to 240 minutes by @XuehaoSun in #474
support packing immediately in new quantization api to save ram usage by @wenhuach21 in #466
rm redundant line break by @WeiweiZhang1 in #475
Temporarily close qxk api for new release by @n1ck-guo in #478
add restrict for exporting act-quant models by @n1ck-guo in #480

Full Changelog: v0.4.6...v0.4.7

Contributors

yiliu30, wenhuach21, and 3 other contributors

Assets 2

24 Feb 09:23

wenhuach21

v0.4.6

1320752

v0.4.6

Highlights:

1 set torch compile to false by default in #447
2 Fix packing hang and force to fp16 at exporting in #430
3 align auto_quantizer with Transformers 4.49 in #437

What's Changed

Fix packing hang, torch compile and force to fp16 at exporting by @wenhuach21 in #430
fix nblocks issues by @wenhuach21 in #432
rm gc collect in packing by @wenhuach21 in #438
align auto_quantizer with main branch in Transformers by @WeiweiZhang1 in #437
[HPU]Fix compile bug when quant layer by @yiliu30 in #441
remove tricky setting in mxfp4 by @wenhuach21 in #445
fix bug of evaluate user model by @n1ck-guo in #444
Refine funcs by @WeiweiZhang1 in #446
set torch compile to false by default by @WeiweiZhang1 in #447

Full Changelog: v0.4.5...v0.4.6

Contributors

yiliu30, wenhuach21, and 2 other contributors

Assets 2

27 Jan 12:12

wenhuach21

v0.4.5

e38a306

v0.4.5

Highlights:
We have enhanced support for extremely large models with the following updates:

Multi-Card Tuning Support: Added basic support for multi-GPU tuning. #415 support naive multi-card tuning

Accelerated Packing Stage: Improved the packing speed (2X-4X)for AutoGPTQ and AutoAWQ formats by leveraging cuda. #407 speedup packing stage for autogptq and autoawq forma

Deepseek V3 GGUF Export: Introduced support for exporting models to the Deepseek V3 GGUF format. #416 support to export deepseek v3 gguf format

What's Changed

update format readme by @wenhuach21 in #411
fix log bug and device "auto" bug by @n1ck-guo in #409
speedup packing stage for autogptq and autoawq format by @wenhuach21 in #407
support naive multi-card tuning by @wenhuach21 in #415
support bf16 inference for autoround format by @wenhuach21 in #420
enable backup pile dataset loading by @WeiweiZhang1 in #417
fix evaluation device bug, relate to issue 413 by @n1ck-guo in #419
support to export deepseek v3 gguf format by @n1ck-guo in #416
fix cuda UT torch_dtype by @WeiweiZhang1 in #423
fix eval trust_remote_code by @n1ck-guo in #424

Full Changelog: v0.4.4...v0.4.5

Contributors

wenhuach21, WeiweiZhang1, and n1ck-guo

Assets 2

10 Jan 01:47

wenhuach21

v0.4.4

86767b0

v0.4.4 release

Highlights:
1 Fix install issue in #387
2 support to export gguf q4_0 and q4_1 format in #393
3 fix llm cmd line seqlen issue in #399

What's Changed

fix a critic bug of static activation quantization by @wenhuach21 in #392
vlm 70B+ in single card by @n1ck-guo in #395
enhance calibration dataset and add awq pre quantization warning by @wenhuach21 in #396
support awq format for vlms by @WeiweiZhang1 in #398
[critic bug]fix llm example seqlen issue by @WeiweiZhang1 in #399
fix device auto issue by @wenhuach21 in #400
Fix auto-round install & bump into 0.4.4 by @XuehaoSun in #387
fix dtype converting issue by @wenhuach21 in #403
support for deepseek vl2 by @n1ck-guo in #401
llm_layer_config_bugfix by @WeiweiZhang1 in #406
support awq with qbits, only support sym by @wenhuach21 in #402
support to export gguf q4_0 and q4_1 format by @n1ck-guo in #393

Full Changelog: v0.4.3...v0.4.4

Contributors

wenhuach21, XuehaoSun, and 2 other contributors

Assets 2

16 Dec 03:24

wenhuach21

v0.4.3

3323371

v0.4.3: bug fix release

Highlights:
fix incorrect device setting in autoround format inference by @WeiweiZhang1 in #383
remove the dependency on AutoGPTQ by @XuehaoSun in #380

What's Changed

support_llava_hf_vlm_example by @WeiweiZhang1 in #381
fix block_name_to_quantize by @WeiweiZhang1 in #382
fix incorrect device setting in autoround format inference by @WeiweiZhang1 in #383
refine homepage, update model links by @WeiweiZhang1 in #385
update eval basic usage by @n1ck-guo in #384
refine error msg and dump more log in the tuning by @wenhuach21 in #386
remove the dependency on AutoGPTQ for CPU and bump to V0.4.3 by @XuehaoSun in #380

Full Changelog: v0.4.2...v0.4.3

Contributors

wenhuach21, XuehaoSun, and 2 other contributors

Assets 2

09 Dec 09:44

wenhuach21

v0.4.2

9249b14

v0.4.2: bug fix release

Highlights

1 Fix autoawq exporting issue
2 remove bias exporting if possible in autogptq format

What's Changed

bump version into v0.4.1 by @XuehaoSun in #350
Update docker user and remove baseline UT by @XuehaoSun in #347
delete llm example and refine readme by @wenhuach21 in #354
Simulated W4Afp8 Quantization by @wenhuach21 in #331
add QWQ-32B, VLM, Qwen2.5, Llama3.1 int4 models by @wenhuach21 in #356
fix awq exporting by @wenhuach21 in #358
Tensor reshape bugfix by @WeiweiZhang1 in #364
fix awq backend and fp_layers issue by @wenhuach21 in #363
fix awq exporting bugs by @wenhuach21 in #365
fix bug of only_text_test check due to inference issue on cpu by @n1ck-guo in #362
add gpu test by @wenhuach21 in #367
using multicard when device set to "auto" by @n1ck-guo in #368
quant_block_names enhancement by @WeiweiZhang1 in #369
[HPU] Add lazy mode back by @yiliu30 in #371
remove bias exporting if possible in autogptq format by @wenhuach21 in #375
save processor automatically by @n1ck-guo in #372
Add gpu ut by @wenhuach21 in #370
fix gpu ut by @n1ck-guo in #376
fix typos by @wenhuach21 in #377

Full Changelog: v0.4.1...v0.4.2

Contributors

yiliu30, wenhuach21, and 3 other contributors

Assets 2

27 Nov 09:53

wenhuach21

v0.4.1

d562895

v0.4.1: bug fix release

Highlights:

Fixed vllm calibration infinite loop issue
Corrected the default value for the sym argument in the API configuration.

What's Changed

fix typo by @wenhuach21 in #342
vllm/llama-vision llava calibration infinite loop fix by @WeiweiZhang1 in #343
[HPU]Enhance numba check by @yiliu30 in #345
[VLM]fix bs and grad reset by @n1ck-guo in #344
[HPU]Enhance installation check by @yiliu30 in #346
[Critical Bug]API use sym as default by @wenhuach21 in #349
triton backend requires< 3.0 by @wenhuach21 in #348

Full Changelog: v0.4...v0.4.1

Contributors

yiliu30, wenhuach21, and 2 other contributors

Assets 2

22 Nov 13:32

wenhuach21

v0.4

f7913f9

v0.4

Highlights

[Experimental Feature] We provide API support for VLM models
[Kernel] We add ipex support for intel cpu
[Bug fix] We fix tuning bug for glm4 model
[Enhancement] better align gradient_accumulate_steps behavior for varied length input

What's Changed

refine AuoRound format and support marlin repacking by @wenhuach21 in #280
update readme for v0.3.1 release by @wenhuach21 in #283
update readme for cpu inference by @wenhuach21 in #284
avoid deterministic algorithm warning in inference by @wenhuach21 in #285
fix mx_fp issues by @wenhuach21 in #286
update torch ao integration information by @wenhuach21 in #287
Refine code by @wenhuach21 in #291
Add ipex support for intel cpu by @wenhuach21 in #292
fix ipex tqdm mismatch issue by @wenhuach21 in #293
fix bug of backend by @wenhuach21 in #294
[Experimental Feature]support for common hf multimodel by @n1ck-guo in #276
use torch.compile by default for PyTorch versions 2.6 and above by @wenhuach21 in #295
refine forward hook by @WeiweiZhang1 in #290
eval for MLLMs by @n1ck-guo in #296
mllm eval bug fix by @n1ck-guo in #297
Port Numba-based packing from INC by @yiliu30 in #301
refine model config file for mixed precision quantization by @wenhuach21 in #300
fix glm4-9b batch dim issue by @wenhuach21 in #304
better align gradient_accumulate_steps for varied length input by @wenhuach21 in #309
Enable torch.compile on HPU by @yiliu30 in #307
Update autogptq exporting by @wenhuach21 in #310
fix typo by @wenhuach21 in #311
qwen2 vision quantization bugfix by @WeiweiZhang1 in #313
multiple gpu evaluation/calibration refine by @wenhuach21 in #312
HPU only release binary by @yiliu30 in #302
patch 1 for mllm by @n1ck-guo in #298
add torch compile arg by @wenhuach21 in #314
fix merge error by @n1ck-guo in #316
Update the check for HPU by @yiliu30 in #318
fix eval device issue by @wenhuach21 in #319
fix multiple device bug by @wenhuach21 in #321
add warning for no gptq exllamav2 kernel by @wenhuach21 in #324
add pile calib, rename quant_block_list to to_quant_block_names by @WeiweiZhang1 in #322
fix autogptq version error by @wenhuach21 in #325
new mllm eval by @n1ck-guo in #317
Add cpu only version by @XuehaoSun in #315
set default mllm dataset by @n1ck-guo in #327
fix fp_layers issue and force to FP16 on cuda for autoround format inference by @wenhuach21 in #326
fix the bug of test model support for test-only by @n1ck-guo in #328
Increase unit test timeout to 120 minutes by @XuehaoSun in #330
fix mllm dataset config bug and add gptq cuda backend by @wenhuach21 in #329
add tips and tricks for llm&mllm quantization by @wenhuach21 in #333
fix eval_bs in fake format and reset auto-gptq exporting max_shard_size by @wenhuach21 in #332
fix model_dtype issue and reformat mllm code by @wenhuach21 in #335
Exclude markdown files from unit test pipelines by @XuehaoSun in #337
refine mllm docs by @WeiweiZhang1 in #336
cogvlm doc by @n1ck-guo in #339
add qwen2.5 recipe and refine readme by @WeiweiZhang1 in #338
add cogvlm recipe and refine readme by @WeiweiZhang1 in #340
refine mllm API and add help info by @n1ck-guo in #334

Full Changelog: v0.3.1...v0.4

Contributors

yiliu30, wenhuach21, and 3 other contributors

Assets 2

21 Oct 04:12

wenhuach21

v0.3.1

0c4319c

Intel® auto-round v0.3.1 Release

Release Highlights:

New Features:

Full-Range Symmetric Quantization: We’ve introduced full-range symmetric quantization, which often matches or even exceeds the performance of asymmetric quantization, especially at lower bit widths, such as 2.

Command-Line Support: You can now quantize models using the command auto-round --model xxx --format xxx

Default Exporting Format Change: The default format has been updated to auto_round instead of auto_gptq.

Muiti-thread packing: up to 2X speed up on packing phase

Bug Fixes:

Resolved Missing Cached Position Embeddings: Fixed an issue with missing cached position embeddings in Transformer version 4.45.2.

Mutable Default Values Issue: Addressed problems related to mutable default values.

3 bit packing bug for AutoGPTQ format

What's Changed

Add setseed in autoround by @WeiweiZhang1 in #201
support autoawq format by @yintong-lu in #115
Remove UT coverage check by @XuehaoSun in #202
set autoround format as default to unify CPU/HPU/CUDA by @wenhuach21 in #205
add local file of pile-10k by @WeiweiZhang1 in #198
modify setup.py by @n1ck-guo in #206
limit the scale minimum value not to 0 by @WeiweiZhang1 in #211
fix example dataset regression by @WeiweiZhang1 in #212
remove local pile file by @WeiweiZhang1 in #213
update xpu format exporting by @WeiweiZhang1 in #214
fix a bug in autoround format inference by @wenhuach21 in #215
avoid underflow and overflow for exllamav2 by @wenhuach21 in #218
add qwen int4 model, refine example by @WeiweiZhang1 in #217
[Experimental Feature]fast tuning norm/bias at 2 bits by @wenhuach21 in #208
update readme by @wenhuach21 in #220
refine eval_042 to enable parallelize evaluation by @WeiweiZhang1 in #221
Enable phi3v tuning by @WeiweiZhang1 in #197
Bump setuptools from 69.5.1 to 70.0.0 in /examples/multimodal-modeling/Phi-3-vision by @dependabot in #223
refine example by @WeiweiZhang1 in #224
change the scale thresh generally by @WeiweiZhang1 in #229
add quantized models by 3rd party by @WeiweiZhang1 in #230
add meta3.1-70B-instruct model, refine docs by @WeiweiZhang1 in #231
fix model link by @WeiweiZhang1 in #232
refine docs, add accuracy data, add receip and eval scripts by @WeiweiZhang1 in #226
add brief formats introduction by @wenhuach21 in #236
update readme and add itrex in the requirements.txt by @wenhuach21 in #238
add tritonv2, improve packing and pbar by @wenhuach21 in #239
refine the code and the speedup is notable by @wenhuach21 in #240
move some settings from example to main by @wenhuach21 in #241
add runable script for autoround by @n1ck-guo in #225
update readme by @n1ck-guo in #242
Add MANIFEST.in file to include requirements.txt by @XuehaoSun in #243
fix example bug by @n1ck-guo in #245
enable llava int4 inference with autoround format by @WeiweiZhang1 in #237
remove autoawq requirement at packing stage by @n1ck-guo in #249
remove unused log by @n1ck-guo in #252
support INC API by @WeiweiZhang1 in #255
avoid potential bug for auto-gptq 0.8 by @wenhuach21 in #250
fix example by @n1ck-guo in #256
fix preci by @n1ck-guo in #258
enable_qwen2-vl_quantization by @WeiweiZhang1 in #248
update eval and fix example by @n1ck-guo in #260
refine autoawq exporting code by @wenhuach21 in #261
better support quant_lm_head for larger models by @wenhuach21 in #263
Fix 3bit packing for auto-gptq format by @wenhuach21 in #264
Add a warning for improper export formats. by @wenhuach21 in #265
Update readme for VLM support and integration by @wenhuach21 in #266
remove g_idx in gptq format by @wenhuach21 in #267
keep the dtype after qdq by @wenhuach21 in #268
enable llama3.2-vision model quantization by @WeiweiZhang1 in #269
fix mutable default value by @wenhuach21 in #272
change to even rounding for mantissa of mx_fp by @wenhuach21 in #277
adamround bugfix, refine import by @WeiweiZhang1 in #275
[Important Change]set full range sym as the default by @wenhuach21 in #278
refine eval by @wenhuach21 in #282
qwen2_bugfix, add adamround vision UT by @WeiweiZhang1 in #281

New Contributors

@dependabot made their first contribution in #223

Full Changelog: v0.3...v0.3.1

Contributors

dependabot, wenhuach21, and 4 other contributors

Assets 2

14 Aug 11:33

wenhuach21

v0.3

4ac1104

Intel® auto-round v0.3 Release

Highlights:
- Broader Device Support:
  - Expanded support for CPU, HPU, and CUDA inference in the AutoRound format, resolving the 2-bit accuracy issue.
- New Recipes and Model Releases:
  - Published numerous recipes on the Low Bit Open LLM Leaderboard, showcasing impressive results on LLaMa 3.1 and other leading models.
- Experimental Features:
  - Introduced several experimental features, including activation quantization and mx_fp, with promising outcomes with AutoRound.
- Multimodal Model Support:
  - Extended capabilities for tuning and inference across several multimodal models.
Lowlights:
- Implemented support for low_cpu_mem_usage, auto_awq format, calibration dataset concatenation, and calibration datasets with chat templates.

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Highlights

What's Changed

Contributors

Highlights:

What's Changed

Contributors

What's Changed

Contributors

What's Changed

Contributors

What's Changed

Contributors

Highlights

What's Changed

Contributors

What's Changed

Contributors

Highlights

What's Changed

Contributors

What's Changed

New Contributors

Contributors

Releases: intel/auto-round

v0.4.7

Highlights

What's Changed

Contributors

v0.4.6

Highlights:

What's Changed

Contributors

v0.4.5

What's Changed

Contributors

v0.4.4 release

What's Changed

Contributors

v0.4.3: bug fix release

What's Changed

Contributors

v0.4.2: bug fix release

Highlights

What's Changed

Contributors

v0.4.1: bug fix release

What's Changed

Contributors

v0.4

Highlights

What's Changed

Contributors

Intel® auto-round v0.3.1 Release

What's Changed

New Contributors

Contributors

Intel® auto-round v0.3 Release