Questions about bf16 support and correctness check #12

xiabingquan · 2024-09-11T03:56:57Z

Dear Alex.
Thanks for this great repo. The flash attention community really needs this feature.
I'm trying to integrate this repo in my own project, but encounter two issues:

torch.bfloat16 is not supported. It seems that this repo only support torch.fp32, torch.fp16 and fp8. I'm wondering how should I modify the code to support bfloat16.
How to ensure the results are correct. The benchmark scripts only provides speed comparsions, but not numerical check with the ground truth implementation. How can I make sure that the forward and backward results are correct?

Thanks for your reply!

xiabingquan · 2024-09-11T04:00:10Z

Also, any detailed tutorial on how to implement flash attention with Triton is appreciated!

Tri Dao and Triton tutorials offer their implementations, but reading them from scratch will be a tough job

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Questions about bf16 support and correctness check #12

Questions about bf16 support and correctness check #12

xiabingquan commented Sep 11, 2024

xiabingquan commented Sep 11, 2024

Questions about bf16 support and correctness check #12

Questions about bf16 support and correctness check #12

Comments

xiabingquan commented Sep 11, 2024

xiabingquan commented Sep 11, 2024