-
-
Notifications
You must be signed in to change notification settings - Fork 723
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
core/crypto: More improvements #4124
Conversation
8450ddd
to
26fb0a1
Compare
b6f512d
to
363df33
Compare
4198575
to
f69bb60
Compare
7d37012
to
00e013b
Compare
6f62ca6
to
e8dc9bc
Compare
e8dc9bc
to
0dedf4a
Compare
st.rate = _RATE_128L | ||
|
||
for _ in 0 ..< 10 { | ||
update_hw_128l(st, iv, key) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For a bit of performance improvement: since the input is constant, and we don't need to compute a key stream here, the state can be kept in bitsliced form during the initialization rounds. Same for finalization.
In bitsliced form, the rotation of the AES blocks is just a one-bit shift of the bytes.
During AD absorption, you can also keep the state in bitsliced form, and only bitslice the input blocks.
Also for the sbox, there are faster circuits than the Boyar-Peralta: https://eprint.iacr.org/2019/802.pdf
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the delay in replying to this. I will leave adding those optimizations for a future pass, since my time has been somewhat limited as of late, and what is there is easy to review and gets adequate performance.
384c6c2
to
3a90ce7
Compare
585dd35
to
5d30bd5
Compare
567e5f8
to
8da0ae1
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
General question about all the contextless procedures and contextless panics/ensures, is that because it actually performs better, or some other reason?
@@ -235,7 +235,7 @@ gctr_hw :: proc( | |||
// BUG: Sticking this in gctr_hw (like the other implementations) crashes | |||
// the compiler. | |||
// | |||
// src/check_expr.cpp(7892): Assertion Failure: `c->curr_proc_decl->entity` | |||
// src/check_expr.cpp(8104): Assertion Failure: `c->curr_proc_decl->entity` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
prob good idea to make an issue with steps to reproduce so this can get fixed
To be honest it probably doesn't matter either way. A lot of the crypto code was contextless when I started working on it, and it is exceedingly rare that anything except panic/ensure needs the context. Going from "with context" to "without context" is easier than the reverse in terms of not breaking code that calls the routines, so my inclination is to continue the trend. |
Definitely, but then the panics and ensures can't be overwritten, so it's a bit of a balance act. I'm fine with it though, just wanted to know the reasoning, I am glad it isn't performance. |
- Detect the RISC-V `v` profile - Don't bother trying to process 4 blocks at a time if emulated
- Use text/table for results - Add more benchmarks
Refactor the blake2 code to use SIMD?(Kind of not worth it with just SSE2)ensure
to enforce API contractsArgon2(Delayed to next patch series)ChaCha8Rng(Delayed to next patch series)