Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance hints #1

Open
Yawning opened this issue Oct 25, 2021 · 1 comment
Open

Performance hints #1

Yawning opened this issue Oct 25, 2021 · 1 comment
Labels
performance New feature or change to improve software performance

Comments

@Yawning
Copy link

Yawning commented Oct 25, 2021

Shrek is the fastest .onion vanity address generator written in Go (at time of writing), but it's still very slow compared to native C programs like mkp224o.

As fast as my SSE2/AVX2 scalar-basepoint multiply implementation is, it is not going to compare to mkp244o because it cheats and only does one scalar-basepoint multiply per worker in the common case.

The gist of it is that instead of regenerating the entire key (expensive) each iteration of the search loop, you instead do something like:

eightTimesBasePoint = Point.Mul(Basepoint, 8) # Precompute this, though this is only called once/worker, so you could not bother.

priv, pub = GenerateKey()
scalar = ParseScalar(priv[:32])
point = ParsePoint(pub)
for {
  if Match(Encode(point) {
    // blah blah blah
  }

  scalar = scalar.Add(scalar, 1)
  point = point.Add(point, eightTimesBasePoint)
}

This does have the downside of not supporting RFC-style seed based private key representations, but IIRC tor supports/uses the expanded scalar | nonce form so keys generated by this algorithm work.

The relevant datastructures and math routines are in the curve and curve/scalar package of your ed25519 import.

nb: mkp244o does the scalar addition somewhat differently because of how I originally implemented it in the code they derived their implementation from, but the addition routine in the scalar package is sufficient (and arguably better because it is computationally impractical for wrapping to happen, so you can omit the check).

@innix
Copy link
Owner

innix commented Jan 4, 2022

Hey, thanks for the tips and apologies for replying so late. I finally had some free time over Xmas to work on some side projects 😄

Using your pseudo-code above and using horse25519's code as a reference, I've been able to write up a Go implementation that skips doing the full key generation / scalar multiplication on every iteration. I had some problems with your scalar.Add method in the curve25519-voi library, so I borrowed the addsztoscalar32 function from mkp244o and got it working that way. Maybe I was doing something wrong; crypto isn't really my area of expertise.

This new implementation is much faster; around 4x faster than before. I think it's getting close to being the same performance as mkp2440p when running it using the -z flag. Unfortunately my app still gets obliterated by mkp2440p when it uses -B batching mode. It's so much faster it's kinda unbelievable.

@innix innix added the performance New feature or change to improve software performance label Apr 11, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance New feature or change to improve software performance
Projects
None yet
Development

No branches or pull requests

2 participants