[Cryptech Tech] RSA timing experiments with multiple cores
Rob Austein
sra at hactrn.net
Wed Apr 11 19:42:02 UTC 2018
Update after some profiling: it looks like the current bottleneck is
loading keys from the keystore, not ModExp. More specifically:
* We're spending twice as much time in hal_ks_fetch() as we are in
hal_rsa_decrypt();
* Most of that the time spent in hal_ks_fetch() is waiting for
hal_aes_keyunwrap(), which in turn breaks down into time spent
decrypting keys and time spent waiting for the keystore lock because
some other client is currently hogging it.
Adding more AES cores did not help, so the bottleneck here is probably
the C code, not the AES core.
This looks like an architectural issue, and in retrospect it's sort of
obvious: the current C code makes no attempt to track which signer
cores were just loaded with which key components (in part because,
until relatively recently, we were using bitstreams with one signer
core: since RSA CRT involves two ModExp operations, there would have
been no point in tracking this). So we reload the key components for
every signature, and profiling the results say that's expensive.
Which of course also raises the question of whether we *should* be
preserving key components in the signer cores. Doctrine for the C
code has been to wipe any copy of private key components immediately
after use; we're not currently doing that for the signer cores (oops)
but adding code to do that would be straightforward. Adding code to
be more clever about keeping key components in signer cores seems like
a fun source of additional complexity; we do have a notion of an
"open" key object, so presumably we could somehow hook into that,
perhaps with some kind of LRU mechanism for reclaiming cores when
there are too many open keys for the number of cores available.
At any rate: some work to do, but the above at least sort of makes
sense, which is an improvement over not understanding the results.
More information about the Tech
mailing list