[Cryptech Tech] Happier RSA timing numbers

Rob Austein sra at hactrn.net
Wed May 23 13:53:57 UTC 2018


On Wed, 23 May 2018 01:34:28 -0400, Joachim Strömbergson wrote:
> 
> Just to get some clarifications - what was the number of 2048 sigs/s
> before the AES updates?

Don't remember (might be in old email), but I saved the bitstream, so
we can find out what current firmware does with the older bitstream,
which is probably the answer you really want for comparison anyway.

> A little bit shame not to have numbers that could be compared
> directly to be able to confirm the improvements.

We can also run the older test sequence, it's just that the numbers in
it may be misleading (artifacts of deliberate overloading can produce
(even) strange(r) results when profiling).

> Since you are running w parallel AES cores that way of improving
> things is already used. The next thing should be double sys_clk to
> 100 MHz. That should drop the wait time.

Yep.

On Wed, 23 May 2018 01:37:44 -0400, Joachim Strömbergson wrote:
> 
> Oh by the way, do you test that keys are wrapped/unwrapped
> correctly? Just to get a confirmation that the modified AES core
> works. If that is the case I will update/replace the old core with
> the new one.

Indirectly, yes.  We don't export the KEK from the MKM for obvious
reasons, but the backup mechanism also uses AES keywrap (with a
separate transfer KEK) and the unit tests for the backup mechanism do
attempt to verify the results, including a Python implementation of
AES Keywrap and attempts to use both imported and exported keys in
parallel on the HSM and in PyCrypto.  It is of course possible that I
messed up these tests (wouldn't be the first time), but if they do
what I think they do, we have some reason to believe that AES works.

In any case, these are the tests we've been using all along, so either
they work or we have no reason to believe the older core either. :)

See the TestPKeyBackup class in sw/libhal/unit-tests.py.

On Wed, 23 May 2018 04:32:02 -0400, Joachim Strömbergson wrote:
> 
> After the modifications I did yesterday, the AES core now performs one
> round/cycle. Improving that considerably would mean trying to do more
> than one round/cycle. Based on the worst delay path in the core, this
> will have a bad effect on the possible clock frequency. It will quite
> probably be below 100 MHz. So improving single AES core performance a
> lot must come from increasing the clock frequency it is running at, not
> changes to the design.

Makes sense.

> A question: Are the parallel cores loaded with the same keys? If so,
> what happens if you increase the number of AES cores?

For their principal use (wrapping and unwrapping keys in the
keystore), yes, they're all loaded with the KEK from the MKM.

You may recall that at one point we had this theory that the ARM
should never even see the primary KEK, which is why the MKM is wired
to the FPGA rather than the ARM.  As I recall the theory, the (not yet
written) AES keywrap implementation would (somehow) get the KEK
directly from the MKM.  A miracle having thus far failed to occur[*],
we hacked around the missing bits with software and your mkmif core. :)

For their secondary use (backup and restore) one or more of the AES
cores will be loaded with transfer KEKs.

[*] http://socsci2.ucsd.edu/~aronatas/project/cartoon.math.gif

> After this the big thing I can do is the streaming interface I've been
> talking about.
...

Sounds cool, but not sure it helps for specific case of AES keywrap.
You might want to look at sw/libhal/aes_keywrap.c (or the equivalent
Python implementation in the unit tests) as well as RFC 5649.

Basic problem I see is that the input to each ECB round in AES keywrap
is a composite of two 64-bit fields, one of which is constructed by
XORing a counter with 64-bits of the output of the previous round.  I
see no obvious way to stream this.

One could of course use your streaming approach if the core were
performing the complete AES keywrap transform rather than just ECB. :)


More information about the Tech mailing list