[Cryptech Tech] Aes keywrap core ready for test and benchmarking

Paul Selkirk paul at psgd.org
Mon Aug 20 19:43:15 UTC 2018

Summary: The keywrap core does what it says on the package, and does it
a lot faster than software keywrap.

The following table shows wrap and unwrap times (in ms) for various
object sizes. The speedup increases slightly for larger sizes, but in
general let's call it 5x improvement.

        |              WRAP            |             UNWRAP
   size | software |   core  | speedup | software |   core  | speedup
   128  |   1.086  |  0.220  |  4.94x  |   1.102  |  0.221  |  4.99x
   256  |   2.151  |  0.411  |  5.22x  |   2.186  |  0.414  |  5.28x
   512  |   4.283  |  0.795  |  5.39x  |   4.352  |  0.799  |  5.45x
  1024  |   8.547  |  1.564  |  5.46x  |   8.686  |  1.572  |  5.53x
  2048  |  17.075  |  3.103  |  5.50x  |  17.354  |  3.117  |  5.57x
  4096  |  34.130  |  6.182  |  5.52x  |  34.689  |  6.208  |  5.59x

(Note: I haven't yet tested the practical effects on keygen, signing,
and verification. This is purely about key wrap/unwrap.)

To reproduce this:

Checkout branch 'master' on repo:

Checkout branch 'js_keywrap' on repos:

In core/platform/alpha/build, run
  `make keywrap`
  `cryptech_upload --fpga -i alpha_fmc_keywrap.bit`

In sw/stm32, run
  `make cli-test`
  `bin/flash-target projects/cli-test/cli-test`

In cryptech_console, log in with username=ct, password=ct, and run
  `keywrap test` (for test vectors)
  `keywrap test 256 1000` (for timing tests)

Enhancement requests:
> The core does not implement the magic value calculation as specified
> in RFC 5649. Nor does it do any padding of object to a complete 8
> byte block. The caller needs to calculate the 64-bit magic value and
> store it in the A registers.

It seems like this shouldn't be too hard to do in the core. It already
has the length in the RLEN register, which directly gives it both the
MLI field of the AIV ("magic value"), and the padding length.

> Due to address space limitations in the cryptech cores, the core
> implements bank switching (with four banks). The caller must write
> to the BANK register to ensure that subsequent writes to the DATA
> addresses ends up in the correct bank. And similarly, the caller
> needs to set the bank as needed when reading the wrapped/unwrapped
> object from the core. Basically the first 128 words of the object is
> stored in bank 0, the next 128 in bank 1 etc.

I would like to get rid of the bank switching, and unroll the block
memory into a set of contiguous core register blocks (where each block
is 256 4-byte register), as Pavel did for modexps6 and modexpa7. Of
course, this would require agreement about the block memory size,
between keywrap_mem.v, core/platform/common/config/core.cfg, and
sw/libhal/core.c. At present, the HSM keystore uses fixed-size
8096-byte slots, so that's the practical limit. (In the long term,
each core should encode its own length, right after its name and
version, and we could remove the core-specific knowledge from
core.c. But that's a project for another day.)

Finally, I would like this core to directly fetch the KEK from the
MKM, rather than continuing to require the caller to fetch the KEK via
the mkmif core, and feed it right back to the keywrap core. Keeping
the KEK out of RAM can only improve the security of the device.


More information about the Tech mailing list