[Cryptech Core] further improvements in RSA signing performance

Paul Selkirk paul at psgd.org
Tue Mar 10 22:51:56 UTC 2020


Recall that, when we last left things with modexpng + keywrap cores, we
had sigs/sec in this range, with 1-4 simultaneous signers.

1:  13.358
2:  22.188
3:  25.696
4:  25.688

While I was digging into this, I noticed that libtfm was being built
with BITS=8192. This is the largest size of number that we expect to
be handling, but (for reasons of bignum math overflow handling)
storage is twice that, or 16KB.

However, the modexp cores have been configured to handle a maximum key
size of 4096 bits. (This is a limit we could (but don't) check for in
the software driver.) Also, an 8192-bit key, with all the extra
fields, would have a DER size of about 8824 bytes, which exceeds the
keystore block size of 8192 bytes. So there's really no point in
setting BITS above 4096.

When I rebuilt libtfm with the smaller bignum size, I immediately (and
surprisingly) saw 16-36% improvement in signing speeds:

1:  15.562
2:  28.384
3:  35.140
4:  35.105

At this point, I was still using a minimally-resourced bitstream, with
only 1 modexpng core. So I built a bitstream with 4 modexpng cores,
but (surprisingly), that didn't perform significantly better than 1
core.

Digging further, I noticed that hal_rsa_private_key_from_der, which
(mostly) copies byte strings into fp_ints, had much better performance
than unpack_fp, which copies fp_ints into byte strings. Lo and behold,
fp_to_unsigned_bin is almost pathologically pessimized. Copying code
from fp_read_unsigned_bin, I saw a further improvement of 44-105% in
signing speeds:

1:  22.426
2:  43.913
3:  60.711
4:  72.244

Note that these tests are for 1000 signatures. Bumping it to 10000
signatures, I can drive the performance over 80 sig/sec, because it
amortizes the initial precalc across more signatures:

1:  22.565
2:  45.577
3:  64.916
4:  80.128

GUYS! This is legit 10x the last releng version!

Anyway, this is the current breakdown what goes into signing (in
ms/sig).

parallel-signatures                 43.869
RPC overhead                        23.483
hal_rpc_pkey_sign                   20.386
  hal_ks_fetch                       1.529
    hal_mkm_get_kek                  0.419
    hal_aes_keyunwrap                1.026
  pkey_local_sign_rsa               18.811
    hal_rsa_private_key_from_der     0.557
    hal_rsa_decrypt                 18.243
      rsa_crt                       18.124
        modexpng                    18.110
          unpack_fp                  0.325
          hal_modexpng              17.663

An important thing to notice is that the RPC overhead (marshalling/
unmarshalling, serial communications, dispatch to an RPC task, etc) is
now more than the signing time itself.

And this is still with the modexpng core at 90MHz. Just think what
180MHz would buy us.

				paul



More information about the Core mailing list