[Cryptech Core] further improvements in RSA signing performance

Joachim Strömbergson joachim at assured.se
Wed Mar 11 13:53:21 UTC 2020


Aloha!

Amazing! Very cool.

For comparion, this performance exceeds some commercial HSMs, for
example the SafeNet Luna USB HSM.

Good work!

BR,
JoachimS


On 2020-03-10 23:51, Paul Selkirk wrote:
> Recall that, when we last left things with modexpng + keywrap cores, we
> had sigs/sec in this range, with 1-4 simultaneous signers.
> 
> 1:  13.358
> 2:  22.188
> 3:  25.696
> 4:  25.688
> 
> While I was digging into this, I noticed that libtfm was being built
> with BITS=8192. This is the largest size of number that we expect to
> be handling, but (for reasons of bignum math overflow handling)
> storage is twice that, or 16KB.
> 
> However, the modexp cores have been configured to handle a maximum key
> size of 4096 bits. (This is a limit we could (but don't) check for in
> the software driver.) Also, an 8192-bit key, with all the extra
> fields, would have a DER size of about 8824 bytes, which exceeds the
> keystore block size of 8192 bytes. So there's really no point in
> setting BITS above 4096.
> 
> When I rebuilt libtfm with the smaller bignum size, I immediately (and
> surprisingly) saw 16-36% improvement in signing speeds:
> 
> 1:  15.562
> 2:  28.384
> 3:  35.140
> 4:  35.105
> 
> At this point, I was still using a minimally-resourced bitstream, with
> only 1 modexpng core. So I built a bitstream with 4 modexpng cores,
> but (surprisingly), that didn't perform significantly better than 1
> core.
> 
> Digging further, I noticed that hal_rsa_private_key_from_der, which
> (mostly) copies byte strings into fp_ints, had much better performance
> than unpack_fp, which copies fp_ints into byte strings. Lo and behold,
> fp_to_unsigned_bin is almost pathologically pessimized. Copying code
> from fp_read_unsigned_bin, I saw a further improvement of 44-105% in
> signing speeds:
> 
> 1:  22.426
> 2:  43.913
> 3:  60.711
> 4:  72.244
> 
> Note that these tests are for 1000 signatures. Bumping it to 10000
> signatures, I can drive the performance over 80 sig/sec, because it
> amortizes the initial precalc across more signatures:
> 
> 1:  22.565
> 2:  45.577
> 3:  64.916
> 4:  80.128
> 
> GUYS! This is legit 10x the last releng version!
> 
> Anyway, this is the current breakdown what goes into signing (in
> ms/sig).
> 
> parallel-signatures                 43.869
> RPC overhead                        23.483
> hal_rpc_pkey_sign                   20.386
>   hal_ks_fetch                       1.529
>     hal_mkm_get_kek                  0.419
>     hal_aes_keyunwrap                1.026
>   pkey_local_sign_rsa               18.811
>     hal_rsa_private_key_from_der     0.557
>     hal_rsa_decrypt                 18.243
>       rsa_crt                       18.124
>         modexpng                    18.110
>           unpack_fp                  0.325
>           hal_modexpng              17.663
> 
> An important thing to notice is that the RPC overhead (marshalling/
> unmarshalling, serial communications, dispatch to an RPC task, etc) is
> now more than the signing time itself.
> 
> And this is still with the modexpng core at 90MHz. Just think what
> 180MHz would buy us.
> 
> 				paul
> 
> _______________________________________________
> Core mailing list
> Core at cryptech.is
> https://lists.cryptech.is/listinfo/core
> 


-- 
Med vänlig hälsning, Yours

Joachim Strömbergson
========================================================================
                               Assured AB
========================================================================


More information about the Core mailing list