[Cryptech Core] further improvements in RSA signing performance
Joachim Strömbergson
joachim at assured.se
Wed Mar 11 13:53:21 UTC 2020
Aloha!
Amazing! Very cool.
For comparion, this performance exceeds some commercial HSMs, for
example the SafeNet Luna USB HSM.
Good work!
BR,
JoachimS
On 2020-03-10 23:51, Paul Selkirk wrote:
> Recall that, when we last left things with modexpng + keywrap cores, we
> had sigs/sec in this range, with 1-4 simultaneous signers.
>
> 1: 13.358
> 2: 22.188
> 3: 25.696
> 4: 25.688
>
> While I was digging into this, I noticed that libtfm was being built
> with BITS=8192. This is the largest size of number that we expect to
> be handling, but (for reasons of bignum math overflow handling)
> storage is twice that, or 16KB.
>
> However, the modexp cores have been configured to handle a maximum key
> size of 4096 bits. (This is a limit we could (but don't) check for in
> the software driver.) Also, an 8192-bit key, with all the extra
> fields, would have a DER size of about 8824 bytes, which exceeds the
> keystore block size of 8192 bytes. So there's really no point in
> setting BITS above 4096.
>
> When I rebuilt libtfm with the smaller bignum size, I immediately (and
> surprisingly) saw 16-36% improvement in signing speeds:
>
> 1: 15.562
> 2: 28.384
> 3: 35.140
> 4: 35.105
>
> At this point, I was still using a minimally-resourced bitstream, with
> only 1 modexpng core. So I built a bitstream with 4 modexpng cores,
> but (surprisingly), that didn't perform significantly better than 1
> core.
>
> Digging further, I noticed that hal_rsa_private_key_from_der, which
> (mostly) copies byte strings into fp_ints, had much better performance
> than unpack_fp, which copies fp_ints into byte strings. Lo and behold,
> fp_to_unsigned_bin is almost pathologically pessimized. Copying code
> from fp_read_unsigned_bin, I saw a further improvement of 44-105% in
> signing speeds:
>
> 1: 22.426
> 2: 43.913
> 3: 60.711
> 4: 72.244
>
> Note that these tests are for 1000 signatures. Bumping it to 10000
> signatures, I can drive the performance over 80 sig/sec, because it
> amortizes the initial precalc across more signatures:
>
> 1: 22.565
> 2: 45.577
> 3: 64.916
> 4: 80.128
>
> GUYS! This is legit 10x the last releng version!
>
> Anyway, this is the current breakdown what goes into signing (in
> ms/sig).
>
> parallel-signatures 43.869
> RPC overhead 23.483
> hal_rpc_pkey_sign 20.386
> hal_ks_fetch 1.529
> hal_mkm_get_kek 0.419
> hal_aes_keyunwrap 1.026
> pkey_local_sign_rsa 18.811
> hal_rsa_private_key_from_der 0.557
> hal_rsa_decrypt 18.243
> rsa_crt 18.124
> modexpng 18.110
> unpack_fp 0.325
> hal_modexpng 17.663
>
> An important thing to notice is that the RPC overhead (marshalling/
> unmarshalling, serial communications, dispatch to an RPC task, etc) is
> now more than the signing time itself.
>
> And this is still with the modexpng core at 90MHz. Just think what
> 180MHz would buy us.
>
> paul
>
> _______________________________________________
> Core mailing list
> Core at cryptech.is
> https://lists.cryptech.is/listinfo/core
>
--
Med vänlig hälsning, Yours
Joachim Strömbergson
========================================================================
Assured AB
========================================================================
More information about the Core
mailing list