[Cryptech Core] further improvements in RSA signing performance
Stephen Farrell
stephen.farrell at cs.tcd.ie
Wed Mar 11 13:54:27 UTC 2020
On 11/03/2020 13:53, Joachim Strömbergson wrote:
> Aloha!
>
> Amazing! Very cool.
>
> For comparion, this performance exceeds some commercial HSMs, for
> example the SafeNet Luna USB HSM.
Excellent!
S.
>
> Good work!
>
> BR,
> JoachimS
>
>
> On 2020-03-10 23:51, Paul Selkirk wrote:
>> Recall that, when we last left things with modexpng + keywrap cores, we
>> had sigs/sec in this range, with 1-4 simultaneous signers.
>>
>> 1: 13.358
>> 2: 22.188
>> 3: 25.696
>> 4: 25.688
>>
>> While I was digging into this, I noticed that libtfm was being built
>> with BITS=8192. This is the largest size of number that we expect to
>> be handling, but (for reasons of bignum math overflow handling)
>> storage is twice that, or 16KB.
>>
>> However, the modexp cores have been configured to handle a maximum key
>> size of 4096 bits. (This is a limit we could (but don't) check for in
>> the software driver.) Also, an 8192-bit key, with all the extra
>> fields, would have a DER size of about 8824 bytes, which exceeds the
>> keystore block size of 8192 bytes. So there's really no point in
>> setting BITS above 4096.
>>
>> When I rebuilt libtfm with the smaller bignum size, I immediately (and
>> surprisingly) saw 16-36% improvement in signing speeds:
>>
>> 1: 15.562
>> 2: 28.384
>> 3: 35.140
>> 4: 35.105
>>
>> At this point, I was still using a minimally-resourced bitstream, with
>> only 1 modexpng core. So I built a bitstream with 4 modexpng cores,
>> but (surprisingly), that didn't perform significantly better than 1
>> core.
>>
>> Digging further, I noticed that hal_rsa_private_key_from_der, which
>> (mostly) copies byte strings into fp_ints, had much better performance
>> than unpack_fp, which copies fp_ints into byte strings. Lo and behold,
>> fp_to_unsigned_bin is almost pathologically pessimized. Copying code
>> from fp_read_unsigned_bin, I saw a further improvement of 44-105% in
>> signing speeds:
>>
>> 1: 22.426
>> 2: 43.913
>> 3: 60.711
>> 4: 72.244
>>
>> Note that these tests are for 1000 signatures. Bumping it to 10000
>> signatures, I can drive the performance over 80 sig/sec, because it
>> amortizes the initial precalc across more signatures:
>>
>> 1: 22.565
>> 2: 45.577
>> 3: 64.916
>> 4: 80.128
>>
>> GUYS! This is legit 10x the last releng version!
>>
>> Anyway, this is the current breakdown what goes into signing (in
>> ms/sig).
>>
>> parallel-signatures 43.869
>> RPC overhead 23.483
>> hal_rpc_pkey_sign 20.386
>> hal_ks_fetch 1.529
>> hal_mkm_get_kek 0.419
>> hal_aes_keyunwrap 1.026
>> pkey_local_sign_rsa 18.811
>> hal_rsa_private_key_from_der 0.557
>> hal_rsa_decrypt 18.243
>> rsa_crt 18.124
>> modexpng 18.110
>> unpack_fp 0.325
>> hal_modexpng 17.663
>>
>> An important thing to notice is that the RPC overhead (marshalling/
>> unmarshalling, serial communications, dispatch to an RPC task, etc) is
>> now more than the signing time itself.
>>
>> And this is still with the modexpng core at 90MHz. Just think what
>> 180MHz would buy us.
>>
>> paul
>>
>> _______________________________________________
>> Core mailing list
>> Core at cryptech.is
>> https://lists.cryptech.is/listinfo/core
>>
>
>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0x5AB2FAF17B172BEA.asc
Type: application/pgp-keys
Size: 10715 bytes
Desc: not available
URL: <https://lists.cryptech.is/archives/core/attachments/20200311/ccdc83a2/attachment-0001.key>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: OpenPGP digital signature
URL: <https://lists.cryptech.is/archives/core/attachments/20200311/ccdc83a2/attachment-0001.sig>
More information about the Core
mailing list