[Cryptech Tech] RSA timing experiments with multiple cores

Pavel Shatov meisterpaul1 at yandex.ru
Mon Apr 2 18:13:55 UTC 2018


02.04.2018 1:44, Rob Austein пишет:
> One thing we discussed at the last face to face meeting was what kind
> of overall signature rate we could get with RSA if we were to load up
> the FPGA with enough cores that we could run batches of signatures in
> parallel.  What numbers were able to generate quickly at the time
> showed surprisingly little benefit from running parallel signers.
> Clearly, more research was required.
> 
> The following is a report on what I saw with an FPGA configuration
> using eight ModExpA7 cores.  Our CRT implementation uses two cores in
> parallel, so in theory this allows four signatures in parallel.
> 
> Each line in the results below represents 10,000 samples.
> 
> Explanation of fields:
> 
> * "sigs/sec" is overall signing rate: number of signatures (10,000)
>    divided by elapsed time from start of first signature until end of
>    last signature.
> 
> * "secs/sig" is inverse of "sigs/sec", as the name would suggest.
> 
> * "mean" is the arithmetic mean of the actual measured times for each
>    of the 10,000 signatures.
> 
> * "clients" is how many separate RPC client connections the test
>    program was driving in parallel.  In theory, separate client streams
>    should execute in parallel on the HSM, resources permitting.
> 
> Given the above, one might expect four clients to be optimal with
> eight cores.  In practice it's weirder than that, we don't (yet)
> really know why.
> 
> RSA blinding (as currently implemented, anyway) is a significant
> component of the overall workload, so I ran the same set of tests
> twice, once with blinding enabled, once with blinding disabled.  As
> mentioned at the last face-to-face meeting, I'm also looking into
> options to amortize the cost of blinding factor generation over
> multiple signatures (eg, the squaring method that Cryptlib uses), but
> the "no blinding" case below is probably still useful for comparison,
> since it seems unlikely that any blinding implementation, no matter
> how clever, would be faster than no blinding at all.
> 
> Results with RSA blinding enabled:
> 
> rsa_1024 sigs/sec 8.23051184652 secs/sig 0:00:00.121499 mean 0:00:00.242737 clients 2
> rsa_1024 sigs/sec 7.98897745442 secs/sig 0:00:00.125172 mean 0:00:00.375233 clients 3
> rsa_1024 sigs/sec 7.81232996586 secs/sig 0:00:00.128002 mean 0:00:00.511688 clients 4
> rsa_1024 sigs/sec 7.47154351327 secs/sig 0:00:00.133841 mean 0:00:00.668838 clients 5
> rsa_1024 sigs/sec 7.39642568217 secs/sig 0:00:00.135200 mean 0:00:00.810773 clients 6
> rsa_1024 sigs/sec 7.39639878825 secs/sig 0:00:00.135200 mean 0:00:00.945906 clients 7
> rsa_1024 sigs/sec 7.39569001033 secs/sig 0:00:00.135213 mean 0:00:01.081113 clients 8
> rsa_1024 sigs/sec 6.94447171114 secs/sig 0:00:00.143999 mean 0:00:00.143754 clients 1
> 
> rsa_2048 sigs/sec 3.58840968475 secs/sig 0:00:00.278674 mean 0:00:00.557077 clients 2
> rsa_2048 sigs/sec 3.57318846246 secs/sig 0:00:00.279862 mean 0:00:00.839253 clients 3
> rsa_2048 sigs/sec 3.52841577927 secs/sig 0:00:00.283413 mean 0:00:01.133249 clients 4
> rsa_2048 sigs/sec 3.43730017148 secs/sig 0:00:00.290926 mean 0:00:01.454111 clients 5
> rsa_2048 sigs/sec 3.43150490015 secs/sig 0:00:00.291417 mean 0:00:02.039086 clients 7
> rsa_2048 sigs/sec 3.43119570233 secs/sig 0:00:00.291443 mean 0:00:01.748002 clients 6
> rsa_2048 sigs/sec 3.43097294848 secs/sig 0:00:00.291462 mean 0:00:02.330666 clients 8
> rsa_2048 sigs/sec 2.83980804247 secs/sig 0:00:00.352136 mean 0:00:00.351889 clients 1
> 
> rsa_4096 sigs/sec 1.26531176783 secs/sig 0:00:00.790319 mean 0:00:02.370533 clients 3
> rsa_4096 sigs/sec 1.25766369367 secs/sig 0:00:00.795125 mean 0:00:03.179863 clients 4
> rsa_4096 sigs/sec 1.24165903669 secs/sig 0:00:00.805374 mean 0:00:04.025947 clients 5
> rsa_4096 sigs/sec 1.23907931329 secs/sig 0:00:00.807050 mean 0:00:04.841009 clients 6
> rsa_4096 sigs/sec 1.23877049069 secs/sig 0:00:00.807252 mean 0:00:05.649015 clients 7
> rsa_4096 sigs/sec 1.23818618332 secs/sig 0:00:00.807632 mean 0:00:06.458782 clients 8
> rsa_4096 sigs/sec 1.20014282040 secs/sig 0:00:00.833234 mean 0:00:01.666174 clients 2
> rsa_4096 sigs/sec 0.75256731769 secs/sig 0:00:01.328784 mean 0:00:01.328541 clients 1
> 
> Results with RSA blinding disabled:
> 
> rsa_1024 sigs/sec 20.8328738383 secs/sig 0:00:00.048001 mean 0:00:00.095756 clients 2
> rsa_1024 sigs/sec 19.9879970479 secs/sig 0:00:00.050030 mean 0:00:00.149835 clients 3
> rsa_1024 sigs/sec 19.2903102572 secs/sig 0:00:00.051839 mean 0:00:00.207086 clients 4
> rsa_1024 sigs/sec 18.0929850137 secs/sig 0:00:00.055270 mean 0:00:00.276055 clients 5
> rsa_1024 sigs/sec 17.6741526966 secs/sig 0:00:00.056579 mean 0:00:00.339158 clients 6
> rsa_1024 sigs/sec 17.6726372423 secs/sig 0:00:00.056584 mean 0:00:00.395741 clients 7
> rsa_1024 sigs/sec 17.6706028853 secs/sig 0:00:00.056591 mean 0:00:00.452338 clients 8
> rsa_1024 sigs/sec 15.6249343020 secs/sig 0:00:00.064000 mean 0:00:00.063758 clients 1
> 
> rsa_2048 sigs/sec 11.8560079254 secs/sig 0:00:00.084345 mean 0:00:00.252769 clients 3
> rsa_2048 sigs/sec 11.5105757962 secs/sig 0:00:00.086876 mean 0:00:00.347218 clients 4
> rsa_2048 sigs/sec 10.8679436493 secs/sig 0:00:00.092013 mean 0:00:00.183782 clients 2
> rsa_2048 sigs/sec 10.8513037353 secs/sig 0:00:00.092154 mean 0:00:00.460453 clients 5
> rsa_2048 sigs/sec 10.7127620993 secs/sig 0:00:00.093346 mean 0:00:00.559706 clients 6
> rsa_2048 sigs/sec 10.7120583942 secs/sig 0:00:00.093352 mean 0:00:00.653052 clients 7
> rsa_2048 sigs/sec 10.7114778457 secs/sig 0:00:00.093357 mean 0:00:00.746364 clients 8
> rsa_2048 sigs/sec 6.25004214872 secs/sig 0:00:00.159998 mean 0:00:00.159755 clients 1
> 
> rsa_4096 sigs/sec 5.14717930991 secs/sig 0:00:00.194281 mean 0:00:00.776755 clients 4
> rsa_4096 sigs/sec 5.13575633082 secs/sig 0:00:00.194713 mean 0:00:00.973146 clients 5
> rsa_4096 sigs/sec 5.10692320688 secs/sig 0:00:00.195812 mean 0:00:01.174336 clients 6
> rsa_4096 sigs/sec 5.10315211883 secs/sig 0:00:00.195957 mean 0:00:01.371050 clients 7
> rsa_4096 sigs/sec 5.10234529358 secs/sig 0:00:00.195988 mean 0:00:01.567119 clients 8
> rsa_4096 sigs/sec 4.07444053710 secs/sig 0:00:00.245432 mean 0:00:00.736003 clients 3
> rsa_4096 sigs/sec 2.78873489846 secs/sig 0:00:00.358585 mean 0:00:00.716912 clients 2
> rsa_4096 sigs/sec 1.48805229671 secs/sig 0:00:00.672019 mean 0:00:00.671777 clients 1
> 

Well, that's very weird. I guess it's stupid to ask, but are you sure, 
you're using the cores correctly? I mean, when you need to sign, you 
look for the first idle core, load it with input data and toggle the 
corresponding control bit? Then you save the core's number (to keep 
track of which core is signing what) and go do something else?

Another question, with say three clients does each one of them do 3333 
signatures on average? I mean, aren't you doing 30000 tests for 3 
clients by chance?

-- 
With best regards,
Pavel Shatov



More information about the Tech mailing list