[Cryptech Tech] Happier RSA timing numbers
Joachim Strömbergson
joachim.strombergson at assured.se
Wed May 23 05:34:28 UTC 2018
Aloha!
Just to get some clarifications - what was the number of 2048 sigs/s before the AES updates?
A little bit shame not to have numbers that could be compared directly to be able to confirm the improvements.
Since you are running w parallel AES cores that way of improving things is already used. The next thing should be double sys_clk to 100 MHz. That should drop the wait time.
After that, removing per block wait by implementing streaming/big data is probably the next improvement.
Regards,
JoachimS
> On 23 May 2018, at 07:14, Rob Austein <sra at hactrn.net> wrote:
>
> Summary: 10.3 sig/sec throughput for 2048-bit RSA with eight Modexp
> cores and four AES cores (Joachim's most recent version).
> AES keyunwrap still dominates, but less of it is thumb
> twiddling waiting for the AES core.
>
> There are some relatively minor improvements we might be able to make
> to the FMC I/O code (remove some vestigial stuff which dates back to
> the bridge board and preemptive tasking, move byteswapping to the FPGA
> where we can do it with wires), but none of it's likely to be radical.
> Might be worth doing anyway, since the ARM is underpowered and some of
> the vestigial stuff chews up ARM CPU time to no useful purpose.
>
> The following call graph excerpt was from a test version of the
> firmware with the "vestigial stuff" removed from the FMC code (seems
> to work, at least, it passed all unit tests as well as a signature run
> with eight clients and validation enabled):
>
> index % time self children called name
> -----------------------------------------------
> 0.00 51.25 3000/3000 hal_rpc_pkey_sign [4]
> [5] 88.2 0.00 51.25 3000 pkey_local_sign [5]
> 0.00 46.04 3000/3000 hal_ks_fetch [7]
> 0.00 5.21 3000/3000 pkey_local_sign_rsa [19]
> 0.00 0.00 6000/6806031 memset [44]
> 0.00 0.00 3000/33907 hal_critical_section_start [122]
> 0.00 0.00 3000/33907 hal_critical_section_end [192]
> -----------------------------------------------
> 0.00 0.28 37122/6000552 hal_aes_keywrap [57]
> 0.33 45.48 5963430/6000552 hal_aes_keyunwrap [8]
> [6] 79.3 0.33 45.76 6000552 do_block [6]
> 0.76 17.30 6000552/6003854 hal_io_wait [12]
> 1.96 12.44 18001656/19712064 hal_io_write [14]
> 1.05 12.25 12001104/21468882 hal_io_read [10]
> -----------------------------------------------
> 0.00 46.04 3000/3000 pkey_local_sign [5]
> [7] 79.2 0.00 46.04 3000 hal_ks_fetch [7]
> 0.00 45.82 3000/3000 hal_aes_keyunwrap [8]
> 0.00 0.20 3000/3072 hal_ks_lock [65]
> 0.00 0.02 3000/3024 hal_mkm_get_kek [85]
> 0.00 0.00 3000/6036 hal_ks_index_find [126]
> 0.00 0.00 3000/369530 memcpy [83]
> 0.00 0.00 3000/3024 ks_volatile_test_owner [205]
> 0.00 0.00 3000/6036 hal_ks_block_read_cached [198]
> 0.00 0.00 3000/6084 hal_ks_cache_mark_used [195]
> 0.00 0.00 3000/3072 hal_ks_unlock [202]
> -----------------------------------------------
> 0.00 45.82 3000/3000 hal_ks_fetch [7]
> [8] 78.9 0.00 45.82 3000 hal_aes_keyunwrap [8]
> 0.33 45.48 5963430/6000552 do_block [6]
> 0.00 0.01 3000/3024 load_kek [98]
> 0.00 0.00 3000/9042 hal_core_free [90]
> 0.00 0.00 3000/3042 hal_core_alloc [110]
> 0.00 0.00 3000/6073 memmove [156]
> -----------------------------------------------
> 3.12 5.02 31734336/98674308 fmc_write_32 [16]
> 6.59 10.59 66939972/98674308 fmc_read_32 [11]
> [9] 43.6 9.71 15.61 98674308 _fmc_nwait_idle [9]
> 15.61 0.00 98674308/98674308 HAL_GPIO_ReadPin [15]
> -----------------------------------------------
>
> Be warned that this call graph is not strictly comparable to the
> previous ones: the profiling script had been testing not just multiple
> key sizes but also multiple numbers of clients, I dropped the latter
> in favor of always trying just what should be the optimal number of
> clients (now that theory and reality seem to match there --
> deliberately running too many clients is useful as a torture test, but
> may not give a very accurate picture of where the time is going for
> the case we really care about).
>
> Full profiling report available if folks want to look at it. Be
> warned that we're already past the point where enabling profiling has
> a significant effect on throughput (current penalty is roughly 2x).
>
> I'll send a copy of the proposed change to the FMC code to Paul and
> Pavel for review before pushing it to master.
> _______________________________________________
> Tech mailing list
> Tech at cryptech.is
> https://lists.cryptech.is/listinfo/tech
More information about the Tech
mailing list