[Cryptech Tech] Happier RSA timing numbers

Paul Selkirk paul at psgd.org
Thu May 24 15:41:23 UTC 2018


If you're referring to [6], it just means that both keywrap and
keyunwrap call do_block, not that we're wrapping and unwrapping in the
same operation.

				paul

On 05/24/2018 11:14 AM, Russ Housley wrote:
> Rob:
> 
> Am I reading this correctly?  I am seeing both wrap and unwrap.  I expected to see an unwrap, private key operation, overwrite the memory where the key was stored.  This would save a whole bunch of AES operations.
> 
> Russ
> 
>   
>> On May 23, 2018, at 1:14 AM, Rob Austein <sra at hactrn.net> wrote:
>>
>> Summary: 10.3 sig/sec throughput for 2048-bit RSA with eight Modexp
>> 	 cores and four AES cores (Joachim's most recent version).
>> 	 AES keyunwrap still dominates, but less of it is thumb
>> 	 twiddling waiting for the AES core.
>>
>> There are some relatively minor improvements we might be able to make
>> to the FMC I/O code (remove some vestigial stuff which dates back to
>> the bridge board and preemptive tasking, move byteswapping to the FPGA
>> where we can do it with wires), but none of it's likely to be radical.
>> Might be worth doing anyway, since the ARM is underpowered and some of
>> the vestigial stuff chews up ARM CPU time to no useful purpose.
>>
>> The following call graph excerpt was from a test version of the
>> firmware with the "vestigial stuff" removed from the FMC code (seems
>> to work, at least, it passed all unit tests as well as a signature run
>> with eight clients and validation enabled):
>>
>> index % time    self  children    called     name
>> -----------------------------------------------
>>                0.00   51.25    3000/3000        hal_rpc_pkey_sign [4]
>> [5]     88.2    0.00   51.25    3000         pkey_local_sign [5]
>>                0.00   46.04    3000/3000        hal_ks_fetch [7]
>>                0.00    5.21    3000/3000        pkey_local_sign_rsa [19]
>>                0.00    0.00    6000/6806031     memset [44]
>>                0.00    0.00    3000/33907       hal_critical_section_start [122]
>>                0.00    0.00    3000/33907       hal_critical_section_end [192]
>> -----------------------------------------------
>>                0.00    0.28   37122/6000552     hal_aes_keywrap [57]
>>                0.33   45.48 5963430/6000552     hal_aes_keyunwrap [8]
>> [6]     79.3    0.33   45.76 6000552         do_block [6]
>>                0.76   17.30 6000552/6003854     hal_io_wait [12]
>>                1.96   12.44 18001656/19712064     hal_io_write [14]
>>                1.05   12.25 12001104/21468882     hal_io_read [10]
>> -----------------------------------------------
>>                0.00   46.04    3000/3000        pkey_local_sign [5]
>> [7]     79.2    0.00   46.04    3000         hal_ks_fetch [7]
>>                0.00   45.82    3000/3000        hal_aes_keyunwrap [8]
>>                0.00    0.20    3000/3072        hal_ks_lock [65]
>>                0.00    0.02    3000/3024        hal_mkm_get_kek [85]
>>                0.00    0.00    3000/6036        hal_ks_index_find [126]
>>                0.00    0.00    3000/369530      memcpy [83]
>>                0.00    0.00    3000/3024        ks_volatile_test_owner [205]
>>                0.00    0.00    3000/6036        hal_ks_block_read_cached [198]
>>                0.00    0.00    3000/6084        hal_ks_cache_mark_used [195]
>>                0.00    0.00    3000/3072        hal_ks_unlock [202]
>> -----------------------------------------------
>>                0.00   45.82    3000/3000        hal_ks_fetch [7]
>> [8]     78.9    0.00   45.82    3000         hal_aes_keyunwrap [8]
>>                0.33   45.48 5963430/6000552     do_block [6]
>>                0.00    0.01    3000/3024        load_kek [98]
>>                0.00    0.00    3000/9042        hal_core_free [90]
>>                0.00    0.00    3000/3042        hal_core_alloc [110]
>>                0.00    0.00    3000/6073        memmove [156]
>> -----------------------------------------------
>>                3.12    5.02 31734336/98674308     fmc_write_32 [16]
>>                6.59   10.59 66939972/98674308     fmc_read_32 [11]
>> [9]     43.6    9.71   15.61 98674308         _fmc_nwait_idle [9]
>>               15.61    0.00 98674308/98674308     HAL_GPIO_ReadPin [15]
>> -----------------------------------------------
>>
>> Be warned that this call graph is not strictly comparable to the
>> previous ones: the profiling script had been testing not just multiple
>> key sizes but also multiple numbers of clients, I dropped the latter
>> in favor of always trying just what should be the optimal number of
>> clients (now that theory and reality seem to match there --
>> deliberately running too many clients is useful as a torture test, but
>> may not give a very accurate picture of where the time is going for
>> the case we really care about).
>>
>> Full profiling report available if folks want to look at it.  Be
>> warned that we're already past the point where enabling profiling has
>> a significant effect on throughput (current penalty is roughly 2x).
>>
>> I'll send a copy of the proposed change to the FMC code to Paul and
>> Pavel for review before pushing it to master.
>> _______________________________________________
>> Tech mailing list
>> Tech at cryptech.is
>> https://lists.cryptech.is/listinfo/tech
> 
> _______________________________________________
> Tech mailing list
> Tech at cryptech.is
> https://lists.cryptech.is/listinfo/tech
> 


More information about the Tech mailing list