[Cryptech Tech] Happier RSA timing numbers

Rob Austein sra at hactrn.net
Sun May 20 21:00:02 UTC 2018


Adding Peter Gutmann's trick for amortizing the cost of RSA blinding
factors over a long run of signatures was good for approximately 2x
speedup in the RSA code per se.  Having blinding on still makes a
difference, but not a dramatic one, so while we might still want to
turn it off occasionally for testing, it's now cheap enough that I
don't think we need to worry about the cost of it in production.

Brief excerpt from latest profiling run (full log available upon
request, but not necessary for this message):

  index % time    self  children    called     name
  -----------------------------------------------
		  0.00  291.96   24000/24000       hal_rpc_pkey_sign [4]
  [5]     75.1    0.00  291.96   24000         pkey_local_sign [5]
		  0.00  254.21   24000/24000       hal_ks_fetch [7]
		  0.00   37.75   24000/24000       pkey_local_sign_rsa [18]
		  0.01    0.00   48000/54695399     memset [44]
		  0.01    0.00   24000/6811963     hal_critical_section_start [70]
		  0.00    0.00   24000/6811963     hal_critical_section_end [267]
  -----------------------------------------------
		  0.02    1.70  315522/46710960     hal_aes_keywrap [72]
		  3.58  250.08 46395438/46710960     hal_aes_keyunwrap [8]
  [6]     65.7    3.60  251.78 46710960         do_block [6]
		  8.22  109.89 46710960/46735838     hal_io_wait [9]
		 16.75   63.03 140132880/153807632     hal_io_write [12]
		 11.57   42.32 93421920/192643038     hal_io_read [10]
  -----------------------------------------------

So out of 291 seconds spent signing stuff in this test run, we spent
38 seconds on the actual signatures (including ASN.1, blinding, modexp
including FMC I/O, and other arithmetic), spent more than twice that
just on FMC I/O talking to the AES cores, and 110 seconds waiting for
the AES core.  At least that's what the profiler thinks happened.

FWIW, totals here are from a long series of signatures, currently:

  for client in [1..8]:
    for keysize in [1024,2048,4086]:
       for 1000 iterations:
          rsa_decrypt(pkcs1_5_blob)

Testing for silly numbers of clients has been quite useful up until
now, but it is of course possible that it's skewing the numbers so we
may want to drop those cases, or perhaps even go back to just testing
four clients as the optimal target.


More information about the Tech mailing list