[Cryptech Core] Input to NGI_trust presentation?

Pavel Shatov meisterpaul1 at yandex.ru
Thu Mar 5 09:07:17 UTC 2020



On 05.03.2020 10:55, Joachim Strömbergson wrote:
> Aloha!
> 
> Thank you Paul, a great summary.
> 
> For comparison the SafeNet USB HSM perform about 60 RSA-2048
> signatures/s. If we could get the 180 MHz clock speed to work we should
> be in the 50+ range. So fairly close to a commercial machine with
> similar interfaces.
> 
> https://safenet.gemalto.com/data-encryption/hardware-security-modules-hsms/usb-hsm/
> 
> BR,
> JoachimS
> 

The latest ModExpNG can do about 120 exponentiations with 2048-bit 
modulus per second. This is obviously the limit on how many signatures 
per second can be generated using one core instance. Note, that we 
aren't exploiting the core to its full potential yet, Paul has already 
identified most of the things that should be done in that direction: get 
rid of byte swapping, get rid of blinding factor mutation in software, 
get 180 MHz internal clock to build, throw away ModExpA7 to free DSP 
slices and add more instances of ModExpNG instead, etc.

According to the latest bi-weekly chat, I'm concentrating on doing the 
planned changes to the hardware design at the moment, since that is the 
next milestone. We definitely should keep the aforementioned 
improvements in mind, maybe do them in background if time permits or 
delay until some later milestone.


> On 2020-03-04 21:00, Paul Selkirk wrote:
>> (Copied to core@ because I think it's a matter of general interest.)
>>
>> Here are a few performance numbers from recent work.
>>
>> The following table is signatures/second, using
>> libhal/tests/parallel-signatures.py, 2048-bit key, 1000 signatures per
>> run, with 1-4 signers.
>>
>>          releng      clocking    modexpng    ng + keywrap
>> 1:       6.924       8.106      10.815      13.358
>> 2:      11.660      13.450      16.302      22.188
>> 3:      14.898       9.836       7.095      25.696
>> 4:       7.865       6.848       4.975      25.688
>>
>> releng: from 2019-09-03 releng tarball
>> clocking: Pavel's clocking work (90MHz FPGA, 45MHz FMC)
>> modexpng: Pavel's modexpng core (clocked at 90MHz)
>> ng + keywrap: modexpng + Joachim's keywrap core
>>
>> Note that, in all cases, the bitstream is minimally resourced: 1 pair of
>> modexpa7 cores and/or 1 modexpng core for signing, 1 AES core and/or 1
>> keywrap core for key wrap/unwrap.
>>
>> In particular, note that there is a regression going from "releng" to
>> "clocking" with >2 signers. I believe this is because of excessive
>> contention for the one AES core for key unwrap. This is made worse by
>> the fact that the clocking changes reduced the FMC clock while
>> increasing the FPGA clock, and the "old" keywrap spends a lot of its
>> time pushing data back and forth across the FMC bus.
>>
>> Anyway, I don't know how you want to report this. We can show a
>> performance increase of 2x or 3x, depending on how you pick the numbers.
>>
>> BTW, Pavel built and tested modexpng with an internal clock of 180MHz; I
>> wasn't able to get that to meet timing, so my tests were all with a
>> 90MHz modexpng. I did recently fix the driver to take advantage of the
>> hardware blinding factor mutation, but I haven't addressed the
>> byte-swapping issues. Point being that we should be able to squeeze more
>> performance out of this core without a lot of trouble.
>>
>> 				paul
>>
> 
> 
> 
> _______________________________________________
> Core mailing list
> Core at cryptech.is
> https://lists.cryptech.is/listinfo/core
> 

-- 
With best regards,
Pavel Shatov


More information about the Core mailing list