[Cryptech Core] Input to NGI_trust presentation?
Pavel Shatov
meisterpaul1 at yandex.ru
Thu Mar 5 09:07:17 UTC 2020
On 05.03.2020 10:55, Joachim Strömbergson wrote:
> Aloha!
>
> Thank you Paul, a great summary.
>
> For comparison the SafeNet USB HSM perform about 60 RSA-2048
> signatures/s. If we could get the 180 MHz clock speed to work we should
> be in the 50+ range. So fairly close to a commercial machine with
> similar interfaces.
>
> https://safenet.gemalto.com/data-encryption/hardware-security-modules-hsms/usb-hsm/
>
> BR,
> JoachimS
>
The latest ModExpNG can do about 120 exponentiations with 2048-bit
modulus per second. This is obviously the limit on how many signatures
per second can be generated using one core instance. Note, that we
aren't exploiting the core to its full potential yet, Paul has already
identified most of the things that should be done in that direction: get
rid of byte swapping, get rid of blinding factor mutation in software,
get 180 MHz internal clock to build, throw away ModExpA7 to free DSP
slices and add more instances of ModExpNG instead, etc.
According to the latest bi-weekly chat, I'm concentrating on doing the
planned changes to the hardware design at the moment, since that is the
next milestone. We definitely should keep the aforementioned
improvements in mind, maybe do them in background if time permits or
delay until some later milestone.
> On 2020-03-04 21:00, Paul Selkirk wrote:
>> (Copied to core@ because I think it's a matter of general interest.)
>>
>> Here are a few performance numbers from recent work.
>>
>> The following table is signatures/second, using
>> libhal/tests/parallel-signatures.py, 2048-bit key, 1000 signatures per
>> run, with 1-4 signers.
>>
>> releng clocking modexpng ng + keywrap
>> 1: 6.924 8.106 10.815 13.358
>> 2: 11.660 13.450 16.302 22.188
>> 3: 14.898 9.836 7.095 25.696
>> 4: 7.865 6.848 4.975 25.688
>>
>> releng: from 2019-09-03 releng tarball
>> clocking: Pavel's clocking work (90MHz FPGA, 45MHz FMC)
>> modexpng: Pavel's modexpng core (clocked at 90MHz)
>> ng + keywrap: modexpng + Joachim's keywrap core
>>
>> Note that, in all cases, the bitstream is minimally resourced: 1 pair of
>> modexpa7 cores and/or 1 modexpng core for signing, 1 AES core and/or 1
>> keywrap core for key wrap/unwrap.
>>
>> In particular, note that there is a regression going from "releng" to
>> "clocking" with >2 signers. I believe this is because of excessive
>> contention for the one AES core for key unwrap. This is made worse by
>> the fact that the clocking changes reduced the FMC clock while
>> increasing the FPGA clock, and the "old" keywrap spends a lot of its
>> time pushing data back and forth across the FMC bus.
>>
>> Anyway, I don't know how you want to report this. We can show a
>> performance increase of 2x or 3x, depending on how you pick the numbers.
>>
>> BTW, Pavel built and tested modexpng with an internal clock of 180MHz; I
>> wasn't able to get that to meet timing, so my tests were all with a
>> 90MHz modexpng. I did recently fix the driver to take advantage of the
>> hardware blinding factor mutation, but I haven't addressed the
>> byte-swapping issues. Point being that we should be able to squeeze more
>> performance out of this core without a lot of trouble.
>>
>> paul
>>
>
>
>
> _______________________________________________
> Core mailing list
> Core at cryptech.is
> https://lists.cryptech.is/listinfo/core
>
--
With best regards,
Pavel Shatov
More information about the Core
mailing list