[Cryptech Tech] Getting sync_fmc branch to work

Pavel Shatov meisterpaul1 at yandex.ru
Thu Dec 20 11:57:16 UTC 2018


Hi,

I want to report my progress with the fmc_clk branch that will hopefully 
allow us to further increase the signing performance.

The primary obstacle was that the bitstream failed timing at 90 MHz, and 
that was mostly in ECDSA cores. I rewrote those using the approach I 
learned while working on Ed25519 and X25519 cores and did another change 
which was getting rid of a dedicated modular inversion sub-module. I 
replaced that with a micro-coded inversion based on Fermat's little 
theorem, that turned out to be ~10% faster actually.

As a side note, that constant-time modular inversion module can be 
turned into a separate modular inversion math core (with certain amount 
of sanding, of course). I'm not sure, that there's currently an urgent 
need in such a core. From Paul's profiling report it looks like we might 
want a "full" RSA core that will also do the final part of CRT 
("Garner's formula") in hardware, not just the exponentiation part.

The second change is that I overhauled the Verilog repository a bit. The 
problem was that both ECDSA multipliers and ModExpA7 use vendor math 
slices and they both had their essentially identical copies of wrappers 
and generic replacements for simulation. I moved all that reused stuff 
to core/lib and updated the cores to use primitives from there. I 
updated the Makefile and the core configuration file accordingly. I 
believe I was also able to fix the issue that it was not possible to 
build a bitstream with only ECDSA cores or only ModExp cores, but I 
would really appreciate it if someone pulls the changes and tries to 
build the bitstream to make sure that I haven't broken anything by 
chance given the amount of updates I've just pushed.

Now the bad news. For some reason the new bitstream (`fpga show cores' 
should report ECDSA version 0.20 instead of 0.11 now) fails 
unit-tests.py. It locks up in hal_get_random() waiting for the valid bit 
of CSPRNG core to go high. If I disable this test, it locks up a bit 
further in 'test_attribute_bloat_token_big' which calls hal_uuid_gen() 
which in its turn again calls hal_get_random(). If I drop FMC clock to 
60 MHz (line 156 of stm-fmc.c sets the divisor: 2 is for 90 MHz, 3 is 
for 60 MHz), then all the tests pass just fine. This is strange because 
we've already seen this situation, but then we clearly had failed 
timing, while now everything should be fine. I haven't done any thorough 
investigation yet, my very preliminary guess is that maybe we need to 
change the number of taps in the ring oscillator entropy source or 
something like that. I do have a platform cable, so given hints on where 
to look I can try to debug, but I suggest that someone first tries to 
reproduce the situation, because maybe I'm doing something wrong (not 
pulled latest changes from Joachim, etc...)


-- 
With best regards,
Pavel Shatov


More information about the Tech mailing list