[Cryptech Tech] Getting sync_fmc branch to work
Pavel Shatov
meisterpaul1 at yandex.ru
Thu Dec 20 11:57:16 UTC 2018
Hi,
I want to report my progress with the fmc_clk branch that will hopefully
allow us to further increase the signing performance.
The primary obstacle was that the bitstream failed timing at 90 MHz, and
that was mostly in ECDSA cores. I rewrote those using the approach I
learned while working on Ed25519 and X25519 cores and did another change
which was getting rid of a dedicated modular inversion sub-module. I
replaced that with a micro-coded inversion based on Fermat's little
theorem, that turned out to be ~10% faster actually.
As a side note, that constant-time modular inversion module can be
turned into a separate modular inversion math core (with certain amount
of sanding, of course). I'm not sure, that there's currently an urgent
need in such a core. From Paul's profiling report it looks like we might
want a "full" RSA core that will also do the final part of CRT
("Garner's formula") in hardware, not just the exponentiation part.
The second change is that I overhauled the Verilog repository a bit. The
problem was that both ECDSA multipliers and ModExpA7 use vendor math
slices and they both had their essentially identical copies of wrappers
and generic replacements for simulation. I moved all that reused stuff
to core/lib and updated the cores to use primitives from there. I
updated the Makefile and the core configuration file accordingly. I
believe I was also able to fix the issue that it was not possible to
build a bitstream with only ECDSA cores or only ModExp cores, but I
would really appreciate it if someone pulls the changes and tries to
build the bitstream to make sure that I haven't broken anything by
chance given the amount of updates I've just pushed.
Now the bad news. For some reason the new bitstream (`fpga show cores'
should report ECDSA version 0.20 instead of 0.11 now) fails
unit-tests.py. It locks up in hal_get_random() waiting for the valid bit
of CSPRNG core to go high. If I disable this test, it locks up a bit
further in 'test_attribute_bloat_token_big' which calls hal_uuid_gen()
which in its turn again calls hal_get_random(). If I drop FMC clock to
60 MHz (line 156 of stm-fmc.c sets the divisor: 2 is for 90 MHz, 3 is
for 60 MHz), then all the tests pass just fine. This is strange because
we've already seen this situation, but then we clearly had failed
timing, while now everything should be fine. I haven't done any thorough
investigation yet, my very preliminary guess is that maybe we need to
change the number of taps in the ring oscillator entropy source or
something like that. I do have a platform cable, so given hints on where
to look I can try to debug, but I suggest that someone first tries to
reproduce the situation, because maybe I'm doing something wrong (not
pulled latest changes from Joachim, etc...)
--
With best regards,
Pavel Shatov
More information about the Tech
mailing list