[Cryptech Tech] EIM: mostly working
Bernd Paysan
bernd at net2o.de
Thu Nov 6 22:53:43 UTC 2014
Am Donnerstag, 6. November 2014, 16:55:53 schrieb Paul Selkirk:
> On 11/06/2014 03:27 AM, Joachim Strömbergson wrote:
> > But do you get timing closure for these cores @133 MHz. I would have
> > excected them to not be able to be clocked that high in the Spartan-6.
> > What does the timing report says?
> >
> > (1) Ensure that all important timings are met. If not, signals will not
> > have the correct, expected values. Can you extract the final timimg
> > report?
>
> I've attached build logs for both the 133MHz and 25MHz clocks. As
> expected, there are timing differences.
>
> Even clocking the coretest circuit at 25MHz, there are still reported
> timing errors around the EIM code. Bunnie says:
> ///// It's tricky to change code that relates to
> ///// the details of talking to the i.MX6 EIM -- it's difficult to
> ///// close timing, so you'll need to understand a bit about FPGA timing
> ///// closure to make chages to that section
>
> And I don't understand FPGA timing closure.
>
> > (2) Ensure that the 16-32 bit and big endian-little endian-converion
> > works as expected. How about adding a 32-bit register, write to it from
> > EIM and the extract it via I2C. Since you have I2C access working you
> > should be able to see if the conversion works as expected. That is we
> > end up with the expected values in the FPGA.
> >
> > (3) Properly debug the EIM interface and the SW interface including
> > setup with something simple. Possible a single hash core or something
> > even simpler. An adder, multiplier etc with extra ports for debugging.
> > Then we add one of the hash cores and then move towards the complete
> > shebang.
>
> Given that all 3 of the hash cores produce correct digests for single
> and double block tests most of the time, I'm pretty confident in both of
> these things. But I think we're riding the edge of our timing
> tolerances, and it's biting us most often in the 1000-block test (while
> the other tests hash 18 blocks in total).
SHA-1 and the related SHA-2 has in total 4 dependent 32/64 bit additions per
iteration (SHA-2-512 has 64 bit adders). On average, with random inputs and
outputs, the likelyhood of a carry propagation of n bits is 1/2^n for every
bit added. That means, short tests will more likely run fine at higher speed
than longer tests, and the chance that you actually get a 64 bit carry
propagation is very low, unless the designer of the test has done the effort
to find the rare case where this is actually needed.
There's a good reason why Keccak chose not to use adders as non-linear
elements to avoid that delay in a hardware implementation - the AND-based non-
linear element in Keccak has a constant and short gate delay. If you look at
other finalists like SKEIN or BLAKE, they all have 64 bit adders in them,
which is fine if you do it on a 64 bit CPU, but not so good if you do it in
hardware.
The important results of the timing analysis is that:
Timing constraint: Default period analysis for Clock 'EIM_BCLK'
Clock period: 11.468ns (frequency: 87.201MHz)
The critical path is through the carry chain of a 64 bit adder in the sha512
block. So SHA-1 is probably fine, but SHA-2-512 isn't. This is limited by
the 63 carries the signal has to propagate through; the carries are fast, and
the placement is good, so the only thing you can do about this is to make
sha512 take 2 cycles per add, putting it on a tick/tock clock which is
disabled every second cycle, but still fully synchronous to the rest of the
chip.
The next problem seems to be addressing; there's a significant routing delay.
You can probably solve that by providing local address latches, which make a
read take two cycles, with the global propagation delay in the first cycle,
and the decode and mux delay in the second cycle.
Another timing problem is in the synchronous reset, again here the routing
delay is the culprit. That's one reason why I recommend async reset and
delayed clock enable even on an FPGA (the more important is that you can use
the same code on an ASIC, where async reset is more commonly used than sync
reset). You can completely take that timing problem out, by having a small
down-counter from reset which waits for n cycles (4, 8) until it allows the
clock to pass.
--
Bernd Paysan
"If you want it done right, you have to do it yourself"
http://bernd-paysan.de/
More information about the Tech
mailing list