[Cryptech Tech] keystore flash performance

Paul Selkirk paul at psgd.org
Mon Feb 20 23:14:20 UTC 2017

In the course of reviewing Rob's new keystore architecture (ksng
branches in the libhal, pkcs11, and stm32 repos), I've been looking at
performance of the SPI flash chip. In short, most of the poor
performance is self-inflicted.

As originally coded, there were 1ms delays after every SPI operation
(both transmit and receive), and an additional 10ms delay in the
_wait_while_wip() loop. I was able to significantly speed up flash
operations just by removing these delays, with no loss of stability or

I'm not an expert, and I haven't fully re-read the 84 page data sheet
for the chip, but looking at the timing diagrams, there doesn't seem to
be any need for a delay between transmitting the command and receiving
the result, or for a delay before de-selecting the chip, especially
since the low-level SPI code has its own timeout mechanism. Maybe
Fredrik can comment?

For my test program, I read the entire flash chip, erase it in different
ways, read again to verify erasure, write out a pattern, and read again
to verify the write. Results are shown.


read page       131.072 sec for 65536 rounds, 2.00 ms each
erase subsector 290.548 sec for 4096 rounds, 70.93 ms each
erase sector    68.652 sec for 256 rounds, 268.17 ms each
erase bulk      45.907 sec
verify erase    131.072 sec for 65536 rounds, 2.00 ms each
write page      1048.576 sec for 65536 rounds, 16.00 ms each
verify write    131.072 sec for 65536 rounds, 2.00 ms each

After removing delays (with speed-up factor):

read page       15.681 sec for 65536 rounds, 0.23 ms each       8.4
erase subsector 238.017 sec for 4096 rounds, 58.10 ms each      1.2
erase sector    66.339 sec for 256 rounds, 259.13 ms each       1.0
erase bulk      45.907 sec                                      1.0
verify erase    16.073 sec for 65536 rounds, 0.24 ms each       8.2
write page      24.327 sec for 65536 rounds, 0.37 ms each       43.1
verify write    16.073 sec for 65536 rounds, 0.24 ms each       8.2

How did writing 16MB go from 17.5 min to 24 sec? It turns out that
almost all the time was spent in delays. The sequence of events is like

send WRITE_ENABLE, delay 1ms
send READ_STATUS (write enable), delay 1ms
send PROGRAM_PAGE, delay 1ms
send data, delay 1ms
send READ_STATUS (write in progress), delay 1ms
WIP flag is asserted, so delay 10ms
send READ_STATUS (write in progress), delay 1ms

Erase shows more modest gains, because each operation takes a lot
longer, so the delays are a smaller percentage. There is also some
jitter in the erase timing, so there's only a 9ms difference in the
sector erase in these examples.


More information about the Tech mailing list