[Cryptech Tech] keystore flash performance
Paul Selkirk
paul at psgd.org
Mon Feb 20 23:14:20 UTC 2017
In the course of reviewing Rob's new keystore architecture (ksng
branches in the libhal, pkcs11, and stm32 repos), I've been looking at
performance of the SPI flash chip. In short, most of the poor
performance is self-inflicted.
As originally coded, there were 1ms delays after every SPI operation
(both transmit and receive), and an additional 10ms delay in the
_wait_while_wip() loop. I was able to significantly speed up flash
operations just by removing these delays, with no loss of stability or
functionality.
I'm not an expert, and I haven't fully re-read the 84 page data sheet
for the chip, but looking at the timing diagrams, there doesn't seem to
be any need for a delay between transmitting the command and receiving
the result, or for a delay before de-selecting the chip, especially
since the low-level SPI code has its own timeout mechanism. Maybe
Fredrik can comment?
For my test program, I read the entire flash chip, erase it in different
ways, read again to verify erasure, write out a pattern, and read again
to verify the write. Results are shown.
Before:
read page 131.072 sec for 65536 rounds, 2.00 ms each
erase subsector 290.548 sec for 4096 rounds, 70.93 ms each
erase sector 68.652 sec for 256 rounds, 268.17 ms each
erase bulk 45.907 sec
verify erase 131.072 sec for 65536 rounds, 2.00 ms each
write page 1048.576 sec for 65536 rounds, 16.00 ms each
verify write 131.072 sec for 65536 rounds, 2.00 ms each
After removing delays (with speed-up factor):
read page 15.681 sec for 65536 rounds, 0.23 ms each 8.4
erase subsector 238.017 sec for 4096 rounds, 58.10 ms each 1.2
erase sector 66.339 sec for 256 rounds, 259.13 ms each 1.0
erase bulk 45.907 sec 1.0
verify erase 16.073 sec for 65536 rounds, 0.24 ms each 8.2
write page 24.327 sec for 65536 rounds, 0.37 ms each 43.1
verify write 16.073 sec for 65536 rounds, 0.24 ms each 8.2
How did writing 16MB go from 17.5 min to 24 sec? It turns out that
almost all the time was spent in delays. The sequence of events is like
this:
send WRITE_ENABLE, delay 1ms
send READ_STATUS (write enable), delay 1ms
send PROGRAM_PAGE, delay 1ms
send data, delay 1ms
send READ_STATUS (write in progress), delay 1ms
WIP flag is asserted, so delay 10ms
send READ_STATUS (write in progress), delay 1ms
Erase shows more modest gains, because each operation takes a lot
longer, so the delays are a smaller percentage. There is also some
jitter in the erase timing, so there's only a 9ms difference in the
sector erase in these examples.
paul
More information about the Tech
mailing list