[Cryptech Core] Proposal for new FPGA architecture

Joachim Strömbergson joachim at assured.se
Thu Oct 25 13:48:49 UTC 2018


AlohA!

On the F2F we talked about the possibility to ensure that unwrapped
secrets never leaves the FPGA. I've looked at the architecture in the
FPGA and how we could achieve this. Hopefully in a way that minimize the
changes to all cores. And I think I have an idea how to do this.

Initially the idea was to have a simple command handler and storage for
multiple object buffers. The MCU would write data to be processed into
an object buffer including metadata about object size, type etc. Then
the MCU would instruct the command handler to perform one or more
operations. For example "peform SHA-512 hash on object0 and then perform
ECDSA sign using the private key in object1. Put result in object3."

Clearly this would be a rather complex design. Even though the command
set would be finite, even small, it would still have variants, options
such as key lengths etc. This solution might be interesting to target in
a long run. But it is hard to get there very incrementallty.

Another issue is that some cores would not benefit very much from a
command handler in front of it. The TRNG for example where the only real
interesting operation is to read random data words.

So, instead I would like to propose a simpler solution. Instead of a
command handler that understands commands and what the cores do, we
simply implement a DMA engine that provides a few different access
patterns. Please see the included block diagram for an illustration.



The FPGA on the Alpha board is connected to the MCU via the FMC bus. The
core_selector is the module that implements the MUX needed to connect
the different cores (illustrated by TRNG, SHA512 etc). The core_selector
is a transparent module that doesn't have an API. It simply decodes the
address and control signals from the MCU and connects the corresponding
core.

What I propose is to add API functionality to the core selector that
would expose a DMA mechanism with from-to copy functionality to the MCU.
Basically SW would set up transfer from a set of addresses (base +
lengh) to another set of addresses. The supported copy mechanism would
support N-N, 1-N and N-1 operations.

The core selector would also have a two port buffer memory. This would
allow SW to read or write the contents of the memory while the DMA
engine is performing copy operations between the buffer memory and a core.

This means that SW would continue to be responsible for controlling the
cores including checking status. This would also mean that the other
cores would not have to change at all. Zero changes.



The number of transfers would not go down, in fact for object that are
sent to a core by the MCU, processed and then read back to the MCU would
now include two DMA transfers in the FPGA. But for any objects where
more than one operation is performed the number of FMC transfers would
decrease down. One could of course instead consider sending pointers to
cores, but that would mean changing the cores and adding a bus system to
have all cores be able to access the shared buffer memory.

The proposed change does not provide any new security against the MCU
being taken over by rouge SW, nor by physical attacks on the FMC bus. It
simply provides for a way to eliminate the need for transferring
sensitive data back and forth between the MCU and the FPGA.

One could consider partitioning the buffer memory into two parts and
block the MCU from accessing one of the parts. In this way the DMA could
be made to move sensitive data (an unwrapped key for example) into the
other parts. And the DMA could be blocked from transferring data from
the sensitive part to the MCU accessible part. But then the MCU could of
course make the DMA first write the sensitive data to a core memory, and
then write it to the memory area it can access (or in the core directly).


One thing that will be affected is how the core_selector is implemented.
Today it is generated by a python program based on the core
configuration. This generator would have to be modified and possibly be
a bit more complicated to include its own API. implement a MUX that can
be controlled by the DMA engine. And instantiate the the buffer memory
and the DMA engine.

Still I think this is the simplest way forward that would allow us to
stop moving sensitive data back and forth. And we can implement it with
a fairly low complex DMA engine, and without updating the other cores.


Comments, questions, suggestions?
-- 
Med vänlig hälsning, Yours

Joachim Strömbergson
========================================================================
                               Assured AB
========================================================================
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Screen Shot 2018-10-25 at 14.47.25.png
Type: image/png
Size: 443300 bytes
Desc: not available
URL: <https://lists.cryptech.is/archives/core/attachments/20181025/d9444543/attachment-0001.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 801 bytes
Desc: OpenPGP digital signature
URL: <https://lists.cryptech.is/archives/core/attachments/20181025/d9444543/attachment-0001.sig>


More information about the Core mailing list