[Cryptech Tech] [Cryptech-Commits] [core/platform/novena] 21/21: Sick hacks to compensate for sparse MUX within TRNG core.

Rob Austein sra at hactrn.net
Thu Oct 1 18:56:28 UTC 2015


At Thu, 01 Oct 2015 13:33:48 +0200, Joachim Strömbergson wrote:
> Rob Austein wrote:
> > At Wed, 30 Sep 2015 15:31:35 +0200, Joachim Strömbergson wrote:
...
> >> The second issue is if we should also flag for impproper use of the
> >> API
> > 
> > that's the function that seems like it could just be a bit in the
> > status word.
> 
> Sure. The only issue I have with that is deciding the temporal validity
> of that status bit. My suggestion is that any violation sets the bit and
> any reads to the status register (where the bit is located) clears the
> bit. This means that if you do more than one violation before reading
> the status register you wont know which one caused it. Is this an OK
> solution?

Or violation sets the bit and it stays set until the CPU performs some
kind of explicit reset, eg, sets new flag in the control word.  Either
could work, but the latter seems more robust.

> > Doctrine as I was taught is to do as little as one possibly can at 
> > interrupt level: set a flag, signal a condition, whatever, then get 
> > the hell out and let main program level handle it.
> > interrupt to schedule an immediate poll is fine, and using an 
> > interrupt to say "everything you were doing on the FPGA is now 
> > hopelessly borked, bye, have a nice life" would also be appropriate
> > if such a condition could occur, but having a real conversation with
> > the FPGA at interrupt level would frowned upon.
> 
> We might have been tought different doctrines. In RTOSes at least, being
> able to handle high prio stuff including reading an external register is
> imho quite common.

There's a reason why I phrased it as "having a real conversation with
the FPGA."  Reading a register might be OK.  It's when it crosses the
line from that to attempting to execute some kind of complex recovery
behavior that it starts getting to be too much, eg, several round
trips of reading and writing registers, mucking with data structures
on the CPU side, etc.

The point is that one wants to minimize the footprint of things that
the interrupt handler touches directly, because anything that an
interrupt handler can touch needs to be protected, rentrant, blah blah
blah.  One does it when one must, but one keeps it to a minimum.

It gets more fun with a hierarchy of interrupts, so that interrupt
handlers themselves can be interrupted.

> But lets not digress. The important thing is if we are in agreement
> design wise. And if I'm understanding it, it is this:
> 
> (1) We remove all error signals from the core APIs. Instead API access
> violations are logged internally by setting an error bit in the status
> register. Which bit to use needs to be defined across the cores. Since
> we have:
> 
> STATUS_READY_BIT = 0;
> STATUS_VALID_BIT = 1;
> 
> We could for example select 2 for error. And we also reserve some part
> of the status register bit space, [24..31] for example, for individual
> cores to specify different error conditions not related to access
> violations. (see example below).
> 
> 
> (2) Cores that have possible internal error conditions not directly
> related to API access problems should have a separate, one bit wide
> error signal out. This signal will then be connected to a utility core
> that muxes error signals onto an external pin, provides status info
> about which error event came from which core etc.

I think the issue is less whether such errors are connected with the
API usage/access/...  than whether they're synchronous with respect to
the API.  TRNG breaking in the middle of the night is asynchronous, so
it needs to interrupt.

> As an example lets say that the TRNG has separate health monitors in
> each entropy provider as well as the output of the csprng. So with three
> entropy providers the TRNG would have four different error bits in the
> status register.
> 
> Suddenly the rosc based entropy source monitor detects that the entropy
> source is dead. The TRNG sets its defined "BAD_ENTROPY_SOURCE2" bit in
> its status register and pulls the error signal.
> 
> SW would get the IRQ and very soonish would read the appropriate
> register in the utility core to figure out that the IRQ was caused to to
> a problem in the TRNG. SW then reads the status register in the TRNG and
> finds out that entropy source 2 is b0rked. SW can then decide to disable
> this source or stop allowing any further use of the TRNG etc.
> 
> The API error bit in the core would in this case never be set since no
> access violation has occurred.

Yep.


More information about the Tech mailing list