[Cryptech Tech] Seeking comments on a proposal for changes to the Cryptech RNG design.

Pavel Shatov meisterpaul1 at yandex.ru
Wed Mar 28 13:07:38 UTC 2018

22.03.2018 14:52, Joachim Strömbergson пишет:
> Hash: SHA256
> Aloha!

Hi, Joachim, sorry for a late reply.

> I'm considering changing part of the Cryptech RNG and would like
> feedback on it. The note below tries to explain why, how etc.
> Replacing the mixer in the Cryptech RNG
> =======================================
> 1. Intro:
> This is a short note about a possible change of the mixer stage in the
> Cryptech RNG. The note tries to describe the current solution, what we
> want to change and why. The note contains three possible solutions and
> then discuss the security implications.
> 2. Current architecture:
> The random number generator in Cryptech [1] currently consists of a
> chain comprising of three major stages:
> 1. A set of separate entropy sources. Currently one entropy source
>     external to the FPGA and one internal entropy source.
> 2. A stateful entropy mixer that generates seeds.
> 3. A deterministic random bit generator (DRBG)
> Between stage 2 and 3 there is also a FIFO capable of storing a few
> output words from the mixer. There are three reasons for the mixer:
>    1. Ensure that the entropy collected from the sources are mixed to
>    block bias in one source to dominate, affect the generated seed.
>    2. Ensure that an attack on the entropy sources at a given time does
>    not propagate to the seeds generated. This is due to the fact that
>    seeds generated does not only depend on entropy collected at a given
>    time, but also on an internal state.
>    3. Decouple the entropy collection and seed generation from the random
>    number generation. This allows the seed generation to continue
>    independently, updating the internal state. And when the random bit
>    generator is to be reseeded the seeds needed are ready.
> The last reason is both for security, but also for efficiency. The
> decoupled mixer allows us to reseed with very little affect on the
> random number generation performance. And compared to today, we probably
> want to reseed more frequently than today [1].
> The stateful entropy mixer consumes 32-bit words from the entropy
> sources selected in round-robin order. The mixer treats the words as
> being part of a continuous message. The words are combined into message
> blocks and hashed using SHA-512 (NIST FIPS 180-4) one block at a
> time. The output from the mixer is the updated message digest given by
> the consumed block and the previous hash state. This means that the
> output depends both on the consumed entropy words in a given block, but
> also on the digest state from all previous message blocks.
> The mixer message block size is 1024 bits. The internal state is 512
> bits and the output is 512-bits.
> The extracted 512-bit words are used in pairs to seed the DRBG. The DRBG
> is the stream cipher ChaCha with 256 bit key, 64 bit IV, 64 bit counter
> initial value and 256 bit input data block. In total 896 bits are used
> to seed the DRBG to its initial state.
> 3. Problem definition:
> There are two drivers for the change we want to do:
>    1. Decrease the FPGA resources used by the RNG.
>    2. Remove a major obstacle for running the Cryptech FPGA at 100 MHz.
> SHA-512 is a big core. In Xilinx Artix-7 the core requires 1575 Slices
> and 3918 registers. This is bigger than the ChaCha core that requires
> 1076 slices and 1949 registers.
> The SHA-512 core can be clocked at 96 MHz in Artix-7. This is just below
> a target frequency of 100 MHz. But to comfortably meet this clock, we
> would need a margin. Getting that margin would a redesign (pipelining of
> the round function) and basically halve the performance thus nullifying
> any performance gained by the increased clock speed.

I was thinking, in what place of SHA-512 exactly do timing checks fail? 
I bet it's in the carry chains of 64-bit adders. One thing you can try 
doing is replacing them with 32-bit adders, that take 2 clock cycles to 
compute the entire 64-bit sum. I mean, suppose that one needs to 
calculate s[63:0] = x[63:0] + y[63:0], then during the first cycle one does:

x_msb_dly <= x[63:32];
y_msb_dly <= y[63:32];
{carry_int, sum_lsb} <= x[31:0] + y[31:0];

and during the second cycle one does:

sum_lsb_dly[31:0] <= sum_lsb;
sum_msb <= x_msb_dly + y_msb_dly + carry_int;

The resulting sum is {sum_msb, sum_lsb_dly}. This way we may be able to 
keep SHA-512 in place, because SHA-256 with 32-bit adders works fine at 
100 MHz, as far as I understand. I don't know whether this change is 
viable, because you definitely have a better understanding of you core's 
internals. Do you have any idea how difficult it would be to pipeline 
SHA-512? I agree with Rob, that we should most probably go with the 
easier option. I mean, rewriting the core vs. replacing it altogether.

> 4. Possible solutions:
> I see three possible solutions:
> 4.1 Replace with SHA-256
> Cryptech has a SHA-256 core in the repo. The core is well tested and
> used in several designs besides Cryptech. The SHA-256 core
> requires 688 Slices and 1929 regsisters. Basically halving the size of
> mixer.
> The difference in block size and digest size requires some
> adaptations. Using SHA-256 would require generating double the number of
> digests to seed ChaCha. The block size would change, but
> the rate of entropy consumption would stay the same. The internal state
> would decrease from 512 to 256 bits.
> 4.2 Replace with Blake2s
> The Blake2s hash function is much faster than SHA-256. In other
> respects the changes in block size, internal state would be the same as
> for SHA-256. I have a partial Blake2s core that could be completed if
> this is the direction chosen. Blake2s also supports key based hashing
> which would allow per-device unique mixing.
> 4.3 Replace with SipHash
> SipHash [3] is a short message PRF. The block size is 64 bits and
> SipHash generates digests of 128 bits. SipHash is really compact in
> hardware and is also very fast. SipHash is primarily used as hash
> function in FreeBSD, OpenDNS, Perl, Ruby, and Rust.
> I have a HW implementation of SipHash [4]. That core requires 432 Slices
> and 789 registers in Xilinx Artix-7. The core is capable of running at
> 173 MHz. In terms of size and clock speed, this core is really
> attractive.
> With SipHash the mixer would have to change quite a bit more. The
> solution I have in mind is to:
>    I. Process in total of 256 bits of entropy as a separate message and
>       generate a 128 bit digest for that message. This means that each
>       message is four blocks. Each block contains one 32-bit word from
>       each entropy source. No round-robin collector is needed.
>   II. Every odd message is used to generate a 128 bit key. The key is
>       used to generate the digest for the next message. Every even
>       message (generated by the key from the previous message) is used as
>       seed for the CSPRNG.
> This means that the 128 bit key is the internal state of the mixer.
> The FIFO between the mixer and the CSPRNG could also decrease in size
> since the seed word size would decrease from 512 to 128 bits.
> 5. Discussion:
> - From a technical point of view it would be quite easy to replace the
> SHA-512 core with another hash core (SHA-256 or Blake2s) or even
> SipHash). The difference in block size and digest size requires some
> adaptations. Using SHA-256 for example would require generating double
> the number of digests to seed ChaCha. The block size would change, but
> the rate of entropy consumption would stay the same.
> The SipHash based solution would be the more radical change. It would
> provide the biggest reduction in FPGA resources used. It would also
> change and simplify entropy collection.
> The big question is how the different solutions affects the security of
> the RNG chain. As I see it, this is mainly due to the size of the internal
> state. SHA-256 has an internal state of 256 bits. Blake2s in comparison
> has an internal state of 512 bits. With SipHash the state is given by
> the key and would be 128 bits.
> - From a high level perspective. We consume a large number of entropy bits
> to end up with a ChaCha key of 256 bits (we also set IV and initial
> value of the counter. But the security of the CSPRNG is in the key. With
> SHA-512, SHA-256 or SHA-512 the internal state is either bigger or equal
> to 256 bits. With SipHash the internal state is 128 bits.
> Another question with SipHash is if the 128 bit digests can be
> considered independent enough to be concatenated into a 256 bit key?
> Should we replace SHA-512 with something else? If so, under Which
> conditions? What core should we use instead?
> I personally think that Blake2s would be interesting, but that would
> require more work than SHA-256. SipHash is very tiny and very fast. But
> feels like a bigger change security wise.
> 6. Other changes considered
> I have a list of other changes that we might want to consider to do in
> the near term.
> 1. Feedback from CSPRNG to mixer. Basically extract the first N words
> from the CSPRNG and send back to the mixer as a third entropy
> source. These N words would not be provided to the applications. This
> adds another internal state that makes it harder for an attacker to
> control.
> 2. Change the seed FIFO into a mutator queue. Instead of simply hold a
> number of seed words and signal to the CSPRNG that there are seed words
> available, the queue would act as a mutator ring buffer. The mixer can
> keep on inserting words and inserted words are XOR:ed into the seed
> words stored in the buffer.
> 3. Add entropy monitors to the entropy providers. Currently the entropy
> providers perform warm-up before providing words. But the quality (or
> lack thereof) of the entropy generated is not checked. The proposed idea
> since way back is to add some AIS31 [5] tests to detect that the
> entropy sources at least are not totally broken and react if that
> happens.
> 7. References
> [1] Cryptech project. The Cryptech TRNG repoistory.
>      https://trac.cryptech.is/browser/core/rng/trng
> [2] Bernstein, D. J. "Fast-key-erasure random-number generators"
>      2017-07-23. https://blog.cr.yp.to/20170723-random.html
> [3] Aumasson, J-P. Bernstein, D. J. SipHash: a fast short-input PRF.
>      https://www.131002.net/siphash/
> [4] Strombergson. J. SipHash Verilog Implementation.
>      https://github.com/secworks/siphash
> [5] Bundesamt für Sicherheit in der Informationstechnik (BSI). "A
>      proposal for: Functionality classes and evaluation methodology for
>      true (physical) random number generators"
> https://www.bsi.bund.de/SharedDocs/Downloads/DE/BSI/Zertifizierung/Interpretationen/AIS_31_Functionality_classes_evaluation_methodology_for_true_RNG_e.pdf?__blob=publicationFile
> - -- 
> Med vänlig hälsning, Yours
> Joachim Strömbergson - Assured AB
> ========================================================================
> Comment: GPGTools - http://gpgtools.org
> Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/
> ep7qqOsIw8xeFoad+PgUhtlCMA1iiniCx+8373Dbr3FKFjEftU+2gzeimPkSZVHi
> UyVjKghg3Hq3yZoqq3E5g9mP+jBD908oQ/t321T7IBUaGBh6uyaSBy/PongGSXl6
> s0tbOiTVC3IQibnXFtMd3slHCIyPPTwdCRgyLg+NimH2VUfN4UijVJtaljbN4Ilh
> Y14v2wURBkrlUc+J9ySly0Fu28w01ABxdd05cGXHIDsqvK4gFpbNokUIM94vXC5f
> 7Ih0ULglU+0D1T54uwJ8+TNUXNrog3Mi0w//HbBONaVtCi3vGw4OP29nZN/Ye6u1
> 3arSA6s2wqUW2bI0lyIrw2aqeKV7er9j3Jxd6m3o2idfDorRtiQRgPzxazGfcLAO
> Kmh4LvaPt4HDok5/na9PSpY/o5+Mr4Rh13FCurf97w8gghnZXfRLQhcOjP3e4uMT
> /saZk+ucoe+23ODFFLiiHcsuPFoWQ7V3K49POCARMxwyB/FauXjtg4BJWnAuKCNA
> R0PxfTsEueOJhYEXmgW/5dfSiFHopWgis5rQcTgRNS7OMzEQKdpZyxJ7wdd7YSX2
> M2bI1MetO5T+9Idfenw8SuDLS38FS9evOgn4zOAI4yO6EbzFO+T9T1/rKtMA4YsE
> 33Ilr862Cu7eAfHbZTvr
> =8cDS
> _______________________________________________
> Tech mailing list
> Tech at cryptech.is
> https://lists.cryptech.is/listinfo/tech

With best regards,
Pavel Shatov

More information about the Tech mailing list