[Cryptech Core] modexp optimization plans (was: Re: teleconf?)

Joachim Strömbergson joachim at secworks.se
Thu Jun 18 12:46:39 UTC 2015


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

Aloha!

Randy Bush wrote:
> o joachim, can anybody provide useful help with modexp and
> underlying bignum

There isn't really a problem with modexp or bignum functionality wise as
I see it. And I believe I have a good grasp of the operation as such.
The problem is getting it to run much, much faster. After reading up on
several papers on high speed FPGA modexp, montprod and RSA
implementations, I'm fairly certain that the architecture we have is
correct, but it needs to be optimized.

Optimization is basically  a matter of hard work and loads of testing to
make sure that the baseline isn't affected. I have set up good test
cases that allows for fast testing while making sure that the
functionality isn't b0rked during changes.

The following is a short description on the work plan and optimization
strategy.

The montprod (montprod.v) module is the module that does all the hard
work. For real size operands (2048), we will do thousands of montprod
operations.

1. montprod optimization
Internally the montprod consists of two adders and a shifter that
iterates over the operands. The baseline we have today does each of
these operations separately.

Shaving off cycles in the montprod inner loop has great impact on the
performance gains without really adding hw resources. Yesterday I
competed fusing the adder iterations into a single iteration. This
removed 30+% of all cycles.

Right now I'm working on fusing the shifter into the adder iteration.
This will remove another 30% or so. The fusing requires a pipeline stage
which adds a single cycle/loop but the end result is a net loss of
cycles. Finally there are a few minor operations for init and ending in
montprod. Even a single cycle saved in montprod gives measurable
performance gains for real operand sizes.

But basically, fusing the big iterations, making a smarter init of the
internal s_memory and the montpod should be done in terms of cycle
optimization.


2. internal operand size
Currently montprod and modexp) operates on 32-bit words. Increasing the
size to 64 bits will halve the number of cycles in an iteration. The
current version of modexp can do 115 MHz in Spartan-6, and in
combination with test implementations, increasing operands to 64 bits is
possible, 128 bits is probably possible too.

Changing operands to a bigger size means changing a lot of wires and
operations as well as counters etc all over modexp and montprod. But the
functionality per se will be the same. A lot of details to fix, but
basically fairly easy. This change will increase the number of block
memories used in the FPGA. But we have a lot of those still.


3. parallel montprod operations
The montprod core is actually used in two different settings in modexp.
In the current version of modexp a single montprod is used and muxes are
used to allow sharing of the single instance. Instantiating a second
montprod will allow for some parallel processing and this increased
performance.

The cost for this is more logic and registers from a second montprod as
well as another block memory. Some of the increased logic will
compensated for by the removal of the logic (muxes) needed to share the
single montprod instance.


4. Minor tweaks
There are several possible minor tweaks possible to do. Some of them we
want to do anyway


4.1 Optimize modexp FSM. There are a few cycles to be gained here, but
will not gain a lot of performance.


4.2 Implement support for short exponent. Currently the size of the
exponent is ignored. This means that an operation with a public exponent
(such as 65537) takes as long time as if the exponent is as big as the
modulus. This fix is easy to do and will drastically reduce the time to
do operations with short exponents.


4.3 Fix MSW padding. In order to do correct calculations when MSB is set
in modulus, the operands needs to be padded with an extra word. This
means that for an operation with 1024 bit operands (i.e. 32 words), we
today needs to use 33 words. This is what for example Java BigInteger
does too, but is hidden from the user application.

The cost of padding with an extra word is that the processing time is
increased. So both because it makes modexp harder to use due to padding
requirements and to improve performance, the padding/high bit should be
handled internally in modexp and in in a smarter way.


4.4 operation skipping. In the current version of the core, we do all
adds and shifts even though the result is zero (i.e. mult with a bit
that is zero). We could basically halve the processing time by skipping
iterations that will be zero (half of the bits in the modulus for
example is probably zero.).

Doing operation skipping means introducing data/operand/key dependent
processing time and power consumption and is the source of typical side
channel attacks on RSA. If we want to implement operation skipping I
think it should be a compile time flag to allow an implementator to
enable this function if he/she thinks the application is ok with side
channel problems.


4.5 Revert to explicit operand memories. To make Rob happier.


The plan is to work on these changes in the order given above. Step one
is close to completion and I should have done that and at least started
changing operand sizes to 64 bits. When that works and the performance
gained has been verified I will quite probably increase operand size to
128 bits before moving to step three. (having learned what changes to do
in order to increase operand size, increasing again should be easy.).
Step 4.1 should be easy to do and could be done asap. But it will not
provide any gains for our primary use case. Step 4.4, if done at all
will be last.

But basically focus on optimizing where the big gains are, esp those
that don't increase resources or affects the API. Then cleanup optimize
the API.

I have been extremely bogged down with other things the last couple of
weeks, but have now completed everything else besides Cryptech before
summer. I have two more weeks of work before vacation and will fully
focus on Cryptech.

We could perhaps parallelize the work, but will run into the risk of
messing in each others changes. I feel confident that I can do these
changes and is up to speed.


Happy Midsummers Eve!
- -- 
Med vänlig hälsning, Yours

Joachim Strömbergson - Alltid i harmonisk svängning.
========================================================================
 Joachim Strömbergson          Secworks AB          joachim at secworks.se
========================================================================
-----BEGIN PGP SIGNATURE-----
Comment: GPGTools - http://gpgtools.org
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iQIcBAEBCAAGBQJVgr2vAAoJEF3cfFQkIuyN8xUP/1sZmvAwEwtDFpHbAmrMK6vr
+KsJOn/gwkRVgU456/2yJN14JllmCWKsNYm5uOCXY/VtcwZIeeZDfaBYUOsbKlWq
uWFVRVA7XAdx9VgV7tSejZ5h8IlHs4RXBD7LiJUJbmGLBqMvsw+cknxnQ/0lr23r
Md03tC1bsnZaJ285RtFMfH+hHaiy0xinLIkt0sdUcZ/9Q7klBrkrFWAaz55AcUsd
az0KqbObNa0IiH8m8/qq303wQVHKZ2Vdjtks+FeXUdQzENXCSkGASNPNiJh3I+iA
tCfGKTC55JS1ixS6M2bTPyNYxH+QjgCqsana1K4vmRZqEw3P1Df7B06l8OfHTmNO
fkJIWQDygnraH0ru2Uc1ZTRQ+f4fa2r6VuH9+J792MzAbSD2jZw+JNCUYi/yn/yr
Y2CWkVP3oL0O/lvT2SZwYhXK57oHoUSTHQ560ssfUBfC4zM/eZ88fQNX85VfU4Nx
1luxeWS02gFf9iCoxrw/Q2smr3NdSPHZSoaebPIXylOXaU/TePO11f9vKKI/7MSJ
UhhN43PoyC0j6RcOjp2SIDq658s+8yp92JjcezfXQDSZ89Hb4uj0E2Su1ZTSs+cA
Pa4hkZGNcQh3pPpdqLf5jKw7vDterf18doYXrNmvQKxu1Vr4QQiylszTZs1vIZpm
3xTHulZtxA2a4TGUJOTI
=v6Ju
-----END PGP SIGNATURE-----



More information about the Core mailing list