[Cryptech Tech] RSA blinding

Pavel Shatov meisterpaul1 at yandex.ru
Tue Jul 4 22:23:40 UTC 2017

03.07.2017 11:48, Peter Gutmann пишет:
> Rob Austein <sra at hactrn.net> writes:
>> We've left RSA blinding enabled unconditionally in all cases for now, out of
>> paranoia, but would be interested in opinions from wider heads about how
>> necessary this really is.
> Definitely a good idea in any case.  Most of the side-channel attacks require
> repeated sampling of an operation and then statistical analysis to break out
> the details of interest, with randomisation each time you're making that a lot
> harder.
> Russ Housley <housley at vigilsec.com> writes:
>> If the Verilog is constant-time and constant-power-consumption, then the
>> major side channels are protected.  
> I don't think anyone's ever managed to do a constant-time, constant-power,
> constant-EMI, constant-* implementation of something like that have they?  You
> occasionally get conference papers demonstrating some new side-channel-
> analysis-resistant implementation, but then the following year at the same
> conference you get another paper un-demonstrating it.
> The thing is, for hardware you don't actually need to have a side-channel-
> resistant implementation.  For software you do because the person writing the
> conference paper can give themselves whatever privileges they need to perform
> the attack (hostile code running on the same CPU with insight into the cache
> or memory access patterns or whatever), but with an HSM you define how far
> they can go, and that's the outside of the HSM.  
> There's a twenty-year-old HSM, IBM's 4758, that was resistant to pretty much
> all of the side-channel attacks that came along after it was developed, not
> because the developers were magically aware of them but because they used good
> engineering practice, power supply decoupling, filtering, etc (I asked them
> about how they managed it and that was their explanation, we designed it
> properly from the outset).
> So worst case all you need to worry about is timing attacks, which is what you
> have blinding for.  If you're still worried then given that you're building an
> HSM rather than a crypto accelerator, so security is more important than
> throughput, quantise the production of results.  That can eventually be
> defeated too with enough samples, but since you're blinding as well you're
> making it pretty difficult for the attacker.
> Peter.
> _______________________________________________
> Tech mailing list
> Tech at cryptech.is
> https://lists.cryptech.is/listinfo/tech


I'm currently in the process of re-designing our ModExp FPGA core. The
primary reason behind this is that our RSA is currently, well, painfully
slow. The software implementation for the STM32 processor has little
potential for improvement, because its performance is ultimately limited
by instruction throughput of the processor. The hardware implementation
is currently on par with the software, but it doesn't exploit all of the
features offered by field-programmable architecture.

Rob and I had an off-list discussion about what can be done to improve
the situation. The obvious thing to do is to convert the FPGA core to
systolic architecture, and that's what I've been doing lately. Getting
rid of blinding was one of the things we considered. As far as I
understand, the question is whether we believe the hardware has no
side-channel leakages to allow removal of blinding. Speaking of timing,
Verilog easily allows true constant-time by design implementation. There
are no cache hits and misses, because there's simply no cache, and
there's no instruction pipeline that can execute operations out of
order, and so on. I can guarantee, that for given operand width and
exponent length ModExp does everything in exactly the same number of
clock cycles.

Speaking of power consumption, as far as I understand, if dummy
multiplications are done for zero bits of the exponent, then major power
fluctuations are eliminated. I can only think of one problem, Verilog
does the following during exponentiation under the hood:

1. R = 1
2. P = M
for i = 0 to k-1
  3. R0 = R
  4. R1 = R * P
  5. P  = P * P
  6. R = E[i] ? R1 : R0

In step 6. block memory R is either overwritten with its own contents
(that changes no bits) or with a different value that on average changes
half of the bits. I have a feeling that those will have different power

I agree, that it's most likely impossible to create a
constant-everything exponentiation design. FPGA allows us to have true
constant-time design, and with careful coding we can keep power
fluctuations as small as possible, but I guess no one can guarantee,
that tomorrow someone doesn't find some new side-channel leakage. In
that sense I tend to support Rob's view of keeping blinding intact just
out of paranoia.

Speaking of that left-to-right scanning vulnerability, the current
hardware ModExp doesn't use sliding windows. It has a control bit, that
either tells the core to skip all multiplications where bits of the
exponent are not set ("public" mode for encryption) or do all the
multiplications and discard the result where bits are not set ("private"
mode for signing).

Note that while left-to-right and right-to-left scanning is equivalent
on paper, the latter one is more suitable for hardware. The inner loop
when going left-to-right looks like:

1. C = C * C
2. if E[i] then C = C * M

Note, that 1. and 2. can only be done sequentially. Going right-to-left
needs an extra temporary variable to hold the current power of M, but
allows 1. and 2. to be done simultaneously:

1. if E[i] then C = C * M
2. M = M * M

Given that we have an FPGA, two independent modular multipliers can be
instantiated, that will be able to do squaring and multiplication at the
same time for 2x performance increase. The hardware core currently does
not do this, because at the time I wrote it we only had dev-bridge board
with much smaller FPGA. The newer systolic version for the Alpha board
will take advantage of this trick.

As far as I understand, the fundamental limit for exponentiation is that
for k-bit exponent one must do k squarings. In that sense when doing
multiplication and squaring at the same time, there's no point in trying
to skip multiplication, because it will just not give any speed
increase. Moreover not skipping dummy multiplications should keep power
fluctuations at minimum.

With best regards,
Pavel Shatov

More information about the Tech mailing list