[Cryptech Tech] NaCL in hardware

Pavel Shatov meisterpaul1 at yandex.ru
Tue Sep 29 09:28:03 UTC 2015


On 29.09.2015 11:04, Joachim Strömbergson wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA256
>
> Aloha!
>
> Peter Stuge wrote:
>> Pavel Shatov wrote:
>>> in nowadays FPGAs we have dedicated hardware multipliers, that
>>> come at no cost. Yes, they are vendor-specific, and I obviously
>>> cannot decide for the whole team, whether we can use them or not.
>>> But my personal view is that as long as we clearly document, that
>>> in this particular module we are using vendor-specific multiplier,
>>> and provide generic replacement module for simulation, it's OK.
>>
>> I don't think it's OK at all to restrict the usefulness of cryptech
>> modules so severely, making it impossible to use them with any other
>> hardware than one particular hardware from one particular vendor.

I don't think, that our modules are impossible to use on a different 
hardware. To use our modules with a different device, vendor-specific 
wrappers will need to be ported. But that applies to virtually any HDL 
code out there. Even if we have a completely vendor-independent module, 
we still need to generate a bitstream from it, and this process required 
constraints, that are always device- and vendor-specific.

> Not only multipliers, but DSP functionality that are fairly complex
> black boxes. We should clearly document where these black boxes are. And
> if possible either also provide a generic implementation (not only
> simulation) module or an implementation that infers the correct macro
> instead of instantiation.

I agree, that DSP slices in Spartan-6 are fairly complex, but they are 
not black boxes, Xilinx offers a user guide, that describes their 
internal structure and modes of operation:

http://www.xilinx.com/support/documentation/user_guides/ug389.pdf

So far we use DSP-based 32x32-bit multiplier in modular exponentiation 
core. This multiplier can also be synthesized using only generic fabric 
resources, it will require 1088 LUTs in this case, that is about 4% of 
Novena's FPGA. Modular exponentiation core contains five instances of 
this multiplier, so without using DSP slices the core would occupy more 
than 20% of FPGA. I remember, that minimum set of cores (besides 
ModExp), that we wanted to show off in Prague occupied 77% of FPGA. I'm 
afraid, without DSP-based multipliers we wouldn't have been able to show 
anything at all. I don't think, that we are restricting people. If they 
trust DSP slices from Xilinx, they can use hardware multiplier, if they 
don't, they can use generic replacement, for which we don't have enough 
space right now.

Speaking of generic replacement, 32x32-bit multiplier is 1 LoC:

always @(posedge clk) c[63:0] <= a[31:0] * b[31:0];

Fun fact is that depending on current phase of moon (or weather on 
Mars), ISE may eventually understand, you your generic wrapper is a 
multiplier and still shove in DSP slice during synthesis without 
explicitly telling you. Default implementation strategy has "Use DSP 
Block" option set to "auto", and I think, very few people know for sure, 
how exactly this "auto" works. Maybe it's documented in XST user guide, 
I don't know. What I'm trying to say, is that even if you don't 
explicitly use any vendor-specific stuff, synthesis tools may still use 
vendor-specific features during implementation. Speaking of multipliers, 
if you think, that DSP slices from Xilinx are backdoored, you can force 
ISE to avoid them at all by using (* USE_DSP48="NO" *) attribute. Yes, 
vendor-specific attribute to disable vendor-specific stuff :)

> Having separate implementation and simulation modules means that we are
> in fact not simulating what we are implementing. In most projects this
> is ok, but we have more tinfoil on than most projects.
>
> Also there is still a question about licensing of the source code.

--
With best regards,
Pavel Shatov


More information about the Tech mailing list