[Cryptech Tech] NaCL in hardware
Pavel Shatov
meisterpaul1 at yandex.ru
Tue Sep 29 09:28:03 UTC 2015
On 29.09.2015 11:04, Joachim Strömbergson wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA256
>
> Aloha!
>
> Peter Stuge wrote:
>> Pavel Shatov wrote:
>>> in nowadays FPGAs we have dedicated hardware multipliers, that
>>> come at no cost. Yes, they are vendor-specific, and I obviously
>>> cannot decide for the whole team, whether we can use them or not.
>>> But my personal view is that as long as we clearly document, that
>>> in this particular module we are using vendor-specific multiplier,
>>> and provide generic replacement module for simulation, it's OK.
>>
>> I don't think it's OK at all to restrict the usefulness of cryptech
>> modules so severely, making it impossible to use them with any other
>> hardware than one particular hardware from one particular vendor.
I don't think, that our modules are impossible to use on a different
hardware. To use our modules with a different device, vendor-specific
wrappers will need to be ported. But that applies to virtually any HDL
code out there. Even if we have a completely vendor-independent module,
we still need to generate a bitstream from it, and this process required
constraints, that are always device- and vendor-specific.
> Not only multipliers, but DSP functionality that are fairly complex
> black boxes. We should clearly document where these black boxes are. And
> if possible either also provide a generic implementation (not only
> simulation) module or an implementation that infers the correct macro
> instead of instantiation.
I agree, that DSP slices in Spartan-6 are fairly complex, but they are
not black boxes, Xilinx offers a user guide, that describes their
internal structure and modes of operation:
http://www.xilinx.com/support/documentation/user_guides/ug389.pdf
So far we use DSP-based 32x32-bit multiplier in modular exponentiation
core. This multiplier can also be synthesized using only generic fabric
resources, it will require 1088 LUTs in this case, that is about 4% of
Novena's FPGA. Modular exponentiation core contains five instances of
this multiplier, so without using DSP slices the core would occupy more
than 20% of FPGA. I remember, that minimum set of cores (besides
ModExp), that we wanted to show off in Prague occupied 77% of FPGA. I'm
afraid, without DSP-based multipliers we wouldn't have been able to show
anything at all. I don't think, that we are restricting people. If they
trust DSP slices from Xilinx, they can use hardware multiplier, if they
don't, they can use generic replacement, for which we don't have enough
space right now.
Speaking of generic replacement, 32x32-bit multiplier is 1 LoC:
always @(posedge clk) c[63:0] <= a[31:0] * b[31:0];
Fun fact is that depending on current phase of moon (or weather on
Mars), ISE may eventually understand, you your generic wrapper is a
multiplier and still shove in DSP slice during synthesis without
explicitly telling you. Default implementation strategy has "Use DSP
Block" option set to "auto", and I think, very few people know for sure,
how exactly this "auto" works. Maybe it's documented in XST user guide,
I don't know. What I'm trying to say, is that even if you don't
explicitly use any vendor-specific stuff, synthesis tools may still use
vendor-specific features during implementation. Speaking of multipliers,
if you think, that DSP slices from Xilinx are backdoored, you can force
ISE to avoid them at all by using (* USE_DSP48="NO" *) attribute. Yes,
vendor-specific attribute to disable vendor-specific stuff :)
> Having separate implementation and simulation modules means that we are
> in fact not simulating what we are implementing. In most projects this
> is ok, but we have more tinfoil on than most projects.
>
> Also there is still a question about licensing of the source code.
--
With best regards,
Pavel Shatov
More information about the Tech
mailing list