[Cryptech Tech] NaCL in hardware

Pavel Shatov meisterpaul1 at yandex.ru
Wed Sep 23 09:15:59 UTC 2015


On 19.09.2015 13:41, Peter Schwabe wrote:
> Joachim Strömbergson <joachim at secworks.se> wrote:
>
>> Aloha!
>
> Dear all,
>
>> After looking at the source code I'd say its not that hard to do a
>> manual translation to Verilog. Most tools (Xilinx ISE for example)
>> supports both languages. But projects with mixed languages often
>> requires buying a license for that feature.
>
> I checked with Michi Hutter and he also said that it would probably not
> be too hard, but also not really worth the effort because almost all
> tools support both languages. If somebody ends up porting the
> implementation to Verilog, I'll of course be very happy to link to that
> port.
>
> Cheers,
>
> Peter
>

Hello!

I've taken a look at the source code and read the covering paper. I'm 
afraid, VHDL -> Verilog translation is not the only thing, that we'll 
need to do in order to adapt this library to our needs.

This library is said to be optimized for small slow, low-power 
low-resource devices, while our FPGA is going to be large and fast. For 
example, they decided to use single-port RAM for storage to save 
resources, but contemporary FPGAs all have inherently dual-port block 
memory, so it makes no difference for us. Another problem is their 
multiplier. As far as I understand, they implemented digit-serial 
multiplier, which is not very fast, according to their report (a few 
MHz). Well, they are talking about 130 nm technology, white Artix-7 is 
28 nm, but it won't be like 130/28=4.6 times faster. Again, in nowadays 
FPGAs we have dedicated hardware multipliers, that come at no cost. Yes, 
they are vendor-specific, and I obviously cannot decide for the whole 
team, whether we can use them or not. But my personal view is that as 
long as we clearly document, that in this particular module we are using 
vendor-specific multiplier, and provide generic replacement module for 
simulation, it's OK.

Another issue is that this library is kind of very specific crypto 
co-processor, that has its own very specific instruction set. And it has 
something similar to microcode. As far as I understand, the co-processor 
itself can only do low-level math stuff, while actual curve arithmetic 
is supported by small programs residing in a dedicated ROM. I see, that 
they distribute microcode assembly sources (.nacl files) for Curve25519 
and a special assembler tool written in Java. As far as I understand, to 
add support for curves P-256 and P-384, that we need for DNSSEC, we'll 
have to re-write their microcode. At first sight, this is not trivial.

I don't want to say, that this library is bad. It's great, it's huge 
amount of work, but it's oriented more towards ASICs than FPGAs, and 
again huge effort was made to make ASIC implementation cheap. As I see 
it, we have slightly different target, at least in the nearest future. 
Please correct me, if I'm wrong.

--
With best regards,
Pavel Shatov


More information about the Tech mailing list