[Cryptech Tech] Fwd: Re: Alpha Platform Upgrade

Pavel Shatov meisterpaul1 at yandex.ru
Thu Feb 27 11:23:59 UTC 2020


<forwarding my answer, that Paul apparently missed>

-------- Перенаправленное сообщение --------
Тема: Re: [Cryptech Tech] Alpha Platform Upgrade
Дата: Thu, 20 Feb 2020 10:38:59 +0300
От: Pavel Shatov <meisterpaul1 at yandex.ru>
Кому: Paul Selkirk <paul at psgd.org>

On 20.02.2020 1:14, Paul Selkirk wrote:
> I'm trying to build your new stuff, but it fails to meet timing on
> TS_clkmgr_mmcm_inst_mmcm_clkout2.
> 

That would be the new high-speed core clock. Haven't had time to look at 
the reports yet, but here's what immediately comes to mind:

1) what set of cores are you trying to build? I added a non-default 
"hsm_ng" project to core/platform/alpha, which is the current "hsm" 
project plus 1x ModExpNG. This configuration should build just fine, I 
tried it using both the GUI push button flow and the console Makefile 
flow in my Ubuntu VM. I believe I even got identical bitstreams, since 
the "checksums" printed throughout the processes were identical.

2) I tweaked the synthesis settings in core/platform/alpha, namely I 
disabled resource sharing. Have you pulled the changes? I believe the 
new stuff won't build with resource sharing enabled.

Speaking of resource sharing, I now see, that I haven't written an 
explanation of the change in the commit message. The problem is that 
during synthesis XST detects certain arithmetic operations and then 
tries to find those, that are never carried out simultaneously. It then 
tries to combine them to save logic resources. Sometimes it does the 
trick, sometimes only makes things worse. Montgomery modular 
multiplication has three phases: computation of full-size product, 
computation of reduction coefficient, computation of multiple of the 
modulus. Each intermediate product is written into a different internal 
storage space, which has its own address register. Since the phases are 
done in series, phase address registers are never incremented at the 
same time. With resource sharing enabled XST becomes too smart and 
throws away two of the three adders and instead adds an input 3:1 mux 
and an output 1:3 selector with clock enable. This introduces 
unnecessarily complex logic and makes timing impossible to meet. 
Disabling resource sharing increases slice utilization by ~1%, but makes 
it possible to have much higher clock frequency.


> I've attached a couple report files that seem relevant.
> 
> Also, I'm updating the FMC driver, and I see where I need to change
> fmc_timing.CLKDivision, but do I need to do anything with
> fmc_timing.DataLatency to account for the 2-cycle read?
> 
> And is the following still true?
> 
>      // not needed, since nwait will be polled manually
>      fmc_timing.BusTurnAroundDuration = 0;
> 
> 				paul
> 


Hm, I believe, I already did those changes:
https://trac.cryptech.is/changeset/39f2d7ec4f35191884978db447bd97638b283d1d/sw/stm32

CLKDivision is now 4 and DataLatency becomes 6.

Speaking of NWAIT, you're right, we no longer poll, so I removed the 
misleading comment, should be in the same commit.


-- 
With best regards,
Pavel Shatov


More information about the Tech mailing list