[Cryptech Tech] Fwd: Re: Alpha Platform Upgrade
Pavel Shatov
meisterpaul1 at yandex.ru
Thu Feb 27 11:23:59 UTC 2020
<forwarding my answer, that Paul apparently missed>
-------- Перенаправленное сообщение --------
Тема: Re: [Cryptech Tech] Alpha Platform Upgrade
Дата: Thu, 20 Feb 2020 10:38:59 +0300
От: Pavel Shatov <meisterpaul1 at yandex.ru>
Кому: Paul Selkirk <paul at psgd.org>
On 20.02.2020 1:14, Paul Selkirk wrote:
> I'm trying to build your new stuff, but it fails to meet timing on
> TS_clkmgr_mmcm_inst_mmcm_clkout2.
>
That would be the new high-speed core clock. Haven't had time to look at
the reports yet, but here's what immediately comes to mind:
1) what set of cores are you trying to build? I added a non-default
"hsm_ng" project to core/platform/alpha, which is the current "hsm"
project plus 1x ModExpNG. This configuration should build just fine, I
tried it using both the GUI push button flow and the console Makefile
flow in my Ubuntu VM. I believe I even got identical bitstreams, since
the "checksums" printed throughout the processes were identical.
2) I tweaked the synthesis settings in core/platform/alpha, namely I
disabled resource sharing. Have you pulled the changes? I believe the
new stuff won't build with resource sharing enabled.
Speaking of resource sharing, I now see, that I haven't written an
explanation of the change in the commit message. The problem is that
during synthesis XST detects certain arithmetic operations and then
tries to find those, that are never carried out simultaneously. It then
tries to combine them to save logic resources. Sometimes it does the
trick, sometimes only makes things worse. Montgomery modular
multiplication has three phases: computation of full-size product,
computation of reduction coefficient, computation of multiple of the
modulus. Each intermediate product is written into a different internal
storage space, which has its own address register. Since the phases are
done in series, phase address registers are never incremented at the
same time. With resource sharing enabled XST becomes too smart and
throws away two of the three adders and instead adds an input 3:1 mux
and an output 1:3 selector with clock enable. This introduces
unnecessarily complex logic and makes timing impossible to meet.
Disabling resource sharing increases slice utilization by ~1%, but makes
it possible to have much higher clock frequency.
> I've attached a couple report files that seem relevant.
>
> Also, I'm updating the FMC driver, and I see where I need to change
> fmc_timing.CLKDivision, but do I need to do anything with
> fmc_timing.DataLatency to account for the 2-cycle read?
>
> And is the following still true?
>
> // not needed, since nwait will be polled manually
> fmc_timing.BusTurnAroundDuration = 0;
>
> paul
>
Hm, I believe, I already did those changes:
https://trac.cryptech.is/changeset/39f2d7ec4f35191884978db447bd97638b283d1d/sw/stm32
CLKDivision is now 4 and DataLatency becomes 6.
Speaking of NWAIT, you're right, we no longer poll, so I removed the
misleading comment, should be in the same commit.
--
With best regards,
Pavel Shatov
More information about the Tech
mailing list