From git at cryptech.is Wed Jan 1 19:27:50 2020 From: git at cryptech.is (git at cryptech.is) Date: Wed, 01 Jan 2020 19:27:50 +0000 Subject: [Cryptech-Commits] [releng/alpha] branch master updated: Accumulated minor changes on master branches Message-ID: <157790687030.39624.9115286806700417078@bikeshed.cryptech.is> This is an automated email from the git hooks/post-receive script. sra at hactrn.net pushed a commit to branch master in repository releng/alpha. The following commit(s) were added to refs/heads/master by this push: new fb0f2aa Accumulated minor changes on master branches fb0f2aa is described below commit fb0f2aa084b24921fbeb9076601c56c981bef5e9 Author: Rob Austein AuthorDate: Wed Jan 1 14:27:55 2020 -0500 Accumulated minor changes on master branches --- source/core/cipher/aes | 2 +- source/core/lib | 2 +- source/sw/libhal | 2 +- 3 files changed, 3 insertions(+), 3 deletions(-) diff --git a/source/core/cipher/aes b/source/core/cipher/aes index a86d494..4c03b5b 160000 --- a/source/core/cipher/aes +++ b/source/core/cipher/aes @@ -1 +1 @@ -Subproject commit a86d49407d86cde1e782b7e54fc6848f450cba23 +Subproject commit 4c03b5b0542988680821ec10b50fb93a9a0e6389 diff --git a/source/core/lib b/source/core/lib index fbcbd42..2578241 160000 --- a/source/core/lib +++ b/source/core/lib @@ -1 +1 @@ -Subproject commit fbcbd4218e2711da279d8097620a5b26637bf45b +Subproject commit 2578241c6795bacfd6819d0065edb855681bd669 diff --git a/source/sw/libhal b/source/sw/libhal index 1c07fa5..9e6edd6 160000 --- a/source/sw/libhal +++ b/source/sw/libhal @@ -1 +1 @@ -Subproject commit 1c07fa52b09b04f0cd3807bf11005b03739d1c2a +Subproject commit 9e6edd6082cc8d501e2b062983ed58b01ef677d7 -- To stop receiving notification emails like this one, please contact the administrator of this repository. From git at cryptech.is Wed Jan 1 19:30:05 2020 From: git at cryptech.is (git at cryptech.is) Date: Wed, 01 Jan 2020 19:30:05 +0000 Subject: [Cryptech-Commits] [releng/alpha] branch fmc_clk updated: Accumulated minor changes on fmc_clk branches Message-ID: <157790700565.39820.16823787167023486497@bikeshed.cryptech.is> This is an automated email from the git hooks/post-receive script. sra at hactrn.net pushed a commit to branch fmc_clk in repository releng/alpha. The following commit(s) were added to refs/heads/fmc_clk by this push: new 8fedb2c Accumulated minor changes on fmc_clk branches 8fedb2c is described below commit 8fedb2c5901e862a6043eed5e2aa08940131a600 Author: Rob Austein AuthorDate: Wed Jan 1 14:30:23 2020 -0500 Accumulated minor changes on fmc_clk branches --- source/core/cipher/aes | 2 +- source/core/cipher/chacha | 2 +- source/core/lib | 2 +- source/core/rng/rosc_entropy | 2 +- source/sw/libhal | 2 +- 5 files changed, 5 insertions(+), 5 deletions(-) diff --git a/source/core/cipher/aes b/source/core/cipher/aes index a193ff4..4c03b5b 160000 --- a/source/core/cipher/aes +++ b/source/core/cipher/aes @@ -1 +1 @@ -Subproject commit a193ff4771e8bcfcfb02b798422bf963bcfc2e96 +Subproject commit 4c03b5b0542988680821ec10b50fb93a9a0e6389 diff --git a/source/core/cipher/chacha b/source/core/cipher/chacha index 447efe9..136f26d 160000 --- a/source/core/cipher/chacha +++ b/source/core/cipher/chacha @@ -1 +1 @@ -Subproject commit 447efe94126531908899a0749a21766534d78965 +Subproject commit 136f26d704cb6b318b6cd3f79c19b9258b19813b diff --git a/source/core/lib b/source/core/lib index fbcbd42..2578241 160000 --- a/source/core/lib +++ b/source/core/lib @@ -1 +1 @@ -Subproject commit fbcbd4218e2711da279d8097620a5b26637bf45b +Subproject commit 2578241c6795bacfd6819d0065edb855681bd669 diff --git a/source/core/rng/rosc_entropy b/source/core/rng/rosc_entropy index 47edd24..db81c0e 160000 --- a/source/core/rng/rosc_entropy +++ b/source/core/rng/rosc_entropy @@ -1 +1 @@ -Subproject commit 47edd24044d58d5368f8e5a45c26c94d6f6e4b26 +Subproject commit db81c0e56dca48a0d8dee8a4359fb2096e4c1faf diff --git a/source/sw/libhal b/source/sw/libhal index ccb61f2..9e6edd6 160000 --- a/source/sw/libhal +++ b/source/sw/libhal @@ -1 +1 @@ -Subproject commit ccb61f28bde3760bf1fb0ff18edb981da3434cff +Subproject commit 9e6edd6082cc8d501e2b062983ed58b01ef677d7 -- To stop receiving notification emails like this one, please contact the administrator of this repository. From git at cryptech.is Mon Jan 20 20:41:38 2020 From: git at cryptech.is (git at cryptech.is) Date: Mon, 20 Jan 2020 20:41:38 +0000 Subject: [Cryptech-Commits] [user/shatov/modexpng_fpga_model] branch master updated: * mostly cosmetic fixes related to debug output * fixed REGULAR_ADD opcode to match how hardware works Message-ID: <157955289828.71439.10256644209662075705@bikeshed.cryptech.is> This is an automated email from the git hooks/post-receive script. meisterpaul1 at yandex.ru pushed a commit to branch master in repository user/shatov/modexpng_fpga_model. The following commit(s) were added to refs/heads/master by this push: new 9519ec4 * mostly cosmetic fixes related to debug output * fixed REGULAR_ADD opcode to match how hardware works 9519ec4 is described below commit 9519ec41975ddfa8c66a79c4ebc45ef9ebd4243d Author: Pavel V. Shatov (Meister) AuthorDate: Mon Jan 20 23:39:34 2020 +0300 * mostly cosmetic fixes related to debug output * fixed REGULAR_ADD opcode to match how hardware works --- modexpng_fpga_model.py | 55 +++++++++++++++++++++++++------------------------- 1 file changed, 27 insertions(+), 28 deletions(-) diff --git a/modexpng_fpga_model.py b/modexpng_fpga_model.py index 41acbff..334eecc 100644 --- a/modexpng_fpga_model.py +++ b/modexpng_fpga_model.py @@ -93,7 +93,7 @@ DUMP_EXPONENTS = False # dump secret exponents FORCE_OVERFLOW = False # force rarely seen internal overflow situation to verify how its handler works DUMP_PROGRESS_FACTOR = 16 # once per how many ladder steps to update progress indicator DUMP_FORMAT_BUS = True # False: dump 18-bit words, True: dump 32-bit words -DUMP_FORMAT_C_ARRAY = False # False: dump in Verilog format, True: dump as C array initializer +DUMP_FORMAT_C_ARRAY = False # False: dump in Verilog format, True: dump as C array initializer # @@ -1102,7 +1102,7 @@ class ModExpNG_Worker(): ab = ModExpNG_Operand(None, 2 * ab_num_words, ab_words) if dump and DUMP_VECTORS: - ab.format_verilog_concat("%s_%s_AB" % (dump_crt, dump_ladder)) + ab.format("%s_%s_AB" % (dump_crt, dump_ladder)) if not bnk is None: bnk._set_wide(ModExpNG_WideBankEnum.L, ab.lower_half()) @@ -1121,7 +1121,7 @@ class ModExpNG_Worker(): q = ModExpNG_Operand(None, ab_num_words + 1, q_words) if dump and DUMP_VECTORS: - q.format_verilog_concat("%s_%s_Q" % (dump_crt, dump_ladder)) + q.format("%s_%s_Q" % (dump_crt, dump_ladder)) if not bnk is None: bnk._set_narrow(ModExpNG_NarrowBankEnum.Q, q) @@ -1139,7 +1139,7 @@ class ModExpNG_Worker(): m = ModExpNG_Operand(None, 2 * ab_num_words + 1, m_words) if dump and DUMP_VECTORS: - m.format_verilog_concat("%s_%s_M" % (dump_crt, dump_ladder)) + m.format("%s_%s_M" % (dump_crt, dump_ladder)) # # 4. R = AB + M @@ -1160,7 +1160,6 @@ class ModExpNG_Worker(): r_cy = r_cy_new - # # 4b. Initialize empty result # @@ -1216,12 +1215,12 @@ class ModExpNG_Core(): def _dump_bank_indices(self, n): print(" ", end='') - for i in range(64): print("[ %3d ] " % i, end='') + for i in range(n): print("[ %3d ] " % i, end='') print(""); def _dump_bank_seps(self, n): print(" ", end='') - for i in range(64): print(" ------ ", end='') + for i in range(n): print("------- ", end='') print(""); def _dump_bank_entry_narrow(self, name, op, val, n): @@ -1561,11 +1560,11 @@ class ModExpNG_Core(): # adds sel_narrow_a_in to sel_narrow_b_in # stores result in sel_narrow_out # - def regular_add(self, sel_narrow_a_in, sel_narrow_b_in, sel_narrow_out, num_words): - xxa = self.bnk.crt_x.ladder_x._get_narrow(sel_narrow_a_in) - xya = self.bnk.crt_x.ladder_y._get_narrow(sel_narrow_a_in) - yxa = self.bnk.crt_y.ladder_x._get_narrow(sel_narrow_a_in) - yya = self.bnk.crt_y.ladder_y._get_narrow(sel_narrow_a_in) + def regular_add(self, sel_wide_a_in, sel_narrow_b_in, sel_narrow_out, num_words): + xxa = self.bnk.crt_x.ladder_x._get_wide(sel_wide_a_in) + xya = self.bnk.crt_x.ladder_y._get_wide(sel_wide_a_in) + yxa = self.bnk.crt_y.ladder_x._get_wide(sel_wide_a_in) + yya = self.bnk.crt_y.ladder_y._get_wide(sel_wide_a_in) xxb = self.bnk.crt_x.ladder_x._get_narrow(sel_narrow_b_in) xyb = self.bnk.crt_x.ladder_y._get_narrow(sel_narrow_b_in) @@ -1589,23 +1588,23 @@ class ModExpNG_Core(): print("num_words = %d" % pq) print("\rladder_mode_x = %d" % m[0]) print("\rladder_mode_y = %d" % m[1]) - self.bnk.crt_x.ladder_x._get_narrow(N.C).format_verilog_concat("X_X") - self.bnk.crt_x.ladder_y._get_narrow(N.C).format_verilog_concat("X_Y") - self.bnk.crt_y.ladder_x._get_narrow(N.C).format_verilog_concat("Y_X") - self.bnk.crt_y.ladder_y._get_narrow(N.C).format_verilog_concat("Y_Y") - self.bnk.crt_x.ladder_x._get_wide(W.N).format_verilog_concat("X_N") - self.bnk.crt_x.ladder_x._get_wide(W.N).format_verilog_concat("Y_N") - self.bnk.crt_x.ladder_x._get_narrow(N.N_COEFF).format_verilog_concat("X_N_COEFF") - self.bnk.crt_x.ladder_x._get_narrow(N.N_COEFF).format_verilog_concat("Y_N_COEFF") + self.bnk.crt_x.ladder_x._get_narrow(N.C).format("X_X") + self.bnk.crt_x.ladder_y._get_narrow(N.C).format("X_Y") + self.bnk.crt_y.ladder_x._get_narrow(N.C).format("Y_X") + self.bnk.crt_y.ladder_y._get_narrow(N.C).format("Y_Y") + self.bnk.crt_x.ladder_x._get_wide(W.N).format("X_N") + self.bnk.crt_x.ladder_x._get_wide(W.N).format("Y_N") + self.bnk.crt_x.ladder_x._get_narrow(N.N_COEFF).format("X_N_COEFF") + self.bnk.crt_x.ladder_x._get_narrow(N.N_COEFF).format("Y_N_COEFF") # # dump working variables after ladder step # def dump_after_step_using_crt(self): - self.bnk.crt_x.ladder_x._get_narrow(N.C).format_verilog_concat("X_X") - self.bnk.crt_x.ladder_y._get_narrow(N.C).format_verilog_concat("X_Y") - self.bnk.crt_y.ladder_x._get_narrow(N.C).format_verilog_concat("Y_X") - self.bnk.crt_y.ladder_y._get_narrow(N.C).format_verilog_concat("Y_Y") + self.bnk.crt_x.ladder_x._get_narrow(N.C).format("X_X") + self.bnk.crt_x.ladder_y._get_narrow(N.C).format("X_Y") + self.bnk.crt_y.ladder_x._get_narrow(N.C).format("Y_X") + self.bnk.crt_y.ladder_y._get_narrow(N.C).format("Y_Y") # # this deliberately converts narrow operand into redundant representation @@ -1733,7 +1732,7 @@ def sign_using_crt(): c.modular_multiply(W.C, N.I, W.D, N.D, n) # | [XY] / N_FACTOR | [XY]F | [XY]YM | [XY]M | M | [XY]M = [XY]MF * 1 # +------------------------+-------+------------------+---------+-----------+ c.propagate_carries(N.D, n) # | [XY] / N_FACTOR | [XY]F | [XY]YM | [XY]M | M | - # +------------------------+-------+------------------+---------+-----------+ + # +------------------------+-------+------------------+---------+-----------+ c.set_output_from_narrow_x(O.XM, c.bnk.crt_x, N.D) # | [XY] / N_FACTOR | [XY]F | [XY]YM | [XY]M | M | c.set_output_from_narrow_x(O.YM, c.bnk.crt_y, N.D) # | [XY] / N_FACTOR | [XY]F | [XY]YM | [XY]M | M | # +------------------------+-------+------------------+---------+-----------+ @@ -1762,7 +1761,7 @@ def sign_using_crt(): c.modular_multiply(W.A, N.I, W.C, N.C, pq) # | [PQ]_FACTOR | [XY]F | [PQ]IF | [PQ]MBF | QINV | [PQ]IF = 1 * [PQ]_FACTOR # +------------------------+-------+------------------+---------+-----------+ c.copy_ladders_x2y(W.D, N.D, W.C, N.C) # | [PQ]_FACTOR | [XY]F | [PQ]IF / [PQ]MBF | [PQ]MBF | QINV | - # +------------------------+-------+------------------+---------+-----------+ + # +------------------------+-------+------------------+---------+-----------+ ########################### # | | | | | | # Begin Montgomery Ladder # # | | | | | | ########################### # | | | | | | @@ -1807,8 +1806,8 @@ def sign_using_crt(): # +------------------------+-------+------------------+---------+-----------+ c.copy_crt_y2x(W.D, N.D) # | [PQ]_FACTOR / QRSBI | [XY]F | RSBI | QSB* | | # +------------------------+-------+------------------+---------+-----------+ - c.regular_add(N.D, N.A, N.C, pq) # | [PQ]_FACTOR / QRSBI | [XY]F | SB | QSB* | | SB = QSB + RSBI - # +------------------------+-------+------------------+---------+-----------+ + c.regular_add(W.D, N.A, N.C, pq) # | [PQ]_FACTOR / QRSBI | [XY]F | SB | QSB* | | SB = QSB + RSBI + # +------------------------+-------+------------------+---------+-----------+ c.set_wide_from_input (c.bnk.crt_x, W.N, I.N) # | | | | | | c.set_wide_from_input (c.bnk.crt_y, W.N, I.N) # | | | | | | # +------------------------+-------+------------------+---------+-----------+ -- To stop receiving notification emails like this one, please contact the administrator of this repository. From git at cryptech.is Mon Jan 20 21:18:01 2020 From: git at cryptech.is (git at cryptech.is) Date: Mon, 20 Jan 2020 21:18:01 +0000 Subject: [Cryptech-Commits] [user/shatov/modexpng] branch master updated (6791175 -> 97d5b79) Message-ID: <157955508198.72689.15683234965619703441@bikeshed.cryptech.is> This is an automated email from the git hooks/post-receive script. meisterpaul1 at yandex.ru pushed a change to branch master in repository user/shatov/modexpng. from 6791175 One more cosmetic fix. new 83f8779 Had to rework the general worker module to reach 180 MHz core clock. The module is responsible for doing certain supporting operations (mostly moving operands between banks and doing some simple math operations, such as modular subtraction and regular addition). Depending on the particular operation, one of three bank address space sweep patterns was used: * one-pass (for things like carry propagation) * two-pass (for things like modular subtraction that produce interm [...] new 6a0438e Turns out, fabric addition and subtraction in the general worker module are actually in the critical paths of the ModExpNG core and are plaguing the place and route tools. I was barely able to achieve timing closure at 180 MHz even with the highest Map and PaR effort levels. This means that any further clock frequency increase is effectively impossible, moreover any small change in the design may prevent it from meeting timing constants. The obvious solution is to use DS [...] new e5f4454 Reworked modular subtraction micro-operation. Previously it used "two-pass" bank address space sweep, during the first pass (a-b) and (a-b+n) were computed, during the second pass either the former or the latter quantity was written to the output bank (depending on the very last borrow flag value). This is no longer possible, since the FSM now only generates one "interleaved" address space sweep. The solution is to split one complex modular subtraction operation into sim [...] new ab061af This commit modifies the REGULAR_ADD_UNEVEN micro-operation to use DSP slices for addition instead of fabric logic. This opcode is only necessary when in CRT mode and is executed once per entire exponentiation to recombine the two "easier" exponentiations. This was the final change necessary to get rid of using fabric math in the general worker module. new b985d45 * DSP slices now have two use modes: MULT and ADD/SUB * cosmetic rename of Verilog include file new 147dcd3 Removed old DSP wrappers. new a1314f3 Added two pairs of new wrappers. new b585d25 Cosmetic fix that only involves debug output during simulation. new b590661 Updated microcode source to match the changes made to general worker module. new b8c0536 Updated uOP engine to match the changes made to the general worker module (modular subtraction was split into three micro-operations instead of one). new 6492883 For the new general worker module to work we need dynamic switching of DSP OPMODE, ALUMODE and CARRYINSEL ports, thus more defined constants. new 8f7829c Tiny cosmetic typo fix ("dst" -> "dsp") new 2345e42 Added more meaningful constants to avoid certain hardcoded numbers. new c551625 Refactored modular reductor module. new 04bc457 Renumbered micro-operations. new 76f89d6 The I/O manager has to work in sync with the general worker module. Made the necessary changes to make it work after the general worker update. Also moved debug simulation-time code into a separate file. new e68e11a Update DSP wrapper instance names. new c6029af Refactored MMM recombinator module, accomodated the changes in DSP slice wrapper names. new b0cbd33 Refactored the MMM module, now uses meaningful constant names from the include file, not hardcoded widths. new c4bee71 Cosmetic change to easily switch tests on/off. new 97d5b79 Bump version number. The 21 revisions listed above as "new" are entirely new to this repository and will be described in separate emails. The revisions listed as "add" were already present in the repository and have only been added to this reference. Summary of changes: bench/tb_core_full_512.v | 6 +- rtl/modexpng_core_top_debug.vh | 144 +- rtl/modexpng_dsp48e1.vh | 51 +- rtl/modexpng_dsp_array_block.v | 8 +- rtl/modexpng_dsp_slice_addsub_wrapper_generic.v | 224 +++ ... => modexpng_dsp_slice_addsub_wrapper_xilinx.v} | 131 +- ...v => modexpng_dsp_slice_mult_wrapper_generic.v} | 2 +- ....v => modexpng_dsp_slice_mult_wrapper_xilinx.v} | 2 +- ...imitive.vh => modexpng_dsp_slice_primitives.vh} | 6 +- rtl/modexpng_general_worker.v | 1995 +++++++++----------- rtl/modexpng_io_manager.v | 364 ++-- ...png_dsp48e1.vh => modexpng_io_manager_debug.vh} | 18 +- rtl/modexpng_microcode.vh | 21 +- rtl/modexpng_mmm_dual.v | 733 +++---- rtl/modexpng_parameters.vh | 10 +- rtl/modexpng_recombinator_block.v | 1244 ++++++------ rtl/modexpng_recombinator_cell.v | 94 +- rtl/modexpng_reductor.v | 388 ++-- rtl/modexpng_uop_engine.v | 14 +- rtl/modexpng_uop_rom.v | 281 +-- rtl/modexpng_wrapper.v | 12 +- 21 files changed, 3096 insertions(+), 2652 deletions(-) create mode 100644 rtl/modexpng_dsp_slice_addsub_wrapper_generic.v copy rtl/{modexpng_dsp_slice_wrapper_xilinx.v => modexpng_dsp_slice_addsub_wrapper_xilinx.v} (58%) rename rtl/{modexpng_dsp_slice_wrapper_generic.v => modexpng_dsp_slice_mult_wrapper_generic.v} (98%) rename rtl/{modexpng_dsp_slice_wrapper_xilinx.v => modexpng_dsp_slice_mult_wrapper_xilinx.v} (99%) rename rtl/{modexpng_dsp_slice_primitive.vh => modexpng_dsp_slice_primitives.vh} (85%) copy rtl/{modexpng_dsp48e1.vh => modexpng_io_manager_debug.vh} (82%) From git at cryptech.is Mon Jan 20 21:18:02 2020 From: git at cryptech.is (git at cryptech.is) Date: Mon, 20 Jan 2020 21:18:02 +0000 Subject: [Cryptech-Commits] [user/shatov/modexpng] 01/21: Had to rework the general worker module to reach 180 MHz core clock. The module is responsible for doing certain supporting operations (mostly moving operands between banks and doing some simple math operations, such as modular subtraction and regular addition). Depending on the particular operation, one of three bank address space sweep patterns was used: * one-pass (for things like carry propagation) * two-pass (for things like modular subtraction that produce intermediate values in [...] In-Reply-To: <157955508198.72689.15683234965619703441@bikeshed.cryptech.is> References: <157955508198.72689.15683234965619703441@bikeshed.cryptech.is> Message-ID: <20200120211802.A9F948BAF0F@bikeshed.cryptech.is> This is an automated email from the git hooks/post-receive script. meisterpaul1 at yandex.ru pushed a commit to branch master in repository user/shatov/modexpng. commit 83f8779a661202183f5866a4e80ef36f24b9e1ea Author: Pavel V. Shatov (Meister) AuthorDate: Thu Jan 16 14:45:26 2020 +0300 Had to rework the general worker module to reach 180 MHz core clock. The module is responsible for doing certain supporting operations (mostly moving operands between banks and doing some simple math operations, such as modular subtraction and regular addition). Depending on the particular operation, one of three bank address space sweep patterns was used: * one-pass (for things like carry propagation) * two-pass (for things like modular subtraction that produce intermediate values in the process) * one-pass interleaved (for copying when only either CRT_?.X or CRT_?.Y is rewritten: we can only write to X and Y simultaneously, so we have to interleave reads from the source bank with reads from the destination bank and overwrite the destination with its just read value, otherwise the second destination operand is lost) I initially coded three FSMs, one for each of the address space sweeps and triggered one of them depending on the opcode, but that turned out too complicated. There's now only one FSM that always does the "one-pass interleaved" pattern, whereas the second read (from the destination bank) is inhibited when not need by the opcode. --- rtl/modexpng_general_worker.v | 1898 ++++++++++++++++++----------------------- 1 file changed, 839 insertions(+), 1059 deletions(-) diff --git a/rtl/modexpng_general_worker.v b/rtl/modexpng_general_worker.v index eadd284..0620bd6 100644 --- a/rtl/modexpng_general_worker.v +++ b/rtl/modexpng_general_worker.v @@ -127,67 +127,46 @@ module modexpng_general_worker // // FSM Declaration // - localparam [5:0] WRK_FSM_STATE_IDLE = 6'h00; - - localparam [5:0] WRK_FSM_STATE_LATENCY_PRE1 = 6'h01; - localparam [5:0] WRK_FSM_STATE_LATENCY_PRE2 = 6'h02; - localparam [5:0] WRK_FSM_STATE_BUSY = 6'h03; - localparam [5:0] WRK_FSM_STATE_LATENCY_POST1 = 6'h05; // NOTE: 4 is skipped to match the numbering in IO_MANAGER to ease debug! - localparam [5:0] WRK_FSM_STATE_LATENCY_POST2 = 6'h06; - - localparam [5:0] WRK_FSM_STATE_STOP = 6'h07; - - localparam [5:0] WRK_FSM_STATE_LATENCY_PRE1_M1 = 6'h10; - localparam [5:0] WRK_FSM_STATE_LATENCY_PRE1_M2 = 6'h11; - localparam [5:0] WRK_FSM_STATE_LATENCY_PRE2_M1 = 6'h12; - localparam [5:0] WRK_FSM_STATE_LATENCY_PRE2_M2 = 6'h13; - localparam [5:0] WRK_FSM_STATE_BUSY_M1 = 6'h14; - localparam [5:0] WRK_FSM_STATE_BUSY_M2 = 6'h15; - localparam [5:0] WRK_FSM_STATE_LATENCY_POST1_M1 = 6'h16; - localparam [5:0] WRK_FSM_STATE_LATENCY_POST1_M2 = 6'h17; - localparam [5:0] WRK_FSM_STATE_LATENCY_POST2_M1 = 6'h18; - localparam [5:0] WRK_FSM_STATE_LATENCY_POST2_M2 = 6'h19; - - localparam [5:0] WRK_FSM_STATE_LATENCY_PRE1_TP = 6'h20; - localparam [5:0] WRK_FSM_STATE_LATENCY_PRE2_TP = 6'h21; - localparam [5:0] WRK_FSM_STATE_LATENCY_PRE3_TP = 6'h22; - localparam [5:0] WRK_FSM_STATE_LATENCY_PRE4_TP = 6'h23; - localparam [5:0] WRK_FSM_STATE_BUSY_TP = 6'h24; - localparam [5:0] WRK_FSM_STATE_LATENCY_POST1_TP = 6'h25; - localparam [5:0] WRK_FSM_STATE_LATENCY_POST2_TP = 6'h26; - localparam [5:0] WRK_FSM_STATE_LATENCY_POST3_TP = 6'h27; - localparam [5:0] WRK_FSM_STATE_LATENCY_POST4_TP = 6'h28; - localparam [5:0] WRK_FSM_STATE_HOLDOFF_TP = 6'h29; - - reg [5:0] wrk_fsm_state = WRK_FSM_STATE_IDLE; - reg [5:0] wrk_fsm_state_next_one_pass; // single address space sweep - reg [5:0] wrk_fsm_state_next_one_pass_meander; // single address space sweep with interleaving source/destination banks (needed by copy_ladders_x2y) - reg [5:0] wrk_fsm_state_next_two_pass; // two address space sweeps - reg wrk_fsm_two_pass_pass; // 0=first pass, 1=second pass - reg wrk_fsm_two_pass_pass_dly; // 0=first pass, 1=second pass - - - // TODO: Comment on how narrow/wide address increment works (narrow is one long sweep, wide is two twice shorter sweeps) + + localparam [3:0] WRK_FSM_STATE_IDLE = 4'h0; + + localparam [3:0] WRK_FSM_STATE_LATENCY_PRE1 = 4'h1; + localparam [3:0] WRK_FSM_STATE_LATENCY_PRE2 = 4'h2; + localparam [3:0] WRK_FSM_STATE_LATENCY_PRE3 = 4'h3; + localparam [3:0] WRK_FSM_STATE_LATENCY_PRE4 = 4'h4; + + localparam [3:0] WRK_FSM_STATE_BUSY1 = 4'hA; + localparam [3:0] WRK_FSM_STATE_BUSY2 = 4'hB; + + localparam [3:0] WRK_FSM_STATE_LATENCY_POST1 = 4'h5; + localparam [3:0] WRK_FSM_STATE_LATENCY_POST2 = 4'h6; + localparam [3:0] WRK_FSM_STATE_LATENCY_POST3 = 4'h7; + localparam [3:0] WRK_FSM_STATE_LATENCY_POST4 = 4'h8; + localparam [3:0] WRK_FSM_STATE_STOP = 4'hF; + + reg [3:0] wrk_fsm_state = WRK_FSM_STATE_IDLE; + reg [3:0] wrk_fsm_state_next; + // // Control Signals // - reg rd_wide_xy_ena_x = 1'b0; - reg [BANK_ADDR_W -1:0] rd_wide_xy_bank_x; - reg [ OP_ADDR_W -1:0] rd_wide_xy_addr_x; + reg rd_wide_ena_x = 1'b0; + reg [BANK_ADDR_W -1:0] rd_wide_bank_x; + reg [ OP_ADDR_W -1:0] rd_wide_addr_x; - reg rd_narrow_xy_ena_x = 1'b0; - reg [BANK_ADDR_W -1:0] rd_narrow_xy_bank_x; - reg [ OP_ADDR_W -1:0] rd_narrow_xy_addr_x; + reg rd_narrow_ena_x = 1'b0; + reg [BANK_ADDR_W -1:0] rd_narrow_bank_x; + reg [ OP_ADDR_W -1:0] rd_narrow_addr_x; - reg rd_wide_xy_ena_y = 1'b0; - reg [BANK_ADDR_W -1:0] rd_wide_xy_bank_y; - reg [ OP_ADDR_W -1:0] rd_wide_xy_addr_y; + reg rd_wide_ena_y = 1'b0; + reg [BANK_ADDR_W -1:0] rd_wide_bank_y; + reg [ OP_ADDR_W -1:0] rd_wide_addr_y; - reg rd_narrow_xy_ena_y = 1'b0; - reg [BANK_ADDR_W -1:0] rd_narrow_xy_bank_y; - reg [ OP_ADDR_W -1:0] rd_narrow_xy_addr_y; + reg rd_narrow_ena_y = 1'b0; + reg [BANK_ADDR_W -1:0] rd_narrow_bank_y; + reg [ OP_ADDR_W -1:0] rd_narrow_addr_y; reg wr_wide_xy_ena_x = 1'b0; reg [BANK_ADDR_W -1:0] wr_wide_xy_bank_x; @@ -217,21 +196,21 @@ module modexpng_general_worker // // Mapping // - assign wrk_rd_wide_xy_ena_x = rd_wide_xy_ena_x; - assign wrk_rd_wide_xy_bank_x = rd_wide_xy_bank_x; - assign wrk_rd_wide_xy_addr_x = rd_wide_xy_addr_x; + assign wrk_rd_wide_xy_ena_x = rd_wide_ena_x; + assign wrk_rd_wide_xy_bank_x = rd_wide_bank_x; + assign wrk_rd_wide_xy_addr_x = rd_wide_addr_x; - assign wrk_rd_narrow_xy_ena_x = rd_narrow_xy_ena_x; - assign wrk_rd_narrow_xy_bank_x = rd_narrow_xy_bank_x; - assign wrk_rd_narrow_xy_addr_x = rd_narrow_xy_addr_x; + assign wrk_rd_narrow_xy_ena_x = rd_narrow_ena_x; + assign wrk_rd_narrow_xy_bank_x = rd_narrow_bank_x; + assign wrk_rd_narrow_xy_addr_x = rd_narrow_addr_x; - assign wrk_rd_wide_xy_ena_y = rd_wide_xy_ena_y; - assign wrk_rd_wide_xy_bank_y = rd_wide_xy_bank_y; - assign wrk_rd_wide_xy_addr_y = rd_wide_xy_addr_y; + assign wrk_rd_wide_xy_ena_y = rd_wide_ena_y; + assign wrk_rd_wide_xy_bank_y = rd_wide_bank_y; + assign wrk_rd_wide_xy_addr_y = rd_wide_addr_y; - assign wrk_rd_narrow_xy_ena_y = rd_narrow_xy_ena_y; - assign wrk_rd_narrow_xy_bank_y = rd_narrow_xy_bank_y; - assign wrk_rd_narrow_xy_addr_y = rd_narrow_xy_addr_y; + assign wrk_rd_narrow_xy_ena_y = rd_narrow_ena_y; + assign wrk_rd_narrow_xy_bank_y = rd_narrow_bank_y; + assign wrk_rd_narrow_xy_addr_y = rd_narrow_addr_y; assign wrk_wr_wide_xy_ena_x = wr_wide_xy_ena_x; assign wrk_wr_wide_xy_bank_x = wr_wide_xy_bank_x; @@ -260,172 +239,111 @@ module modexpng_general_worker // // Delays - // - reg [OP_ADDR_W -1:0] rd_wide_xy_addr_x_dly1; - reg [OP_ADDR_W -1:0] rd_wide_xy_addr_x_dly2; - reg [OP_ADDR_W -1:0] rd_wide_xy_addr_x_dly3; - reg [OP_ADDR_W -1:0] rd_wide_xy_addr_x_dly4; - reg [OP_ADDR_W -1:0] rd_wide_xy_addr_y_dly1; - reg [OP_ADDR_W -1:0] rd_wide_xy_addr_y_dly2; - reg [OP_ADDR_W -1:0] rd_wide_xy_addr_y_dly3; - reg [OP_ADDR_W -1:0] rd_wide_xy_addr_y_dly4; - - reg [OP_ADDR_W -1:0] rd_narrow_xy_addr_x_dly1; - reg [OP_ADDR_W -1:0] rd_narrow_xy_addr_x_dly2; - reg [OP_ADDR_W -1:0] rd_narrow_xy_addr_x_dly3; - reg [OP_ADDR_W -1:0] rd_narrow_xy_addr_x_dly4; - reg [OP_ADDR_W -1:0] rd_narrow_xy_addr_y_dly1; - reg [OP_ADDR_W -1:0] rd_narrow_xy_addr_y_dly2; - reg [OP_ADDR_W -1:0] rd_narrow_xy_addr_y_dly3; - reg [OP_ADDR_W -1:0] rd_narrow_xy_addr_y_dly4; - - reg [WORD_EXT_W -1:0] wrk_rd_wide_x_din_x_dly1; - reg [WORD_EXT_W -1:0] wrk_rd_wide_x_din_x_dly2; - reg [WORD_EXT_W -1:0] wrk_rd_wide_x_din_x_dly3; - //reg [WORD_EXT_W -1:0] wrk_rd_wide_x_din_x_dly4; - - reg [WORD_EXT_W -1:0] wrk_rd_wide_x_din_y_dly1; - reg [WORD_EXT_W -1:0] wrk_rd_wide_x_din_y_dly2; - reg [WORD_EXT_W -1:0] wrk_rd_wide_x_din_y_dly3; - //reg [WORD_EXT_W -1:0] wrk_rd_wide_x_din_y_dly4; - - reg [WORD_EXT_W -1:0] wrk_rd_narrow_x_din_x_dly1; - reg [WORD_EXT_W -1:0] wrk_rd_narrow_x_din_x_dly2; - reg [WORD_EXT_W -1:0] wrk_rd_narrow_x_din_x_dly3; - reg [WORD_EXT_W -1:0] wrk_rd_narrow_y_din_x_dly1; - reg [WORD_EXT_W -1:0] wrk_rd_narrow_y_din_x_dly2; - - reg [WORD_EXT_W -1:0] wrk_rd_narrow_x_din_y_dly1; - reg [WORD_EXT_W -1:0] wrk_rd_narrow_x_din_y_dly2; - reg [WORD_EXT_W -1:0] wrk_rd_narrow_x_din_y_dly3; - reg [WORD_EXT_W -1:0] wrk_rd_narrow_y_din_y_dly1; - reg [WORD_EXT_W -1:0] wrk_rd_narrow_y_din_y_dly2; + // + reg [OP_ADDR_W -1:0] rd_narrow_addr_x_dly[0:3]; + reg [OP_ADDR_W -1:0] rd_narrow_addr_y_dly[0:3]; + + reg [OP_ADDR_W -1:0] rd_wide_addr_x_dly[0:3]; + reg [OP_ADDR_W -1:0] rd_wide_addr_y_dly[0:3]; + + reg [WORD_EXT_W -1:0] rd_wide_x_din_x_dly1; + reg [WORD_EXT_W -1:0] rd_wide_y_din_x_dly1; + reg [WORD_EXT_W -1:0] rd_wide_x_din_y_dly1; + reg [WORD_EXT_W -1:0] rd_wide_y_din_y_dly1; + reg [WORD_EXT_W -1:0] rd_narrow_x_din_x_dly1; + reg [WORD_EXT_W -1:0] rd_narrow_y_din_x_dly1; + reg [WORD_EXT_W -1:0] rd_narrow_x_din_y_dly1; + reg [WORD_EXT_W -1:0] rd_narrow_y_din_y_dly1; always @(posedge clk) begin // - {rd_wide_xy_addr_x_dly4, rd_wide_xy_addr_x_dly3, rd_wide_xy_addr_x_dly2, rd_wide_xy_addr_x_dly1} <= {rd_wide_xy_addr_x_dly3, rd_wide_xy_addr_x_dly2, rd_wide_xy_addr_x_dly1, rd_wide_xy_addr_x}; - {rd_wide_xy_addr_y_dly4, rd_wide_xy_addr_y_dly3, rd_wide_xy_addr_y_dly2, rd_wide_xy_addr_y_dly1} <= {rd_wide_xy_addr_y_dly3, rd_wide_xy_addr_y_dly2, rd_wide_xy_addr_y_dly1, rd_wide_xy_addr_y}; + {rd_wide_x_din_x_dly1} <= {wrk_rd_wide_x_din_x}; + {rd_wide_y_din_x_dly1} <= {wrk_rd_wide_y_din_x}; + {rd_wide_x_din_y_dly1} <= {wrk_rd_wide_x_din_y}; + {rd_wide_y_din_y_dly1} <= {wrk_rd_wide_y_din_y}; // - {rd_narrow_xy_addr_x_dly4, rd_narrow_xy_addr_x_dly3, rd_narrow_xy_addr_x_dly2, rd_narrow_xy_addr_x_dly1} <= {rd_narrow_xy_addr_x_dly3, rd_narrow_xy_addr_x_dly2, rd_narrow_xy_addr_x_dly1, rd_narrow_xy_addr_x}; - {rd_narrow_xy_addr_y_dly4, rd_narrow_xy_addr_y_dly3, rd_narrow_xy_addr_y_dly2, rd_narrow_xy_addr_y_dly1} <= {rd_narrow_xy_addr_y_dly3, rd_narrow_xy_addr_y_dly2, rd_narrow_xy_addr_y_dly1, rd_narrow_xy_addr_y}; + {rd_narrow_x_din_x_dly1} <= {wrk_rd_narrow_x_din_x}; + {rd_narrow_y_din_x_dly1} <= {wrk_rd_narrow_y_din_x}; + {rd_narrow_x_din_y_dly1} <= {wrk_rd_narrow_x_din_y}; + {rd_narrow_y_din_y_dly1} <= {wrk_rd_narrow_y_din_y}; // - {/*wrk_rd_wide_x_din_x_dly4,*/ wrk_rd_wide_x_din_x_dly3, wrk_rd_wide_x_din_x_dly2, wrk_rd_wide_x_din_x_dly1} <= {/*wrk_rd_wide_x_din_x_dly3,*/ wrk_rd_wide_x_din_x_dly2, wrk_rd_wide_x_din_x_dly1, wrk_rd_wide_x_din_x}; - {/*wrk_rd_wide_x_din_y_dly4,*/ wrk_rd_wide_x_din_y_dly3, wrk_rd_wide_x_din_y_dly2, wrk_rd_wide_x_din_y_dly1} <= {/*wrk_rd_wide_x_din_y_dly3,*/ wrk_rd_wide_x_din_y_dly2, wrk_rd_wide_x_din_y_dly1, wrk_rd_wide_x_din_y}; + {rd_narrow_addr_x_dly[3], rd_narrow_addr_x_dly[2], rd_narrow_addr_x_dly[1], rd_narrow_addr_x_dly[0]} <= {rd_narrow_addr_x_dly[2], rd_narrow_addr_x_dly[1], rd_narrow_addr_x_dly[0], rd_narrow_addr_x}; + {rd_narrow_addr_y_dly[3], rd_narrow_addr_y_dly[2], rd_narrow_addr_y_dly[1], rd_narrow_addr_y_dly[0]} <= {rd_narrow_addr_y_dly[2], rd_narrow_addr_y_dly[1], rd_narrow_addr_y_dly[0], rd_narrow_addr_y}; // - {wrk_rd_narrow_x_din_x_dly3, wrk_rd_narrow_x_din_x_dly2, wrk_rd_narrow_x_din_x_dly1} <= {wrk_rd_narrow_x_din_x_dly2, wrk_rd_narrow_x_din_x_dly1, wrk_rd_narrow_x_din_x}; - {wrk_rd_narrow_y_din_x_dly2, wrk_rd_narrow_y_din_x_dly1} <= {wrk_rd_narrow_y_din_x_dly1, wrk_rd_narrow_y_din_x}; - {wrk_rd_narrow_x_din_y_dly3, wrk_rd_narrow_x_din_y_dly2, wrk_rd_narrow_x_din_y_dly1} <= {wrk_rd_narrow_x_din_y_dly2, wrk_rd_narrow_x_din_y_dly1, wrk_rd_narrow_x_din_y}; - {wrk_rd_narrow_y_din_y_dly2, wrk_rd_narrow_y_din_y_dly1} <= {wrk_rd_narrow_y_din_y_dly1, wrk_rd_narrow_y_din_y}; + {rd_wide_addr_x_dly[3], rd_wide_addr_x_dly[2], rd_wide_addr_x_dly[1], rd_wide_addr_x_dly[0]} <= {rd_wide_addr_x_dly[2], rd_wide_addr_x_dly[1], rd_wide_addr_x_dly[0], rd_wide_addr_x}; + {rd_wide_addr_y_dly[3], rd_wide_addr_y_dly[2], rd_wide_addr_y_dly[1], rd_wide_addr_y_dly[0]} <= {rd_wide_addr_y_dly[2], rd_wide_addr_y_dly[1], rd_wide_addr_y_dly[0], rd_wide_addr_y}; // end - - + + // // Source Read Enable Logic // + task _update_wide_rd_en; input _en; {rd_wide_ena_x, rd_wide_ena_y } <= {2{_en}}; endtask + task _update_narrow_rd_en; input _en; {rd_narrow_ena_x, rd_narrow_ena_y} <= {2{_en}}; endtask - task _update_wide_xy_rd_en; input _en; {rd_wide_xy_ena_x, rd_wide_xy_ena_y } <= {2{_en}}; endtask - task _update_narrow_xy_rd_en; input _en; {rd_narrow_xy_ena_x, rd_narrow_xy_ena_y} <= {2{_en}}; endtask + task enable_wide_rd_en; _update_wide_rd_en(1'b1); endtask + task disable_wide_rd_en; _update_wide_rd_en(1'b0); endtask - task enable_wide_xy_rd_en; _update_wide_xy_rd_en(1'b1); endtask - task disable_wide_xy_rd_en; _update_wide_xy_rd_en(1'b0); endtask - - task enable_narrow_xy_rd_en; _update_narrow_xy_rd_en(1'b1); endtask - task disable_narrow_xy_rd_en; _update_narrow_xy_rd_en(1'b0); endtask + task enable_narrow_rd_en; _update_narrow_rd_en(1'b1); endtask + task disable_narrow_rd_en; _update_narrow_rd_en(1'b0); endtask always @(posedge clk or negedge rst_n) // if (!rst_n) begin // - disable_wide_xy_rd_en; - disable_narrow_xy_rd_en; + disable_wide_rd_en; + disable_narrow_rd_en; // end else begin // - disable_wide_xy_rd_en; - disable_narrow_xy_rd_en; - // - // one_pass + disable_wide_rd_en; + disable_narrow_rd_en; // - case (wrk_fsm_state_next_one_pass) + case (opcode) // - WRK_FSM_STATE_LATENCY_PRE1, - WRK_FSM_STATE_LATENCY_PRE2, - WRK_FSM_STATE_BUSY: + UOP_OPCODE_PROPAGATE_CARRIES, + UOP_OPCODE_OUTPUT_FROM_NARROW, + UOP_OPCODE_MODULAR_REDUCE_INIT, + UOP_OPCODE_MODULAR_SUBTRACT_X: // - case (opcode) - // - UOP_OPCODE_PROPAGATE_CARRIES, - UOP_OPCODE_OUTPUT_FROM_NARROW, - UOP_OPCODE_MODULAR_REDUCE_INIT: - // - enable_narrow_xy_rd_en; - // - UOP_OPCODE_COPY_CRT_Y2X: begin - // - enable_wide_xy_rd_en; - enable_narrow_xy_rd_en; - // - end - // - UOP_OPCODE_MERGE_LH: - // - enable_wide_xy_rd_en; - // + case (wrk_fsm_state_next) + WRK_FSM_STATE_LATENCY_PRE1, + WRK_FSM_STATE_LATENCY_PRE3, + WRK_FSM_STATE_BUSY1: enable_narrow_rd_en; endcase // - endcase - // - // one_pass_meander - // - case (wrk_fsm_state_next_one_pass_meander) - // - WRK_FSM_STATE_LATENCY_PRE1_M1, - WRK_FSM_STATE_LATENCY_PRE1_M2, - WRK_FSM_STATE_LATENCY_PRE2_M1, - WRK_FSM_STATE_LATENCY_PRE2_M2, - WRK_FSM_STATE_BUSY_M1, - WRK_FSM_STATE_BUSY_M2: + UOP_OPCODE_COPY_CRT_Y2X, + UOP_OPCODE_MODULAR_SUBTRACT_Y, + UOP_OPCODE_MODULAR_SUBTRACT_Z, + UOP_OPCODE_REGULAR_ADD_UNEVEN: // - case (opcode) - // - UOP_OPCODE_COPY_LADDERS_X2Y, - UOP_OPCODE_CROSS_LADDERS_X2Y: begin - // - enable_wide_xy_rd_en; - enable_narrow_xy_rd_en; - // - end - // - UOP_OPCODE_REGULAR_ADD_UNEVEN: - // - enable_narrow_xy_rd_en; - // + case (wrk_fsm_state_next) + WRK_FSM_STATE_LATENCY_PRE1, + WRK_FSM_STATE_LATENCY_PRE3, + WRK_FSM_STATE_BUSY1: begin enable_wide_rd_en; enable_narrow_rd_en; end endcase // - endcase - // - // two_pass - // - case (wrk_fsm_state_next_two_pass) - // - WRK_FSM_STATE_LATENCY_PRE1_TP, - WRK_FSM_STATE_LATENCY_PRE2_TP, - WRK_FSM_STATE_LATENCY_PRE3_TP, - WRK_FSM_STATE_LATENCY_PRE4_TP, - WRK_FSM_STATE_BUSY_TP: + UOP_OPCODE_COPY_LADDERS_X2Y, + UOP_OPCODE_CROSS_LADDERS_X2Y: // - case (opcode) - UOP_OPCODE_MODULAR_SUBTRACT: - // - if (!wrk_fsm_two_pass_pass) begin - enable_wide_xy_rd_en; - enable_narrow_xy_rd_en; - end else - enable_narrow_xy_rd_en; - // + case (wrk_fsm_state_next) + WRK_FSM_STATE_LATENCY_PRE1, + WRK_FSM_STATE_LATENCY_PRE2, + WRK_FSM_STATE_LATENCY_PRE3, + WRK_FSM_STATE_LATENCY_PRE4, + WRK_FSM_STATE_BUSY1, + WRK_FSM_STATE_BUSY2: begin enable_wide_rd_en; enable_narrow_rd_en; end endcase // + UOP_OPCODE_MERGE_LH: + // + case (wrk_fsm_state_next) + WRK_FSM_STATE_LATENCY_PRE1, + WRK_FSM_STATE_LATENCY_PRE3, + WRK_FSM_STATE_BUSY1: enable_wide_rd_en; + endcase + // endcase // end @@ -435,490 +353,330 @@ module modexpng_general_worker // Destination Write Enable Logic // - task _update_wide_xy_wr_en; input _en; {wr_wide_xy_ena_x, wr_wide_xy_ena_y } <= {2{_en}}; endtask - task _update_narrow_xy_wr_en; input _en; {wr_narrow_xy_ena_x, wr_narrow_xy_ena_y} <= {2{_en}}; endtask + task _update_wide_wr_en; input _en; {wr_wide_xy_ena_x, wr_wide_xy_ena_y } <= {2{_en}}; endtask + task _update_narrow_wr_en; input _en; {wr_narrow_xy_ena_x, wr_narrow_xy_ena_y} <= {2{_en}}; endtask - task enable_wide_xy_wr_en; _update_wide_xy_wr_en(1'b1); endtask - task disable_wide_xy_wr_en; _update_wide_xy_wr_en(1'b0); endtask + task enable_wide_wr_en; _update_wide_wr_en(1'b1); endtask + task disable_wide_wr_en; _update_wide_wr_en(1'b0); endtask - task enable_narrow_xy_wr_en; _update_narrow_xy_wr_en(1'b1); endtask - task disable_narrow_xy_wr_en; _update_narrow_xy_wr_en(1'b0); endtask + task enable_narrow_wr_en; _update_narrow_wr_en(1'b1); endtask + task disable_narrow_wr_en; _update_narrow_wr_en(1'b0); endtask always @(posedge clk or negedge rst_n) // if (!rst_n) begin // - disable_wide_xy_wr_en; - disable_narrow_xy_wr_en; + disable_wide_wr_en; + disable_narrow_wr_en; // end else begin // - disable_wide_xy_wr_en; - disable_narrow_xy_wr_en; + disable_wide_wr_en; + disable_narrow_wr_en; // - // one_pass - // - case (wrk_fsm_state) + case (opcode) // - WRK_FSM_STATE_BUSY, - WRK_FSM_STATE_LATENCY_POST1, - WRK_FSM_STATE_LATENCY_POST2: + UOP_OPCODE_PROPAGATE_CARRIES, + UOP_OPCODE_MODULAR_SUBTRACT_X, + UOP_OPCODE_MERGE_LH, + UOP_OPCODE_REGULAR_ADD_UNEVEN: // - case (opcode) - // - UOP_OPCODE_PROPAGATE_CARRIES, - UOP_OPCODE_MERGE_LH: - // - enable_narrow_xy_wr_en; - // - UOP_OPCODE_COPY_CRT_Y2X: begin - // - enable_wide_xy_wr_en; - enable_narrow_xy_wr_en; - // - end - // - UOP_OPCODE_MODULAR_REDUCE_INIT: - // - enable_wide_xy_wr_en; - // + case (wrk_fsm_state) + WRK_FSM_STATE_BUSY1, + WRK_FSM_STATE_LATENCY_POST1, + WRK_FSM_STATE_LATENCY_POST3: enable_narrow_wr_en; endcase // - endcase - // - // one_pass_meander - // - case (wrk_fsm_state) - // - WRK_FSM_STATE_BUSY_M2, - WRK_FSM_STATE_LATENCY_POST1_M2, - WRK_FSM_STATE_LATENCY_POST2_M2: + UOP_OPCODE_COPY_CRT_Y2X, + UOP_OPCODE_COPY_LADDERS_X2Y, + UOP_OPCODE_CROSS_LADDERS_X2Y, + UOP_OPCODE_MODULAR_SUBTRACT_Z: // - case (opcode) - // - UOP_OPCODE_COPY_LADDERS_X2Y, - UOP_OPCODE_CROSS_LADDERS_X2Y: begin - // - enable_wide_xy_wr_en; - enable_narrow_xy_wr_en; - // - end - // - UOP_OPCODE_REGULAR_ADD_UNEVEN: - // - enable_narrow_xy_wr_en; - // + case (wrk_fsm_state) + WRK_FSM_STATE_BUSY1, + WRK_FSM_STATE_LATENCY_POST1, + WRK_FSM_STATE_LATENCY_POST3: begin enable_wide_wr_en; enable_narrow_wr_en; end endcase // - endcase - // - // two_pass - // - case (wrk_fsm_state) - // - WRK_FSM_STATE_BUSY_TP, - WRK_FSM_STATE_LATENCY_POST1_TP, - WRK_FSM_STATE_LATENCY_POST2_TP, - WRK_FSM_STATE_LATENCY_POST3_TP, - WRK_FSM_STATE_LATENCY_POST4_TP: + UOP_OPCODE_MODULAR_REDUCE_INIT, + UOP_OPCODE_MODULAR_SUBTRACT_Y: // - case (opcode) - // - UOP_OPCODE_MODULAR_SUBTRACT: - // - if (!wrk_fsm_two_pass_pass) - enable_narrow_xy_wr_en; - else begin - enable_wide_xy_wr_en; - enable_narrow_xy_wr_en; - end - // + case (wrk_fsm_state) + WRK_FSM_STATE_BUSY1, + WRK_FSM_STATE_LATENCY_POST1, + WRK_FSM_STATE_LATENCY_POST3: enable_wide_wr_en; endcase - // + // endcase // end - + // - // Source to Destination Data Logic + // Source Read Address Logic // + reg [OP_ADDR_W -1:0] rd_wide_addr_next; + reg [OP_ADDR_W -1:0] rd_narrow_addr_next; + + reg rd_wide_addr_is_last = 1'b0; + reg rd_narrow_addr_is_last = 1'b0; + + reg rd_wide_addr_is_last_half = 1'b0; + reg rd_narrow_addr_is_last_half = 1'b0; + + reg rd_wide_addr_next_is_last = 1'b0; + reg rd_narrow_addr_next_is_last = 1'b0; + reg rd_wide_addr_next_is_last_half = 1'b0; + reg rd_narrow_addr_next_is_last_half = 1'b0; + + reg [3:0] rd_wide_addr_is_last_half_dly = 4'h0; + reg [3:0] rd_narrow_addr_is_last_half_dly = 4'h0; + always @(posedge clk) begin // - update_wide_dout (WORD_EXT_DNC, WORD_EXT_DNC, WORD_EXT_DNC, WORD_EXT_DNC); - update_narrow_dout(WORD_EXT_DNC, WORD_EXT_DNC, WORD_EXT_DNC, WORD_EXT_DNC); - // - // one_pass - // - case (wrk_fsm_state) - // - WRK_FSM_STATE_BUSY, - WRK_FSM_STATE_LATENCY_POST1, - WRK_FSM_STATE_LATENCY_POST2: - // - case (opcode) - // - UOP_OPCODE_PROPAGATE_CARRIES: - // - update_narrow_dout(rd_narrow_x_din_x_w_cry_reduced, - rd_narrow_y_din_x_w_cry_reduced, - rd_narrow_x_din_y_w_cry_reduced, - rd_narrow_y_din_y_w_cry_reduced); - // - UOP_OPCODE_COPY_CRT_Y2X: begin - // - update_wide_dout(wrk_rd_wide_x_din_y, - wrk_rd_wide_y_din_y, - wrk_rd_wide_x_din_y, - wrk_rd_wide_y_din_y); - // - update_narrow_dout(wrk_rd_narrow_x_din_y, - wrk_rd_narrow_y_din_y, - wrk_rd_narrow_x_din_y, - wrk_rd_narrow_y_din_y); - // - end - // - UOP_OPCODE_MODULAR_REDUCE_INIT: - // - update_wide_dout(wrk_rd_narrow_x_din_x, - wrk_rd_narrow_y_din_x, - wrk_rd_narrow_x_din_y, - wrk_rd_narrow_y_din_y); - // - UOP_OPCODE_MERGE_LH: - // - update_narrow_dout(wrk_rd_wide_x_din_x, - wrk_rd_wide_y_din_x, - wrk_rd_wide_x_din_y, - wrk_rd_wide_y_din_y); - // - endcase - // - endcase - // - // one_pass_meander - // - case (wrk_fsm_state) - // - WRK_FSM_STATE_BUSY_M2, - WRK_FSM_STATE_LATENCY_POST1_M2, - WRK_FSM_STATE_LATENCY_POST2_M2: - // - case (opcode) - // - UOP_OPCODE_COPY_LADDERS_X2Y: begin - // - update_wide_dout(wrk_rd_wide_x_din_x_dly3, - wrk_rd_wide_x_din_x_dly2, - wrk_rd_wide_x_din_y_dly3, - wrk_rd_wide_x_din_y_dly2); - // - update_narrow_dout(wrk_rd_narrow_x_din_x_dly3, - wrk_rd_narrow_x_din_x_dly2, - wrk_rd_narrow_x_din_y_dly3, - wrk_rd_narrow_x_din_y_dly2); - // - end - // - UOP_OPCODE_CROSS_LADDERS_X2Y: begin - // - update_wide_dout(wrk_rd_wide_x_din_x_dly3, - wrk_rd_wide_x_din_y_dly2, - wrk_rd_wide_x_din_y_dly3, - wrk_rd_wide_x_din_x_dly2); - // - update_narrow_dout(wrk_rd_narrow_x_din_x_dly3, - wrk_rd_narrow_x_din_y_dly2, - wrk_rd_narrow_x_din_y_dly3, - wrk_rd_narrow_x_din_x_dly2); - // - end - // - UOP_OPCODE_REGULAR_ADD_UNEVEN: begin - // - update_narrow_dout(regadd_x_x_trunc, - regadd_y_x_trunc, - regadd_x_y_trunc, - regadd_y_y_trunc); - // - end - // - endcase - // - endcase - // - // two_pass - // - case (wrk_fsm_state) - // - WRK_FSM_STATE_BUSY_TP, - WRK_FSM_STATE_LATENCY_POST1_TP, - WRK_FSM_STATE_LATENCY_POST2_TP, - WRK_FSM_STATE_LATENCY_POST3_TP, - WRK_FSM_STATE_LATENCY_POST4_TP: - // - case (opcode) - // - UOP_OPCODE_MODULAR_SUBTRACT: - // - if (!wrk_fsm_two_pass_pass) - update_narrow_dout(modsub_x_ab_dly_trunc, modsub_x_abn_trunc, modsub_y_ab_dly_trunc, modsub_y_abn_trunc); - else begin - update_wide_dout (modsub_x_mux, modsub_x_mux, modsub_y_mux, modsub_y_mux); - update_narrow_dout(modsub_x_mux, modsub_x_mux, modsub_y_mux, modsub_y_mux); - end - // - endcase - // - endcase + rd_wide_addr_is_last_half_dly <= {rd_wide_addr_is_last_half_dly[2:0], rd_wide_addr_is_last_half}; + rd_narrow_addr_is_last_half_dly <= {rd_narrow_addr_is_last_half_dly[2:0], rd_narrow_addr_is_last_half}; // end - - // - // Source Read Address Logic - // - - reg [OP_ADDR_W -1:0] rd_wide_xy_addr_xy_next; - reg [OP_ADDR_W -1:0] rd_narrow_xy_addr_xy_next; - - reg rd_wide_xy_addr_xy_next_last_seen; - reg rd_wide_xy_addr_xy_next_last_seen_dly1; - reg rd_wide_xy_addr_xy_next_last_seen_dly2; - - wire rd_wide_xy_addr_xy_next_is_last = rd_wide_xy_addr_xy_next == word_index_last_half; - wire rd_narrow_xy_addr_xy_next_is_last = rd_narrow_xy_addr_xy_next == word_index_last; + task preset_rd_wide_bank_addr; + input [BANK_ADDR_W -1:0] bank; + input [ OP_ADDR_W -1:0] addr; + begin + {rd_wide_bank_x, rd_wide_addr_x} <= {bank, addr}; + {rd_wide_bank_y, rd_wide_addr_y} <= {bank, addr}; + rd_wide_addr_is_last <= 1'b0; + rd_wide_addr_is_last_half <= 1'b0; + end + endtask - task update_rd_wide_bank_addr; + task preset_rd_narrow_bank_addr; input [BANK_ADDR_W -1:0] bank; input [ OP_ADDR_W -1:0] addr; begin - {rd_wide_xy_bank_x, rd_wide_xy_addr_x} <= {bank, addr}; - {rd_wide_xy_bank_y, rd_wide_xy_addr_y} <= {bank, addr}; + {rd_narrow_bank_x, rd_narrow_addr_x} <= {bank, addr}; + {rd_narrow_bank_y, rd_narrow_addr_y} <= {bank, addr}; + rd_narrow_addr_is_last <= 1'b0; + rd_narrow_addr_is_last_half <= 1'b0; + end + endtask + + task preset_rd_wide_addr_next; + input [OP_ADDR_W -1:0] addr; + begin + rd_wide_addr_next <= addr; + rd_wide_addr_next_is_last <= 1'b0; + rd_wide_addr_next_is_last_half <= 1'b0; end endtask - task update_rd_wide_bank; - input [BANK_ADDR_W -1:0] bank; + task preset_rd_narrow_addr_next; + input [OP_ADDR_W -1:0] addr; + begin + rd_narrow_addr_next <= addr; + rd_narrow_addr_next_is_last <= 1'b0; + rd_narrow_addr_next_is_last_half <= 1'b0; + end + endtask + + task keep_rd_wide_bank; begin - {rd_wide_xy_bank_x, rd_wide_xy_addr_x} <= {bank, rd_wide_xy_addr_x}; - {rd_wide_xy_bank_y, rd_wide_xy_addr_y} <= {bank, rd_wide_xy_addr_y}; + {rd_wide_bank_x} <= {rd_wide_bank_x}; + {rd_wide_bank_y} <= {rd_wide_bank_y}; end endtask - task update_rd_narrow_bank_addr; + task switch_rd_wide_bank; input [BANK_ADDR_W -1:0] bank; - input [ OP_ADDR_W -1:0] addr; begin - {rd_narrow_xy_bank_x, rd_narrow_xy_addr_x} <= {bank, addr}; - {rd_narrow_xy_bank_y, rd_narrow_xy_addr_y} <= {bank, addr}; + {rd_wide_bank_x} <= {bank}; + {rd_wide_bank_y} <= {bank}; + end + endtask + + task keep_rd_wide_addr; + begin + {rd_wide_addr_x} <= {rd_wide_addr_x}; + {rd_wide_addr_y} <= {rd_wide_addr_y}; + end + endtask + + task advance_rd_wide_addr; + begin + {rd_wide_addr_x} <= {rd_wide_addr_next}; + {rd_wide_addr_y} <= {rd_wide_addr_next}; + rd_wide_addr_is_last <= rd_wide_addr_next == word_index_last; + rd_wide_addr_is_last_half <= rd_wide_addr_next == word_index_last_half; + end + endtask + + task keep_rd_narrow_bank; + begin + {rd_narrow_bank_x} <= {rd_narrow_bank_x}; + {rd_narrow_bank_y} <= {rd_narrow_bank_y}; end endtask - task update_rd_narrow_bank; + task switch_rd_narrow_bank; input [BANK_ADDR_W -1:0] bank; begin - {rd_narrow_xy_bank_x, rd_narrow_xy_addr_x} <= {bank, rd_narrow_xy_addr_x}; - {rd_narrow_xy_bank_y, rd_narrow_xy_addr_y} <= {bank, rd_narrow_xy_addr_y}; + {rd_narrow_bank_x} <= {bank}; + {rd_narrow_bank_y} <= {bank}; end endtask - task update_rd_wide_addr_next; - input [OP_ADDR_W -1:0] addr; - rd_wide_xy_addr_xy_next <= addr; + task keep_rd_narrow_addr; + begin + {rd_narrow_addr_x} <= {rd_narrow_addr_x}; + {rd_narrow_addr_y} <= {rd_narrow_addr_y}; + end + endtask + + task advance_rd_narrow_addr; + begin + {rd_narrow_addr_x} <= {rd_narrow_addr_next}; + {rd_narrow_addr_y} <= {rd_narrow_addr_next}; + rd_narrow_addr_is_last <= rd_narrow_addr_next == word_index_last; + rd_narrow_addr_is_last_half <= rd_narrow_addr_next == word_index_last_half; + end + endtask + + task update_rd_wide_addr_flags; + begin + rd_wide_addr_next_is_last <= rd_wide_addr_next == (word_index_last - 1'b1); + rd_wide_addr_next_is_last_half <= rd_wide_addr_next == (word_index_last_half - 1'b1); + end endtask - task update_rd_narrow_addr_next; - input [OP_ADDR_W -1:0] addr; - rd_narrow_xy_addr_xy_next <= addr; + task update_rd_narrow_addr_flags; + begin + rd_narrow_addr_next_is_last <= rd_narrow_addr_next == (word_index_last - 1'b1); + rd_narrow_addr_next_is_last_half <= rd_narrow_addr_next == (word_index_last_half - 1'b1); + end endtask task advance_rd_wide_addr_next; - rd_wide_xy_addr_xy_next <= !rd_wide_xy_addr_xy_next_is_last ? rd_wide_xy_addr_xy_next + 1'b1 : OP_ADDR_ZERO; + begin + rd_wide_addr_next <= !rd_wide_addr_next_is_last ? rd_wide_addr_next + 1'b1 : OP_ADDR_ZERO; + update_rd_wide_addr_flags; + end endtask task advance_rd_narrow_addr_next; - rd_narrow_xy_addr_xy_next <= !rd_narrow_xy_addr_xy_next_is_last ? rd_narrow_xy_addr_xy_next + 1'b1 : OP_ADDR_ZERO; + begin + rd_narrow_addr_next <= !rd_narrow_addr_next_is_last ? rd_narrow_addr_next + 1'b1 : OP_ADDR_ZERO; + update_rd_narrow_addr_flags; + end + endtask + + task advance_rd_wide_addr_next_half; + begin + rd_wide_addr_next <= !rd_wide_addr_next_is_last_half ? rd_wide_addr_next + 1'b1 : OP_ADDR_ZERO; + update_rd_wide_addr_flags; + end + endtask + + task advance_rd_narrow_addr_next_half; + begin + rd_narrow_addr_next <= !rd_narrow_addr_next_is_last_half ? rd_narrow_addr_next + 1'b1 : OP_ADDR_ZERO; + update_rd_narrow_addr_flags; + end endtask - - always @(posedge clk) - // - case (opcode) - UOP_OPCODE_MERGE_LH: - case (wrk_fsm_state_next_one_pass) - WRK_FSM_STATE_LATENCY_PRE1: - rd_wide_xy_addr_xy_next_last_seen <= 1'b0; - WRK_FSM_STATE_BUSY: - if (!rd_wide_xy_addr_xy_next_last_seen && rd_wide_xy_addr_xy_next_is_last) - rd_wide_xy_addr_xy_next_last_seen <= 1'b1; - endcase - UOP_OPCODE_REGULAR_ADD_UNEVEN: - case (wrk_fsm_state_next_one_pass_meander) - WRK_FSM_STATE_LATENCY_PRE1_M1: begin - rd_wide_xy_addr_xy_next_last_seen <= 1'b0; - rd_wide_xy_addr_xy_next_last_seen_dly1 <= 1'b0; - rd_wide_xy_addr_xy_next_last_seen_dly2 <= 1'b0; - end - WRK_FSM_STATE_BUSY_M1: begin - if (!rd_wide_xy_addr_xy_next_last_seen && rd_wide_xy_addr_xy_next_is_last) - rd_wide_xy_addr_xy_next_last_seen <= 1'b1; - rd_wide_xy_addr_xy_next_last_seen_dly1 <= rd_wide_xy_addr_xy_next_last_seen; - rd_wide_xy_addr_xy_next_last_seen_dly2 <= rd_wide_xy_addr_xy_next_last_seen_dly1; - end - endcase - endcase always @(posedge clk) begin // - update_rd_wide_bank_addr (BANK_DNC, OP_ADDR_DNC); - update_rd_narrow_bank_addr(BANK_DNC, OP_ADDR_DNC); + preset_rd_wide_bank_addr (BANK_DNC, OP_ADDR_DNC); + preset_rd_narrow_bank_addr(BANK_DNC, OP_ADDR_DNC); // - // one_pass - // - case (wrk_fsm_state_next_one_pass) + case (opcode) // - WRK_FSM_STATE_LATENCY_PRE1: - // - case (opcode) - // - UOP_OPCODE_PROPAGATE_CARRIES, - UOP_OPCODE_OUTPUT_FROM_NARROW, - UOP_OPCODE_COPY_CRT_Y2X, - UOP_OPCODE_MODULAR_REDUCE_INIT: begin - // - update_rd_wide_bank_addr (sel_wide_in, OP_ADDR_ZERO); update_rd_wide_addr_next (OP_ADDR_ONE); - update_rd_narrow_bank_addr(sel_narrow_in, OP_ADDR_ZERO); update_rd_narrow_addr_next(OP_ADDR_ONE); - // - end - // - UOP_OPCODE_MERGE_LH: begin - update_rd_wide_bank_addr (BANK_WIDE_L, OP_ADDR_ZERO); update_rd_wide_addr_next (OP_ADDR_ONE); - update_rd_narrow_bank_addr(sel_narrow_in, OP_ADDR_ZERO); update_rd_narrow_addr_next(OP_ADDR_ONE); - end - // - endcase - // - WRK_FSM_STATE_LATENCY_PRE2, - WRK_FSM_STATE_BUSY: + UOP_OPCODE_PROPAGATE_CARRIES, + UOP_OPCODE_OUTPUT_FROM_NARROW, + UOP_OPCODE_MODULAR_SUBTRACT_X: // - case (opcode) - // - UOP_OPCODE_PROPAGATE_CARRIES, - UOP_OPCODE_OUTPUT_FROM_NARROW, - UOP_OPCODE_COPY_CRT_Y2X: begin - // - update_rd_wide_bank_addr (sel_wide_in, rd_narrow_xy_addr_xy_next); advance_rd_wide_addr_next ; - update_rd_narrow_bank_addr(sel_narrow_in, rd_narrow_xy_addr_xy_next); advance_rd_narrow_addr_next; - // - end - // - UOP_OPCODE_MODULAR_REDUCE_INIT: begin - // - update_rd_wide_bank_addr (sel_wide_in, rd_wide_xy_addr_xy_next ); advance_rd_wide_addr_next ; - update_rd_narrow_bank_addr(sel_narrow_in, rd_narrow_xy_addr_xy_next); advance_rd_narrow_addr_next; - // - end - // - UOP_OPCODE_MERGE_LH: begin - // - if (!rd_wide_xy_addr_xy_next_last_seen) update_rd_wide_bank_addr (BANK_WIDE_L, rd_wide_xy_addr_xy_next ); - else update_rd_wide_bank_addr (BANK_WIDE_H, rd_wide_xy_addr_xy_next ); - advance_rd_wide_addr_next ; - update_rd_narrow_bank_addr(sel_narrow_in, rd_narrow_xy_addr_xy_next); advance_rd_narrow_addr_next; - // - end - // + case (wrk_fsm_state_next) + WRK_FSM_STATE_LATENCY_PRE1: begin preset_rd_narrow_bank_addr(sel_narrow_in, OP_ADDR_ZERO); preset_rd_narrow_addr_next(OP_ADDR_ONE); end + WRK_FSM_STATE_LATENCY_PRE3, + WRK_FSM_STATE_BUSY1: begin keep_rd_narrow_bank; advance_rd_narrow_addr; advance_rd_narrow_addr_next; end + WRK_FSM_STATE_LATENCY_PRE2, + WRK_FSM_STATE_LATENCY_PRE4, + WRK_FSM_STATE_BUSY2: keep_rd_narrow_bank; endcase // - endcase - // - // one_pass_meander - // - case (wrk_fsm_state_next_one_pass_meander) - // - WRK_FSM_STATE_LATENCY_PRE1_M1: - case (opcode) - UOP_OPCODE_COPY_LADDERS_X2Y, - UOP_OPCODE_CROSS_LADDERS_X2Y: begin - update_rd_wide_bank_addr (sel_wide_out, OP_ADDR_ZERO); update_rd_wide_addr_next (OP_ADDR_ONE); - update_rd_narrow_bank_addr(sel_narrow_out, OP_ADDR_ZERO); update_rd_narrow_addr_next(OP_ADDR_ONE); - end - UOP_OPCODE_REGULAR_ADD_UNEVEN: begin - update_rd_wide_bank_addr (sel_wide_in, OP_ADDR_ZERO); update_rd_wide_addr_next (OP_ADDR_ONE); - update_rd_narrow_bank_addr(sel_wide_in, OP_ADDR_ZERO); update_rd_narrow_addr_next(OP_ADDR_ONE); - end + UOP_OPCODE_COPY_CRT_Y2X, + UOP_OPCODE_MODULAR_SUBTRACT_Z, + UOP_OPCODE_REGULAR_ADD_UNEVEN: + // + case (wrk_fsm_state_next) + WRK_FSM_STATE_LATENCY_PRE1: begin preset_rd_wide_bank_addr (sel_wide_in, OP_ADDR_ZERO); preset_rd_wide_addr_next (OP_ADDR_ONE); + preset_rd_narrow_bank_addr(sel_narrow_in, OP_ADDR_ZERO); preset_rd_narrow_addr_next(OP_ADDR_ONE); end + WRK_FSM_STATE_LATENCY_PRE3, + WRK_FSM_STATE_BUSY1: begin keep_rd_wide_bank; advance_rd_wide_addr; advance_rd_wide_addr_next; + keep_rd_narrow_bank; advance_rd_narrow_addr; advance_rd_narrow_addr_next; end + WRK_FSM_STATE_LATENCY_PRE2, + WRK_FSM_STATE_LATENCY_PRE4, + WRK_FSM_STATE_BUSY2: begin keep_rd_wide_bank; keep_rd_narrow_bank; end endcase // - WRK_FSM_STATE_LATENCY_PRE2_M1, - WRK_FSM_STATE_BUSY_M1: - case (opcode) - UOP_OPCODE_COPY_LADDERS_X2Y, - UOP_OPCODE_CROSS_LADDERS_X2Y: begin - update_rd_wide_bank_addr (sel_wide_out, rd_narrow_xy_addr_xy_next); advance_rd_wide_addr_next ; - update_rd_narrow_bank_addr(sel_narrow_out, rd_narrow_xy_addr_xy_next); advance_rd_narrow_addr_next; - end - UOP_OPCODE_REGULAR_ADD_UNEVEN: begin - update_rd_wide_bank_addr (sel_wide_in, rd_narrow_xy_addr_xy_next); advance_rd_wide_addr_next ; - update_rd_narrow_bank_addr(sel_wide_in, rd_narrow_xy_addr_xy_next); advance_rd_narrow_addr_next; - end + UOP_OPCODE_MODULAR_REDUCE_INIT: + // + case (wrk_fsm_state_next) + WRK_FSM_STATE_LATENCY_PRE1: begin preset_rd_wide_bank_addr (BANK_DNC, OP_ADDR_ZERO); preset_rd_wide_addr_next (OP_ADDR_ONE); + preset_rd_narrow_bank_addr(sel_narrow_in, OP_ADDR_ZERO); preset_rd_narrow_addr_next(OP_ADDR_ONE); end + WRK_FSM_STATE_LATENCY_PRE3, + WRK_FSM_STATE_BUSY1: begin advance_rd_wide_addr; advance_rd_wide_addr_next_half; + keep_rd_narrow_bank; advance_rd_narrow_addr; advance_rd_narrow_addr_next; end + WRK_FSM_STATE_LATENCY_PRE2, + WRK_FSM_STATE_LATENCY_PRE4, + WRK_FSM_STATE_BUSY2: keep_rd_narrow_bank; endcase // - WRK_FSM_STATE_LATENCY_PRE1_M2, - WRK_FSM_STATE_LATENCY_PRE2_M2, - WRK_FSM_STATE_BUSY_M2: - case (opcode) - UOP_OPCODE_COPY_LADDERS_X2Y, - UOP_OPCODE_CROSS_LADDERS_X2Y: begin - update_rd_wide_bank (sel_wide_in ); - update_rd_narrow_bank(sel_narrow_in); - end - UOP_OPCODE_REGULAR_ADD_UNEVEN: begin - update_rd_wide_bank (sel_narrow_in); - update_rd_narrow_bank(sel_narrow_in); - end + UOP_OPCODE_COPY_LADDERS_X2Y, + UOP_OPCODE_CROSS_LADDERS_X2Y: + // + case (wrk_fsm_state_next) + WRK_FSM_STATE_LATENCY_PRE1: begin preset_rd_wide_bank_addr (sel_wide_in, OP_ADDR_ZERO); preset_rd_wide_addr_next (OP_ADDR_ONE); + preset_rd_narrow_bank_addr(sel_narrow_in, OP_ADDR_ZERO); preset_rd_narrow_addr_next(OP_ADDR_ONE); end + WRK_FSM_STATE_LATENCY_PRE2: begin switch_rd_wide_bank (sel_wide_out); keep_rd_wide_addr; + switch_rd_narrow_bank(sel_narrow_out); keep_rd_narrow_addr; end + WRK_FSM_STATE_LATENCY_PRE3, + WRK_FSM_STATE_BUSY1: begin advance_rd_wide_addr; advance_rd_wide_addr_next; switch_rd_wide_bank(sel_wide_in); + advance_rd_narrow_addr; advance_rd_narrow_addr_next; switch_rd_narrow_bank(sel_narrow_in); end + WRK_FSM_STATE_LATENCY_PRE4, + WRK_FSM_STATE_BUSY2: begin keep_rd_wide_addr; switch_rd_wide_bank (sel_wide_out); + keep_rd_narrow_addr; switch_rd_narrow_bank(sel_narrow_out); end endcase // - endcase - // - // two_pass - // - case (wrk_fsm_state_next_two_pass) - // - WRK_FSM_STATE_LATENCY_PRE1_TP: + UOP_OPCODE_MODULAR_SUBTRACT_Y: // - case (opcode) - // - UOP_OPCODE_MODULAR_SUBTRACT: - // - if (!wrk_fsm_two_pass_pass) begin - update_rd_wide_bank_addr (BANK_WIDE_N, OP_ADDR_ZERO); update_rd_wide_addr_next (OP_ADDR_ONE); - update_rd_narrow_bank_addr(sel_narrow_in, OP_ADDR_ZERO); update_rd_narrow_addr_next(OP_ADDR_ONE); - end else begin - update_rd_narrow_bank_addr(sel_narrow_out, OP_ADDR_ZERO); update_rd_narrow_addr_next(OP_ADDR_ONE); - end - // + case (wrk_fsm_state_next) + WRK_FSM_STATE_LATENCY_PRE1: begin preset_rd_wide_bank_addr (BANK_WIDE_N, OP_ADDR_ZERO); preset_rd_wide_addr_next (OP_ADDR_ONE); + preset_rd_narrow_bank_addr(sel_narrow_in, OP_ADDR_ZERO); preset_rd_narrow_addr_next(OP_ADDR_ONE); end + WRK_FSM_STATE_LATENCY_PRE3, + WRK_FSM_STATE_BUSY1: begin keep_rd_wide_bank; advance_rd_wide_addr; advance_rd_wide_addr_next; + keep_rd_narrow_bank; advance_rd_narrow_addr; advance_rd_narrow_addr_next; end + WRK_FSM_STATE_LATENCY_PRE2, + WRK_FSM_STATE_LATENCY_PRE4, + WRK_FSM_STATE_BUSY2: begin keep_rd_wide_bank; keep_rd_narrow_bank; end endcase + // + UOP_OPCODE_MERGE_LH: // - WRK_FSM_STATE_LATENCY_PRE2_TP, - WRK_FSM_STATE_LATENCY_PRE3_TP, - WRK_FSM_STATE_LATENCY_PRE4_TP, - WRK_FSM_STATE_BUSY_TP: - // - case (opcode) - // - UOP_OPCODE_MODULAR_SUBTRACT: - // - if (!wrk_fsm_two_pass_pass) begin - update_rd_wide_bank_addr (BANK_WIDE_N, rd_narrow_xy_addr_xy_next); advance_rd_wide_addr_next ; - update_rd_narrow_bank_addr(sel_narrow_in, rd_narrow_xy_addr_xy_next); advance_rd_narrow_addr_next; - end else begin - update_rd_narrow_bank_addr(sel_narrow_out, rd_narrow_xy_addr_xy_next); advance_rd_narrow_addr_next; - end - // + case (wrk_fsm_state_next) + WRK_FSM_STATE_LATENCY_PRE1: begin preset_rd_wide_bank_addr (BANK_WIDE_L, OP_ADDR_ZERO); preset_rd_wide_addr_next (OP_ADDR_ONE); + preset_rd_narrow_bank_addr(BANK_DNC, OP_ADDR_ZERO); preset_rd_narrow_addr_next(OP_ADDR_ONE); end + WRK_FSM_STATE_LATENCY_PRE3: begin keep_rd_wide_bank; advance_rd_wide_addr; advance_rd_wide_addr_next_half; + advance_rd_narrow_addr; advance_rd_narrow_addr_next; end + WRK_FSM_STATE_BUSY1: begin if (!rd_wide_addr_is_last_half_dly[0]) keep_rd_wide_bank; + else switch_rd_wide_bank(BANK_WIDE_H); + advance_rd_wide_addr; advance_rd_wide_addr_next_half; + advance_rd_narrow_addr; advance_rd_narrow_addr_next; end + WRK_FSM_STATE_LATENCY_PRE2, + WRK_FSM_STATE_LATENCY_PRE4, + WRK_FSM_STATE_BUSY2: keep_rd_wide_bank; endcase - // + // endcase // end @@ -927,13 +685,21 @@ module modexpng_general_worker // // Destination Write Address Logic // - - wire uop_modular_reduce_init_feed_lsb_x = rd_narrow_xy_addr_x_dly2 <= word_index_last_half; - wire uop_modular_reduce_init_feed_lsb_y = rd_narrow_xy_addr_y_dly2 <= word_index_last_half; - - wire [BANK_ADDR_W -1:0] uop_modular_reduce_init_bank_x = uop_modular_reduce_init_feed_lsb_x ? BANK_WIDE_L : BANK_WIDE_H; - wire [BANK_ADDR_W -1:0] uop_modular_reduce_init_bank_y = uop_modular_reduce_init_feed_lsb_y ? BANK_WIDE_L : BANK_WIDE_H; + reg modular_reduce_init_first_half_x; + reg modular_reduce_init_first_half_y; + reg [BANK_ADDR_W -1:0] modular_reduce_init_sel_wide_out_x; + reg [BANK_ADDR_W -1:0] modular_reduce_init_sel_wide_out_y; + always @(posedge clk) begin + // + modular_reduce_init_first_half_x <= rd_narrow_addr_x_dly[1] <= word_index_last_half; + modular_reduce_init_first_half_y <= rd_narrow_addr_y_dly[1] <= word_index_last_half; + // + modular_reduce_init_sel_wide_out_x <= modular_reduce_init_first_half_x ? BANK_WIDE_L : BANK_WIDE_H; + modular_reduce_init_sel_wide_out_y <= modular_reduce_init_first_half_y ? BANK_WIDE_L : BANK_WIDE_H; + // + end + task update_wr_wide_bank_addr; input [BANK_ADDR_W -1:0] x_bank; input [BANK_ADDR_W -1:0] y_bank; @@ -955,120 +721,351 @@ module modexpng_general_worker {wr_narrow_xy_bank_y, wr_narrow_xy_addr_y} <= {y_bank, y_addr}; end endtask - + always @(posedge clk) begin // update_wr_wide_bank_addr (BANK_DNC, BANK_DNC, OP_ADDR_DNC, OP_ADDR_DNC); update_wr_narrow_bank_addr(BANK_DNC, BANK_DNC, OP_ADDR_DNC, OP_ADDR_DNC); // - // one_pass - // - case (wrk_fsm_state) + case (opcode) // - WRK_FSM_STATE_BUSY, - WRK_FSM_STATE_LATENCY_POST1, - WRK_FSM_STATE_LATENCY_POST2: + UOP_OPCODE_PROPAGATE_CARRIES, + UOP_OPCODE_MODULAR_SUBTRACT_X, + UOP_OPCODE_MERGE_LH, + UOP_OPCODE_REGULAR_ADD_UNEVEN: // - case (opcode) - // - UOP_OPCODE_PROPAGATE_CARRIES, - UOP_OPCODE_COPY_CRT_Y2X: begin - update_wr_wide_bank_addr (sel_wide_out, sel_wide_out, rd_narrow_xy_addr_x_dly2, rd_narrow_xy_addr_y_dly2); - update_wr_narrow_bank_addr(sel_narrow_out, sel_narrow_out, rd_narrow_xy_addr_x_dly2, rd_narrow_xy_addr_y_dly2); - end - // - UOP_OPCODE_MODULAR_REDUCE_INIT: - update_wr_wide_bank_addr(uop_modular_reduce_init_bank_x, uop_modular_reduce_init_bank_y, rd_wide_xy_addr_x_dly2, rd_wide_xy_addr_y_dly2); - // - UOP_OPCODE_MERGE_LH: - update_wr_narrow_bank_addr(sel_narrow_out, sel_narrow_out, rd_narrow_xy_addr_x_dly2, rd_narrow_xy_addr_y_dly2); - // + case (wrk_fsm_state) + WRK_FSM_STATE_BUSY1, + WRK_FSM_STATE_LATENCY_POST1, + WRK_FSM_STATE_LATENCY_POST3: update_wr_narrow_bank_addr(sel_narrow_out, sel_narrow_out, rd_narrow_addr_x_dly[3], rd_narrow_addr_y_dly[3]); endcase - // - endcase - // - // one_pass_meander - // - case (wrk_fsm_state) // - WRK_FSM_STATE_BUSY_M2, - WRK_FSM_STATE_LATENCY_POST1_M2, - WRK_FSM_STATE_LATENCY_POST2_M2: - // - case (opcode) - UOP_OPCODE_COPY_LADDERS_X2Y, - UOP_OPCODE_CROSS_LADDERS_X2Y: begin - update_wr_wide_bank_addr (sel_wide_out, sel_wide_out, rd_narrow_xy_addr_x_dly4, rd_narrow_xy_addr_y_dly4); - update_wr_narrow_bank_addr(sel_narrow_out, sel_narrow_out, rd_narrow_xy_addr_x_dly4, rd_narrow_xy_addr_y_dly4); - end - UOP_OPCODE_REGULAR_ADD_UNEVEN: - update_wr_narrow_bank_addr(sel_narrow_out, sel_narrow_out, rd_narrow_xy_addr_x_dly4, rd_narrow_xy_addr_y_dly4); + UOP_OPCODE_COPY_CRT_Y2X, + UOP_OPCODE_COPY_LADDERS_X2Y, + UOP_OPCODE_CROSS_LADDERS_X2Y, + UOP_OPCODE_MODULAR_SUBTRACT_Z: + // + case (wrk_fsm_state) + WRK_FSM_STATE_BUSY1, + WRK_FSM_STATE_LATENCY_POST1, + WRK_FSM_STATE_LATENCY_POST3: begin update_wr_narrow_bank_addr(sel_narrow_out, sel_narrow_out, rd_narrow_addr_x_dly[3], rd_narrow_addr_y_dly[3]); + update_wr_wide_bank_addr (sel_wide_out, sel_wide_out, rd_wide_addr_x_dly[3], rd_wide_addr_y_dly[3] ); end endcase + // + UOP_OPCODE_MODULAR_REDUCE_INIT: + // + case (wrk_fsm_state) + WRK_FSM_STATE_BUSY1, + WRK_FSM_STATE_LATENCY_POST1, + WRK_FSM_STATE_LATENCY_POST3: update_wr_wide_bank_addr(modular_reduce_init_sel_wide_out_x, modular_reduce_init_sel_wide_out_y, rd_wide_addr_x_dly[3], rd_wide_addr_y_dly[3]); + endcase + // + UOP_OPCODE_MODULAR_SUBTRACT_Y: // + case (wrk_fsm_state) + WRK_FSM_STATE_BUSY1, + WRK_FSM_STATE_LATENCY_POST1, + WRK_FSM_STATE_LATENCY_POST3: update_wr_wide_bank_addr(sel_wide_out, sel_wide_out, rd_wide_addr_x_dly[3], rd_wide_addr_y_dly[3]); + endcase + // endcase // - // two_pass + end + + + // + // UOP_OPCODE_PROPAGATE_CARRIES + // + reg [CARRY_W -1:0] propagate_carries_x_x_cry_r; + reg [CARRY_W -1:0] propagate_carries_y_x_cry_r; + reg [CARRY_W -1:0] propagate_carries_x_y_cry_r; + reg [CARRY_W -1:0] propagate_carries_y_y_cry_r; + + wire [WORD_EXT_W -1:0] propagate_carries_x_x_w_cry = rd_narrow_x_din_x_dly1 + {{WORD_W{1'b0}}, propagate_carries_x_x_cry_r}; + wire [WORD_EXT_W -1:0] propagate_carries_y_x_w_cry = rd_narrow_y_din_x_dly1 + {{WORD_W{1'b0}}, propagate_carries_y_x_cry_r}; + wire [WORD_EXT_W -1:0] propagate_carries_x_y_w_cry = rd_narrow_x_din_y_dly1 + {{WORD_W{1'b0}}, propagate_carries_x_y_cry_r}; + wire [WORD_EXT_W -1:0] propagate_carries_y_y_w_cry = rd_narrow_y_din_y_dly1 + {{WORD_W{1'b0}}, propagate_carries_y_y_cry_r}; + + reg [WORD_EXT_W -1:0] propagate_carries_x_x_w_cry_r; + reg [WORD_EXT_W -1:0] propagate_carries_y_x_w_cry_r; + reg [WORD_EXT_W -1:0] propagate_carries_x_y_w_cry_r; + reg [WORD_EXT_W -1:0] propagate_carries_y_y_w_cry_r; + + wire [CARRY_W -1:0] propagate_carries_x_x_w_cry_msb = propagate_carries_x_x_w_cry_r[WORD_EXT_W -1:WORD_W]; + wire [CARRY_W -1:0] propagate_carries_y_x_w_cry_msb = propagate_carries_y_x_w_cry_r[WORD_EXT_W -1:WORD_W]; + wire [CARRY_W -1:0] propagate_carries_x_y_w_cry_msb = propagate_carries_x_y_w_cry_r[WORD_EXT_W -1:WORD_W]; + wire [CARRY_W -1:0] propagate_carries_y_y_w_cry_msb = propagate_carries_y_y_w_cry_r[WORD_EXT_W -1:WORD_W]; + + wire [WORD_W -1:0] propagate_carries_x_x_w_cry_lsb = propagate_carries_x_x_w_cry_r[WORD_W -1:0]; + wire [WORD_W -1:0] propagate_carries_y_x_w_cry_lsb = propagate_carries_y_x_w_cry_r[WORD_W -1:0]; + wire [WORD_W -1:0] propagate_carries_x_y_w_cry_lsb = propagate_carries_x_y_w_cry_r[WORD_W -1:0]; + wire [WORD_W -1:0] propagate_carries_y_y_w_cry_lsb = propagate_carries_y_y_w_cry_r[WORD_W -1:0]; + + wire [WORD_EXT_W -1:0] propagate_carries_x_x_w_cry_reduced = {{CARRY_W{1'b0}}, propagate_carries_x_x_w_cry_lsb}; + wire [WORD_EXT_W -1:0] propagate_carries_y_x_w_cry_reduced = {{CARRY_W{1'b0}}, propagate_carries_y_x_w_cry_lsb}; + wire [WORD_EXT_W -1:0] propagate_carries_x_y_w_cry_reduced = {{CARRY_W{1'b0}}, propagate_carries_x_y_w_cry_lsb}; + wire [WORD_EXT_W -1:0] propagate_carries_y_y_w_cry_reduced = {{CARRY_W{1'b0}}, propagate_carries_y_y_w_cry_lsb}; + + task _propagate_carries_update_cry; + input [CARRY_W-1:0] x_x_cry, y_x_cry, x_y_cry, y_y_cry; + { propagate_carries_x_x_cry_r, propagate_carries_y_x_cry_r, propagate_carries_x_y_cry_r, propagate_carries_y_y_cry_r} <= + { x_x_cry, y_x_cry, x_y_cry, y_y_cry}; + endtask + + task propagate_carries_clear_cry; _propagate_carries_update_cry( CARRY_ZERO, CARRY_ZERO, CARRY_ZERO, CARRY_ZERO); endtask + task propagate_carries_store_cry; _propagate_carries_update_cry(propagate_carries_x_x_w_cry_msb, propagate_carries_y_x_w_cry_msb, propagate_carries_x_y_w_cry_msb, propagate_carries_y_y_w_cry_msb); endtask + + task _propagate_carries_update_sum_w_cry; + input [WORD_EXT_W-1:0] x_x_sum_w_cry, y_x_sum_w_cry, x_y_sum_w_cry, y_y_sum_w_cry; + { propagate_carries_x_x_w_cry_r, propagate_carries_y_x_w_cry_r, propagate_carries_x_y_w_cry_r, propagate_carries_y_y_w_cry_r} <= + { x_x_sum_w_cry, y_x_sum_w_cry, x_y_sum_w_cry, y_y_sum_w_cry}; + endtask + + task propagate_carries_store_sum_w_cry; _propagate_carries_update_sum_w_cry(propagate_carries_x_x_w_cry, propagate_carries_y_x_w_cry, propagate_carries_x_y_w_cry, propagate_carries_y_y_w_cry); endtask + + always @(posedge clk) // - case (wrk_fsm_state) + if (opcode == UOP_OPCODE_PROPAGATE_CARRIES) + // + case (wrk_fsm_state) + // + WRK_FSM_STATE_LATENCY_PRE3: propagate_carries_clear_cry; + WRK_FSM_STATE_BUSY1, + WRK_FSM_STATE_LATENCY_POST1: propagate_carries_store_cry; + // + WRK_FSM_STATE_LATENCY_PRE4, + WRK_FSM_STATE_BUSY2, + WRK_FSM_STATE_LATENCY_POST2: propagate_carries_store_sum_w_cry; + // + endcase + + + // + // UOP_OPCODE_MODULAR_SUBTRACT_X + // UOP_OPCODE_MODULAR_SUBTRACT_Y + // + reg modular_subtract_x_brw_r; + reg modular_subtract_y_brw_r; + + reg modular_subtract_x_cry_r; + reg modular_subtract_y_cry_r; + + wire [WORD_W:0] modular_subtract_x_w_brw = rd_narrow_x_din_x_dly1[WORD_W:0] - rd_narrow_y_din_x_dly1[WORD_W:0] - {{WORD_W{1'b0}}, modular_subtract_x_brw_r}; + wire [WORD_W:0] modular_subtract_y_w_brw = rd_narrow_x_din_y_dly1[WORD_W:0] - rd_narrow_y_din_y_dly1[WORD_W:0] - {{WORD_W{1'b0}}, modular_subtract_y_brw_r}; + + wire [WORD_W:0] modular_subtract_x_w_cry = rd_narrow_x_din_x_dly1[WORD_W:0] + rd_wide_x_din_x_dly1[WORD_W:0] + {{WORD_W{1'b0}}, modular_subtract_x_cry_r}; + wire [WORD_W:0] modular_subtract_y_w_cry = rd_narrow_x_din_y_dly1[WORD_W:0] + rd_wide_x_din_y_dly1[WORD_W:0] + {{WORD_W{1'b0}}, modular_subtract_y_brw_r}; + + reg [WORD_W:0] modular_subtract_x_w_brw_r; + reg [WORD_W:0] modular_subtract_y_w_brw_r; + + reg [WORD_W:0] modular_subtract_x_w_cry_r; + reg [WORD_W:0] modular_subtract_y_w_cry_r; + + wire modular_subtract_x_w_brw_msb = modular_subtract_x_w_brw_r[WORD_W]; + wire modular_subtract_y_w_brw_msb = modular_subtract_y_w_brw_r[WORD_W]; + + wire modular_subtract_x_w_cry_msb = modular_subtract_x_w_cry_r[WORD_W]; + wire modular_subtract_y_w_cry_msb = modular_subtract_y_w_cry_r[WORD_W]; + + wire [WORD_W -1:0] modular_subtract_x_w_brw_lsb = modular_subtract_x_w_brw_r[WORD_W -1:0]; + wire [WORD_W -1:0] modular_subtract_y_w_brw_lsb = modular_subtract_y_w_brw_r[WORD_W -1:0]; + + wire [WORD_W -1:0] modular_subtract_x_w_cry_lsb = modular_subtract_x_w_cry_r[WORD_W -1:0]; + wire [WORD_W -1:0] modular_subtract_y_w_cry_lsb = modular_subtract_y_w_cry_r[WORD_W -1:0]; + + wire [WORD_EXT_W -1:0] modular_subtract_x_w_brw_reduced = {{CARRY_W{1'b0}}, modular_subtract_x_w_brw_lsb}; + wire [WORD_EXT_W -1:0] modular_subtract_y_w_brw_reduced = {{CARRY_W{1'b0}}, modular_subtract_y_w_brw_lsb}; + + wire [WORD_EXT_W -1:0] modular_subtract_x_w_cry_reduced = {{CARRY_W{1'b0}}, modular_subtract_x_w_cry_lsb}; + wire [WORD_EXT_W -1:0] modular_subtract_y_w_cry_reduced = {{CARRY_W{1'b0}}, modular_subtract_y_w_cry_lsb}; + + reg [WORD_EXT_W -1:0] modular_subtract_x_mux; + reg [WORD_EXT_W -1:0] modular_subtract_y_mux; + + wire [WORD_EXT_W -1:0] modular_subtract_x_mux_reduced = {{CARRY_W{1'b0}}, modular_subtract_x_mux[WORD_W-1:0]}; + wire [WORD_EXT_W -1:0] modular_subtract_y_mux_reduced = {{CARRY_W{1'b0}}, modular_subtract_y_mux[WORD_W-1:0]}; + + task _modular_subtract_update_brw; + input x_brw, y_brw; + {modular_subtract_x_brw_r, modular_subtract_y_brw_r} <= {x_brw, y_brw}; + endtask + + task _modular_subtract_update_cry; + input x_cry, y_cry; + {modular_subtract_x_cry_r, modular_subtract_y_cry_r} <= {x_cry, y_cry}; + endtask + + task modular_subtract_clear_brw; _modular_subtract_update_brw( 1'b0, 1'b0); endtask + task modular_subtract_store_brw; _modular_subtract_update_brw(modular_subtract_x_w_brw_msb, modular_subtract_y_w_brw_msb); endtask + + task modular_subtract_clear_cry; _modular_subtract_update_cry( 1'b0, 1'b0); endtask + task modular_subtract_store_cry; _modular_subtract_update_cry(modular_subtract_x_w_cry_msb, modular_subtract_y_w_cry_msb); endtask + + task _modular_subtract_update_diff_w_brw; + input [WORD_W:0] x_diff_w_brw, y_diff_w_brw; + {modular_subtract_x_w_brw_r, modular_subtract_y_w_brw_r} <= {x_diff_w_brw, y_diff_w_brw}; + endtask + + task _modular_subtract_update_sum_w_cry; + input [WORD_W:0] x_sum_w_cry, y_sum_w_cry; + {modular_subtract_x_w_cry_r, modular_subtract_y_w_cry_r} <= {x_sum_w_cry, y_sum_w_cry}; + endtask + + task modular_subtract_store_diff_w_brw; _modular_subtract_update_diff_w_brw(modular_subtract_x_w_brw, modular_subtract_y_w_brw); endtask + + task modular_subtract_store_sum_w_cry; _modular_subtract_update_sum_w_cry(modular_subtract_x_w_cry, modular_subtract_y_w_cry); endtask + + always @(posedge clk) + // + case (opcode) // - WRK_FSM_STATE_BUSY_TP, - WRK_FSM_STATE_LATENCY_POST1_TP, - WRK_FSM_STATE_LATENCY_POST2_TP, - WRK_FSM_STATE_LATENCY_POST3_TP, - WRK_FSM_STATE_LATENCY_POST4_TP: + UOP_OPCODE_MODULAR_SUBTRACT_X: // - case (opcode) + case (wrk_fsm_state) // - UOP_OPCODE_MODULAR_SUBTRACT: - // - if (!wrk_fsm_two_pass_pass) begin - update_wr_narrow_bank_addr(sel_narrow_out, sel_narrow_out, rd_narrow_xy_addr_x_dly4, rd_narrow_xy_addr_y_dly4); - end else begin - update_wr_wide_bank_addr (sel_wide_out, sel_wide_out, rd_narrow_xy_addr_x_dly4, rd_narrow_xy_addr_y_dly4); - update_wr_narrow_bank_addr(sel_narrow_out, sel_narrow_out, rd_narrow_xy_addr_x_dly4, rd_narrow_xy_addr_y_dly4); - end + WRK_FSM_STATE_LATENCY_PRE3: modular_subtract_clear_brw; + WRK_FSM_STATE_BUSY1, + WRK_FSM_STATE_LATENCY_POST1, + WRK_FSM_STATE_LATENCY_POST3: modular_subtract_store_brw; // we need the very last borrow here too! + // + WRK_FSM_STATE_LATENCY_PRE4, + WRK_FSM_STATE_BUSY2, + WRK_FSM_STATE_LATENCY_POST2: modular_subtract_store_diff_w_brw; + // + endcase + // + UOP_OPCODE_MODULAR_SUBTRACT_Y: + // + case (wrk_fsm_state) + // + WRK_FSM_STATE_LATENCY_PRE3: modular_subtract_clear_cry; + WRK_FSM_STATE_BUSY1, + WRK_FSM_STATE_LATENCY_POST1: modular_subtract_store_cry; + // + WRK_FSM_STATE_LATENCY_PRE4, + WRK_FSM_STATE_BUSY2, + WRK_FSM_STATE_LATENCY_POST2: modular_subtract_store_sum_w_cry; + // + endcase + // + UOP_OPCODE_MODULAR_SUBTRACT_Z: + // + case (wrk_fsm_state) + // + WRK_FSM_STATE_LATENCY_PRE4, + WRK_FSM_STATE_BUSY2, + WRK_FSM_STATE_LATENCY_POST2: // - endcase + begin modular_subtract_x_mux <= !modular_subtract_x_brw_r ? rd_narrow_x_din_x_dly1 : rd_wide_x_din_x_dly1; + modular_subtract_y_mux <= !modular_subtract_y_brw_r ? rd_narrow_x_din_y_dly1 : rd_wide_x_din_y_dly1; end + // + endcase + // + endcase + + + // + // UOP_OPCODE_REGULAR_ADD_UNEVEN + // + reg [CARRY_W -1:0] regular_add_uneven_x_x_cry_r; + reg [CARRY_W -1:0] regular_add_uneven_y_x_cry_r; + reg [CARRY_W -1:0] regular_add_uneven_x_y_cry_r; + reg [CARRY_W -1:0] regular_add_uneven_y_y_cry_r; + + wire [WORD_EXT_W -1:0] regular_add_uneven_x_x_msb_w_cry = rd_narrow_x_din_x_dly1 + {{WORD_W{1'b0}}, regular_add_uneven_x_x_cry_r}; + wire [WORD_EXT_W -1:0] regular_add_uneven_y_x_msb_w_cry = rd_narrow_y_din_x_dly1 + {{WORD_W{1'b0}}, regular_add_uneven_y_x_cry_r}; + wire [WORD_EXT_W -1:0] regular_add_uneven_x_y_msb_w_cry = rd_narrow_x_din_y_dly1 + {{WORD_W{1'b0}}, regular_add_uneven_x_y_cry_r}; + wire [WORD_EXT_W -1:0] regular_add_uneven_y_y_msb_w_cry = rd_narrow_y_din_y_dly1 + {{WORD_W{1'b0}}, regular_add_uneven_y_y_cry_r}; + + wire [WORD_EXT_W -1:0] regular_add_uneven_x_x_lsb_w_cry = regular_add_uneven_x_x_msb_w_cry + rd_wide_x_din_x_dly1; + wire [WORD_EXT_W -1:0] regular_add_uneven_y_x_lsb_w_cry = regular_add_uneven_y_x_msb_w_cry + rd_wide_y_din_x_dly1; + wire [WORD_EXT_W -1:0] regular_add_uneven_x_y_lsb_w_cry = regular_add_uneven_x_y_msb_w_cry + rd_wide_x_din_y_dly1; + wire [WORD_EXT_W -1:0] regular_add_uneven_y_y_lsb_w_cry = regular_add_uneven_y_y_msb_w_cry + rd_wide_y_din_y_dly1; + + reg [WORD_EXT_W -1:0] regular_add_uneven_x_x_w_cry_r; + reg [WORD_EXT_W -1:0] regular_add_uneven_y_x_w_cry_r; + reg [WORD_EXT_W -1:0] regular_add_uneven_x_y_w_cry_r; + reg [WORD_EXT_W -1:0] regular_add_uneven_y_y_w_cry_r; + + wire [CARRY_W -1:0] regular_add_uneven_x_x_w_cry_msb = regular_add_uneven_x_x_w_cry_r[WORD_EXT_W -1:WORD_W]; + wire [CARRY_W -1:0] regular_add_uneven_y_x_w_cry_msb = regular_add_uneven_y_x_w_cry_r[WORD_EXT_W -1:WORD_W]; + wire [CARRY_W -1:0] regular_add_uneven_x_y_w_cry_msb = regular_add_uneven_x_y_w_cry_r[WORD_EXT_W -1:WORD_W]; + wire [CARRY_W -1:0] regular_add_uneven_y_y_w_cry_msb = regular_add_uneven_y_y_w_cry_r[WORD_EXT_W -1:WORD_W]; + + wire [WORD_W -1:0] regular_add_uneven_x_x_w_cry_lsb = regular_add_uneven_x_x_w_cry_r[WORD_W -1:0]; + wire [WORD_W -1:0] regular_add_uneven_y_x_w_cry_lsb = regular_add_uneven_y_x_w_cry_r[WORD_W -1:0]; + wire [WORD_W -1:0] regular_add_uneven_x_y_w_cry_lsb = regular_add_uneven_x_y_w_cry_r[WORD_W -1:0]; + wire [WORD_W -1:0] regular_add_uneven_y_y_w_cry_lsb = regular_add_uneven_y_y_w_cry_r[WORD_W -1:0]; + + wire [WORD_EXT_W -1:0] regular_add_uneven_x_x_w_cry_reduced = {{CARRY_W{1'b0}}, regular_add_uneven_x_x_w_cry_lsb}; + wire [WORD_EXT_W -1:0] regular_add_uneven_y_x_w_cry_reduced = {{CARRY_W{1'b0}}, regular_add_uneven_y_x_w_cry_lsb}; + wire [WORD_EXT_W -1:0] regular_add_uneven_x_y_w_cry_reduced = {{CARRY_W{1'b0}}, regular_add_uneven_x_y_w_cry_lsb}; + wire [WORD_EXT_W -1:0] regular_add_uneven_y_y_w_cry_reduced = {{CARRY_W{1'b0}}, regular_add_uneven_y_y_w_cry_lsb}; + + reg regular_add_uneven_store_lsb_now; + + task _regular_add_uneven_update_cry; + input [CARRY_W-1:0] x_x_cry, y_x_cry, x_y_cry, y_y_cry; + { regular_add_uneven_x_x_cry_r, regular_add_uneven_y_x_cry_r, regular_add_uneven_x_y_cry_r, regular_add_uneven_y_y_cry_r} <= + { x_x_cry, y_x_cry, x_y_cry, y_y_cry}; + endtask + + task regular_add_uneven_clear_cry; _regular_add_uneven_update_cry( CARRY_ZERO, CARRY_ZERO, CARRY_ZERO, CARRY_ZERO); endtask + task regular_add_uneven_store_cry; _regular_add_uneven_update_cry(regular_add_uneven_x_x_w_cry_msb, regular_add_uneven_y_x_w_cry_msb, regular_add_uneven_x_y_w_cry_msb, regular_add_uneven_y_y_w_cry_msb); endtask + + task _regular_add_uneven_update_sum_w_cry; + input [WORD_EXT_W-1:0] x_x_sum_w_cry, y_x_sum_w_cry, x_y_sum_w_cry, y_y_sum_w_cry; + { regular_add_uneven_x_x_w_cry_r, regular_add_uneven_y_x_w_cry_r, regular_add_uneven_x_y_w_cry_r, regular_add_uneven_y_y_w_cry_r} <= + { x_x_sum_w_cry, y_x_sum_w_cry, x_y_sum_w_cry, y_y_sum_w_cry}; + endtask + + task regular_add_uneven_store_sum_lsb_w_cry; _regular_add_uneven_update_sum_w_cry(regular_add_uneven_x_x_lsb_w_cry, regular_add_uneven_y_x_lsb_w_cry, regular_add_uneven_x_y_lsb_w_cry, regular_add_uneven_y_y_lsb_w_cry); endtask + + task regular_add_uneven_store_sum_msb_w_cry; _regular_add_uneven_update_sum_w_cry(regular_add_uneven_x_x_msb_w_cry, regular_add_uneven_y_x_msb_w_cry, regular_add_uneven_x_y_msb_w_cry, regular_add_uneven_y_y_msb_w_cry); endtask + + always @(posedge clk) + // + case (wrk_fsm_state) + // + WRK_FSM_STATE_LATENCY_PRE3: regular_add_uneven_store_lsb_now <= 1'b1; + WRK_FSM_STATE_BUSY1: if (rd_wide_addr_is_last_half_dly[3]) regular_add_uneven_store_lsb_now <= 1'b0; // endcase + + always @(posedge clk) // - end + case (wrk_fsm_state) + // + WRK_FSM_STATE_LATENCY_PRE3: regular_add_uneven_clear_cry; + WRK_FSM_STATE_BUSY1, + WRK_FSM_STATE_LATENCY_POST1: regular_add_uneven_store_cry; + // + WRK_FSM_STATE_LATENCY_PRE4: regular_add_uneven_store_sum_lsb_w_cry; + WRK_FSM_STATE_BUSY2: if (regular_add_uneven_store_lsb_now) regular_add_uneven_store_sum_lsb_w_cry; + else regular_add_uneven_store_sum_msb_w_cry; + WRK_FSM_STATE_LATENCY_POST2: regular_add_uneven_store_sum_msb_w_cry; + // + endcase // // FSM Process // - always @(posedge clk or negedge rst_n) // if (!rst_n) wrk_fsm_state <= WRK_FSM_STATE_IDLE; - else case (opcode) - UOP_OPCODE_PROPAGATE_CARRIES, - UOP_OPCODE_OUTPUT_FROM_NARROW, - UOP_OPCODE_COPY_CRT_Y2X, - UOP_OPCODE_MODULAR_REDUCE_INIT, - UOP_OPCODE_MERGE_LH: wrk_fsm_state <= wrk_fsm_state_next_one_pass; - UOP_OPCODE_COPY_LADDERS_X2Y, - UOP_OPCODE_CROSS_LADDERS_X2Y, - UOP_OPCODE_REGULAR_ADD_UNEVEN: wrk_fsm_state <= wrk_fsm_state_next_one_pass_meander; - UOP_OPCODE_MODULAR_SUBTRACT: wrk_fsm_state <= wrk_fsm_state_next_two_pass; - default: wrk_fsm_state <= WRK_FSM_STATE_IDLE; - endcase - - + else wrk_fsm_state <= wrk_fsm_state_next; + + // // Busy Exit Logic - // - - reg wrk_fsm_done_one_pass = 1'b0; - reg wrk_fsm_done_one_pass_meander = 1'b0; - reg wrk_fsm_done_two_pass = 1'b0; + // + reg wrk_fsm_done = 1'b0; always @(posedge clk) begin // - wrk_fsm_done_one_pass <= 1'b0; - wrk_fsm_done_one_pass_meander <= 1'b0; - wrk_fsm_done_two_pass <= 1'b0; + wrk_fsm_done <= 1'b0; // case (opcode) // @@ -1076,47 +1073,22 @@ module modexpng_general_worker UOP_OPCODE_OUTPUT_FROM_NARROW, UOP_OPCODE_COPY_CRT_Y2X, UOP_OPCODE_MODULAR_REDUCE_INIT, - UOP_OPCODE_MERGE_LH: - // - case (wrk_fsm_state) - WRK_FSM_STATE_BUSY: - if (rd_narrow_xy_addr_xy_next_is_last) wrk_fsm_done_one_pass <= 1'b1; - endcase - // UOP_OPCODE_COPY_LADDERS_X2Y, UOP_OPCODE_CROSS_LADDERS_X2Y, + UOP_OPCODE_MODULAR_SUBTRACT_X, + UOP_OPCODE_MODULAR_SUBTRACT_Y, + UOP_OPCODE_MODULAR_SUBTRACT_Z, + UOP_OPCODE_MERGE_LH, UOP_OPCODE_REGULAR_ADD_UNEVEN: // case (wrk_fsm_state) - WRK_FSM_STATE_BUSY_M2: - if (rd_narrow_xy_addr_xy_next_is_last) wrk_fsm_done_one_pass_meander <= 1'b1; - WRK_FSM_STATE_BUSY_M1: - wrk_fsm_done_one_pass_meander <= wrk_fsm_done_one_pass_meander; + WRK_FSM_STATE_BUSY1: + if (rd_narrow_addr_is_last) wrk_fsm_done <= 1'b1; endcase - // - UOP_OPCODE_MODULAR_SUBTRACT: - // - case (wrk_fsm_state) - WRK_FSM_STATE_BUSY_TP: - if (rd_narrow_xy_addr_xy_next_is_last) wrk_fsm_done_two_pass <= 1'b1; - endcase - // // endcase // end - - - // - // FSM Helper Logic - // - always @(posedge clk) - // - case (wrk_fsm_state) - WRK_FSM_STATE_IDLE: if (ena) {wrk_fsm_two_pass_pass, wrk_fsm_two_pass_pass_dly} <= {1'b0, 1'b0}; - WRK_FSM_STATE_LATENCY_POST4_TP: wrk_fsm_two_pass_pass <= 1'b1; - WRK_FSM_STATE_HOLDOFF_TP: wrk_fsm_two_pass_pass_dly <= 1'b1; - endcase // @@ -1125,64 +1097,26 @@ module modexpng_general_worker always @* begin // case (wrk_fsm_state) - WRK_FSM_STATE_IDLE: wrk_fsm_state_next_one_pass = ena ? WRK_FSM_STATE_LATENCY_PRE1 : WRK_FSM_STATE_IDLE ; - WRK_FSM_STATE_LATENCY_PRE1: wrk_fsm_state_next_one_pass = WRK_FSM_STATE_LATENCY_PRE2 ; - WRK_FSM_STATE_LATENCY_PRE2: wrk_fsm_state_next_one_pass = WRK_FSM_STATE_BUSY ; - WRK_FSM_STATE_BUSY: wrk_fsm_state_next_one_pass = wrk_fsm_done_one_pass ? WRK_FSM_STATE_LATENCY_POST1 : WRK_FSM_STATE_BUSY ; - WRK_FSM_STATE_LATENCY_POST1: wrk_fsm_state_next_one_pass = WRK_FSM_STATE_LATENCY_POST2 ; - WRK_FSM_STATE_LATENCY_POST2: wrk_fsm_state_next_one_pass = WRK_FSM_STATE_STOP ; - WRK_FSM_STATE_STOP: wrk_fsm_state_next_one_pass = WRK_FSM_STATE_IDLE ; - default: wrk_fsm_state_next_one_pass = WRK_FSM_STATE_IDLE ; - endcase - // - end - - always @* begin - // - case (wrk_fsm_state) - WRK_FSM_STATE_IDLE: wrk_fsm_state_next_one_pass_meander = ena ? WRK_FSM_STATE_LATENCY_PRE1_M1 : WRK_FSM_STATE_IDLE ; - // - WRK_FSM_STATE_LATENCY_PRE1_M1: wrk_fsm_state_next_one_pass_meander = WRK_FSM_STATE_LATENCY_PRE1_M2 ; - WRK_FSM_STATE_LATENCY_PRE1_M2: wrk_fsm_state_next_one_pass_meander = WRK_FSM_STATE_LATENCY_PRE2_M1 ; - WRK_FSM_STATE_LATENCY_PRE2_M1: wrk_fsm_state_next_one_pass_meander = WRK_FSM_STATE_LATENCY_PRE2_M2 ; - WRK_FSM_STATE_LATENCY_PRE2_M2: wrk_fsm_state_next_one_pass_meander = WRK_FSM_STATE_BUSY_M1 ; - WRK_FSM_STATE_BUSY_M1: wrk_fsm_state_next_one_pass_meander = WRK_FSM_STATE_BUSY_M2 ; - WRK_FSM_STATE_BUSY_M2: wrk_fsm_state_next_one_pass_meander = wrk_fsm_done_one_pass_meander ? WRK_FSM_STATE_LATENCY_POST1_M1 : WRK_FSM_STATE_BUSY_M1 ; - WRK_FSM_STATE_LATENCY_POST1_M1: wrk_fsm_state_next_one_pass_meander = WRK_FSM_STATE_LATENCY_POST1_M2 ; - WRK_FSM_STATE_LATENCY_POST1_M2: wrk_fsm_state_next_one_pass_meander = WRK_FSM_STATE_LATENCY_POST2_M1 ; - WRK_FSM_STATE_LATENCY_POST2_M1: wrk_fsm_state_next_one_pass_meander = WRK_FSM_STATE_LATENCY_POST2_M2 ; - WRK_FSM_STATE_LATENCY_POST2_M2: wrk_fsm_state_next_one_pass_meander = WRK_FSM_STATE_STOP ; - // - WRK_FSM_STATE_STOP: wrk_fsm_state_next_one_pass_meander = WRK_FSM_STATE_IDLE ; - // - default: wrk_fsm_state_next_one_pass_meander = WRK_FSM_STATE_IDLE ; - endcase - // - end - - always @* begin - // - case (wrk_fsm_state) - WRK_FSM_STATE_IDLE: wrk_fsm_state_next_two_pass = ena ? WRK_FSM_STATE_LATENCY_PRE1_TP : WRK_FSM_STATE_IDLE; - WRK_FSM_STATE_LATENCY_PRE1_TP: wrk_fsm_state_next_two_pass = WRK_FSM_STATE_LATENCY_PRE2_TP ; - WRK_FSM_STATE_LATENCY_PRE2_TP: wrk_fsm_state_next_two_pass = WRK_FSM_STATE_LATENCY_PRE3_TP ; - WRK_FSM_STATE_LATENCY_PRE3_TP: wrk_fsm_state_next_two_pass = WRK_FSM_STATE_LATENCY_PRE4_TP ; - WRK_FSM_STATE_LATENCY_PRE4_TP: wrk_fsm_state_next_two_pass = WRK_FSM_STATE_BUSY_TP ; - WRK_FSM_STATE_BUSY_TP: wrk_fsm_state_next_two_pass = wrk_fsm_done_two_pass ? WRK_FSM_STATE_LATENCY_POST1_TP : WRK_FSM_STATE_BUSY_TP; - WRK_FSM_STATE_LATENCY_POST1_TP: wrk_fsm_state_next_two_pass = WRK_FSM_STATE_LATENCY_POST2_TP ; - WRK_FSM_STATE_LATENCY_POST2_TP: wrk_fsm_state_next_two_pass = WRK_FSM_STATE_LATENCY_POST3_TP ; - WRK_FSM_STATE_LATENCY_POST3_TP: wrk_fsm_state_next_two_pass = WRK_FSM_STATE_LATENCY_POST4_TP ; - WRK_FSM_STATE_LATENCY_POST4_TP: wrk_fsm_state_next_two_pass = WRK_FSM_STATE_HOLDOFF_TP ; - WRK_FSM_STATE_HOLDOFF_TP: wrk_fsm_state_next_two_pass = wrk_fsm_two_pass_pass_dly ? WRK_FSM_STATE_STOP : WRK_FSM_STATE_LATENCY_PRE1_TP; - WRK_FSM_STATE_STOP: wrk_fsm_state_next_two_pass = WRK_FSM_STATE_IDLE ; - default: wrk_fsm_state_next_two_pass = WRK_FSM_STATE_IDLE ; + WRK_FSM_STATE_IDLE: wrk_fsm_state_next = ena ? WRK_FSM_STATE_LATENCY_PRE1 : WRK_FSM_STATE_IDLE ; + WRK_FSM_STATE_LATENCY_PRE1: wrk_fsm_state_next = WRK_FSM_STATE_LATENCY_PRE2 ; + WRK_FSM_STATE_LATENCY_PRE2: wrk_fsm_state_next = WRK_FSM_STATE_LATENCY_PRE3 ; + WRK_FSM_STATE_LATENCY_PRE3: wrk_fsm_state_next = WRK_FSM_STATE_LATENCY_PRE4 ; + WRK_FSM_STATE_LATENCY_PRE4: wrk_fsm_state_next = WRK_FSM_STATE_BUSY1 ; + WRK_FSM_STATE_BUSY1: wrk_fsm_state_next = WRK_FSM_STATE_BUSY2 ; + WRK_FSM_STATE_BUSY2: wrk_fsm_state_next = wrk_fsm_done ? WRK_FSM_STATE_LATENCY_POST1 : WRK_FSM_STATE_BUSY1 ; + WRK_FSM_STATE_LATENCY_POST1: wrk_fsm_state_next = WRK_FSM_STATE_LATENCY_POST2 ; + WRK_FSM_STATE_LATENCY_POST2: wrk_fsm_state_next = WRK_FSM_STATE_LATENCY_POST3 ; + WRK_FSM_STATE_LATENCY_POST3: wrk_fsm_state_next = WRK_FSM_STATE_LATENCY_POST4 ; + WRK_FSM_STATE_LATENCY_POST4: wrk_fsm_state_next = WRK_FSM_STATE_STOP ; + WRK_FSM_STATE_STOP: wrk_fsm_state_next = WRK_FSM_STATE_IDLE ; + default: wrk_fsm_state_next = WRK_FSM_STATE_IDLE ; endcase // end - - + + // - // Ready Logic + // Ready Flag Logic // reg rdy_reg = 1'b1; @@ -1198,321 +1132,167 @@ module modexpng_general_worker // - // UOP_OPCODE_PROPAGATE_CARRIES + // Source to Destination Data Logic // - reg [CARRY_W -1:0] rd_narrow_x_din_x_cry_r; - reg [CARRY_W -1:0] rd_narrow_y_din_x_cry_r; - reg [CARRY_W -1:0] rd_narrow_x_din_y_cry_r; - reg [CARRY_W -1:0] rd_narrow_y_din_y_cry_r; - - wire [WORD_EXT_W -1:0] rd_narrow_x_din_x_w_cry = wrk_rd_narrow_x_din_x + {{WORD_W{1'b0}}, rd_narrow_x_din_x_cry_r}; - wire [WORD_EXT_W -1:0] rd_narrow_y_din_x_w_cry = wrk_rd_narrow_y_din_x + {{WORD_W{1'b0}}, rd_narrow_y_din_x_cry_r}; - wire [WORD_EXT_W -1:0] rd_narrow_x_din_y_w_cry = wrk_rd_narrow_x_din_y + {{WORD_W{1'b0}}, rd_narrow_x_din_y_cry_r}; - wire [WORD_EXT_W -1:0] rd_narrow_y_din_y_w_cry = wrk_rd_narrow_y_din_y + {{WORD_W{1'b0}}, rd_narrow_y_din_y_cry_r}; - - wire [CARRY_W -1:0] rd_narrow_x_din_x_w_cry_msb = rd_narrow_x_din_x_w_cry[WORD_EXT_W -1:WORD_W]; - wire [CARRY_W -1:0] rd_narrow_y_din_x_w_cry_msb = rd_narrow_y_din_x_w_cry[WORD_EXT_W -1:WORD_W]; - wire [CARRY_W -1:0] rd_narrow_x_din_y_w_cry_msb = rd_narrow_x_din_y_w_cry[WORD_EXT_W -1:WORD_W]; - wire [CARRY_W -1:0] rd_narrow_y_din_y_w_cry_msb = rd_narrow_y_din_y_w_cry[WORD_EXT_W -1:WORD_W]; - - wire [WORD_EXT_W -1:0] rd_narrow_x_din_x_w_cry_reduced = {{CARRY_W{1'b0}}, rd_narrow_x_din_x_w_cry[WORD_W -1:0]}; - wire [WORD_EXT_W -1:0] rd_narrow_y_din_x_w_cry_reduced = {{CARRY_W{1'b0}}, rd_narrow_y_din_x_w_cry[WORD_W -1:0]}; - wire [WORD_EXT_W -1:0] rd_narrow_x_din_y_w_cry_reduced = {{CARRY_W{1'b0}}, rd_narrow_x_din_y_w_cry[WORD_W -1:0]}; - wire [WORD_EXT_W -1:0] rd_narrow_y_din_y_w_cry_reduced = {{CARRY_W{1'b0}}, rd_narrow_y_din_y_w_cry[WORD_W -1:0]}; - + reg [WORD_EXT_W -1:0] rd_wide_x_din_x_dly2; + reg [WORD_EXT_W -1:0] rd_wide_y_din_x_dly2; + reg [WORD_EXT_W -1:0] rd_wide_x_din_y_dly2; + reg [WORD_EXT_W -1:0] rd_wide_y_din_y_dly2; + reg [WORD_EXT_W -1:0] rd_narrow_x_din_x_dly2; + reg [WORD_EXT_W -1:0] rd_narrow_y_din_x_dly2; + reg [WORD_EXT_W -1:0] rd_narrow_x_din_y_dly2; + reg [WORD_EXT_W -1:0] rd_narrow_y_din_y_dly2; + + always @(posedge clk) begin + {rd_wide_x_din_x_dly2, rd_wide_y_din_x_dly2, rd_wide_x_din_y_dly2, rd_wide_y_din_y_dly2 } <= {rd_wide_x_din_x_dly1, rd_wide_y_din_x_dly1, rd_wide_x_din_y_dly1, rd_wide_y_din_y_dly1 }; + {rd_narrow_x_din_x_dly2, rd_narrow_y_din_x_dly2, rd_narrow_x_din_y_dly2, rd_narrow_y_din_y_dly2} <= {rd_narrow_x_din_x_dly1, rd_narrow_y_din_x_dly1, rd_narrow_x_din_y_dly1, rd_narrow_y_din_y_dly1}; + end + task update_wide_dout; input [WORD_EXT_W-1:0] x_x, y_x, x_y, y_y; {wr_wide_x_dout_x, wr_wide_y_dout_x, wr_wide_x_dout_y, wr_wide_y_dout_y} <= - { x_x, y_x, x_y, y_y }; + { x_x, y_x, x_y, y_y}; endtask task update_narrow_dout; input [WORD_EXT_W-1:0] x_x, y_x, x_y, y_y; {wr_narrow_x_dout_x, wr_narrow_y_dout_x, wr_narrow_x_dout_y, wr_narrow_y_dout_y} <= - { x_x, y_x, x_y, y_y }; - endtask - - task update_narrow_carries; - input [CARRY_W-1:0] x_x_cry, y_x_cry, x_y_cry, y_y_cry; - {rd_narrow_x_din_x_cry_r, rd_narrow_y_din_x_cry_r, rd_narrow_x_din_y_cry_r, rd_narrow_y_din_y_cry_r} <= - { x_x_cry, y_x_cry, x_y_cry, y_y_cry }; + { x_x, y_x, x_y, y_y}; endtask - - always @(posedge clk) - // - if (opcode == UOP_OPCODE_PROPAGATE_CARRIES) - // - case (wrk_fsm_state) - // - WRK_FSM_STATE_LATENCY_PRE2: - // - update_narrow_carries(CARRY_ZERO, CARRY_ZERO, CARRY_ZERO, CARRY_ZERO); - // - WRK_FSM_STATE_BUSY, - WRK_FSM_STATE_LATENCY_POST1: - // - update_narrow_carries(rd_narrow_x_din_x_w_cry_msb, - rd_narrow_y_din_x_w_cry_msb, - rd_narrow_x_din_y_w_cry_msb, - rd_narrow_y_din_y_w_cry_msb); - // - endcase - - - // - // UOP_OPCODE_MODULAR_SUBTRACT - // - - reg [WORD_W:0] modsub_x_ab; - reg [WORD_W:0] modsub_y_ab; - reg [WORD_W:0] modsub_x_ab_dly; - reg [WORD_W:0] modsub_y_ab_dly; - - reg [WORD_W:0] modsub_x_abn; - reg [WORD_W:0] modsub_y_abn; - - reg modsub_x_ab_mask_now; - reg modsub_y_ab_mask_now; - - reg modsub_x_abn_mask_now; - reg modsub_y_abn_mask_now; - - reg modsub_x_borrow_r; - reg modsub_y_borrow_r; - - wire modsub_x_ab_masked = modsub_x_ab_mask_now ? 1'b0 : modsub_x_ab[WORD_W]; - wire modsub_y_ab_masked = modsub_y_ab_mask_now ? 1'b0 : modsub_y_ab[WORD_W]; - - wire modsub_x_abn_masked = modsub_x_abn_mask_now ? 1'b0 : modsub_x_abn[WORD_W]; - wire modsub_y_abn_masked = modsub_y_abn_mask_now ? 1'b0 : modsub_y_abn[WORD_W]; - - wire [WORD_W:0] modsub_x_narrow_x_lsb_pad = {1'b0, wrk_rd_narrow_x_din_x[WORD_W-1:0]}; - wire [WORD_W:0] modsub_y_narrow_x_lsb_pad = {1'b0, wrk_rd_narrow_y_din_x[WORD_W-1:0]}; - wire [WORD_W:0] modsub_x_narrow_y_lsb_pad = {1'b0, wrk_rd_narrow_x_din_y[WORD_W-1:0]}; - wire [WORD_W:0] modsub_y_narrow_y_lsb_pad = {1'b0, wrk_rd_narrow_y_din_y[WORD_W-1:0]}; - - wire [WORD_W:0] modsub_x_wide_x_lsb_pad = {1'b0, wrk_rd_wide_x_din_x_dly1[WORD_W-1:0]}; - wire [WORD_W:0] modsub_x_wide_y_lsb_pad = {1'b0, wrk_rd_wide_x_din_y_dly1[WORD_W-1:0]}; - - wire [WORD_EXT_W -1:0] modsub_x_ab_dly_trunc = {{CARRY_W{1'b0}}, modsub_x_ab_dly[WORD_W-1:0]}; - wire [WORD_EXT_W -1:0] modsub_y_ab_dly_trunc = {{CARRY_W{1'b0}}, modsub_y_ab_dly[WORD_W-1:0]}; - - wire [WORD_EXT_W -1:0] modsub_x_abn_trunc = {{CARRY_W{1'b0}}, modsub_x_abn[WORD_W-1:0]}; - wire [WORD_EXT_W -1:0] modsub_y_abn_trunc = {{CARRY_W{1'b0}}, modsub_y_abn[WORD_W-1:0]}; - - wire [WORD_EXT_W -1:0] modsub_x_mux = !modsub_x_borrow_r ? wrk_rd_narrow_x_din_x_dly2 : wrk_rd_narrow_y_din_x_dly2; - wire [WORD_EXT_W -1:0] modsub_y_mux = !modsub_y_borrow_r ? wrk_rd_narrow_x_din_y_dly2 : wrk_rd_narrow_y_din_y_dly2; - - wire [WORD_W:0] modsub_x_ab_lsb_pad = {1'b0, modsub_x_ab[WORD_W-1:0]}; - wire [WORD_W:0] modsub_y_ab_lsb_pad = {1'b0, modsub_y_ab[WORD_W-1:0]}; - - task update_modsub_ab; - begin - modsub_x_ab <= modsub_x_narrow_x_lsb_pad - modsub_y_narrow_x_lsb_pad - modsub_x_ab_masked; - modsub_y_ab <= modsub_x_narrow_y_lsb_pad - modsub_y_narrow_y_lsb_pad - modsub_y_ab_masked; - end - endtask - - task update_modsub_abn; - begin - modsub_x_abn <= modsub_x_ab_lsb_pad + modsub_x_wide_x_lsb_pad + modsub_x_abn_masked; - modsub_y_abn <= modsub_y_ab_lsb_pad + modsub_x_wide_y_lsb_pad + modsub_y_abn_masked; - end - endtask - - always @(posedge clk) - // - if (opcode == UOP_OPCODE_MODULAR_SUBTRACT) - // - case (wrk_fsm_state) - WRK_FSM_STATE_LATENCY_POST4_TP: - if (!wrk_fsm_two_pass_pass) - {modsub_x_borrow_r, modsub_y_borrow_r} <= {modsub_x_ab_dly[WORD_W], modsub_y_ab_dly[WORD_W]}; - endcase - - always @(posedge clk) begin - modsub_x_ab_dly <= modsub_x_ab; - modsub_y_ab_dly <= modsub_y_ab; - end - always @(posedge clk) begin // - modsub_x_ab <= {1'bX, WORD_DNC}; - modsub_y_ab <= {1'bX, WORD_DNC}; - // - modsub_x_abn <= {1'bX, WORD_DNC}; - modsub_y_abn <= {1'bX, WORD_DNC}; + update_wide_dout (WORD_EXT_DNC, WORD_EXT_DNC, WORD_EXT_DNC, WORD_EXT_DNC); + update_narrow_dout(WORD_EXT_DNC, WORD_EXT_DNC, WORD_EXT_DNC, WORD_EXT_DNC); // - if (opcode == UOP_OPCODE_MODULAR_SUBTRACT) + case (opcode) // - case (wrk_fsm_state) - // - WRK_FSM_STATE_LATENCY_PRE3_TP: - update_modsub_ab; - - WRK_FSM_STATE_LATENCY_PRE4_TP, - WRK_FSM_STATE_BUSY_TP, - WRK_FSM_STATE_LATENCY_POST1_TP, - WRK_FSM_STATE_LATENCY_POST2_TP: begin - update_modsub_ab; - update_modsub_abn; - end + UOP_OPCODE_PROPAGATE_CARRIES: // - WRK_FSM_STATE_LATENCY_POST3_TP: + case (wrk_fsm_state) // - update_modsub_abn; - // - endcase - // - end - - always @(posedge clk) begin - // - modsub_x_ab_mask_now <= 1'b0; - modsub_y_ab_mask_now <= 1'b0; - // - modsub_x_abn_mask_now <= 1'b0; - modsub_y_abn_mask_now <= 1'b0; - // - if (opcode == UOP_OPCODE_MODULAR_SUBTRACT) + WRK_FSM_STATE_BUSY1, + WRK_FSM_STATE_LATENCY_POST1, + WRK_FSM_STATE_LATENCY_POST3: + // + update_narrow_dout(propagate_carries_x_x_w_cry_reduced, propagate_carries_y_x_w_cry_reduced, propagate_carries_x_y_w_cry_reduced, propagate_carries_y_y_w_cry_reduced); + // + endcase // - case (wrk_fsm_state) - // - WRK_FSM_STATE_LATENCY_PRE2_TP: begin - modsub_x_ab_mask_now <= 1'b1; - modsub_y_ab_mask_now <= 1'b1; - end + UOP_OPCODE_COPY_CRT_Y2X: // - WRK_FSM_STATE_LATENCY_PRE3_TP: begin - modsub_x_abn_mask_now <= 1'b1; - modsub_y_abn_mask_now <= 1'b1; - end + case (wrk_fsm_state) + // + WRK_FSM_STATE_BUSY1, + WRK_FSM_STATE_LATENCY_POST1, + WRK_FSM_STATE_LATENCY_POST3: + // + begin update_narrow_dout(rd_narrow_x_din_y_dly2, rd_narrow_y_din_y_dly2, rd_narrow_x_din_y_dly2, rd_narrow_y_din_y_dly2); + update_wide_dout (rd_wide_x_din_y_dly2, rd_wide_y_din_y_dly2, rd_wide_x_din_y_dly2, rd_wide_y_din_y_dly2); end + // + endcase + // + UOP_OPCODE_MODULAR_REDUCE_INIT: // - endcase - // - end - - - // - // UOP_OPCODE_ADD_UNEVEN - // - reg [WORD_W:0] regadd_x_x; - reg [WORD_W:0] regadd_y_x; - reg [WORD_W:0] regadd_x_y; - reg [WORD_W:0] regadd_y_y; - - reg regadd_x_x_cry; - reg regadd_y_x_cry; - reg regadd_x_y_cry; - reg regadd_y_y_cry; - - wire [WORD_EXT_W-1:0] regadd_x_x_trunc = {{CARRY_W{1'b0}}, regadd_x_x[WORD_W-1:0]}; - wire [WORD_EXT_W-1:0] regadd_y_x_trunc = {{CARRY_W{1'b0}}, regadd_y_x[WORD_W-1:0]}; - wire [WORD_EXT_W-1:0] regadd_x_y_trunc = {{CARRY_W{1'b0}}, regadd_x_y[WORD_W-1:0]}; - wire [WORD_EXT_W-1:0] regadd_y_y_trunc = {{CARRY_W{1'b0}}, regadd_y_y[WORD_W-1:0]}; - - //wire regadd_x_x_masked = regadd_xy_ab_x_mask_now ? 1'b0 : regadd_x_x[WORD_W]; - //wire regadd_y_x_masked = regadd_xy_ab_x_mask_now ? 1'b0 : regadd_y_x[WORD_W]; - //wire regadd_x_y_masked = regadd_xy_ab_y_mask_now ? 1'b0 : regadd_x_y[WORD_W]; - //wire regadd_y_y_masked = regadd_xy_ab_y_mask_now ? 1'b0 : regadd_y_y[WORD_W]; - /**/ - reg [WORD_W:0] regadd_x_x_a_lsb_pad; //= {1'b0, wrk_rd_narrow_x_din_x_dly2[WORD_W-1:0]}; - reg [WORD_W:0] regadd_x_x_b_lsb_pad; //= {1'b0, wrk_rd_narrow_x_din_x_dly1[WORD_W-1:0]}; - reg [WORD_W:0] regadd_y_x_a_lsb_pad; //= {1'b0, wrk_rd_narrow_y_din_x_dly2[WORD_W-1:0]}; - reg [WORD_W:0] regadd_y_x_b_lsb_pad; //= {1'b0, wrk_rd_narrow_y_din_x_dly1[WORD_W-1:0]}; - reg [WORD_W:0] regadd_x_y_a_lsb_pad; //= {1'b0, wrk_rd_narrow_x_din_y_dly2[WORD_W-1:0]}; - reg [WORD_W:0] regadd_x_y_b_lsb_pad; //= {1'b0, wrk_rd_narrow_x_din_y_dly1[WORD_W-1:0]}; - reg [WORD_W:0] regadd_y_y_a_lsb_pad; //= {1'b0, wrk_rd_narrow_y_din_y_dly2[WORD_W-1:0]}; - reg [WORD_W:0] regadd_y_y_b_lsb_pad; //= {1'b0, wrk_rd_narrow_y_din_y_dly1[WORD_W-1:0]}; - /**/ - //WRK_FSM_STATE_BUSY_M1, - //WRK_FSM_STATE_LATENCY_POST1_M1, - //WRK_FSM_STATE_LATENCY_POST2_M1: - - always @(posedge clk) begin - // - regadd_x_x_a_lsb_pad <= {1'bX, WORD_DNC}; - regadd_x_x_b_lsb_pad <= {1'bX, WORD_DNC}; - regadd_y_x_a_lsb_pad <= {1'bX, WORD_DNC}; - regadd_y_x_b_lsb_pad <= {1'bX, WORD_DNC}; - regadd_x_y_a_lsb_pad <= {1'bX, WORD_DNC}; - regadd_x_y_b_lsb_pad <= {1'bX, WORD_DNC}; - regadd_y_y_a_lsb_pad <= {1'bX, WORD_DNC}; - regadd_y_y_b_lsb_pad <= {1'bX, WORD_DNC}; - // - if (opcode == UOP_OPCODE_REGULAR_ADD_UNEVEN) + case (wrk_fsm_state) + // + WRK_FSM_STATE_BUSY1, + WRK_FSM_STATE_LATENCY_POST1, + WRK_FSM_STATE_LATENCY_POST3: + // + update_wide_dout(rd_narrow_x_din_x_dly2, rd_narrow_y_din_x_dly2, rd_narrow_x_din_y_dly2, rd_narrow_y_din_y_dly2); + // + endcase // - case (wrk_fsm_state) + UOP_OPCODE_COPY_LADDERS_X2Y: // - WRK_FSM_STATE_LATENCY_PRE2_M2, - WRK_FSM_STATE_BUSY_M2, - WRK_FSM_STATE_LATENCY_POST1_M2: begin - regadd_x_x_a_lsb_pad <= {1'b0, !rd_wide_xy_addr_xy_next_last_seen_dly2 ? wrk_rd_narrow_x_din_x_dly1[WORD_W-1:0] : WORD_ZERO}; - regadd_x_x_b_lsb_pad <= {1'b0, wrk_rd_narrow_x_din_x [WORD_W-1:0] }; - regadd_y_x_a_lsb_pad <= {1'b0, !rd_wide_xy_addr_xy_next_last_seen_dly2 ? wrk_rd_narrow_y_din_x_dly1[WORD_W-1:0] : WORD_ZERO}; - regadd_y_x_b_lsb_pad <= {1'b0, wrk_rd_narrow_y_din_x [WORD_W-1:0] }; - regadd_x_y_a_lsb_pad <= {1'b0, !rd_wide_xy_addr_xy_next_last_seen_dly2 ? wrk_rd_narrow_x_din_y_dly1[WORD_W-1:0] : WORD_ZERO}; - regadd_x_y_b_lsb_pad <= {1'b0, wrk_rd_narrow_x_din_y [WORD_W-1:0] }; - regadd_y_y_a_lsb_pad <= {1'b0, !rd_wide_xy_addr_xy_next_last_seen_dly2 ? wrk_rd_narrow_y_din_y_dly1[WORD_W-1:0] : WORD_ZERO}; - regadd_y_y_b_lsb_pad <= {1'b0, wrk_rd_narrow_y_din_y [WORD_W-1:0] }; - end + case (wrk_fsm_state) + // + WRK_FSM_STATE_BUSY1, + WRK_FSM_STATE_LATENCY_POST1, + WRK_FSM_STATE_LATENCY_POST3: + // + begin update_wide_dout (rd_wide_x_din_x_dly1, rd_wide_x_din_x_dly2, rd_wide_x_din_y_dly1, rd_wide_x_din_y_dly2); + update_narrow_dout(rd_narrow_x_din_x_dly1, rd_narrow_x_din_x_dly2, rd_narrow_x_din_y_dly1, rd_narrow_x_din_y_dly2); end + // + endcase + // + UOP_OPCODE_CROSS_LADDERS_X2Y: // - endcase - end - - always @(posedge clk) begin - // - regadd_x_x <= {1'bX, WORD_DNC}; - regadd_y_x <= {1'bX, WORD_DNC}; - regadd_x_y <= {1'bX, WORD_DNC}; - regadd_y_y <= {1'bX, WORD_DNC}; - // - if (opcode == UOP_OPCODE_REGULAR_ADD_UNEVEN) + case (wrk_fsm_state) + // + WRK_FSM_STATE_BUSY1, + WRK_FSM_STATE_LATENCY_POST1, + WRK_FSM_STATE_LATENCY_POST3: + // + begin update_wide_dout (rd_wide_x_din_x_dly1, rd_wide_x_din_y_dly2, rd_wide_x_din_y_dly1, rd_wide_x_din_x_dly2); + update_narrow_dout(rd_narrow_x_din_x_dly1, rd_narrow_x_din_y_dly2, rd_narrow_x_din_y_dly1, rd_narrow_x_din_x_dly2); end + // + endcase // - case (wrk_fsm_state) + UOP_OPCODE_MODULAR_SUBTRACT_X: // - WRK_FSM_STATE_BUSY_M1, - WRK_FSM_STATE_LATENCY_POST1_M1, - WRK_FSM_STATE_LATENCY_POST2_M1: begin - regadd_x_x <= regadd_x_x_a_lsb_pad + regadd_x_x_b_lsb_pad + regadd_x_x_cry; - regadd_y_x <= regadd_y_x_a_lsb_pad + regadd_y_x_b_lsb_pad + regadd_y_x_cry; - regadd_x_y <= regadd_x_y_a_lsb_pad + regadd_x_y_b_lsb_pad + regadd_x_y_cry; - regadd_y_y <= regadd_y_y_a_lsb_pad + regadd_y_y_b_lsb_pad + regadd_y_y_cry; - end + case (wrk_fsm_state) + // + WRK_FSM_STATE_BUSY1, + WRK_FSM_STATE_LATENCY_POST1, + WRK_FSM_STATE_LATENCY_POST3: + // + update_narrow_dout(modular_subtract_x_w_brw_reduced, modular_subtract_x_w_brw_reduced, modular_subtract_y_w_brw_reduced, modular_subtract_y_w_brw_reduced); + // + endcase + // + UOP_OPCODE_MODULAR_SUBTRACT_Y: // - endcase - // - end - - always @(posedge clk) begin - // - regadd_x_x_cry <= 1'bX; - regadd_y_x_cry <= 1'bX; - regadd_x_y_cry <= 1'bX; - regadd_y_y_cry <= 1'bX; - // - if (opcode == UOP_OPCODE_REGULAR_ADD_UNEVEN) + case (wrk_fsm_state) + // + WRK_FSM_STATE_BUSY1, + WRK_FSM_STATE_LATENCY_POST1, + WRK_FSM_STATE_LATENCY_POST3: + // + update_wide_dout(modular_subtract_x_w_cry_reduced, modular_subtract_x_w_cry_reduced, modular_subtract_y_w_cry_reduced, modular_subtract_y_w_cry_reduced); + // + endcase // - case (wrk_fsm_state) + UOP_OPCODE_MODULAR_SUBTRACT_Z: // - WRK_FSM_STATE_LATENCY_PRE2_M2: begin - regadd_x_x_cry <= 1'b0; - regadd_y_x_cry <= 1'b0; - regadd_x_y_cry <= 1'b0; - regadd_y_y_cry <= 1'b0; - end + case (wrk_fsm_state) + // + WRK_FSM_STATE_BUSY1, + WRK_FSM_STATE_LATENCY_POST1, + WRK_FSM_STATE_LATENCY_POST3: + // + begin update_wide_dout (modular_subtract_x_mux_reduced, modular_subtract_x_mux_reduced, modular_subtract_y_mux_reduced, modular_subtract_y_mux_reduced); + update_narrow_dout(modular_subtract_x_mux_reduced, modular_subtract_x_mux_reduced, modular_subtract_y_mux_reduced, modular_subtract_y_mux_reduced); end + // + endcase + // + UOP_OPCODE_MERGE_LH: // - WRK_FSM_STATE_BUSY_M2, - WRK_FSM_STATE_LATENCY_POST1_M2: begin - regadd_x_x_cry <= regadd_x_x[WORD_W]; - regadd_y_x_cry <= regadd_y_x[WORD_W]; - regadd_x_y_cry <= regadd_x_y[WORD_W]; - regadd_y_y_cry <= regadd_y_y[WORD_W]; - end + case (wrk_fsm_state) + // + WRK_FSM_STATE_BUSY1, + WRK_FSM_STATE_LATENCY_POST1, + WRK_FSM_STATE_LATENCY_POST3: + // + update_narrow_dout(rd_wide_x_din_x_dly2, rd_wide_y_din_x_dly2, rd_wide_x_din_y_dly2, rd_wide_y_din_y_dly2); + // + endcase + // + UOP_OPCODE_REGULAR_ADD_UNEVEN: // - endcase - // + case (wrk_fsm_state) + // + WRK_FSM_STATE_BUSY1, + WRK_FSM_STATE_LATENCY_POST1, + WRK_FSM_STATE_LATENCY_POST3: + // + update_narrow_dout(regular_add_uneven_x_x_w_cry_reduced, regular_add_uneven_y_x_w_cry_reduced, regular_add_uneven_x_y_w_cry_reduced, regular_add_uneven_y_y_w_cry_reduced); + // + endcase + endcase + // end + endmodule From git at cryptech.is Mon Jan 20 21:18:03 2020 From: git at cryptech.is (git at cryptech.is) Date: Mon, 20 Jan 2020 21:18:03 +0000 Subject: [Cryptech-Commits] [user/shatov/modexpng] 02/21: Turns out, fabric addition and subtraction in the general worker module are actually in the critical paths of the ModExpNG core and are plaguing the place and route tools. I was barely able to achieve timing closure at 180 MHz even with the highest Map and PaR effort levels. This means that any further clock frequency increase is effectively impossible, moreover any small change in the design may prevent it from meeting timing constants. The obvious solution is to use DSP slices not only f [...] In-Reply-To: <157955508198.72689.15683234965619703441@bikeshed.cryptech.is> References: <157955508198.72689.15683234965619703441@bikeshed.cryptech.is> Message-ID: <20200120211803.2E2AE8BAF10@bikeshed.cryptech.is> This is an automated email from the git hooks/post-receive script. meisterpaul1 at yandex.ru pushed a commit to branch master in repository user/shatov/modexpng. commit 6a0438e33fa300822216c259668180f177ac0343 Author: Pavel V. Shatov (Meister) AuthorDate: Thu Jan 16 15:09:59 2020 +0300 Turns out, fabric addition and subtraction in the general worker module are actually in the critical paths of the ModExpNG core and are plaguing the place and route tools. I was barely able to achieve timing closure at 180 MHz even with the highest Map and PaR effort levels. This means that any further clock frequency increase is effectively impossible, moreover any small change in the design may prevent it from meeting timing constants. The obvious solution is to use DSP slices not only for modular multiplication, but also for supporting math operations. When fully pipelined, they can be clocked even faster then the block memory, so there definitely should not be any timing problems with them. The general worker module does three things that currently involve fabric-based math operations: * carry propagation (conversion to non-redundant repsesentation) * modular subtraction * regular addition This commit adds four DSP slice instances and makes the carry propagation opcode use DSP slice products instead of fabric logic. --- rtl/modexpng_general_worker.v | 172 +++++++++++++++++++++++++++++++++++++++--- 1 file changed, 161 insertions(+), 11 deletions(-) diff --git a/rtl/modexpng_general_worker.v b/rtl/modexpng_general_worker.v index 0620bd6..684af5a 100644 --- a/rtl/modexpng_general_worker.v +++ b/rtl/modexpng_general_worker.v @@ -54,7 +54,9 @@ module modexpng_general_worker // `include "modexpng_parameters.vh" `include "modexpng_microcode.vh" - + `include "modexpng_dsp48e1.vh" + `include "modexpng_dsp_slice_primitives.vh" + // // Ports @@ -240,8 +242,8 @@ module modexpng_general_worker // // Delays // - reg [OP_ADDR_W -1:0] rd_narrow_addr_x_dly[0:3]; - reg [OP_ADDR_W -1:0] rd_narrow_addr_y_dly[0:3]; + reg [OP_ADDR_W -1:0] rd_narrow_addr_x_dly[0:4]; + reg [OP_ADDR_W -1:0] rd_narrow_addr_y_dly[0:4]; reg [OP_ADDR_W -1:0] rd_wide_addr_x_dly[0:3]; reg [OP_ADDR_W -1:0] rd_wide_addr_y_dly[0:3]; @@ -255,6 +257,11 @@ module modexpng_general_worker reg [WORD_EXT_W -1:0] rd_narrow_x_din_y_dly1; reg [WORD_EXT_W -1:0] rd_narrow_y_din_y_dly1; + reg rd_narrow_ena_x_dly1 = 1'b0; + reg rd_narrow_ena_y_dly1 = 1'b0; + reg rd_narrow_ena_x_dly2 = 1'b0; + reg rd_narrow_ena_y_dly2 = 1'b0; + always @(posedge clk) begin // {rd_wide_x_din_x_dly1} <= {wrk_rd_wide_x_din_x}; @@ -267,12 +274,15 @@ module modexpng_general_worker {rd_narrow_x_din_y_dly1} <= {wrk_rd_narrow_x_din_y}; {rd_narrow_y_din_y_dly1} <= {wrk_rd_narrow_y_din_y}; // - {rd_narrow_addr_x_dly[3], rd_narrow_addr_x_dly[2], rd_narrow_addr_x_dly[1], rd_narrow_addr_x_dly[0]} <= {rd_narrow_addr_x_dly[2], rd_narrow_addr_x_dly[1], rd_narrow_addr_x_dly[0], rd_narrow_addr_x}; - {rd_narrow_addr_y_dly[3], rd_narrow_addr_y_dly[2], rd_narrow_addr_y_dly[1], rd_narrow_addr_y_dly[0]} <= {rd_narrow_addr_y_dly[2], rd_narrow_addr_y_dly[1], rd_narrow_addr_y_dly[0], rd_narrow_addr_y}; + {rd_narrow_addr_x_dly[4], rd_narrow_addr_x_dly[3], rd_narrow_addr_x_dly[2], rd_narrow_addr_x_dly[1], rd_narrow_addr_x_dly[0]} <= {rd_narrow_addr_x_dly[3], rd_narrow_addr_x_dly[2], rd_narrow_addr_x_dly[1], rd_narrow_addr_x_dly[0], rd_narrow_addr_x}; + {rd_narrow_addr_y_dly[4], rd_narrow_addr_y_dly[3], rd_narrow_addr_y_dly[2], rd_narrow_addr_y_dly[1], rd_narrow_addr_y_dly[0]} <= {rd_narrow_addr_y_dly[3], rd_narrow_addr_y_dly[2], rd_narrow_addr_y_dly[1], rd_narrow_addr_y_dly[0], rd_narrow_addr_y}; // {rd_wide_addr_x_dly[3], rd_wide_addr_x_dly[2], rd_wide_addr_x_dly[1], rd_wide_addr_x_dly[0]} <= {rd_wide_addr_x_dly[2], rd_wide_addr_x_dly[1], rd_wide_addr_x_dly[0], rd_wide_addr_x}; {rd_wide_addr_y_dly[3], rd_wide_addr_y_dly[2], rd_wide_addr_y_dly[1], rd_wide_addr_y_dly[0]} <= {rd_wide_addr_y_dly[2], rd_wide_addr_y_dly[1], rd_wide_addr_y_dly[0], rd_wide_addr_y}; // + {rd_narrow_ena_x_dly2, rd_narrow_ena_x_dly1} <= {rd_narrow_ena_x_dly1, rd_narrow_ena_x}; + {rd_narrow_ena_y_dly2, rd_narrow_ena_y_dly1} <= {rd_narrow_ena_y_dly1, rd_narrow_ena_y}; + // end @@ -376,7 +386,14 @@ module modexpng_general_worker // case (opcode) // - UOP_OPCODE_PROPAGATE_CARRIES, + UOP_OPCODE_PROPAGATE_CARRIES: + // + case (wrk_fsm_state) + WRK_FSM_STATE_BUSY2, + WRK_FSM_STATE_LATENCY_POST2, + WRK_FSM_STATE_LATENCY_POST4: enable_narrow_wr_en; + endcase + // UOP_OPCODE_MODULAR_SUBTRACT_X, UOP_OPCODE_MERGE_LH, UOP_OPCODE_REGULAR_ADD_UNEVEN: @@ -729,7 +746,14 @@ module modexpng_general_worker // case (opcode) // - UOP_OPCODE_PROPAGATE_CARRIES, + UOP_OPCODE_PROPAGATE_CARRIES: + // + case (wrk_fsm_state) + WRK_FSM_STATE_BUSY2, + WRK_FSM_STATE_LATENCY_POST2, + WRK_FSM_STATE_LATENCY_POST4: update_wr_narrow_bank_addr(sel_narrow_out, sel_narrow_out, rd_narrow_addr_x_dly[4], rd_narrow_addr_y_dly[4]); + endcase + // UOP_OPCODE_MODULAR_SUBTRACT_X, UOP_OPCODE_MERGE_LH, UOP_OPCODE_REGULAR_ADD_UNEVEN: @@ -773,6 +797,131 @@ module modexpng_general_worker end + + // + // DSP Slice Array + // + wire [DSP48E1_C_W-1:0] dsp_x_x_x = 'bX;//{{(DSP48E1_C_W-WORD_EXT_W){1'b0}}, rd_narrow_x_din_x_dly1}; + wire [DSP48E1_C_W-1:0] dsp_y_x_x = 'bX;//{{(DSP48E1_C_W-WORD_EXT_W){1'b0}}, rd_narrow_y_din_x_dly1}; + wire [DSP48E1_C_W-1:0] dsp_x_y_x = 'bX;//{{(DSP48E1_C_W-WORD_EXT_W){1'b0}}, rd_narrow_x_din_y_dly1}; + wire [DSP48E1_C_W-1:0] dsp_y_y_x = 'bX;//{{(DSP48E1_C_W-WORD_EXT_W){1'b0}}, rd_narrow_y_din_y_dly1}; + + wire [DSP48E1_C_W-1:0] dsp_x_x_y = {{(DSP48E1_C_W-(WORD_EXT_W+1)){1'b0}}, rd_narrow_x_din_x_dly1[WORD_EXT_W-1:WORD_W], 1'b1, rd_narrow_x_din_x_dly1[WORD_W-1:0]}; + wire [DSP48E1_C_W-1:0] dsp_y_x_y = {{(DSP48E1_C_W-(WORD_EXT_W+1)){1'b0}}, rd_narrow_y_din_x_dly1[WORD_EXT_W-1:WORD_W], 1'b1, rd_narrow_y_din_x_dly1[WORD_W-1:0]}; + wire [DSP48E1_C_W-1:0] dsp_x_y_y = {{(DSP48E1_C_W-(WORD_EXT_W+1)){1'b0}}, rd_narrow_x_din_y_dly1[WORD_EXT_W-1:WORD_W], 1'b1, rd_narrow_x_din_y_dly1[WORD_W-1:0]}; + wire [DSP48E1_C_W-1:0] dsp_y_y_y = {{(DSP48E1_C_W-(WORD_EXT_W+1)){1'b0}}, rd_narrow_y_din_y_dly1[WORD_EXT_W-1:WORD_W], 1'b1, rd_narrow_y_din_y_dly1[WORD_W-1:0]}; + + wire [DSP48E1_P_W-1:0] dsp_x_x_p; + wire [DSP48E1_P_W-1:0] dsp_y_x_p; + wire [DSP48E1_P_W-1:0] dsp_x_y_p; + wire [DSP48E1_P_W-1:0] dsp_y_y_p; + + wire [WORD_EXT_W-1:0] dsp_x_x_p_reduced = {CARRY_ZERO, dsp_x_x_p[WORD_W-1:0]}; + wire [WORD_EXT_W-1:0] dsp_y_x_p_reduced = {CARRY_ZERO, dsp_y_x_p[WORD_W-1:0]}; + wire [WORD_EXT_W-1:0] dsp_x_y_p_reduced = {CARRY_ZERO, dsp_x_y_p[WORD_W-1:0]}; + wire [WORD_EXT_W-1:0] dsp_y_y_p_reduced = {CARRY_ZERO, dsp_y_y_p[WORD_W-1:0]}; + + reg dsp_ce_x = 1'b0; + reg dsp_ce_y = 1'b0; + reg dsp_ce_x_dly = 1'b0; + reg dsp_ce_y_dly = 1'b0; + reg [DSP48E1_OPMODE_W-1:0] dsp_opmode_x; + reg [DSP48E1_OPMODE_W-1:0] dsp_opmode_y; + + always @(posedge clk or negedge rst_n) + // + if (!rst_n) {dsp_ce_x, dsp_ce_y} <= {1'b0, 1'b0}; + else case (opcode) + // + UOP_OPCODE_PROPAGATE_CARRIES: {dsp_ce_x, dsp_ce_y} <= {rd_narrow_ena_x_dly2, rd_narrow_ena_y_dly2}; + default: {dsp_ce_x, dsp_ce_y} <= {1'b0, 1'b0}; + // + endcase + + always @(posedge clk) begin + // + dsp_opmode_x <= {DSP48E1_OPMODE_W{1'bX}}; + dsp_opmode_y <= {DSP48E1_OPMODE_W{1'bX}}; + // + if (rd_narrow_ena_x_dly2) + // + case (opcode) + // + UOP_OPCODE_PROPAGATE_CARRIES: if (rd_narrow_addr_x_dly[1] == OP_ADDR_ZERO) dsp_opmode_x <= DSP48E1_OPMODE_Z0_YC_X0; + else dsp_opmode_x <= DSP48E1_OPMODE_ZP17_YC_X0; + // + endcase + // + if (rd_narrow_ena_y_dly2) + // + case (opcode) + // + UOP_OPCODE_PROPAGATE_CARRIES: if (rd_narrow_addr_y_dly[1] == OP_ADDR_ZERO) dsp_opmode_y <= DSP48E1_OPMODE_Z0_YC_X0; + else dsp_opmode_y <= DSP48E1_OPMODE_ZP17_YC_X0; + // + endcase + // + end + + always @(posedge clk) {dsp_ce_x_dly, dsp_ce_y_dly} <= {dsp_ce_x, dsp_ce_y}; + + `MODEXPNG_DSP_SLICE_ADDSUB dst_inst_x_x + ( + .clk (clk), + .ce_abc (dsp_ce_x), + .ce_p (dsp_ce_x_dly), + .ce_opmode (dsp_ce_x), + .x (dsp_x_x_x), + .y (dsp_x_x_y), + .p (dsp_x_x_p), + .opmode (dsp_opmode_x), + .casc_p_in (), + .casc_p_out () + ); + + `MODEXPNG_DSP_SLICE_ADDSUB dst_inst_y_x + ( + .clk (clk), + .ce_abc (dsp_ce_x), + .ce_p (dsp_ce_x_dly), + .ce_opmode (dsp_ce_x), + .x (dsp_y_x_x), + .y (dsp_y_x_y), + .p (dsp_y_x_p), + .opmode (dsp_opmode_x), + .casc_p_in (), + .casc_p_out () + ); + + `MODEXPNG_DSP_SLICE_ADDSUB dst_inst_x_y + ( + .clk (clk), + .ce_abc (dsp_ce_y), + .ce_p (dsp_ce_y_dly), + .ce_opmode (dsp_ce_y), + .x (dsp_x_y_x), + .y (dsp_x_y_y), + .p (dsp_x_y_p), + .opmode (dsp_opmode_y), + .casc_p_in (), + .casc_p_out () + ); + + `MODEXPNG_DSP_SLICE_ADDSUB dst_inst_y_y + ( + .clk (clk), + .ce_abc (dsp_ce_y), + .ce_p (dsp_ce_y_dly), + .ce_opmode (dsp_ce_y), + .x (dsp_y_y_x), + .y (dsp_y_y_y), + .p (dsp_y_y_p), + .opmode (dsp_opmode_y), + .casc_p_in (), + .casc_p_out () + ); + + // // UOP_OPCODE_PROPAGATE_CARRIES // @@ -1171,11 +1320,12 @@ module modexpng_general_worker // case (wrk_fsm_state) // - WRK_FSM_STATE_BUSY1, - WRK_FSM_STATE_LATENCY_POST1, - WRK_FSM_STATE_LATENCY_POST3: + WRK_FSM_STATE_BUSY2, + WRK_FSM_STATE_LATENCY_POST2, + WRK_FSM_STATE_LATENCY_POST4: // - update_narrow_dout(propagate_carries_x_x_w_cry_reduced, propagate_carries_y_x_w_cry_reduced, propagate_carries_x_y_w_cry_reduced, propagate_carries_y_y_w_cry_reduced); + //update_narrow_dout(propagate_carries_x_x_w_cry_reduced, propagate_carries_y_x_w_cry_reduced, propagate_carries_x_y_w_cry_reduced, propagate_carries_y_y_w_cry_reduced); + update_narrow_dout(dsp_x_x_p_reduced, dsp_y_x_p_reduced, dsp_x_y_p_reduced, dsp_y_y_p_reduced); // endcase // From git at cryptech.is Mon Jan 20 21:18:04 2020 From: git at cryptech.is (git at cryptech.is) Date: Mon, 20 Jan 2020 21:18:04 +0000 Subject: [Cryptech-Commits] [user/shatov/modexpng] 03/21: Reworked modular subtraction micro-operation. Previously it used "two-pass" bank address space sweep, during the first pass (a-b) and (a-b+n) were computed, during the second pass either the former or the latter quantity was written to the output bank (depending on the very last borrow flag value). This is no longer possible, since the FSM now only generates one "interleaved" address space sweep. The solution is to split one complex modular subtraction operation into simpler sub-operations [...] In-Reply-To: <157955508198.72689.15683234965619703441@bikeshed.cryptech.is> References: <157955508198.72689.15683234965619703441@bikeshed.cryptech.is> Message-ID: <20200120211803.BA4758BAF11@bikeshed.cryptech.is> This is an automated email from the git hooks/post-receive script. meisterpaul1 at yandex.ru pushed a commit to branch master in repository user/shatov/modexpng. commit e5f4454e3ac52fa761f301e7d11ad144cd23d590 Author: Pavel V. Shatov (Meister) AuthorDate: Thu Jan 16 21:38:04 2020 +0300 Reworked modular subtraction micro-operation. Previously it used "two-pass" bank address space sweep, during the first pass (a-b) and (a-b+n) were computed, during the second pass either the former or the latter quantity was written to the output bank (depending on the very last borrow flag value). This is no longer possible, since the FSM now only generates one "interleaved" address space sweep. The solution is to split one complex modular subtraction operation into simpler sub-operations. Currently modular subtraction is achieved by running a sequence of three micro-operations: * MODULAR_SUBTRACT_X computes (a-b) and latches the final borrow flag * MODULAR_SUBTRACT_Y computes (a-b+n) * MODULAR_SUBTRACT_Z writes either (a-b) or (a-b+n) into the output bank depending on the latched value of the borrow flag Unfortunately we can't compute both (a-b) and (a-b+n) during one address space sweep, since fully pipelined adder/subtractor DSP slice has 2-cycle latency. --- rtl/modexpng_general_worker.v | 569 ++++++++++++++++++++++++------------------ 1 file changed, 332 insertions(+), 237 deletions(-) diff --git a/rtl/modexpng_general_worker.v b/rtl/modexpng_general_worker.v index 684af5a..6652f14 100644 --- a/rtl/modexpng_general_worker.v +++ b/rtl/modexpng_general_worker.v @@ -245,8 +245,8 @@ module modexpng_general_worker reg [OP_ADDR_W -1:0] rd_narrow_addr_x_dly[0:4]; reg [OP_ADDR_W -1:0] rd_narrow_addr_y_dly[0:4]; - reg [OP_ADDR_W -1:0] rd_wide_addr_x_dly[0:3]; - reg [OP_ADDR_W -1:0] rd_wide_addr_y_dly[0:3]; + reg [OP_ADDR_W -1:0] rd_wide_addr_x_dly[0:4]; + reg [OP_ADDR_W -1:0] rd_wide_addr_y_dly[0:4]; reg [WORD_EXT_W -1:0] rd_wide_x_din_x_dly1; reg [WORD_EXT_W -1:0] rd_wide_y_din_x_dly1; @@ -277,8 +277,8 @@ module modexpng_general_worker {rd_narrow_addr_x_dly[4], rd_narrow_addr_x_dly[3], rd_narrow_addr_x_dly[2], rd_narrow_addr_x_dly[1], rd_narrow_addr_x_dly[0]} <= {rd_narrow_addr_x_dly[3], rd_narrow_addr_x_dly[2], rd_narrow_addr_x_dly[1], rd_narrow_addr_x_dly[0], rd_narrow_addr_x}; {rd_narrow_addr_y_dly[4], rd_narrow_addr_y_dly[3], rd_narrow_addr_y_dly[2], rd_narrow_addr_y_dly[1], rd_narrow_addr_y_dly[0]} <= {rd_narrow_addr_y_dly[3], rd_narrow_addr_y_dly[2], rd_narrow_addr_y_dly[1], rd_narrow_addr_y_dly[0], rd_narrow_addr_y}; // - {rd_wide_addr_x_dly[3], rd_wide_addr_x_dly[2], rd_wide_addr_x_dly[1], rd_wide_addr_x_dly[0]} <= {rd_wide_addr_x_dly[2], rd_wide_addr_x_dly[1], rd_wide_addr_x_dly[0], rd_wide_addr_x}; - {rd_wide_addr_y_dly[3], rd_wide_addr_y_dly[2], rd_wide_addr_y_dly[1], rd_wide_addr_y_dly[0]} <= {rd_wide_addr_y_dly[2], rd_wide_addr_y_dly[1], rd_wide_addr_y_dly[0], rd_wide_addr_y}; + {rd_wide_addr_x_dly[4], rd_wide_addr_x_dly[3], rd_wide_addr_x_dly[2], rd_wide_addr_x_dly[1], rd_wide_addr_x_dly[0]} <= {rd_wide_addr_x_dly[3], rd_wide_addr_x_dly[2], rd_wide_addr_x_dly[1], rd_wide_addr_x_dly[0], rd_wide_addr_x}; + {rd_wide_addr_y_dly[4], rd_wide_addr_y_dly[3], rd_wide_addr_y_dly[2], rd_wide_addr_y_dly[1], rd_wide_addr_y_dly[0]} <= {rd_wide_addr_y_dly[3], rd_wide_addr_y_dly[2], rd_wide_addr_y_dly[1], rd_wide_addr_y_dly[0], rd_wide_addr_y}; // {rd_narrow_ena_x_dly2, rd_narrow_ena_x_dly1} <= {rd_narrow_ena_x_dly1, rd_narrow_ena_x}; {rd_narrow_ena_y_dly2, rd_narrow_ena_y_dly1} <= {rd_narrow_ena_y_dly1, rd_narrow_ena_y}; @@ -386,15 +386,15 @@ module modexpng_general_worker // case (opcode) // - UOP_OPCODE_PROPAGATE_CARRIES: + UOP_OPCODE_PROPAGATE_CARRIES, + UOP_OPCODE_MODULAR_SUBTRACT_X: // case (wrk_fsm_state) WRK_FSM_STATE_BUSY2, WRK_FSM_STATE_LATENCY_POST2, WRK_FSM_STATE_LATENCY_POST4: enable_narrow_wr_en; endcase - // - UOP_OPCODE_MODULAR_SUBTRACT_X, + // UOP_OPCODE_MERGE_LH, UOP_OPCODE_REGULAR_ADD_UNEVEN: // @@ -415,8 +415,15 @@ module modexpng_general_worker WRK_FSM_STATE_LATENCY_POST3: begin enable_wide_wr_en; enable_narrow_wr_en; end endcase // - UOP_OPCODE_MODULAR_REDUCE_INIT, UOP_OPCODE_MODULAR_SUBTRACT_Y: + // + case (wrk_fsm_state) + WRK_FSM_STATE_BUSY2, + WRK_FSM_STATE_LATENCY_POST2, + WRK_FSM_STATE_LATENCY_POST4: enable_wide_wr_en; + endcase + // + UOP_OPCODE_MODULAR_REDUCE_INIT: // case (wrk_fsm_state) WRK_FSM_STATE_BUSY1, @@ -746,7 +753,8 @@ module modexpng_general_worker // case (opcode) // - UOP_OPCODE_PROPAGATE_CARRIES: + UOP_OPCODE_PROPAGATE_CARRIES, + UOP_OPCODE_MODULAR_SUBTRACT_X: // case (wrk_fsm_state) WRK_FSM_STATE_BUSY2, @@ -754,7 +762,6 @@ module modexpng_general_worker WRK_FSM_STATE_LATENCY_POST4: update_wr_narrow_bank_addr(sel_narrow_out, sel_narrow_out, rd_narrow_addr_x_dly[4], rd_narrow_addr_y_dly[4]); endcase // - UOP_OPCODE_MODULAR_SUBTRACT_X, UOP_OPCODE_MERGE_LH, UOP_OPCODE_REGULAR_ADD_UNEVEN: // @@ -787,29 +794,28 @@ module modexpng_general_worker UOP_OPCODE_MODULAR_SUBTRACT_Y: // case (wrk_fsm_state) - WRK_FSM_STATE_BUSY1, - WRK_FSM_STATE_LATENCY_POST1, - WRK_FSM_STATE_LATENCY_POST3: update_wr_wide_bank_addr(sel_wide_out, sel_wide_out, rd_wide_addr_x_dly[3], rd_wide_addr_y_dly[3]); + WRK_FSM_STATE_BUSY2, + WRK_FSM_STATE_LATENCY_POST2, + WRK_FSM_STATE_LATENCY_POST4: update_wr_wide_bank_addr(sel_wide_out, sel_wide_out, rd_wide_addr_x_dly[4], rd_wide_addr_y_dly[4]); endcase // endcase // end - - - + + // // DSP Slice Array // - wire [DSP48E1_C_W-1:0] dsp_x_x_x = 'bX;//{{(DSP48E1_C_W-WORD_EXT_W){1'b0}}, rd_narrow_x_din_x_dly1}; - wire [DSP48E1_C_W-1:0] dsp_y_x_x = 'bX;//{{(DSP48E1_C_W-WORD_EXT_W){1'b0}}, rd_narrow_y_din_x_dly1}; - wire [DSP48E1_C_W-1:0] dsp_x_y_x = 'bX;//{{(DSP48E1_C_W-WORD_EXT_W){1'b0}}, rd_narrow_x_din_y_dly1}; - wire [DSP48E1_C_W-1:0] dsp_y_y_x = 'bX;//{{(DSP48E1_C_W-WORD_EXT_W){1'b0}}, rd_narrow_y_din_y_dly1}; + reg [DSP48E1_C_W-1:0] dsp_x_x_x; + reg [DSP48E1_C_W-1:0] dsp_y_x_x; + reg [DSP48E1_C_W-1:0] dsp_x_y_x; + reg [DSP48E1_C_W-1:0] dsp_y_y_x; - wire [DSP48E1_C_W-1:0] dsp_x_x_y = {{(DSP48E1_C_W-(WORD_EXT_W+1)){1'b0}}, rd_narrow_x_din_x_dly1[WORD_EXT_W-1:WORD_W], 1'b1, rd_narrow_x_din_x_dly1[WORD_W-1:0]}; - wire [DSP48E1_C_W-1:0] dsp_y_x_y = {{(DSP48E1_C_W-(WORD_EXT_W+1)){1'b0}}, rd_narrow_y_din_x_dly1[WORD_EXT_W-1:WORD_W], 1'b1, rd_narrow_y_din_x_dly1[WORD_W-1:0]}; - wire [DSP48E1_C_W-1:0] dsp_x_y_y = {{(DSP48E1_C_W-(WORD_EXT_W+1)){1'b0}}, rd_narrow_x_din_y_dly1[WORD_EXT_W-1:WORD_W], 1'b1, rd_narrow_x_din_y_dly1[WORD_W-1:0]}; - wire [DSP48E1_C_W-1:0] dsp_y_y_y = {{(DSP48E1_C_W-(WORD_EXT_W+1)){1'b0}}, rd_narrow_y_din_y_dly1[WORD_EXT_W-1:WORD_W], 1'b1, rd_narrow_y_din_y_dly1[WORD_W-1:0]}; + reg [DSP48E1_C_W-1:0] dsp_x_x_y; + reg [DSP48E1_C_W-1:0] dsp_y_x_y; + reg [DSP48E1_C_W-1:0] dsp_x_y_y; + reg [DSP48E1_C_W-1:0] dsp_y_y_y; wire [DSP48E1_P_W-1:0] dsp_x_x_p; wire [DSP48E1_P_W-1:0] dsp_y_x_p; @@ -821,213 +827,314 @@ module modexpng_general_worker wire [WORD_EXT_W-1:0] dsp_x_y_p_reduced = {CARRY_ZERO, dsp_x_y_p[WORD_W-1:0]}; wire [WORD_EXT_W-1:0] dsp_y_y_p_reduced = {CARRY_ZERO, dsp_y_y_p[WORD_W-1:0]}; - reg dsp_ce_x = 1'b0; - reg dsp_ce_y = 1'b0; - reg dsp_ce_x_dly = 1'b0; - reg dsp_ce_y_dly = 1'b0; - reg [DSP48E1_OPMODE_W-1:0] dsp_opmode_x; - reg [DSP48E1_OPMODE_W-1:0] dsp_opmode_y; + reg dsp_ce_x = 1'b0; + reg dsp_ce_y = 1'b0; + reg dsp_ce_x_dly = 1'b0; + reg dsp_ce_y_dly = 1'b0; + reg [ DSP48E1_OPMODE_W -1:0] dsp_op_mode_x; + reg [ DSP48E1_OPMODE_W -1:0] dsp_op_mode_y; + reg [ DSP48E1_ALUMODE_W -1:0] dsp_alu_mode_x; + reg [ DSP48E1_ALUMODE_W -1:0] dsp_alu_mode_y; + reg [DSP48E1_CARRYINSEL_W -1:0] dsp_carry_in_sel_x; + reg [DSP48E1_CARRYINSEL_W -1:0] dsp_carry_in_sel_y; + wire dsp_carry_out_x; + wire dsp_carry_out_y; + + + // + // DSP - CE + // + always @(posedge clk) {dsp_ce_x_dly, dsp_ce_y_dly} <= {dsp_ce_x, dsp_ce_y}; always @(posedge clk or negedge rst_n) // if (!rst_n) {dsp_ce_x, dsp_ce_y} <= {1'b0, 1'b0}; else case (opcode) // - UOP_OPCODE_PROPAGATE_CARRIES: {dsp_ce_x, dsp_ce_y} <= {rd_narrow_ena_x_dly2, rd_narrow_ena_y_dly2}; - default: {dsp_ce_x, dsp_ce_y} <= {1'b0, 1'b0}; + UOP_OPCODE_PROPAGATE_CARRIES, + UOP_OPCODE_MODULAR_SUBTRACT_X, + UOP_OPCODE_MODULAR_SUBTRACT_Y: {dsp_ce_x, dsp_ce_y} <= {rd_narrow_ena_x_dly2, rd_narrow_ena_y_dly2}; + default: {dsp_ce_x, dsp_ce_y} <= {1'b0, 1'b0}; // endcase + + // + // DSP - OPMODE, ALUMODE, CARRYINSEL + // always @(posedge clk) begin // - dsp_opmode_x <= {DSP48E1_OPMODE_W{1'bX}}; - dsp_opmode_y <= {DSP48E1_OPMODE_W{1'bX}}; + dsp_op_mode_x <= DSP48E1_OPMODE_DNC; + dsp_op_mode_y <= DSP48E1_OPMODE_DNC; + // + dsp_alu_mode_x <= DSP48E1_ALUMODE_DNC; + dsp_alu_mode_y <= DSP48E1_ALUMODE_DNC; // - if (rd_narrow_ena_x_dly2) + dsp_carry_in_sel_x <= DSP48E1_CARRYINSEL_DNC; + dsp_carry_in_sel_y <= DSP48E1_CARRYINSEL_DNC; + // + case (opcode) // - case (opcode) + UOP_OPCODE_PROPAGATE_CARRIES: begin // - UOP_OPCODE_PROPAGATE_CARRIES: if (rd_narrow_addr_x_dly[1] == OP_ADDR_ZERO) dsp_opmode_x <= DSP48E1_OPMODE_Z0_YC_X0; - else dsp_opmode_x <= DSP48E1_OPMODE_ZP17_YC_X0; + if (rd_narrow_ena_x_dly2) begin + if (rd_narrow_addr_x_dly[1] == OP_ADDR_ZERO) dsp_op_mode_x <= DSP48E1_OPMODE_Z0_YC_X0; + else dsp_op_mode_x <= DSP48E1_OPMODE_ZP17_YC_X0; + dsp_alu_mode_x <= DSP48E1_ALUMODE_Z_PLUS_X_AND_Y_AND_CIN; + dsp_carry_in_sel_x <= DSP48E1_CARRYINSEL_CARRYIN; + end // - endcase - // - if (rd_narrow_ena_y_dly2) + if (rd_narrow_ena_y_dly2) begin + if (rd_narrow_addr_y_dly[1] == OP_ADDR_ZERO) dsp_op_mode_y <= DSP48E1_OPMODE_Z0_YC_X0; + else dsp_op_mode_y <= DSP48E1_OPMODE_ZP17_YC_X0; + dsp_alu_mode_y <= DSP48E1_ALUMODE_Z_PLUS_X_AND_Y_AND_CIN; + dsp_carry_in_sel_y <= DSP48E1_CARRYINSEL_CARRYIN; + end + // + end // - case (opcode) + UOP_OPCODE_MODULAR_SUBTRACT_X: begin // - UOP_OPCODE_PROPAGATE_CARRIES: if (rd_narrow_addr_y_dly[1] == OP_ADDR_ZERO) dsp_opmode_y <= DSP48E1_OPMODE_Z0_YC_X0; - else dsp_opmode_y <= DSP48E1_OPMODE_ZP17_YC_X0; + if (rd_narrow_ena_x_dly2) begin + dsp_op_mode_x <= DSP48E1_OPMODE_ZC_Y0_XAB; + dsp_alu_mode_x <= DSP48E1_ALUMODE_Z_MINUS_X_AND_Y_AND_CIN; + if (rd_narrow_addr_x_dly[1] == OP_ADDR_ZERO) dsp_carry_in_sel_x <= DSP48E1_CARRYINSEL_CARRYIN; + else dsp_carry_in_sel_x <= DSP48E1_CARRYINSEL_CARRYCASCOUT; + end // - endcase + if (rd_narrow_ena_y_dly2) begin + dsp_op_mode_y <= DSP48E1_OPMODE_ZC_Y0_XAB; + dsp_alu_mode_y <= DSP48E1_ALUMODE_Z_MINUS_X_AND_Y_AND_CIN; + if (rd_narrow_addr_y_dly[1] == OP_ADDR_ZERO) dsp_carry_in_sel_y <= DSP48E1_CARRYINSEL_CARRYIN; + else dsp_carry_in_sel_y <= DSP48E1_CARRYINSEL_CARRYCASCOUT; + end + // + end + // + UOP_OPCODE_MODULAR_SUBTRACT_Y: begin + // + if (rd_narrow_ena_x_dly2) begin + dsp_op_mode_x <= DSP48E1_OPMODE_ZC_Y0_XAB; + dsp_alu_mode_x <= DSP48E1_ALUMODE_Z_PLUS_X_AND_Y_AND_CIN; + if (rd_narrow_addr_x_dly[1] == OP_ADDR_ZERO) dsp_carry_in_sel_x <= DSP48E1_CARRYINSEL_CARRYIN; + else dsp_carry_in_sel_x <= DSP48E1_CARRYINSEL_CARRYCASCOUT; + end + // + if (rd_narrow_ena_y_dly2) begin + dsp_op_mode_y <= DSP48E1_OPMODE_ZC_Y0_XAB; + dsp_alu_mode_y <= DSP48E1_ALUMODE_Z_PLUS_X_AND_Y_AND_CIN; + if (rd_narrow_addr_y_dly[1] == OP_ADDR_ZERO) dsp_carry_in_sel_y <= DSP48E1_CARRYINSEL_CARRYIN; + else dsp_carry_in_sel_y <= DSP48E1_CARRYINSEL_CARRYCASCOUT; + end + // + end + // + endcase // end - always @(posedge clk) {dsp_ce_x_dly, dsp_ce_y_dly} <= {dsp_ce_x, dsp_ce_y}; + // + // DSP Feed Logic + // + always @(posedge clk) begin + // + dsp_x_x_x <= {DSP48E1_C_W{1'bX}}; + dsp_x_x_y <= {DSP48E1_C_W{1'bX}}; + dsp_y_x_x <= {DSP48E1_C_W{1'bX}}; + dsp_y_x_y <= {DSP48E1_C_W{1'bX}}; + dsp_x_y_x <= {DSP48E1_C_W{1'bX}}; + dsp_x_y_y <= {DSP48E1_C_W{1'bX}}; + dsp_y_y_x <= {DSP48E1_C_W{1'bX}}; + dsp_y_y_y <= {DSP48E1_C_W{1'bX}}; + // + case (opcode) + // + UOP_OPCODE_PROPAGATE_CARRIES: begin + // + if (rd_narrow_ena_x_dly2) begin + dsp_x_x_y <= {{(DSP48E1_C_W-(WORD_EXT_W+1)){1'b0}}, wrk_rd_narrow_x_din_x[WORD_EXT_W-1:WORD_W], 1'b1, wrk_rd_narrow_x_din_x[WORD_W-1:0]}; + dsp_y_x_y <= {{(DSP48E1_C_W-(WORD_EXT_W+1)){1'b0}}, wrk_rd_narrow_y_din_x[WORD_EXT_W-1:WORD_W], 1'b1, wrk_rd_narrow_y_din_x[WORD_W-1:0]}; + end + // + if (rd_narrow_ena_y_dly2) begin + dsp_x_y_y <= {{(DSP48E1_C_W-(WORD_EXT_W+1)){1'b0}}, wrk_rd_narrow_x_din_y[WORD_EXT_W-1:WORD_W], 1'b1, wrk_rd_narrow_x_din_y[WORD_W-1:0]}; + dsp_y_y_y <= {{(DSP48E1_C_W-(WORD_EXT_W+1)){1'b0}}, wrk_rd_narrow_y_din_y[WORD_EXT_W-1:WORD_W], 1'b1, wrk_rd_narrow_y_din_y[WORD_W-1:0]}; + end + // + end + // + UOP_OPCODE_MODULAR_SUBTRACT_X: begin + // + if (rd_narrow_ena_x_dly2) begin + dsp_x_x_y <= {{(DSP48E1_C_W-WORD_W){1'b0}}, wrk_rd_narrow_x_din_x[WORD_W-1:0]}; + dsp_x_x_x <= {{(DSP48E1_C_W-WORD_W){1'b0}}, wrk_rd_narrow_y_din_x[WORD_W-1:0]}; + dsp_y_x_y <= {{(DSP48E1_C_W-WORD_W){1'b0}}, wrk_rd_narrow_x_din_x[WORD_W-1:0]}; + dsp_y_x_x <= {{(DSP48E1_C_W-WORD_W){1'b0}}, wrk_rd_narrow_y_din_x[WORD_W-1:0]}; + end + // + if (rd_narrow_ena_y_dly2) begin + dsp_x_y_y <= {{(DSP48E1_C_W-WORD_W){1'b0}}, wrk_rd_narrow_x_din_y[WORD_W-1:0]}; + dsp_x_y_x <= {{(DSP48E1_C_W-WORD_W){1'b0}}, wrk_rd_narrow_y_din_y[WORD_W-1:0]}; + dsp_y_y_y <= {{(DSP48E1_C_W-WORD_W){1'b0}}, wrk_rd_narrow_x_din_y[WORD_W-1:0]}; + dsp_y_y_x <= {{(DSP48E1_C_W-WORD_W){1'b0}}, wrk_rd_narrow_y_din_y[WORD_W-1:0]}; + end + // + end + // + UOP_OPCODE_MODULAR_SUBTRACT_Y: begin + // + if (rd_narrow_ena_x_dly2) begin + dsp_x_x_y <= {{(DSP48E1_C_W-WORD_W){1'b1}}, wrk_rd_narrow_x_din_x[WORD_W-1:0]}; + dsp_x_x_x <= {{(DSP48E1_C_W-WORD_W){1'b0}}, wrk_rd_wide_x_din_x[WORD_W-1:0]}; + dsp_y_x_y <= {{(DSP48E1_C_W-WORD_W){1'b1}}, wrk_rd_narrow_x_din_x[WORD_W-1:0]}; + dsp_y_x_x <= {{(DSP48E1_C_W-WORD_W){1'b0}}, wrk_rd_wide_y_din_x[WORD_W-1:0]}; + end + // + if (rd_narrow_ena_y_dly2) begin + dsp_x_y_y <= {{(DSP48E1_C_W-WORD_W){1'b1}}, wrk_rd_narrow_x_din_y[WORD_W-1:0]}; + dsp_x_y_x <= {{(DSP48E1_C_W-WORD_W){1'b0}}, wrk_rd_wide_x_din_y[WORD_W-1:0]}; + dsp_y_y_y <= {{(DSP48E1_C_W-WORD_W){1'b1}}, wrk_rd_narrow_x_din_y[WORD_W-1:0]}; + dsp_y_y_x <= {{(DSP48E1_C_W-WORD_W){1'b0}}, wrk_rd_wide_y_din_y[WORD_W-1:0]}; + end + // + end + // + endcase + // + end + + + // + // DSP Slices + // `MODEXPNG_DSP_SLICE_ADDSUB dst_inst_x_x ( - .clk (clk), - .ce_abc (dsp_ce_x), - .ce_p (dsp_ce_x_dly), - .ce_opmode (dsp_ce_x), - .x (dsp_x_x_x), - .y (dsp_x_x_y), - .p (dsp_x_x_p), - .opmode (dsp_opmode_x), - .casc_p_in (), - .casc_p_out () + .clk (clk), + .ce_abc (dsp_ce_x), + .ce_p (dsp_ce_x_dly), + .ce_ctrl (dsp_ce_x), + .x (dsp_x_x_x), + .y (dsp_x_x_y), + .p (dsp_x_x_p), + .op_mode (dsp_op_mode_x), + .alu_mode (dsp_alu_mode_x), + .carry_in_sel (dsp_carry_in_sel_x), + .casc_p_in (), + .casc_p_out (), + .carryout (dsp_carry_out_x) ); `MODEXPNG_DSP_SLICE_ADDSUB dst_inst_y_x ( - .clk (clk), - .ce_abc (dsp_ce_x), - .ce_p (dsp_ce_x_dly), - .ce_opmode (dsp_ce_x), - .x (dsp_y_x_x), - .y (dsp_y_x_y), - .p (dsp_y_x_p), - .opmode (dsp_opmode_x), - .casc_p_in (), - .casc_p_out () + .clk (clk), + .ce_abc (dsp_ce_x), + .ce_p (dsp_ce_x_dly), + .ce_ctrl (dsp_ce_x), + .x (dsp_y_x_x), + .y (dsp_y_x_y), + .p (dsp_y_x_p), + .op_mode (dsp_op_mode_x), + .alu_mode (dsp_alu_mode_x), + .carry_in_sel (dsp_carry_in_sel_x), + .casc_p_in (), + .casc_p_out (), + .carryout () ); `MODEXPNG_DSP_SLICE_ADDSUB dst_inst_x_y ( - .clk (clk), - .ce_abc (dsp_ce_y), - .ce_p (dsp_ce_y_dly), - .ce_opmode (dsp_ce_y), - .x (dsp_x_y_x), - .y (dsp_x_y_y), - .p (dsp_x_y_p), - .opmode (dsp_opmode_y), - .casc_p_in (), - .casc_p_out () + .clk (clk), + .ce_abc (dsp_ce_y), + .ce_p (dsp_ce_y_dly), + .ce_ctrl (dsp_ce_y), + .x (dsp_x_y_x), + .y (dsp_x_y_y), + .p (dsp_x_y_p), + .op_mode (dsp_op_mode_y), + .alu_mode (dsp_alu_mode_y), + .carry_in_sel (dsp_carry_in_sel_y), + .casc_p_in (), + .casc_p_out (), + .carryout (dsp_carry_out_y) ); `MODEXPNG_DSP_SLICE_ADDSUB dst_inst_y_y ( - .clk (clk), - .ce_abc (dsp_ce_y), - .ce_p (dsp_ce_y_dly), - .ce_opmode (dsp_ce_y), - .x (dsp_y_y_x), - .y (dsp_y_y_y), - .p (dsp_y_y_p), - .opmode (dsp_opmode_y), - .casc_p_in (), - .casc_p_out () + .clk (clk), + .ce_abc (dsp_ce_y), + .ce_p (dsp_ce_y_dly), + .ce_ctrl (dsp_ce_y), + .x (dsp_y_y_x), + .y (dsp_y_y_y), + .p (dsp_y_y_p), + .op_mode (dsp_op_mode_y), + .alu_mode (dsp_alu_mode_y), + .carry_in_sel (dsp_carry_in_sel_y), + .casc_p_in (), + .casc_p_out (), + .carryout () ); // - // UOP_OPCODE_PROPAGATE_CARRIES - // - reg [CARRY_W -1:0] propagate_carries_x_x_cry_r; - reg [CARRY_W -1:0] propagate_carries_y_x_cry_r; - reg [CARRY_W -1:0] propagate_carries_x_y_cry_r; - reg [CARRY_W -1:0] propagate_carries_y_y_cry_r; - - wire [WORD_EXT_W -1:0] propagate_carries_x_x_w_cry = rd_narrow_x_din_x_dly1 + {{WORD_W{1'b0}}, propagate_carries_x_x_cry_r}; - wire [WORD_EXT_W -1:0] propagate_carries_y_x_w_cry = rd_narrow_y_din_x_dly1 + {{WORD_W{1'b0}}, propagate_carries_y_x_cry_r}; - wire [WORD_EXT_W -1:0] propagate_carries_x_y_w_cry = rd_narrow_x_din_y_dly1 + {{WORD_W{1'b0}}, propagate_carries_x_y_cry_r}; - wire [WORD_EXT_W -1:0] propagate_carries_y_y_w_cry = rd_narrow_y_din_y_dly1 + {{WORD_W{1'b0}}, propagate_carries_y_y_cry_r}; - - reg [WORD_EXT_W -1:0] propagate_carries_x_x_w_cry_r; - reg [WORD_EXT_W -1:0] propagate_carries_y_x_w_cry_r; - reg [WORD_EXT_W -1:0] propagate_carries_x_y_w_cry_r; - reg [WORD_EXT_W -1:0] propagate_carries_y_y_w_cry_r; - - wire [CARRY_W -1:0] propagate_carries_x_x_w_cry_msb = propagate_carries_x_x_w_cry_r[WORD_EXT_W -1:WORD_W]; - wire [CARRY_W -1:0] propagate_carries_y_x_w_cry_msb = propagate_carries_y_x_w_cry_r[WORD_EXT_W -1:WORD_W]; - wire [CARRY_W -1:0] propagate_carries_x_y_w_cry_msb = propagate_carries_x_y_w_cry_r[WORD_EXT_W -1:WORD_W]; - wire [CARRY_W -1:0] propagate_carries_y_y_w_cry_msb = propagate_carries_y_y_w_cry_r[WORD_EXT_W -1:WORD_W]; - - wire [WORD_W -1:0] propagate_carries_x_x_w_cry_lsb = propagate_carries_x_x_w_cry_r[WORD_W -1:0]; - wire [WORD_W -1:0] propagate_carries_y_x_w_cry_lsb = propagate_carries_y_x_w_cry_r[WORD_W -1:0]; - wire [WORD_W -1:0] propagate_carries_x_y_w_cry_lsb = propagate_carries_x_y_w_cry_r[WORD_W -1:0]; - wire [WORD_W -1:0] propagate_carries_y_y_w_cry_lsb = propagate_carries_y_y_w_cry_r[WORD_W -1:0]; - - wire [WORD_EXT_W -1:0] propagate_carries_x_x_w_cry_reduced = {{CARRY_W{1'b0}}, propagate_carries_x_x_w_cry_lsb}; - wire [WORD_EXT_W -1:0] propagate_carries_y_x_w_cry_reduced = {{CARRY_W{1'b0}}, propagate_carries_y_x_w_cry_lsb}; - wire [WORD_EXT_W -1:0] propagate_carries_x_y_w_cry_reduced = {{CARRY_W{1'b0}}, propagate_carries_x_y_w_cry_lsb}; - wire [WORD_EXT_W -1:0] propagate_carries_y_y_w_cry_reduced = {{CARRY_W{1'b0}}, propagate_carries_y_y_w_cry_lsb}; - - task _propagate_carries_update_cry; - input [CARRY_W-1:0] x_x_cry, y_x_cry, x_y_cry, y_y_cry; - { propagate_carries_x_x_cry_r, propagate_carries_y_x_cry_r, propagate_carries_x_y_cry_r, propagate_carries_y_y_cry_r} <= - { x_x_cry, y_x_cry, x_y_cry, y_y_cry}; - endtask - - task propagate_carries_clear_cry; _propagate_carries_update_cry( CARRY_ZERO, CARRY_ZERO, CARRY_ZERO, CARRY_ZERO); endtask - task propagate_carries_store_cry; _propagate_carries_update_cry(propagate_carries_x_x_w_cry_msb, propagate_carries_y_x_w_cry_msb, propagate_carries_x_y_w_cry_msb, propagate_carries_y_y_w_cry_msb); endtask - - task _propagate_carries_update_sum_w_cry; - input [WORD_EXT_W-1:0] x_x_sum_w_cry, y_x_sum_w_cry, x_y_sum_w_cry, y_y_sum_w_cry; - { propagate_carries_x_x_w_cry_r, propagate_carries_y_x_w_cry_r, propagate_carries_x_y_w_cry_r, propagate_carries_y_y_w_cry_r} <= - { x_x_sum_w_cry, y_x_sum_w_cry, x_y_sum_w_cry, y_y_sum_w_cry}; - endtask - - task propagate_carries_store_sum_w_cry; _propagate_carries_update_sum_w_cry(propagate_carries_x_x_w_cry, propagate_carries_y_x_w_cry, propagate_carries_x_y_w_cry, propagate_carries_y_y_w_cry); endtask + // UOP_OPCODE_MODULAR_SUBTRACT_X + // + reg modular_subtract_x_brw_flag; + reg modular_subtract_y_brw_flag; + // + // IMPORTANT: DSP48E1 turns out to have a very non-obvious feature: when doing _subtraction_, + // the CARRYOUT[3] is _NOT_ equivalent to the borrow flag! See "CARRYOUT/CARRYCASCOUT" + // section of Appendix A on pp. 55-56 of UG479 for more details. + // always @(posedge clk) // - if (opcode == UOP_OPCODE_PROPAGATE_CARRIES) - // - case (wrk_fsm_state) - // - WRK_FSM_STATE_LATENCY_PRE3: propagate_carries_clear_cry; - WRK_FSM_STATE_BUSY1, - WRK_FSM_STATE_LATENCY_POST1: propagate_carries_store_cry; - // - WRK_FSM_STATE_LATENCY_PRE4, - WRK_FSM_STATE_BUSY2, - WRK_FSM_STATE_LATENCY_POST2: propagate_carries_store_sum_w_cry; - // + case (opcode) + UOP_OPCODE_MODULAR_SUBTRACT_X: + case (wrk_fsm_state) + WRK_FSM_STATE_LATENCY_POST4: + //{modular_subtract_x_brw_flag, modular_subtract_y_brw_flag} <= {1'bX, 1'bZ}; + {modular_subtract_x_brw_flag, modular_subtract_y_brw_flag} <= {~dsp_carry_out_x, ~dsp_carry_out_y}; + endcase endcase + + //reg modular_subtract_x_brw_r; + //reg modular_subtract_y_brw_r; - // - // UOP_OPCODE_MODULAR_SUBTRACT_X - // UOP_OPCODE_MODULAR_SUBTRACT_Y - // - reg modular_subtract_x_brw_r; - reg modular_subtract_y_brw_r; - - reg modular_subtract_x_cry_r; - reg modular_subtract_y_cry_r; + //reg modular_subtract_x_cry_r; + //reg modular_subtract_y_cry_r; - wire [WORD_W:0] modular_subtract_x_w_brw = rd_narrow_x_din_x_dly1[WORD_W:0] - rd_narrow_y_din_x_dly1[WORD_W:0] - {{WORD_W{1'b0}}, modular_subtract_x_brw_r}; - wire [WORD_W:0] modular_subtract_y_w_brw = rd_narrow_x_din_y_dly1[WORD_W:0] - rd_narrow_y_din_y_dly1[WORD_W:0] - {{WORD_W{1'b0}}, modular_subtract_y_brw_r}; + //wire [WORD_W:0] modular_subtract_x_w_brw = rd_narrow_x_din_x_dly1[WORD_W:0] - rd_narrow_y_din_x_dly1[WORD_W:0] - {{WORD_W{1'b0}}, modular_subtract_x_brw_r}; + //wire [WORD_W:0] modular_subtract_y_w_brw = rd_narrow_x_din_y_dly1[WORD_W:0] - rd_narrow_y_din_y_dly1[WORD_W:0] - {{WORD_W{1'b0}}, modular_subtract_y_brw_r}; - wire [WORD_W:0] modular_subtract_x_w_cry = rd_narrow_x_din_x_dly1[WORD_W:0] + rd_wide_x_din_x_dly1[WORD_W:0] + {{WORD_W{1'b0}}, modular_subtract_x_cry_r}; - wire [WORD_W:0] modular_subtract_y_w_cry = rd_narrow_x_din_y_dly1[WORD_W:0] + rd_wide_x_din_y_dly1[WORD_W:0] + {{WORD_W{1'b0}}, modular_subtract_y_brw_r}; + //wire [WORD_W:0] modular_subtract_x_w_cry = rd_narrow_x_din_x_dly1[WORD_W:0] + rd_wide_x_din_x_dly1[WORD_W:0] + {{WORD_W{1'b0}}, modular_subtract_x_cry_r}; + //wire [WORD_W:0] modular_subtract_y_w_cry = rd_narrow_x_din_y_dly1[WORD_W:0] + rd_wide_x_din_y_dly1[WORD_W:0] + {{WORD_W{1'b0}}, modular_subtract_y_cry_r}; - reg [WORD_W:0] modular_subtract_x_w_brw_r; - reg [WORD_W:0] modular_subtract_y_w_brw_r; + //reg [WORD_W:0] modular_subtract_x_w_brw_r; + //reg [WORD_W:0] modular_subtract_y_w_brw_r; - reg [WORD_W:0] modular_subtract_x_w_cry_r; - reg [WORD_W:0] modular_subtract_y_w_cry_r; + //reg [WORD_W:0] modular_subtract_x_w_cry_r; + //reg [WORD_W:0] modular_subtract_y_w_cry_r; - wire modular_subtract_x_w_brw_msb = modular_subtract_x_w_brw_r[WORD_W]; - wire modular_subtract_y_w_brw_msb = modular_subtract_y_w_brw_r[WORD_W]; + //wire modular_subtract_x_w_brw_msb = modular_subtract_x_w_brw_r[WORD_W]; + //wire modular_subtract_y_w_brw_msb = modular_subtract_y_w_brw_r[WORD_W]; - wire modular_subtract_x_w_cry_msb = modular_subtract_x_w_cry_r[WORD_W]; - wire modular_subtract_y_w_cry_msb = modular_subtract_y_w_cry_r[WORD_W]; + //wire modular_subtract_x_w_cry_msb = modular_subtract_x_w_cry_r[WORD_W]; + //wire modular_subtract_y_w_cry_msb = modular_subtract_y_w_cry_r[WORD_W]; - wire [WORD_W -1:0] modular_subtract_x_w_brw_lsb = modular_subtract_x_w_brw_r[WORD_W -1:0]; - wire [WORD_W -1:0] modular_subtract_y_w_brw_lsb = modular_subtract_y_w_brw_r[WORD_W -1:0]; + //wire [WORD_W -1:0] modular_subtract_x_w_brw_lsb = modular_subtract_x_w_brw_r[WORD_W -1:0]; + //wire [WORD_W -1:0] modular_subtract_y_w_brw_lsb = modular_subtract_y_w_brw_r[WORD_W -1:0]; - wire [WORD_W -1:0] modular_subtract_x_w_cry_lsb = modular_subtract_x_w_cry_r[WORD_W -1:0]; - wire [WORD_W -1:0] modular_subtract_y_w_cry_lsb = modular_subtract_y_w_cry_r[WORD_W -1:0]; + //wire [WORD_W -1:0] modular_subtract_x_w_cry_lsb = modular_subtract_x_w_cry_r[WORD_W -1:0]; + //wire [WORD_W -1:0] modular_subtract_y_w_cry_lsb = modular_subtract_y_w_cry_r[WORD_W -1:0]; - wire [WORD_EXT_W -1:0] modular_subtract_x_w_brw_reduced = {{CARRY_W{1'b0}}, modular_subtract_x_w_brw_lsb}; - wire [WORD_EXT_W -1:0] modular_subtract_y_w_brw_reduced = {{CARRY_W{1'b0}}, modular_subtract_y_w_brw_lsb}; + //wire [WORD_EXT_W -1:0] modular_subtract_x_w_brw_reduced = {{CARRY_W{1'b0}}, modular_subtract_x_w_brw_lsb}; + //wire [WORD_EXT_W -1:0] modular_subtract_y_w_brw_reduced = {{CARRY_W{1'b0}}, modular_subtract_y_w_brw_lsb}; - wire [WORD_EXT_W -1:0] modular_subtract_x_w_cry_reduced = {{CARRY_W{1'b0}}, modular_subtract_x_w_cry_lsb}; - wire [WORD_EXT_W -1:0] modular_subtract_y_w_cry_reduced = {{CARRY_W{1'b0}}, modular_subtract_y_w_cry_lsb}; + //wire [WORD_EXT_W -1:0] modular_subtract_x_w_cry_reduced = {{CARRY_W{1'b0}}, modular_subtract_x_w_cry_lsb}; + //wire [WORD_EXT_W -1:0] modular_subtract_y_w_cry_reduced = {{CARRY_W{1'b0}}, modular_subtract_y_w_cry_lsb}; reg [WORD_EXT_W -1:0] modular_subtract_x_mux; reg [WORD_EXT_W -1:0] modular_subtract_y_mux; @@ -1035,68 +1142,68 @@ module modexpng_general_worker wire [WORD_EXT_W -1:0] modular_subtract_x_mux_reduced = {{CARRY_W{1'b0}}, modular_subtract_x_mux[WORD_W-1:0]}; wire [WORD_EXT_W -1:0] modular_subtract_y_mux_reduced = {{CARRY_W{1'b0}}, modular_subtract_y_mux[WORD_W-1:0]}; - task _modular_subtract_update_brw; - input x_brw, y_brw; - {modular_subtract_x_brw_r, modular_subtract_y_brw_r} <= {x_brw, y_brw}; - endtask + //task _modular_subtract_update_brw; + //input x_brw, y_brw; + //{modular_subtract_x_brw_r, modular_subtract_y_brw_r} <= {x_brw, y_brw}; + //endtask - task _modular_subtract_update_cry; - input x_cry, y_cry; - {modular_subtract_x_cry_r, modular_subtract_y_cry_r} <= {x_cry, y_cry}; - endtask + //task _modular_subtract_update_cry; + //input x_cry, y_cry; + //{modular_subtract_x_cry_r, modular_subtract_y_cry_r} <= {x_cry, y_cry}; + //endtask - task modular_subtract_clear_brw; _modular_subtract_update_brw( 1'b0, 1'b0); endtask - task modular_subtract_store_brw; _modular_subtract_update_brw(modular_subtract_x_w_brw_msb, modular_subtract_y_w_brw_msb); endtask + //task modular_subtract_clear_brw; _modular_subtract_update_brw( 1'b0, 1'b0); endtask + //task modular_subtract_store_brw; _modular_subtract_update_brw(modular_subtract_x_w_brw_msb, modular_subtract_y_w_brw_msb); endtask - task modular_subtract_clear_cry; _modular_subtract_update_cry( 1'b0, 1'b0); endtask - task modular_subtract_store_cry; _modular_subtract_update_cry(modular_subtract_x_w_cry_msb, modular_subtract_y_w_cry_msb); endtask + //task modular_subtract_clear_cry; _modular_subtract_update_cry( 1'b0, 1'b0); endtask + //task modular_subtract_store_cry; _modular_subtract_update_cry(modular_subtract_x_w_cry_msb, modular_subtract_y_w_cry_msb); endtask - task _modular_subtract_update_diff_w_brw; - input [WORD_W:0] x_diff_w_brw, y_diff_w_brw; - {modular_subtract_x_w_brw_r, modular_subtract_y_w_brw_r} <= {x_diff_w_brw, y_diff_w_brw}; - endtask + //task _modular_subtract_update_diff_w_brw; + //input [WORD_W:0] x_diff_w_brw, y_diff_w_brw; + //{modular_subtract_x_w_brw_r, modular_subtract_y_w_brw_r} <= {x_diff_w_brw, y_diff_w_brw}; + //endtask - task _modular_subtract_update_sum_w_cry; - input [WORD_W:0] x_sum_w_cry, y_sum_w_cry; - {modular_subtract_x_w_cry_r, modular_subtract_y_w_cry_r} <= {x_sum_w_cry, y_sum_w_cry}; - endtask + //task _modular_subtract_update_sum_w_cry; + //input [WORD_W:0] x_sum_w_cry, y_sum_w_cry; + //{modular_subtract_x_w_cry_r, modular_subtract_y_w_cry_r} <= {x_sum_w_cry, y_sum_w_cry}; + //endtask - task modular_subtract_store_diff_w_brw; _modular_subtract_update_diff_w_brw(modular_subtract_x_w_brw, modular_subtract_y_w_brw); endtask + //task modular_subtract_store_diff_w_brw; _modular_subtract_update_diff_w_brw(modular_subtract_x_w_brw, modular_subtract_y_w_brw); endtask - task modular_subtract_store_sum_w_cry; _modular_subtract_update_sum_w_cry(modular_subtract_x_w_cry, modular_subtract_y_w_cry); endtask + //task modular_subtract_store_sum_w_cry; _modular_subtract_update_sum_w_cry(modular_subtract_x_w_cry, modular_subtract_y_w_cry); endtask always @(posedge clk) // case (opcode) // - UOP_OPCODE_MODULAR_SUBTRACT_X: + //UOP_OPCODE_MODULAR_SUBTRACT_X: // - case (wrk_fsm_state) + //case (wrk_fsm_state) // - WRK_FSM_STATE_LATENCY_PRE3: modular_subtract_clear_brw; - WRK_FSM_STATE_BUSY1, - WRK_FSM_STATE_LATENCY_POST1, - WRK_FSM_STATE_LATENCY_POST3: modular_subtract_store_brw; // we need the very last borrow here too! + //WRK_FSM_STATE_LATENCY_PRE3: modular_subtract_clear_brw; + //WRK_FSM_STATE_BUSY1, + //WRK_FSM_STATE_LATENCY_POST1, + //WRK_FSM_STATE_LATENCY_POST3: modular_subtract_store_brw; // we need the very last borrow here too! // - WRK_FSM_STATE_LATENCY_PRE4, - WRK_FSM_STATE_BUSY2, - WRK_FSM_STATE_LATENCY_POST2: modular_subtract_store_diff_w_brw; + //WRK_FSM_STATE_LATENCY_PRE4, + //WRK_FSM_STATE_BUSY2, + //WRK_FSM_STATE_LATENCY_POST2: modular_subtract_store_diff_w_brw; // - endcase + //endcase // - UOP_OPCODE_MODULAR_SUBTRACT_Y: + //UOP_OPCODE_MODULAR_SUBTRACT_Y: // - case (wrk_fsm_state) + //case (wrk_fsm_state) // - WRK_FSM_STATE_LATENCY_PRE3: modular_subtract_clear_cry; - WRK_FSM_STATE_BUSY1, - WRK_FSM_STATE_LATENCY_POST1: modular_subtract_store_cry; + //WRK_FSM_STATE_LATENCY_PRE3: modular_subtract_clear_cry; + //WRK_FSM_STATE_BUSY1, + //WRK_FSM_STATE_LATENCY_POST1: modular_subtract_store_cry; // - WRK_FSM_STATE_LATENCY_PRE4, - WRK_FSM_STATE_BUSY2, - WRK_FSM_STATE_LATENCY_POST2: modular_subtract_store_sum_w_cry; + //WRK_FSM_STATE_LATENCY_PRE4, + //WRK_FSM_STATE_BUSY2, + //WRK_FSM_STATE_LATENCY_POST2: modular_subtract_store_sum_w_cry; // - endcase + //endcase // UOP_OPCODE_MODULAR_SUBTRACT_Z: // @@ -1106,8 +1213,8 @@ module modexpng_general_worker WRK_FSM_STATE_BUSY2, WRK_FSM_STATE_LATENCY_POST2: // - begin modular_subtract_x_mux <= !modular_subtract_x_brw_r ? rd_narrow_x_din_x_dly1 : rd_wide_x_din_x_dly1; - modular_subtract_y_mux <= !modular_subtract_y_brw_r ? rd_narrow_x_din_y_dly1 : rd_wide_x_din_y_dly1; end + begin modular_subtract_x_mux <= !modular_subtract_x_brw_flag ? rd_narrow_x_din_x_dly1 : rd_wide_x_din_x_dly1; + modular_subtract_y_mux <= !modular_subtract_y_brw_flag ? rd_narrow_x_din_y_dly1 : rd_wide_x_din_y_dly1; end // endcase // @@ -1316,7 +1423,8 @@ module modexpng_general_worker // case (opcode) // - UOP_OPCODE_PROPAGATE_CARRIES: + UOP_OPCODE_PROPAGATE_CARRIES, + UOP_OPCODE_MODULAR_SUBTRACT_X: // case (wrk_fsm_state) // @@ -1324,7 +1432,6 @@ module modexpng_general_worker WRK_FSM_STATE_LATENCY_POST2, WRK_FSM_STATE_LATENCY_POST4: // - //update_narrow_dout(propagate_carries_x_x_w_cry_reduced, propagate_carries_y_x_w_cry_reduced, propagate_carries_x_y_w_cry_reduced, propagate_carries_y_y_w_cry_reduced); update_narrow_dout(dsp_x_x_p_reduced, dsp_y_x_p_reduced, dsp_x_y_p_reduced, dsp_y_y_p_reduced); // endcase @@ -1380,27 +1487,15 @@ module modexpng_general_worker // endcase // - UOP_OPCODE_MODULAR_SUBTRACT_X: - // - case (wrk_fsm_state) - // - WRK_FSM_STATE_BUSY1, - WRK_FSM_STATE_LATENCY_POST1, - WRK_FSM_STATE_LATENCY_POST3: - // - update_narrow_dout(modular_subtract_x_w_brw_reduced, modular_subtract_x_w_brw_reduced, modular_subtract_y_w_brw_reduced, modular_subtract_y_w_brw_reduced); - // - endcase - // UOP_OPCODE_MODULAR_SUBTRACT_Y: // case (wrk_fsm_state) // - WRK_FSM_STATE_BUSY1, - WRK_FSM_STATE_LATENCY_POST1, - WRK_FSM_STATE_LATENCY_POST3: - // - update_wide_dout(modular_subtract_x_w_cry_reduced, modular_subtract_x_w_cry_reduced, modular_subtract_y_w_cry_reduced, modular_subtract_y_w_cry_reduced); + WRK_FSM_STATE_BUSY2, + WRK_FSM_STATE_LATENCY_POST2, + WRK_FSM_STATE_LATENCY_POST4: + // + update_wide_dout(dsp_x_x_p_reduced, dsp_y_x_p_reduced, dsp_x_y_p_reduced, dsp_y_y_p_reduced); // endcase // From git at cryptech.is Mon Jan 20 21:18:05 2020 From: git at cryptech.is (git at cryptech.is) Date: Mon, 20 Jan 2020 21:18:05 +0000 Subject: [Cryptech-Commits] [user/shatov/modexpng] 04/21: This commit modifies the REGULAR_ADD_UNEVEN micro-operation to use DSP slices for addition instead of fabric logic. This opcode is only necessary when in CRT mode and is executed once per entire exponentiation to recombine the two "easier" exponentiations. This was the final change necessary to get rid of using fabric math in the general worker module. In-Reply-To: <157955508198.72689.15683234965619703441@bikeshed.cryptech.is> References: <157955508198.72689.15683234965619703441@bikeshed.cryptech.is> Message-ID: <20200120211806.0B8E08BAF18@bikeshed.cryptech.is> This is an automated email from the git hooks/post-receive script. meisterpaul1 at yandex.ru pushed a commit to branch master in repository user/shatov/modexpng. commit ab061afd20523bdb0342613f4eb343daee6571c6 Author: Pavel V. Shatov (Meister) AuthorDate: Thu Jan 16 21:47:01 2020 +0300 This commit modifies the REGULAR_ADD_UNEVEN micro-operation to use DSP slices for addition instead of fabric logic. This opcode is only necessary when in CRT mode and is executed once per entire exponentiation to recombine the two "easier" exponentiations. This was the final change necessary to get rid of using fabric math in the general worker module. --- rtl/modexpng_general_worker.v | 310 ++++++++++++------------------------------ 1 file changed, 88 insertions(+), 222 deletions(-) diff --git a/rtl/modexpng_general_worker.v b/rtl/modexpng_general_worker.v index 6652f14..6618b5f 100644 --- a/rtl/modexpng_general_worker.v +++ b/rtl/modexpng_general_worker.v @@ -30,7 +30,7 @@ // //====================================================================== -module modexpng_general_worker +module modexpng_general_worker_new ( clk, rst_n, ena, rdy, @@ -387,7 +387,8 @@ module modexpng_general_worker case (opcode) // UOP_OPCODE_PROPAGATE_CARRIES, - UOP_OPCODE_MODULAR_SUBTRACT_X: + UOP_OPCODE_MODULAR_SUBTRACT_X, + UOP_OPCODE_REGULAR_ADD_UNEVEN: // case (wrk_fsm_state) WRK_FSM_STATE_BUSY2, @@ -395,8 +396,7 @@ module modexpng_general_worker WRK_FSM_STATE_LATENCY_POST4: enable_narrow_wr_en; endcase // - UOP_OPCODE_MERGE_LH, - UOP_OPCODE_REGULAR_ADD_UNEVEN: + UOP_OPCODE_MERGE_LH: // case (wrk_fsm_state) WRK_FSM_STATE_BUSY1, @@ -754,7 +754,8 @@ module modexpng_general_worker case (opcode) // UOP_OPCODE_PROPAGATE_CARRIES, - UOP_OPCODE_MODULAR_SUBTRACT_X: + UOP_OPCODE_MODULAR_SUBTRACT_X, + UOP_OPCODE_REGULAR_ADD_UNEVEN: // case (wrk_fsm_state) WRK_FSM_STATE_BUSY2, @@ -762,8 +763,7 @@ module modexpng_general_worker WRK_FSM_STATE_LATENCY_POST4: update_wr_narrow_bank_addr(sel_narrow_out, sel_narrow_out, rd_narrow_addr_x_dly[4], rd_narrow_addr_y_dly[4]); endcase // - UOP_OPCODE_MERGE_LH, - UOP_OPCODE_REGULAR_ADD_UNEVEN: + UOP_OPCODE_MERGE_LH: // case (wrk_fsm_state) WRK_FSM_STATE_BUSY1, @@ -802,6 +802,22 @@ module modexpng_general_worker endcase // end + + + // + // UOP_OPCODE_REGULAR_ADD_UNEVEN + // + reg regular_add_uneven_flag; + + always @(posedge clk) + // + case (opcode) + UOP_OPCODE_REGULAR_ADD_UNEVEN: + case (wrk_fsm_state) + WRK_FSM_STATE_LATENCY_PRE4: regular_add_uneven_flag <= 1'b0; + WRK_FSM_STATE_BUSY2: if (rd_wide_addr_is_last_half_dly[2]) regular_add_uneven_flag <= 1'b1; + endcase + endcase // @@ -853,7 +869,8 @@ module modexpng_general_worker // UOP_OPCODE_PROPAGATE_CARRIES, UOP_OPCODE_MODULAR_SUBTRACT_X, - UOP_OPCODE_MODULAR_SUBTRACT_Y: {dsp_ce_x, dsp_ce_y} <= {rd_narrow_ena_x_dly2, rd_narrow_ena_y_dly2}; + UOP_OPCODE_MODULAR_SUBTRACT_Y, + UOP_OPCODE_REGULAR_ADD_UNEVEN: {dsp_ce_x, dsp_ce_y} <= {rd_narrow_ena_x_dly2, rd_narrow_ena_y_dly2}; default: {dsp_ce_x, dsp_ce_y} <= {1'b0, 1'b0}; // endcase @@ -929,6 +946,30 @@ module modexpng_general_worker // end // + UOP_OPCODE_REGULAR_ADD_UNEVEN: begin + // + if (rd_narrow_ena_x_dly2) begin + if (rd_narrow_addr_x_dly[1] == OP_ADDR_ZERO) dsp_op_mode_x <= DSP48E1_OPMODE_Z0_YC_XAB; + else begin + if (!regular_add_uneven_flag) dsp_op_mode_x <= DSP48E1_OPMODE_ZP17_YC_XAB; + else dsp_op_mode_x <= DSP48E1_OPMODE_ZP17_YC_X0; + end + dsp_alu_mode_x <= DSP48E1_ALUMODE_Z_PLUS_X_AND_Y_AND_CIN; + dsp_carry_in_sel_x <= DSP48E1_CARRYINSEL_CARRYIN; + end + // + if (rd_narrow_ena_y_dly2) begin + if (rd_narrow_addr_y_dly[1] == OP_ADDR_ZERO) dsp_op_mode_y <= DSP48E1_OPMODE_Z0_YC_XAB; + else begin + if (!regular_add_uneven_flag) dsp_op_mode_y <= DSP48E1_OPMODE_ZP17_YC_XAB; + else dsp_op_mode_y <= DSP48E1_OPMODE_ZP17_YC_X0; + end + dsp_alu_mode_y <= DSP48E1_ALUMODE_Z_PLUS_X_AND_Y_AND_CIN; + dsp_carry_in_sel_y <= DSP48E1_CARRYINSEL_CARRYIN; + end + // + end + // endcase // end @@ -988,14 +1029,32 @@ module modexpng_general_worker dsp_x_x_y <= {{(DSP48E1_C_W-WORD_W){1'b1}}, wrk_rd_narrow_x_din_x[WORD_W-1:0]}; dsp_x_x_x <= {{(DSP48E1_C_W-WORD_W){1'b0}}, wrk_rd_wide_x_din_x[WORD_W-1:0]}; dsp_y_x_y <= {{(DSP48E1_C_W-WORD_W){1'b1}}, wrk_rd_narrow_x_din_x[WORD_W-1:0]}; - dsp_y_x_x <= {{(DSP48E1_C_W-WORD_W){1'b0}}, wrk_rd_wide_y_din_x[WORD_W-1:0]}; + dsp_y_x_x <= {{(DSP48E1_C_W-WORD_W){1'b0}}, wrk_rd_wide_x_din_x[WORD_W-1:0]}; end // if (rd_narrow_ena_y_dly2) begin dsp_x_y_y <= {{(DSP48E1_C_W-WORD_W){1'b1}}, wrk_rd_narrow_x_din_y[WORD_W-1:0]}; dsp_x_y_x <= {{(DSP48E1_C_W-WORD_W){1'b0}}, wrk_rd_wide_x_din_y[WORD_W-1:0]}; dsp_y_y_y <= {{(DSP48E1_C_W-WORD_W){1'b1}}, wrk_rd_narrow_x_din_y[WORD_W-1:0]}; - dsp_y_y_x <= {{(DSP48E1_C_W-WORD_W){1'b0}}, wrk_rd_wide_y_din_y[WORD_W-1:0]}; + dsp_y_y_x <= {{(DSP48E1_C_W-WORD_W){1'b0}}, wrk_rd_wide_x_din_y[WORD_W-1:0]}; + end + // + end + // + UOP_OPCODE_REGULAR_ADD_UNEVEN: begin + // + if (rd_narrow_ena_x_dly2) begin + dsp_x_x_y <= {{(DSP48E1_C_W-(WORD_EXT_W+1)){1'b0}}, wrk_rd_narrow_x_din_x[WORD_EXT_W-1:WORD_W], 1'b1, wrk_rd_narrow_x_din_x[WORD_W-1:0]}; + dsp_x_x_x <= {{(DSP48E1_C_W-(WORD_EXT_W+1)){1'b0}}, wrk_rd_wide_x_din_x [WORD_EXT_W-1:WORD_W], 1'b0, wrk_rd_wide_x_din_x [WORD_W-1:0]}; + dsp_y_x_y <= {{(DSP48E1_C_W-(WORD_EXT_W+1)){1'b0}}, wrk_rd_narrow_y_din_x[WORD_EXT_W-1:WORD_W], 1'b1, wrk_rd_narrow_y_din_x[WORD_W-1:0]}; + dsp_y_x_x <= {{(DSP48E1_C_W-(WORD_EXT_W+1)){1'b0}}, wrk_rd_wide_y_din_x [WORD_EXT_W-1:WORD_W], 1'b0, wrk_rd_wide_y_din_x [WORD_W-1:0]}; + end + // + if (rd_narrow_ena_y_dly2) begin + dsp_x_y_y <= {{(DSP48E1_C_W-(WORD_EXT_W+1)){1'b0}}, wrk_rd_narrow_x_din_y[WORD_EXT_W-1:WORD_W], 1'b1, wrk_rd_narrow_x_din_y[WORD_W-1:0]}; + dsp_x_y_x <= {{(DSP48E1_C_W-(WORD_EXT_W+1)){1'b0}}, wrk_rd_wide_x_din_y [WORD_EXT_W-1:WORD_W], 1'b0, wrk_rd_wide_x_din_y [WORD_W-1:0]}; + dsp_y_y_y <= {{(DSP48E1_C_W-(WORD_EXT_W+1)){1'b0}}, wrk_rd_narrow_y_din_y[WORD_EXT_W-1:WORD_W], 1'b1, wrk_rd_narrow_y_din_y[WORD_W-1:0]}; + dsp_y_y_x <= {{(DSP48E1_C_W-(WORD_EXT_W+1)){1'b0}}, wrk_rd_wide_y_din_y [WORD_EXT_W-1:WORD_W], 1'b0, wrk_rd_wide_y_din_y [WORD_W-1:0]}; end // end @@ -1022,7 +1081,7 @@ module modexpng_general_worker .carry_in_sel (dsp_carry_in_sel_x), .casc_p_in (), .casc_p_out (), - .carryout (dsp_carry_out_x) + .carry_out (dsp_carry_out_x) ); `MODEXPNG_DSP_SLICE_ADDSUB dst_inst_y_x @@ -1039,7 +1098,7 @@ module modexpng_general_worker .carry_in_sel (dsp_carry_in_sel_x), .casc_p_in (), .casc_p_out (), - .carryout () + .carry_out () ); `MODEXPNG_DSP_SLICE_ADDSUB dst_inst_x_y @@ -1056,7 +1115,7 @@ module modexpng_general_worker .carry_in_sel (dsp_carry_in_sel_y), .casc_p_in (), .casc_p_out (), - .carryout (dsp_carry_out_y) + .carry_out (dsp_carry_out_y) ); `MODEXPNG_DSP_SLICE_ADDSUB dst_inst_y_y @@ -1073,140 +1132,43 @@ module modexpng_general_worker .carry_in_sel (dsp_carry_in_sel_y), .casc_p_in (), .casc_p_out (), - .carryout () + .carry_out () ); // // UOP_OPCODE_MODULAR_SUBTRACT_X // - reg modular_subtract_x_brw_flag; - reg modular_subtract_y_brw_flag; // // IMPORTANT: DSP48E1 turns out to have a very non-obvious feature: when doing _subtraction_, // the CARRYOUT[3] is _NOT_ equivalent to the borrow flag! See "CARRYOUT/CARRYCASCOUT" // section of Appendix A on pp. 55-56 of UG479 for more details. // - always @(posedge clk) - // - case (opcode) - UOP_OPCODE_MODULAR_SUBTRACT_X: - case (wrk_fsm_state) - WRK_FSM_STATE_LATENCY_POST4: - //{modular_subtract_x_brw_flag, modular_subtract_y_brw_flag} <= {1'bX, 1'bZ}; - {modular_subtract_x_brw_flag, modular_subtract_y_brw_flag} <= {~dsp_carry_out_x, ~dsp_carry_out_y}; - endcase - endcase - - - //reg modular_subtract_x_brw_r; - //reg modular_subtract_y_brw_r; - - //reg modular_subtract_x_cry_r; - //reg modular_subtract_y_cry_r; - - //wire [WORD_W:0] modular_subtract_x_w_brw = rd_narrow_x_din_x_dly1[WORD_W:0] - rd_narrow_y_din_x_dly1[WORD_W:0] - {{WORD_W{1'b0}}, modular_subtract_x_brw_r}; - //wire [WORD_W:0] modular_subtract_y_w_brw = rd_narrow_x_din_y_dly1[WORD_W:0] - rd_narrow_y_din_y_dly1[WORD_W:0] - {{WORD_W{1'b0}}, modular_subtract_y_brw_r}; - - //wire [WORD_W:0] modular_subtract_x_w_cry = rd_narrow_x_din_x_dly1[WORD_W:0] + rd_wide_x_din_x_dly1[WORD_W:0] + {{WORD_W{1'b0}}, modular_subtract_x_cry_r}; - //wire [WORD_W:0] modular_subtract_y_w_cry = rd_narrow_x_din_y_dly1[WORD_W:0] + rd_wide_x_din_y_dly1[WORD_W:0] + {{WORD_W{1'b0}}, modular_subtract_y_cry_r}; - - //reg [WORD_W:0] modular_subtract_x_w_brw_r; - //reg [WORD_W:0] modular_subtract_y_w_brw_r; - - //reg [WORD_W:0] modular_subtract_x_w_cry_r; - //reg [WORD_W:0] modular_subtract_y_w_cry_r; - - //wire modular_subtract_x_w_brw_msb = modular_subtract_x_w_brw_r[WORD_W]; - //wire modular_subtract_y_w_brw_msb = modular_subtract_y_w_brw_r[WORD_W]; - - //wire modular_subtract_x_w_cry_msb = modular_subtract_x_w_cry_r[WORD_W]; - //wire modular_subtract_y_w_cry_msb = modular_subtract_y_w_cry_r[WORD_W]; - - //wire [WORD_W -1:0] modular_subtract_x_w_brw_lsb = modular_subtract_x_w_brw_r[WORD_W -1:0]; - //wire [WORD_W -1:0] modular_subtract_y_w_brw_lsb = modular_subtract_y_w_brw_r[WORD_W -1:0]; - - //wire [WORD_W -1:0] modular_subtract_x_w_cry_lsb = modular_subtract_x_w_cry_r[WORD_W -1:0]; - //wire [WORD_W -1:0] modular_subtract_y_w_cry_lsb = modular_subtract_y_w_cry_r[WORD_W -1:0]; - //wire [WORD_EXT_W -1:0] modular_subtract_x_w_brw_reduced = {{CARRY_W{1'b0}}, modular_subtract_x_w_brw_lsb}; - //wire [WORD_EXT_W -1:0] modular_subtract_y_w_brw_reduced = {{CARRY_W{1'b0}}, modular_subtract_y_w_brw_lsb}; + reg modular_subtract_x_brw_flag; + reg modular_subtract_y_brw_flag; - //wire [WORD_EXT_W -1:0] modular_subtract_x_w_cry_reduced = {{CARRY_W{1'b0}}, modular_subtract_x_w_cry_lsb}; - //wire [WORD_EXT_W -1:0] modular_subtract_y_w_cry_reduced = {{CARRY_W{1'b0}}, modular_subtract_y_w_cry_lsb}; - reg [WORD_EXT_W -1:0] modular_subtract_x_mux; reg [WORD_EXT_W -1:0] modular_subtract_y_mux; wire [WORD_EXT_W -1:0] modular_subtract_x_mux_reduced = {{CARRY_W{1'b0}}, modular_subtract_x_mux[WORD_W-1:0]}; wire [WORD_EXT_W -1:0] modular_subtract_y_mux_reduced = {{CARRY_W{1'b0}}, modular_subtract_y_mux[WORD_W-1:0]}; - - //task _modular_subtract_update_brw; - //input x_brw, y_brw; - //{modular_subtract_x_brw_r, modular_subtract_y_brw_r} <= {x_brw, y_brw}; - //endtask - - //task _modular_subtract_update_cry; - //input x_cry, y_cry; - //{modular_subtract_x_cry_r, modular_subtract_y_cry_r} <= {x_cry, y_cry}; - //endtask - - //task modular_subtract_clear_brw; _modular_subtract_update_brw( 1'b0, 1'b0); endtask - //task modular_subtract_store_brw; _modular_subtract_update_brw(modular_subtract_x_w_brw_msb, modular_subtract_y_w_brw_msb); endtask - - //task modular_subtract_clear_cry; _modular_subtract_update_cry( 1'b0, 1'b0); endtask - //task modular_subtract_store_cry; _modular_subtract_update_cry(modular_subtract_x_w_cry_msb, modular_subtract_y_w_cry_msb); endtask - - //task _modular_subtract_update_diff_w_brw; - //input [WORD_W:0] x_diff_w_brw, y_diff_w_brw; - //{modular_subtract_x_w_brw_r, modular_subtract_y_w_brw_r} <= {x_diff_w_brw, y_diff_w_brw}; - //endtask - - //task _modular_subtract_update_sum_w_cry; - //input [WORD_W:0] x_sum_w_cry, y_sum_w_cry; - //{modular_subtract_x_w_cry_r, modular_subtract_y_w_cry_r} <= {x_sum_w_cry, y_sum_w_cry}; - //endtask - - //task modular_subtract_store_diff_w_brw; _modular_subtract_update_diff_w_brw(modular_subtract_x_w_brw, modular_subtract_y_w_brw); endtask - //task modular_subtract_store_sum_w_cry; _modular_subtract_update_sum_w_cry(modular_subtract_x_w_cry, modular_subtract_y_w_cry); endtask - always @(posedge clk) // case (opcode) - // - //UOP_OPCODE_MODULAR_SUBTRACT_X: - // - //case (wrk_fsm_state) - // - //WRK_FSM_STATE_LATENCY_PRE3: modular_subtract_clear_brw; - //WRK_FSM_STATE_BUSY1, - //WRK_FSM_STATE_LATENCY_POST1, - //WRK_FSM_STATE_LATENCY_POST3: modular_subtract_store_brw; // we need the very last borrow here too! - // - //WRK_FSM_STATE_LATENCY_PRE4, - //WRK_FSM_STATE_BUSY2, - //WRK_FSM_STATE_LATENCY_POST2: modular_subtract_store_diff_w_brw; - // - //endcase - // - //UOP_OPCODE_MODULAR_SUBTRACT_Y: - // - //case (wrk_fsm_state) - // - //WRK_FSM_STATE_LATENCY_PRE3: modular_subtract_clear_cry; - //WRK_FSM_STATE_BUSY1, - //WRK_FSM_STATE_LATENCY_POST1: modular_subtract_store_cry; - // - //WRK_FSM_STATE_LATENCY_PRE4, - //WRK_FSM_STATE_BUSY2, - //WRK_FSM_STATE_LATENCY_POST2: modular_subtract_store_sum_w_cry; - // - //endcase - // + UOP_OPCODE_MODULAR_SUBTRACT_X: + case (wrk_fsm_state) + WRK_FSM_STATE_LATENCY_POST4: + {modular_subtract_x_brw_flag, modular_subtract_y_brw_flag} <= {~dsp_carry_out_x, ~dsp_carry_out_y}; + endcase + endcase + + always @(posedge clk) + // + case (opcode) UOP_OPCODE_MODULAR_SUBTRACT_Z: - // case (wrk_fsm_state) // WRK_FSM_STATE_LATENCY_PRE4, @@ -1215,96 +1177,10 @@ module modexpng_general_worker // begin modular_subtract_x_mux <= !modular_subtract_x_brw_flag ? rd_narrow_x_din_x_dly1 : rd_wide_x_din_x_dly1; modular_subtract_y_mux <= !modular_subtract_y_brw_flag ? rd_narrow_x_din_y_dly1 : rd_wide_x_din_y_dly1; end - // - endcase - // + endcase endcase - // - // UOP_OPCODE_REGULAR_ADD_UNEVEN - // - reg [CARRY_W -1:0] regular_add_uneven_x_x_cry_r; - reg [CARRY_W -1:0] regular_add_uneven_y_x_cry_r; - reg [CARRY_W -1:0] regular_add_uneven_x_y_cry_r; - reg [CARRY_W -1:0] regular_add_uneven_y_y_cry_r; - - wire [WORD_EXT_W -1:0] regular_add_uneven_x_x_msb_w_cry = rd_narrow_x_din_x_dly1 + {{WORD_W{1'b0}}, regular_add_uneven_x_x_cry_r}; - wire [WORD_EXT_W -1:0] regular_add_uneven_y_x_msb_w_cry = rd_narrow_y_din_x_dly1 + {{WORD_W{1'b0}}, regular_add_uneven_y_x_cry_r}; - wire [WORD_EXT_W -1:0] regular_add_uneven_x_y_msb_w_cry = rd_narrow_x_din_y_dly1 + {{WORD_W{1'b0}}, regular_add_uneven_x_y_cry_r}; - wire [WORD_EXT_W -1:0] regular_add_uneven_y_y_msb_w_cry = rd_narrow_y_din_y_dly1 + {{WORD_W{1'b0}}, regular_add_uneven_y_y_cry_r}; - - wire [WORD_EXT_W -1:0] regular_add_uneven_x_x_lsb_w_cry = regular_add_uneven_x_x_msb_w_cry + rd_wide_x_din_x_dly1; - wire [WORD_EXT_W -1:0] regular_add_uneven_y_x_lsb_w_cry = regular_add_uneven_y_x_msb_w_cry + rd_wide_y_din_x_dly1; - wire [WORD_EXT_W -1:0] regular_add_uneven_x_y_lsb_w_cry = regular_add_uneven_x_y_msb_w_cry + rd_wide_x_din_y_dly1; - wire [WORD_EXT_W -1:0] regular_add_uneven_y_y_lsb_w_cry = regular_add_uneven_y_y_msb_w_cry + rd_wide_y_din_y_dly1; - - reg [WORD_EXT_W -1:0] regular_add_uneven_x_x_w_cry_r; - reg [WORD_EXT_W -1:0] regular_add_uneven_y_x_w_cry_r; - reg [WORD_EXT_W -1:0] regular_add_uneven_x_y_w_cry_r; - reg [WORD_EXT_W -1:0] regular_add_uneven_y_y_w_cry_r; - - wire [CARRY_W -1:0] regular_add_uneven_x_x_w_cry_msb = regular_add_uneven_x_x_w_cry_r[WORD_EXT_W -1:WORD_W]; - wire [CARRY_W -1:0] regular_add_uneven_y_x_w_cry_msb = regular_add_uneven_y_x_w_cry_r[WORD_EXT_W -1:WORD_W]; - wire [CARRY_W -1:0] regular_add_uneven_x_y_w_cry_msb = regular_add_uneven_x_y_w_cry_r[WORD_EXT_W -1:WORD_W]; - wire [CARRY_W -1:0] regular_add_uneven_y_y_w_cry_msb = regular_add_uneven_y_y_w_cry_r[WORD_EXT_W -1:WORD_W]; - - wire [WORD_W -1:0] regular_add_uneven_x_x_w_cry_lsb = regular_add_uneven_x_x_w_cry_r[WORD_W -1:0]; - wire [WORD_W -1:0] regular_add_uneven_y_x_w_cry_lsb = regular_add_uneven_y_x_w_cry_r[WORD_W -1:0]; - wire [WORD_W -1:0] regular_add_uneven_x_y_w_cry_lsb = regular_add_uneven_x_y_w_cry_r[WORD_W -1:0]; - wire [WORD_W -1:0] regular_add_uneven_y_y_w_cry_lsb = regular_add_uneven_y_y_w_cry_r[WORD_W -1:0]; - - wire [WORD_EXT_W -1:0] regular_add_uneven_x_x_w_cry_reduced = {{CARRY_W{1'b0}}, regular_add_uneven_x_x_w_cry_lsb}; - wire [WORD_EXT_W -1:0] regular_add_uneven_y_x_w_cry_reduced = {{CARRY_W{1'b0}}, regular_add_uneven_y_x_w_cry_lsb}; - wire [WORD_EXT_W -1:0] regular_add_uneven_x_y_w_cry_reduced = {{CARRY_W{1'b0}}, regular_add_uneven_x_y_w_cry_lsb}; - wire [WORD_EXT_W -1:0] regular_add_uneven_y_y_w_cry_reduced = {{CARRY_W{1'b0}}, regular_add_uneven_y_y_w_cry_lsb}; - - reg regular_add_uneven_store_lsb_now; - - task _regular_add_uneven_update_cry; - input [CARRY_W-1:0] x_x_cry, y_x_cry, x_y_cry, y_y_cry; - { regular_add_uneven_x_x_cry_r, regular_add_uneven_y_x_cry_r, regular_add_uneven_x_y_cry_r, regular_add_uneven_y_y_cry_r} <= - { x_x_cry, y_x_cry, x_y_cry, y_y_cry}; - endtask - - task regular_add_uneven_clear_cry; _regular_add_uneven_update_cry( CARRY_ZERO, CARRY_ZERO, CARRY_ZERO, CARRY_ZERO); endtask - task regular_add_uneven_store_cry; _regular_add_uneven_update_cry(regular_add_uneven_x_x_w_cry_msb, regular_add_uneven_y_x_w_cry_msb, regular_add_uneven_x_y_w_cry_msb, regular_add_uneven_y_y_w_cry_msb); endtask - - task _regular_add_uneven_update_sum_w_cry; - input [WORD_EXT_W-1:0] x_x_sum_w_cry, y_x_sum_w_cry, x_y_sum_w_cry, y_y_sum_w_cry; - { regular_add_uneven_x_x_w_cry_r, regular_add_uneven_y_x_w_cry_r, regular_add_uneven_x_y_w_cry_r, regular_add_uneven_y_y_w_cry_r} <= - { x_x_sum_w_cry, y_x_sum_w_cry, x_y_sum_w_cry, y_y_sum_w_cry}; - endtask - - task regular_add_uneven_store_sum_lsb_w_cry; _regular_add_uneven_update_sum_w_cry(regular_add_uneven_x_x_lsb_w_cry, regular_add_uneven_y_x_lsb_w_cry, regular_add_uneven_x_y_lsb_w_cry, regular_add_uneven_y_y_lsb_w_cry); endtask - - task regular_add_uneven_store_sum_msb_w_cry; _regular_add_uneven_update_sum_w_cry(regular_add_uneven_x_x_msb_w_cry, regular_add_uneven_y_x_msb_w_cry, regular_add_uneven_x_y_msb_w_cry, regular_add_uneven_y_y_msb_w_cry); endtask - - always @(posedge clk) - // - case (wrk_fsm_state) - // - WRK_FSM_STATE_LATENCY_PRE3: regular_add_uneven_store_lsb_now <= 1'b1; - WRK_FSM_STATE_BUSY1: if (rd_wide_addr_is_last_half_dly[3]) regular_add_uneven_store_lsb_now <= 1'b0; - // - endcase - - always @(posedge clk) - // - case (wrk_fsm_state) - // - WRK_FSM_STATE_LATENCY_PRE3: regular_add_uneven_clear_cry; - WRK_FSM_STATE_BUSY1, - WRK_FSM_STATE_LATENCY_POST1: regular_add_uneven_store_cry; - // - WRK_FSM_STATE_LATENCY_PRE4: regular_add_uneven_store_sum_lsb_w_cry; - WRK_FSM_STATE_BUSY2: if (regular_add_uneven_store_lsb_now) regular_add_uneven_store_sum_lsb_w_cry; - else regular_add_uneven_store_sum_msb_w_cry; - WRK_FSM_STATE_LATENCY_POST2: regular_add_uneven_store_sum_msb_w_cry; - // - endcase - - // // FSM Process // @@ -1424,7 +1300,8 @@ module modexpng_general_worker case (opcode) // UOP_OPCODE_PROPAGATE_CARRIES, - UOP_OPCODE_MODULAR_SUBTRACT_X: + UOP_OPCODE_MODULAR_SUBTRACT_X, + UOP_OPCODE_REGULAR_ADD_UNEVEN: // case (wrk_fsm_state) // @@ -1524,17 +1401,6 @@ module modexpng_general_worker // endcase // - UOP_OPCODE_REGULAR_ADD_UNEVEN: - // - case (wrk_fsm_state) - // - WRK_FSM_STATE_BUSY1, - WRK_FSM_STATE_LATENCY_POST1, - WRK_FSM_STATE_LATENCY_POST3: - // - update_narrow_dout(regular_add_uneven_x_x_w_cry_reduced, regular_add_uneven_y_x_w_cry_reduced, regular_add_uneven_x_y_w_cry_reduced, regular_add_uneven_y_y_w_cry_reduced); - // - endcase endcase // end From git at cryptech.is Mon Jan 20 21:18:06 2020 From: git at cryptech.is (git at cryptech.is) Date: Mon, 20 Jan 2020 21:18:06 +0000 Subject: [Cryptech-Commits] [user/shatov/modexpng] 05/21: * DSP slices now have two use modes: MULT and ADD/SUB * cosmetic rename of Verilog include file In-Reply-To: <157955508198.72689.15683234965619703441@bikeshed.cryptech.is> References: <157955508198.72689.15683234965619703441@bikeshed.cryptech.is> Message-ID: <20200120211807.0B7A48BAF1B@bikeshed.cryptech.is> This is an automated email from the git hooks/post-receive script. meisterpaul1 at yandex.ru pushed a commit to branch master in repository user/shatov/modexpng. commit b985d4516f1e739be0ea0dabb66da0bbeb5c8f86 Author: Pavel V. Shatov (Meister) AuthorDate: Mon Jan 20 23:44:15 2020 +0300 * DSP slices now have two use modes: MULT and ADD/SUB * cosmetic rename of Verilog include file --- ...xpng_dsp_slice_primitive.vh => modexpng_dsp_slice_primitives.vh} | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/rtl/modexpng_dsp_slice_primitive.vh b/rtl/modexpng_dsp_slice_primitives.vh similarity index 85% rename from rtl/modexpng_dsp_slice_primitive.vh rename to rtl/modexpng_dsp_slice_primitives.vh index 20a0b8f..be20e9e 100644 --- a/rtl/modexpng_dsp_slice_primitive.vh +++ b/rtl/modexpng_dsp_slice_primitives.vh @@ -32,10 +32,12 @@ `ifndef MODEXPNG_ENABLE_DEBUG -`define MODEXPNG_DSP_SLICE modexpng_dsp_slice_wrapper_xilinx +`define MODEXPNG_DSP_SLICE_MULT modexpng_dsp_slice_mult_wrapper_xilinx +`define MODEXPNG_DSP_SLICE_ADDSUB modexpng_dsp_slice_addsub_wrapper_xilinx `else -`define MODEXPNG_DSP_SLICE modexpng_dsp_slice_wrapper_generic +`define MODEXPNG_DSP_SLICE_MULT modexpng_dsp_slice_mult_wrapper_generic +`define MODEXPNG_DSP_SLICE_ADDSUB modexpng_dsp_slice_addsub_wrapper_generic `endif From git at cryptech.is Mon Jan 20 21:18:07 2020 From: git at cryptech.is (git at cryptech.is) Date: Mon, 20 Jan 2020 21:18:07 +0000 Subject: [Cryptech-Commits] [user/shatov/modexpng] 06/21: Removed old DSP wrappers. In-Reply-To: <157955508198.72689.15683234965619703441@bikeshed.cryptech.is> References: <157955508198.72689.15683234965619703441@bikeshed.cryptech.is> Message-ID: <20200120211807.E74498BAF1E@bikeshed.cryptech.is> This is an automated email from the git hooks/post-receive script. meisterpaul1 at yandex.ru pushed a commit to branch master in repository user/shatov/modexpng. commit 147dcd379655d15e9804f3c5155e939ad25ffcfb Author: Pavel V. Shatov (Meister) AuthorDate: Mon Jan 20 23:45:48 2020 +0300 Removed old DSP wrappers. --- rtl/modexpng_dsp_slice_wrapper_generic.v | 124 ----------------------- rtl/modexpng_dsp_slice_wrapper_xilinx.v | 165 ------------------------------- 2 files changed, 289 deletions(-) diff --git a/rtl/modexpng_dsp_slice_wrapper_generic.v b/rtl/modexpng_dsp_slice_wrapper_generic.v deleted file mode 100644 index 5c198bb..0000000 --- a/rtl/modexpng_dsp_slice_wrapper_generic.v +++ /dev/null @@ -1,124 +0,0 @@ -//====================================================================== -// -// Copyright (c) 2019, NORDUnet A/S All rights reserved. -// -// Redistribution and use in source and binary forms, with or without -// modification, are permitted provided that the following conditions -// are met: -// - Redistributions of source code must retain the above copyright -// notice, this list of conditions and the following disclaimer. -// -// - Redistributions in binary form must reproduce the above copyright -// notice, this list of conditions and the following disclaimer in the -// documentation and/or other materials provided with the distribution. -// -// - Neither the name of the NORDUnet nor the names of its contributors may -// be used to endorse or promote products derived from this software -// without specific prior written permission. -// -// THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS -// IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED -// TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A -// PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT -// HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, -// SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED -// TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR -// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF -// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING -// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS -// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. -// -//====================================================================== - -module modexpng_dsp_slice_wrapper_generic # -( - AB_INPUT = "DIRECT", - B_REG = 2 -) -( - clk, - ce_a1, ce_b1, ce_a2, ce_b2, - ce_m, ce_p, ce_mode, - a, b, p, - inmode, opmode, alumode, - casc_a_in, casc_b_in, - casc_a_out, casc_b_out -); - - `include "modexpng_parameters.vh" - `include "modexpng_dsp48e1.vh" - - input clk; // - input ce_a1; // - input ce_b1; // - input ce_a2; // - input ce_b2; // - input ce_m; // - input ce_p; // - input ce_mode; // - input [ WORD_EXT_W -1:0] a; // - input [ WORD_W -1:0] b; // - output [ MAC_W -1:0] p; // - input [ DSP48E1_INMODE_W -1:0] inmode; // - input [ DSP48E1_OPMODE_W -1:0] opmode; // - input [DSP48E1_ALUMODE_W -1:0] alumode; // - input [ DSP48E1_A_W -1:0] casc_a_in; // - input [ DSP48E1_B_W -1:0] casc_b_in; // - output [ DSP48E1_A_W -1:0] casc_a_out; // - output [ DSP48E1_B_W -1:0] casc_b_out; // - - // - // A Port - // - wire [WORD_EXT_W -1:0] a_mux = AB_INPUT == "DIRECT" ? a : casc_a_in[WORD_EXT_W-1:0]; - reg [WORD_EXT_W -1:0] a_reg1; - reg [WORD_EXT_W -1:0] a_reg2; - - assign casc_a_out = a_reg1; - - always @(posedge clk) begin - if (ce_a1) a_reg1 <= a_mux; - if (ce_a2) a_reg2 <= a_reg1; - end - - // - // B Port - // - wire [WORD_W -1:0] b_mux = AB_INPUT == "DIRECT" ? b : casc_b_in[WORD_W-1:0]; - reg [WORD_W -1:0] b_reg1; - reg [WORD_W -1:0] b_reg2; - - assign casc_b_out = b_reg1; - - always @(posedge clk) begin - if (ce_b1) b_reg1 <= b_mux; - if (ce_b2) b_reg2 <= B_REG == 2 ? b_reg1 : b_mux; - end - - // - // OPMODE Port - // - reg [DSP48E1_OPMODE_W -1:0] opmode_reg; - - always @(posedge clk) begin - if (ce_mode) opmode_reg <= opmode; - end - - // - // M, P - // - reg [MAC_W-1:0] m_reg; - reg [MAC_W-1:0] p_reg; - - wire [MAC_W-1:0] a_pad = {{MAC_W-WORD_EXT_W{1'b0}}, a_reg2}; - wire [MAC_W-1:0] b_pad = {{MAC_W-WORD_W{1'b0}}, b_reg2}; - wire [MAC_W-1:0] p_pad = opmode_reg[5] ? p_reg : {MAC_W{1'b0}}; - - assign p = p_reg; - - always @(posedge clk) begin - if (ce_m) m_reg <= a_pad * b_pad; - if (ce_p) p_reg <= m_reg + p_pad; - end - -endmodule diff --git a/rtl/modexpng_dsp_slice_wrapper_xilinx.v b/rtl/modexpng_dsp_slice_wrapper_xilinx.v deleted file mode 100644 index 90407ab..0000000 --- a/rtl/modexpng_dsp_slice_wrapper_xilinx.v +++ /dev/null @@ -1,165 +0,0 @@ -//====================================================================== -// -// Copyright (c) 2019, NORDUnet A/S All rights reserved. -// -// Redistribution and use in source and binary forms, with or without -// modification, are permitted provided that the following conditions -// are met: -// - Redistributions of source code must retain the above copyright -// notice, this list of conditions and the following disclaimer. -// -// - Redistributions in binary form must reproduce the above copyright -// notice, this list of conditions and the following disclaimer in the -// documentation and/or other materials provided with the distribution. -// -// - Neither the name of the NORDUnet nor the names of its contributors may -// be used to endorse or promote products derived from this software -// without specific prior written permission. -// -// THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS -// IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED -// TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A -// PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT -// HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, -// SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED -// TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR -// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF -// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING -// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS -// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. -// -//====================================================================== - -module modexpng_dsp_slice_wrapper_xilinx # -( - AB_INPUT = "DIRECT", - B_REG = 2 -) -( - clk, - ce_a1, ce_b1, ce_a2, ce_b2, - ce_m, ce_p, ce_mode, - a, b, p, - inmode, opmode, alumode, - casc_a_in, casc_b_in, - casc_a_out, casc_b_out -); - - `include "modexpng_parameters.vh" - `include "modexpng_dsp48e1.vh" - - input clk; - input ce_a1; - input ce_b1; - input ce_a2; - input ce_b2; - input ce_m; - input ce_p; - input ce_mode; - input [ WORD_EXT_W -1:0] a; - input [ WORD_W -1:0] b; - output [ MAC_W -1:0] p; - input [ DSP48E1_INMODE_W -1:0] inmode; - input [ DSP48E1_OPMODE_W -1:0] opmode; - input [DSP48E1_ALUMODE_W -1:0] alumode; - input [ DSP48E1_A_W -1:0] casc_a_in; - input [ DSP48E1_B_W -1:0] casc_b_in; - output [ DSP48E1_A_W -1:0] casc_a_out; - output [ DSP48E1_B_W -1:0] casc_b_out; - - wire [DSP48E1_P_W - MAC_W -1:0] p_dummy; - - DSP48E1 # - ( - .AREG (2), - .BREG (B_REG), - .CREG (0), - .DREG (0), - .ADREG (0), - .MREG (1), - .PREG (1), - .ACASCREG (1), - .BCASCREG (1), - .INMODEREG (0), - .OPMODEREG (1), - .ALUMODEREG (0), - .CARRYINREG (0), - .CARRYINSELREG (0), - - .A_INPUT (AB_INPUT), - .B_INPUT (AB_INPUT), - - .USE_DPORT ("FALSE"), - .USE_MULT ("DYNAMIC"), - .USE_SIMD ("ONE48"), - - .MASK ({DSP48E1_P_W{1'b1}}), - .PATTERN ({DSP48E1_P_W{1'b0}}), - .SEL_MASK ("MASK"), - .SEL_PATTERN ("PATTERN"), - - .USE_PATTERN_DETECT ("NO_PATDET"), - .AUTORESET_PATDET ("NO_RESET") - ) - DSP48E1_inst - ( - .CLK (clk), - - .CEA1 (ce_a1), - .CEB1 (ce_b1), - .CEA2 (ce_a2), - .CEB2 (ce_b2), - .CEAD (1'b0), - .CEC (1'b0), - .CED (1'b0), - .CEM (ce_m), - .CEP (ce_p), - .CEINMODE (1'b0), - .CECTRL (ce_mode), - .CEALUMODE (1'b0), - .CECARRYIN (1'b0), - - .A ({{(DSP48E1_A_W-WORD_EXT_W){1'b0}}, a}), - .B ({{(DSP48E1_B_W-WORD_W){1'b0}}, b}), - .C ({DSP48E1_C_W{1'b0}}), - .D ({DSP48E1_D_W{1'b0}}), - .P ({p_dummy, p}), - - .INMODE (inmode), - .OPMODE (opmode), - .ALUMODE (alumode), - - .ACIN (casc_a_in), - .BCIN (casc_b_in), - .ACOUT (casc_a_out), - .BCOUT (casc_b_out), - .PCIN ({DSP48E1_P_W{1'b0}}), - .PCOUT (), - .CARRYCASCIN (1'b0), - .CARRYCASCOUT (), - - .RSTA (1'b0), - .RSTB (1'b0), - .RSTC (1'b0), - .RSTD (1'b0), - .RSTM (1'b0), - .RSTP (1'b0), - .RSTINMODE (1'b0), - .RSTCTRL (1'b0), - .RSTALUMODE (1'b0), - .RSTALLCARRYIN (1'b0), - - .UNDERFLOW (), - .OVERFLOW (), - .PATTERNDETECT (), - .PATTERNBDETECT (), - - .CARRYIN (1'b0), - .CARRYOUT (), - .CARRYINSEL (3'b000), - - .MULTSIGNIN (1'b0), - .MULTSIGNOUT () - ); - -endmodule From git at cryptech.is Mon Jan 20 21:18:08 2020 From: git at cryptech.is (git at cryptech.is) Date: Mon, 20 Jan 2020 21:18:08 +0000 Subject: [Cryptech-Commits] [user/shatov/modexpng] 07/21: Added two pairs of new wrappers. In-Reply-To: <157955508198.72689.15683234965619703441@bikeshed.cryptech.is> References: <157955508198.72689.15683234965619703441@bikeshed.cryptech.is> Message-ID: <20200120211809.879D88BC2CF@bikeshed.cryptech.is> This is an automated email from the git hooks/post-receive script. meisterpaul1 at yandex.ru pushed a commit to branch master in repository user/shatov/modexpng. commit a1314f3f0650e2806d099c7943b63436b431ea05 Author: Pavel V. Shatov (Meister) AuthorDate: Mon Jan 20 23:47:19 2020 +0300 Added two pairs of new wrappers. --- rtl/modexpng_dsp_slice_addsub_wrapper_generic.v | 224 ++++++++++++++++++++++++ rtl/modexpng_dsp_slice_addsub_wrapper_xilinx.v | 168 ++++++++++++++++++ rtl/modexpng_dsp_slice_mult_wrapper_generic.v | 124 +++++++++++++ rtl/modexpng_dsp_slice_mult_wrapper_xilinx.v | 165 +++++++++++++++++ 4 files changed, 681 insertions(+) diff --git a/rtl/modexpng_dsp_slice_addsub_wrapper_generic.v b/rtl/modexpng_dsp_slice_addsub_wrapper_generic.v new file mode 100644 index 0000000..0ef45fe --- /dev/null +++ b/rtl/modexpng_dsp_slice_addsub_wrapper_generic.v @@ -0,0 +1,224 @@ +//====================================================================== +// +// Copyright (c) 2019, NORDUnet A/S All rights reserved. +// +// Redistribution and use in source and binary forms, with or without +// modification, are permitted provided that the following conditions +// are met: +// - Redistributions of source code must retain the above copyright +// notice, this list of conditions and the following disclaimer. +// +// - Redistributions in binary form must reproduce the above copyright +// notice, this list of conditions and the following disclaimer in the +// documentation and/or other materials provided with the distribution. +// +// - Neither the name of the NORDUnet nor the names of its contributors may +// be used to endorse or promote products derived from this software +// without specific prior written permission. +// +// THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS +// IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED +// TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A +// PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT +// HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, +// SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED +// TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR +// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF +// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING +// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS +// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. +// +//====================================================================== + +module modexpng_dsp_slice_addsub_wrapper_generic +( + clk, + ce_abc, + ce_p, + ce_ctrl, + x, y, p, + op_mode, + alu_mode, + carry_in_sel, + casc_p_in, + casc_p_out, + carry_out +); + + `include "modexpng_dsp48e1.vh" + + input clk; + input ce_abc; + input ce_p; + input ce_ctrl; + input [ DSP48E1_C_W -1:0] x; + input [ DSP48E1_C_W -1:0] y; + output [ DSP48E1_P_W -1:0] p; + input [ DSP48E1_OPMODE_W -1:0] op_mode; + input [ DSP48E1_ALUMODE_W -1:0] alu_mode; + input [DSP48E1_CARRYINSEL_W -1:0] carry_in_sel; + input [ DSP48E1_P_W -1:0] casc_p_in; + output [ DSP48E1_P_W -1:0] casc_p_out; + output carry_out; + + + // + // Internal Registers + // + reg [ DSP48E1_C_W -1:0] reg_x; + reg [ DSP48E1_C_W -1:0] reg_y; + + reg [ DSP48E1_OPMODE_W -1:0] reg_op_mode; + reg [ DSP48E1_ALUMODE_W -1:0] reg_alu_mode; + reg [DSP48E1_CARRYINSEL_W -1:0] reg_carry_in_sel; + + always @(posedge clk) + // + if (ce_abc) begin + reg_x <= x; + reg_y <= y; + end + + always @(posedge clk) + // + if (ce_ctrl) begin + reg_op_mode <= op_mode; + reg_alu_mode <= alu_mode; + reg_carry_in_sel <= carry_in_sel; + end + + + // + // Generic Math Slice Model + // + function [DSP48E1_C_W-1:0] calc_mux_x; + input [ 1:0] opmode_x; + input [DSP48E1_C_W-1:0] ab_value; + case (opmode_x) + DSP48E1_OPMODE_X_0: calc_mux_x = {DSP48E1_C_W{1'b0}}; + DSP48E1_OPMODE_X_AB: calc_mux_x = ab_value; + default: begin + $display("ERROR: Bad X opmode (%b)!", opmode_x); + $finish; + end + endcase + endfunction + + function [DSP48E1_C_W-1:0] calc_mux_y; + input [ 1:0] opmode_y; + input [DSP48E1_C_W-1:0] c_value; + case (opmode_y) + DSP48E1_OPMODE_Y_0: calc_mux_y = {DSP48E1_C_W{1'b0}}; + DSP48E1_OPMODE_Y_C: calc_mux_y = c_value; + default: begin + $display("ERROR: Bad Y opmode (%b)!", opmode_y); + $finish; + end + endcase + endfunction + + function [DSP48E1_C_W-1:0] calc_mux_z; + input [ 2:0] opmode_z; + input [DSP48E1_P_W-1:0] p_value; + input [DSP48E1_C_W-1:0] c_value; + input [DSP48E1_P_W-1:0] pcin_value; + case (opmode_z) + DSP48E1_OPMODE_Z_0: calc_mux_z = {DSP48E1_C_W{1'b0}}; // 000 + DSP48E1_OPMODE_Z_C: calc_mux_z = c_value; // 011 + DSP48E1_OPMODE_Z_PCIN17: calc_mux_z = {{17{1'b0}}, pcin_value[DSP48E1_C_W-1:17]}; // 101 + DSP48E1_OPMODE_Z_P17: calc_mux_z = {{17{1'b0}}, p_value[DSP48E1_C_W-1:17]}; // 110 + default: begin + $display("ERROR: Bad Z opmode (%b)!", opmode_z); + $finish; + end + endcase + endfunction + + function [ DSP48E1_C_W -1:0] calc_mux_cin; + input [DSP48E1_CARRYINSEL_W -1:0] carryinsel_value; + input [ DSP48E1_ALUMODE_W -1:0] alumode_value; + input carry_value; + case (carryinsel_value) + DSP48E1_CARRYINSEL_CARRYIN: calc_mux_cin = 1'b0; // 000 + DSP48E1_CARRYINSEL_CARRYCASCOUT: // 100 + case (alumode_value) + DSP48E1_ALUMODE_Z_PLUS_X_AND_Y_AND_CIN: calc_mux_cin = carry_value; + DSP48E1_ALUMODE_Z_MINUS_X_AND_Y_AND_CIN: calc_mux_cin = ~carry_value; + default: begin + $display("ERROR: Invalid ALUMODE (%b)!", alumode_value); + $finish; + end + endcase + default: begin + $display("ERROR: Bad CARRYINSEL (%b)!", carryinsel_value); + $finish; + end + endcase + endfunction + + function [ DSP48E1_P_W :0] calc_p; + input [ DSP48E1_ALUMODE_W -1:0] alumode_value; + input [ DSP48E1_OPMODE_W -1:0] opmode_value; + input [DSP48E1_CARRYINSEL_W -1:0] carryinsel_value; + input [ DSP48E1_P_W -1:0] p_value; + input [ DSP48E1_C_W -1:0] ab_value; + input [ DSP48E1_C_W -1:0] c_value; + input carry_value; + input [ DSP48E1_P_W -1:0] pcin_value; + reg [ DSP48E1_C_W -1:0] mux_x; + reg [ DSP48E1_C_W -1:0] mux_y; + reg [ DSP48E1_C_W -1:0] mux_z; + reg mux_cin; + reg [ DSP48E1_P_W :0] int_p; + begin + mux_x = calc_mux_x(opmode_value[1:0], ab_value); + mux_y = calc_mux_y(opmode_value[3:2], c_value); + mux_z = calc_mux_z(opmode_value[6:4], p_value, c_value, pcin_value); + mux_cin = calc_mux_cin(carryinsel_value, alumode_value, carry_value); + case (alumode_value) + // + DSP48E1_ALUMODE_Z_PLUS_X_AND_Y_AND_CIN: begin + int_p = {1'b0, mux_z} + {1'b0, mux_x} + {1'b0, mux_y} + {{DSP48E1_P_W{1'b0}}, mux_cin}; + calc_p = {int_p[DSP48E1_P_W], int_p[DSP48E1_P_W-1:0]}; + end + // + DSP48E1_ALUMODE_Z_MINUS_X_AND_Y_AND_CIN: begin + int_p = {1'b0, mux_z} - {1'b0, mux_x} - {1'b0, mux_y} - {{DSP48E1_P_W{1'b0}}, mux_cin}; + calc_p = {~int_p[DSP48E1_P_W], int_p[DSP48E1_P_W-1:0]}; + end + // + default: begin + $display("ERROR: Invalid ALUMODE (%b)!", alumode_value); + $finish; + end + endcase + end + endfunction + + + // + // Output Registers + // + reg [DSP48E1_P_W -1:0] reg_p; + reg reg_carry_out; + + always @(posedge clk) + // + if (ce_p) {reg_carry_out, reg_p} <= + calc_p(reg_alu_mode, reg_op_mode, reg_carry_in_sel, reg_p, reg_x, reg_y, reg_carry_out, casc_p_in); + + + // + // Output Buses + // + assign p = reg_p; + assign carry_out = reg_carry_out; + + + // + // Cascade Bus + // + assign casc_p_out = reg_p; + + +endmodule diff --git a/rtl/modexpng_dsp_slice_addsub_wrapper_xilinx.v b/rtl/modexpng_dsp_slice_addsub_wrapper_xilinx.v new file mode 100644 index 0000000..fee8216 --- /dev/null +++ b/rtl/modexpng_dsp_slice_addsub_wrapper_xilinx.v @@ -0,0 +1,168 @@ +//====================================================================== +// +// Copyright (c) 2019, NORDUnet A/S All rights reserved. +// +// Redistribution and use in source and binary forms, with or without +// modification, are permitted provided that the following conditions +// are met: +// - Redistributions of source code must retain the above copyright +// notice, this list of conditions and the following disclaimer. +// +// - Redistributions in binary form must reproduce the above copyright +// notice, this list of conditions and the following disclaimer in the +// documentation and/or other materials provided with the distribution. +// +// - Neither the name of the NORDUnet nor the names of its contributors may +// be used to endorse or promote products derived from this software +// without specific prior written permission. +// +// THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS +// IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED +// TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A +// PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT +// HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, +// SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED +// TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR +// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF +// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING +// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS +// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. +// +//====================================================================== + +module modexpng_dsp_slice_addsub_wrapper_xilinx +( + clk, + ce_abc, + ce_p, + ce_ctrl, + x, y, p, + op_mode, + alu_mode, + carry_in_sel, + casc_p_in, + casc_p_out, + carry_out +); + + `include "modexpng_dsp48e1.vh" + + input clk; + input ce_abc; + input ce_p; + input ce_ctrl; + input [ DSP48E1_C_W -1:0] x; + input [ DSP48E1_C_W -1:0] y; + output [ DSP48E1_P_W -1:0] p; + input [ DSP48E1_OPMODE_W -1:0] op_mode; + input [ DSP48E1_ALUMODE_W -1:0] alu_mode; + input [DSP48E1_CARRYINSEL_W -1:0] carry_in_sel; + input [ DSP48E1_P_W -1:0] casc_p_in; + output [ DSP48E1_P_W -1:0] casc_p_out; + output carry_out; + + wire [ DSP48E1_A_W -1:0] a_int; + wire [ DSP48E1_B_W -1:0] b_int; + wire [ DSP48E1_C_W -1:0] c_int; + wire [ DSP48E1_P_W -1:0] p_int; + wire [DSP48E1_CARRYOUT_W -1:0] carry_out_int; + + assign {a_int, b_int} = {x}; + assign {c_int} = {y}; + assign {p} = {p_int}; + assign {carry_out} = {carry_out_int[DSP48E1_CARRYOUT_W-1]}; + + DSP48E1 # + ( + .AREG (1), + .BREG (1), + .CREG (1), + .DREG (0), + .ADREG (0), + .MREG (0), + .PREG (1), + .ACASCREG (1), + .BCASCREG (1), + .INMODEREG (0), + .OPMODEREG (1), + .ALUMODEREG (1), + .CARRYINREG (0), + .CARRYINSELREG (1), + + .A_INPUT ("DIRECT"), + .B_INPUT ("DIRECT"), + + .USE_DPORT ("FALSE"), + .USE_MULT ("NONE"), + .USE_SIMD ("ONE48"), + + .MASK ({DSP48E1_P_W{1'b1}}), + .PATTERN ({DSP48E1_P_W{1'b0}}), + .SEL_MASK ("MASK"), + .SEL_PATTERN ("PATTERN"), + + .USE_PATTERN_DETECT ("NO_PATDET"), + .AUTORESET_PATDET ("NO_RESET") + ) + DSP48E1_inst + ( + .CLK (clk), + + .CEA1 (1'b0), + .CEB1 (1'b0), + .CEA2 (ce_abc), + .CEB2 (ce_abc), + .CEAD (1'b0), + .CEC (ce_abc), + .CED (1'b0), + .CEM (1'b0), + .CEP (ce_p), + .CEINMODE (1'b0), + .CECTRL (ce_ctrl), + .CEALUMODE (ce_ctrl), + .CECARRYIN (1'b0), + + .A (a_int), + .B (b_int), + .C (c_int), + .D ({DSP48E1_D_W{1'b0}}), + .P (p_int), + + .INMODE ({DSP48E1_INMODE_W{1'b0}}), + .OPMODE (op_mode), + .ALUMODE (alu_mode), + + .ACIN (), + .BCIN (), + .ACOUT (), + .BCOUT (), + .PCIN (casc_p_in), + .PCOUT (casc_p_out), + .CARRYCASCIN (1'b0), + .CARRYCASCOUT (), + + .RSTA (1'b0), + .RSTB (1'b0), + .RSTC (1'b0), + .RSTD (1'b0), + .RSTM (1'b0), + .RSTP (1'b0), + .RSTINMODE (1'b0), + .RSTCTRL (1'b0), + .RSTALUMODE (1'b0), + .RSTALLCARRYIN (1'b0), + + .UNDERFLOW (), + .OVERFLOW (), + .PATTERNDETECT (), + .PATTERNBDETECT (), + + .CARRYIN (1'b0), + .CARRYOUT (carry_out_int), + .CARRYINSEL (carry_in_sel), + + .MULTSIGNIN (1'b0), + .MULTSIGNOUT () + ); + +endmodule diff --git a/rtl/modexpng_dsp_slice_mult_wrapper_generic.v b/rtl/modexpng_dsp_slice_mult_wrapper_generic.v new file mode 100644 index 0000000..524d0dd --- /dev/null +++ b/rtl/modexpng_dsp_slice_mult_wrapper_generic.v @@ -0,0 +1,124 @@ +//====================================================================== +// +// Copyright (c) 2019, NORDUnet A/S All rights reserved. +// +// Redistribution and use in source and binary forms, with or without +// modification, are permitted provided that the following conditions +// are met: +// - Redistributions of source code must retain the above copyright +// notice, this list of conditions and the following disclaimer. +// +// - Redistributions in binary form must reproduce the above copyright +// notice, this list of conditions and the following disclaimer in the +// documentation and/or other materials provided with the distribution. +// +// - Neither the name of the NORDUnet nor the names of its contributors may +// be used to endorse or promote products derived from this software +// without specific prior written permission. +// +// THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS +// IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED +// TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A +// PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT +// HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, +// SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED +// TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR +// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF +// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING +// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS +// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. +// +//====================================================================== + +module modexpng_dsp_slice_mult_wrapper_generic # +( + AB_INPUT = "DIRECT", + B_REG = 2 +) +( + clk, + ce_a1, ce_b1, ce_a2, ce_b2, + ce_m, ce_p, ce_mode, + a, b, p, + inmode, opmode, alumode, + casc_a_in, casc_b_in, + casc_a_out, casc_b_out +); + + `include "modexpng_parameters.vh" + `include "modexpng_dsp48e1.vh" + + input clk; // + input ce_a1; // + input ce_b1; // + input ce_a2; // + input ce_b2; // + input ce_m; // + input ce_p; // + input ce_mode; // + input [ WORD_EXT_W -1:0] a; // + input [ WORD_W -1:0] b; // + output [ MAC_W -1:0] p; // + input [ DSP48E1_INMODE_W -1:0] inmode; // + input [ DSP48E1_OPMODE_W -1:0] opmode; // + input [DSP48E1_ALUMODE_W -1:0] alumode; // + input [ DSP48E1_A_W -1:0] casc_a_in; // + input [ DSP48E1_B_W -1:0] casc_b_in; // + output [ DSP48E1_A_W -1:0] casc_a_out; // + output [ DSP48E1_B_W -1:0] casc_b_out; // + + // + // A Port + // + wire [WORD_EXT_W -1:0] a_mux = AB_INPUT == "DIRECT" ? a : casc_a_in[WORD_EXT_W-1:0]; + reg [WORD_EXT_W -1:0] a_reg1; + reg [WORD_EXT_W -1:0] a_reg2; + + assign casc_a_out = a_reg1; + + always @(posedge clk) begin + if (ce_a1) a_reg1 <= a_mux; + if (ce_a2) a_reg2 <= a_reg1; + end + + // + // B Port + // + wire [WORD_W -1:0] b_mux = AB_INPUT == "DIRECT" ? b : casc_b_in[WORD_W-1:0]; + reg [WORD_W -1:0] b_reg1; + reg [WORD_W -1:0] b_reg2; + + assign casc_b_out = b_reg1; + + always @(posedge clk) begin + if (ce_b1) b_reg1 <= b_mux; + if (ce_b2) b_reg2 <= B_REG == 2 ? b_reg1 : b_mux; + end + + // + // OPMODE Port + // + reg [DSP48E1_OPMODE_W -1:0] opmode_reg; + + always @(posedge clk) begin + if (ce_mode) opmode_reg <= opmode; + end + + // + // M, P + // + reg [MAC_W-1:0] m_reg; + reg [MAC_W-1:0] p_reg; + + wire [MAC_W-1:0] a_pad = {{MAC_W-WORD_EXT_W{1'b0}}, a_reg2}; + wire [MAC_W-1:0] b_pad = {{MAC_W-WORD_W{1'b0}}, b_reg2}; + wire [MAC_W-1:0] p_pad = opmode_reg[5] ? p_reg : {MAC_W{1'b0}}; + + assign p = p_reg; + + always @(posedge clk) begin + if (ce_m) m_reg <= a_pad * b_pad; + if (ce_p) p_reg <= m_reg + p_pad; + end + +endmodule diff --git a/rtl/modexpng_dsp_slice_mult_wrapper_xilinx.v b/rtl/modexpng_dsp_slice_mult_wrapper_xilinx.v new file mode 100644 index 0000000..cd9baf8 --- /dev/null +++ b/rtl/modexpng_dsp_slice_mult_wrapper_xilinx.v @@ -0,0 +1,165 @@ +//====================================================================== +// +// Copyright (c) 2019, NORDUnet A/S All rights reserved. +// +// Redistribution and use in source and binary forms, with or without +// modification, are permitted provided that the following conditions +// are met: +// - Redistributions of source code must retain the above copyright +// notice, this list of conditions and the following disclaimer. +// +// - Redistributions in binary form must reproduce the above copyright +// notice, this list of conditions and the following disclaimer in the +// documentation and/or other materials provided with the distribution. +// +// - Neither the name of the NORDUnet nor the names of its contributors may +// be used to endorse or promote products derived from this software +// without specific prior written permission. +// +// THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS +// IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED +// TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A +// PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT +// HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, +// SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED +// TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR +// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF +// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING +// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS +// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. +// +//====================================================================== + +module modexpng_dsp_slice_mult_wrapper_xilinx # +( + AB_INPUT = "DIRECT", + B_REG = 2 +) +( + clk, + ce_a1, ce_b1, ce_a2, ce_b2, + ce_m, ce_p, ce_mode, + a, b, p, + inmode, opmode, alumode, + casc_a_in, casc_b_in, + casc_a_out, casc_b_out +); + + `include "modexpng_parameters.vh" + `include "modexpng_dsp48e1.vh" + + input clk; + input ce_a1; + input ce_b1; + input ce_a2; + input ce_b2; + input ce_m; + input ce_p; + input ce_mode; + input [ WORD_EXT_W -1:0] a; + input [ WORD_W -1:0] b; + output [ MAC_W -1:0] p; + input [ DSP48E1_INMODE_W -1:0] inmode; + input [ DSP48E1_OPMODE_W -1:0] opmode; + input [DSP48E1_ALUMODE_W -1:0] alumode; + input [ DSP48E1_A_W -1:0] casc_a_in; + input [ DSP48E1_B_W -1:0] casc_b_in; + output [ DSP48E1_A_W -1:0] casc_a_out; + output [ DSP48E1_B_W -1:0] casc_b_out; + + wire [DSP48E1_P_W - MAC_W -1:0] p_dummy; + + DSP48E1 # + ( + .AREG (2), + .BREG (B_REG), + .CREG (0), + .DREG (0), + .ADREG (0), + .MREG (1), + .PREG (1), + .ACASCREG (1), + .BCASCREG (1), + .INMODEREG (0), + .OPMODEREG (1), + .ALUMODEREG (0), + .CARRYINREG (0), + .CARRYINSELREG (0), + + .A_INPUT (AB_INPUT), + .B_INPUT (AB_INPUT), + + .USE_DPORT ("FALSE"), + .USE_MULT ("DYNAMIC"), + .USE_SIMD ("ONE48"), + + .MASK ({DSP48E1_P_W{1'b1}}), + .PATTERN ({DSP48E1_P_W{1'b0}}), + .SEL_MASK ("MASK"), + .SEL_PATTERN ("PATTERN"), + + .USE_PATTERN_DETECT ("NO_PATDET"), + .AUTORESET_PATDET ("NO_RESET") + ) + DSP48E1_inst + ( + .CLK (clk), + + .CEA1 (ce_a1), + .CEB1 (ce_b1), + .CEA2 (ce_a2), + .CEB2 (ce_b2), + .CEAD (1'b0), + .CEC (1'b0), + .CED (1'b0), + .CEM (ce_m), + .CEP (ce_p), + .CEINMODE (1'b0), + .CECTRL (ce_mode), + .CEALUMODE (1'b0), + .CECARRYIN (1'b0), + + .A ({{(DSP48E1_A_W-WORD_EXT_W){1'b0}}, a}), + .B ({{(DSP48E1_B_W-WORD_W){1'b0}}, b}), + .C ({DSP48E1_C_W{1'b0}}), + .D ({DSP48E1_D_W{1'b0}}), + .P ({p_dummy, p}), + + .INMODE (inmode), + .OPMODE (opmode), + .ALUMODE (alumode), + + .ACIN (casc_a_in), + .BCIN (casc_b_in), + .ACOUT (casc_a_out), + .BCOUT (casc_b_out), + .PCIN ({DSP48E1_P_W{1'b0}}), + .PCOUT (), + .CARRYCASCIN (1'b0), + .CARRYCASCOUT (), + + .RSTA (1'b0), + .RSTB (1'b0), + .RSTC (1'b0), + .RSTD (1'b0), + .RSTM (1'b0), + .RSTP (1'b0), + .RSTINMODE (1'b0), + .RSTCTRL (1'b0), + .RSTALUMODE (1'b0), + .RSTALLCARRYIN (1'b0), + + .UNDERFLOW (), + .OVERFLOW (), + .PATTERNDETECT (), + .PATTERNBDETECT (), + + .CARRYIN (1'b0), + .CARRYOUT (), + .CARRYINSEL (3'b000), + + .MULTSIGNIN (1'b0), + .MULTSIGNOUT () + ); + +endmodule From git at cryptech.is Mon Jan 20 21:18:09 2020 From: git at cryptech.is (git at cryptech.is) Date: Mon, 20 Jan 2020 21:18:09 +0000 Subject: [Cryptech-Commits] [user/shatov/modexpng] 08/21: Cosmetic fix that only involves debug output during simulation. In-Reply-To: <157955508198.72689.15683234965619703441@bikeshed.cryptech.is> References: <157955508198.72689.15683234965619703441@bikeshed.cryptech.is> Message-ID: <20200120211811.D96A98BC2E3@bikeshed.cryptech.is> This is an automated email from the git hooks/post-receive script. meisterpaul1 at yandex.ru pushed a commit to branch master in repository user/shatov/modexpng. commit b585d259590bb5f39c512a067c5a5d173350a72b Author: Pavel V. Shatov (Meister) AuthorDate: Mon Jan 20 23:48:34 2020 +0300 Cosmetic fix that only involves debug output during simulation. --- rtl/modexpng_core_top_debug.vh | 144 ++++++++++++++++++++--------------------- 1 file changed, 72 insertions(+), 72 deletions(-) diff --git a/rtl/modexpng_core_top_debug.vh b/rtl/modexpng_core_top_debug.vh index 38bee0b..0b69340 100644 --- a/rtl/modexpng_core_top_debug.vh +++ b/rtl/modexpng_core_top_debug.vh @@ -40,86 +40,86 @@ always @(posedge clk) // // X.X // - $write(" "); for (i=0; i<64; i=i+1) $write("[ %3d ] ", i); $write("\n"); - $write("X.X.WIDE.A: "); for (i=0; i<64; i=i+1) $write("0x%05x ", storage_block_x.gen_wide[0].wide_x.mem[0*256+i]); $write("\n"); - $write("X.X.WIDE.B: "); for (i=0; i<64; i=i+1) $write("0x%05x ", storage_block_x.gen_wide[0].wide_x.mem[1*256+i]); $write("\n"); - $write("X.X.WIDE.C: "); for (i=0; i<64; i=i+1) $write("0x%05x ", storage_block_x.gen_wide[0].wide_x.mem[2*256+i]); $write("\n"); - $write("X.X.WIDE.D: "); for (i=0; i<64; i=i+1) $write("0x%05x ", storage_block_x.gen_wide[0].wide_x.mem[3*256+i]); $write("\n"); - $write("X.X.WIDE.E: "); for (i=0; i<64; i=i+1) $write("0x%05x ", storage_block_x.gen_wide[0].wide_x.mem[4*256+i]); $write("\n"); - $write("X.X.WIDE.N: "); for (i=0; i<64; i=i+1) $write("0x%05x ", storage_block_x.gen_wide[0].wide_x.mem[5*256+i]); $write("\n"); - $write("X.X.WIDE.L: "); for (i=0; i<64; i=i+1) $write("0x%05x ", storage_block_x.gen_wide[0].wide_x.mem[6*256+i]); $write("\n"); - $write("X.X.WIDE.H: "); for (i=0; i<64; i=i+1) $write("0x%05x ", storage_block_x.gen_wide[0].wide_x.mem[7*256+i]); $write("\n"); - $write(" "); for (i=0; i<64; i=i+1) $write(" ------ "); $write("\n"); - $write("X.X.NARROW.A: "); for (i=0; i<64; i=i+1) $write("0x%05x ", storage_block_x.narrow_x.mem[0*256+i]); $write("\n"); - $write("X.X.NARROW.B: "); for (i=0; i<64; i=i+1) $write("0x%05x ", storage_block_x.narrow_x.mem[1*256+i]); $write("\n"); - $write("X.X.NARROW.C: "); for (i=0; i<64; i=i+1) $write("0x%05x ", storage_block_x.narrow_x.mem[2*256+i]); $write("\n"); - $write("X.X.NARROW.D: "); for (i=0; i<64; i=i+1) $write("0x%05x ", storage_block_x.narrow_x.mem[3*256+i]); $write("\n"); - $write("X.X.NARROW.E: "); for (i=0; i<64; i=i+1) $write("0x%05x ", storage_block_x.narrow_x.mem[4*256+i]); $write("\n"); - $write("X.X.NARROW.COEFF: "); for (i=0; i<64; i=i+1) $write("0x%05x ", storage_block_x.narrow_x.mem[5*256+i]); $write("\n"); - $write("X.X.NARROW.Q: "); for (i=0; i<64; i=i+1) $write("0x%05x ", storage_block_x.narrow_x.mem[6*256+i]); $write("\n"); - $write("X.X.NARROW.EXT: "); for (i=0; i<64; i=i+1) $write("0x%05x ", storage_block_x.narrow_x.mem[7*256+i]); $write("\n"); + $write(" "); for (i=0; i<32; i=i+1) $write("[ %3d ] ", i); $write("\n"); + $write("X.X.WIDE.A: "); for (i=0; i<32; i=i+1) $write("0x%05x ", storage_block_x.gen_wide[0].wide_x.mem[0*256+i]); $write("\n"); + $write("X.X.WIDE.B: "); for (i=0; i<32; i=i+1) $write("0x%05x ", storage_block_x.gen_wide[0].wide_x.mem[1*256+i]); $write("\n"); + $write("X.X.WIDE.C: "); for (i=0; i<32; i=i+1) $write("0x%05x ", storage_block_x.gen_wide[0].wide_x.mem[2*256+i]); $write("\n"); + $write("X.X.WIDE.D: "); for (i=0; i<32; i=i+1) $write("0x%05x ", storage_block_x.gen_wide[0].wide_x.mem[3*256+i]); $write("\n"); + $write("X.X.WIDE.E: "); for (i=0; i<32; i=i+1) $write("0x%05x ", storage_block_x.gen_wide[0].wide_x.mem[4*256+i]); $write("\n"); + $write("X.X.WIDE.N: "); for (i=0; i<32; i=i+1) $write("0x%05x ", storage_block_x.gen_wide[0].wide_x.mem[5*256+i]); $write("\n"); + $write("X.X.WIDE.L: "); for (i=0; i<32; i=i+1) $write("0x%05x ", storage_block_x.gen_wide[0].wide_x.mem[6*256+i]); $write("\n"); + $write("X.X.WIDE.H: "); for (i=0; i<32; i=i+1) $write("0x%05x ", storage_block_x.gen_wide[0].wide_x.mem[7*256+i]); $write("\n"); + $write(" "); for (i=0; i<32; i=i+1) $write("------- "); $write("\n"); + $write("X.X.NARROW.A: "); for (i=0; i<32; i=i+1) $write("0x%05x ", storage_block_x.narrow_x.mem[0*256+i]); $write("\n"); + $write("X.X.NARROW.B: "); for (i=0; i<32; i=i+1) $write("0x%05x ", storage_block_x.narrow_x.mem[1*256+i]); $write("\n"); + $write("X.X.NARROW.C: "); for (i=0; i<32; i=i+1) $write("0x%05x ", storage_block_x.narrow_x.mem[2*256+i]); $write("\n"); + $write("X.X.NARROW.D: "); for (i=0; i<32; i=i+1) $write("0x%05x ", storage_block_x.narrow_x.mem[3*256+i]); $write("\n"); + $write("X.X.NARROW.E: "); for (i=0; i<32; i=i+1) $write("0x%05x ", storage_block_x.narrow_x.mem[4*256+i]); $write("\n"); + $write("X.X.NARROW.COEFF: "); for (i=0; i<32; i=i+1) $write("0x%05x ", storage_block_x.narrow_x.mem[5*256+i]); $write("\n"); + $write("X.X.NARROW.Q: "); for (i=0; i<32; i=i+1) $write("0x%05x ", storage_block_x.narrow_x.mem[6*256+i]); $write("\n"); + $write("X.X.NARROW.EXT: "); for (i=0; i<32; i=i+1) $write("0x%05x ", storage_block_x.narrow_x.mem[7*256+i]); $write("\n"); // // X.Y // - $write(" "); for (i=0; i<64; i=i+1) $write("[ %3d ] ", i); $write("\n"); - $write("X.Y.WIDE.A: "); for (i=0; i<64; i=i+1) $write("0x%05x ", storage_block_x.gen_wide[0].wide_y.mem[0*256+i]); $write("\n"); - $write("X.Y.WIDE.B: "); for (i=0; i<64; i=i+1) $write("0x%05x ", storage_block_x.gen_wide[0].wide_y.mem[1*256+i]); $write("\n"); - $write("X.Y.WIDE.C: "); for (i=0; i<64; i=i+1) $write("0x%05x ", storage_block_x.gen_wide[0].wide_y.mem[2*256+i]); $write("\n"); - $write("X.Y.WIDE.D: "); for (i=0; i<64; i=i+1) $write("0x%05x ", storage_block_x.gen_wide[0].wide_y.mem[3*256+i]); $write("\n"); - $write("X.Y.WIDE.E: "); for (i=0; i<64; i=i+1) $write("0x%05x ", storage_block_x.gen_wide[0].wide_y.mem[4*256+i]); $write("\n"); - $write("X.Y.WIDE.N: "); for (i=0; i<64; i=i+1) $write("0x%05x ", storage_block_x.gen_wide[0].wide_y.mem[5*256+i]); $write("\n"); - $write("X.Y.WIDE.L: "); for (i=0; i<64; i=i+1) $write("0x%05x ", storage_block_x.gen_wide[0].wide_y.mem[6*256+i]); $write("\n"); - $write("X.Y.WIDE.H: "); for (i=0; i<64; i=i+1) $write("0x%05x ", storage_block_x.gen_wide[0].wide_y.mem[7*256+i]); $write("\n"); - $write(" "); for (i=0; i<64; i=i+1) $write(" ------ "); $write("\n"); - $write("X.Y.NARROW.A: "); for (i=0; i<64; i=i+1) $write("0x%05x ", storage_block_x.narrow_y.mem[0*256+i]); $write("\n"); - $write("X.Y.NARROW.B: "); for (i=0; i<64; i=i+1) $write("0x%05x ", storage_block_x.narrow_y.mem[1*256+i]); $write("\n"); - $write("X.Y.NARROW.C: "); for (i=0; i<64; i=i+1) $write("0x%05x ", storage_block_x.narrow_y.mem[2*256+i]); $write("\n"); - $write("X.Y.NARROW.D: "); for (i=0; i<64; i=i+1) $write("0x%05x ", storage_block_x.narrow_y.mem[3*256+i]); $write("\n"); - $write("X.Y.NARROW.E: "); for (i=0; i<64; i=i+1) $write("0x%05x ", storage_block_x.narrow_y.mem[4*256+i]); $write("\n"); - $write("X.Y.NARROW.COEFF: "); for (i=0; i<64; i=i+1) $write("0x%05x ", storage_block_x.narrow_y.mem[5*256+i]); $write("\n"); - $write("X.Y.NARROW.Q: "); for (i=0; i<64; i=i+1) $write("0x%05x ", storage_block_x.narrow_y.mem[6*256+i]); $write("\n"); - $write("X.Y.NARROW.EXT: "); for (i=0; i<64; i=i+1) $write("0x%05x ", storage_block_x.narrow_y.mem[7*256+i]); $write("\n"); + $write(" "); for (i=0; i<32; i=i+1) $write("[ %3d ] ", i); $write("\n"); + $write("X.Y.WIDE.A: "); for (i=0; i<32; i=i+1) $write("0x%05x ", storage_block_x.gen_wide[0].wide_y.mem[0*256+i]); $write("\n"); + $write("X.Y.WIDE.B: "); for (i=0; i<32; i=i+1) $write("0x%05x ", storage_block_x.gen_wide[0].wide_y.mem[1*256+i]); $write("\n"); + $write("X.Y.WIDE.C: "); for (i=0; i<32; i=i+1) $write("0x%05x ", storage_block_x.gen_wide[0].wide_y.mem[2*256+i]); $write("\n"); + $write("X.Y.WIDE.D: "); for (i=0; i<32; i=i+1) $write("0x%05x ", storage_block_x.gen_wide[0].wide_y.mem[3*256+i]); $write("\n"); + $write("X.Y.WIDE.E: "); for (i=0; i<32; i=i+1) $write("0x%05x ", storage_block_x.gen_wide[0].wide_y.mem[4*256+i]); $write("\n"); + $write("X.Y.WIDE.N: "); for (i=0; i<32; i=i+1) $write("0x%05x ", storage_block_x.gen_wide[0].wide_y.mem[5*256+i]); $write("\n"); + $write("X.Y.WIDE.L: "); for (i=0; i<32; i=i+1) $write("0x%05x ", storage_block_x.gen_wide[0].wide_y.mem[6*256+i]); $write("\n"); + $write("X.Y.WIDE.H: "); for (i=0; i<32; i=i+1) $write("0x%05x ", storage_block_x.gen_wide[0].wide_y.mem[7*256+i]); $write("\n"); + $write(" "); for (i=0; i<32; i=i+1) $write("------- "); $write("\n"); + $write("X.Y.NARROW.A: "); for (i=0; i<32; i=i+1) $write("0x%05x ", storage_block_x.narrow_y.mem[0*256+i]); $write("\n"); + $write("X.Y.NARROW.B: "); for (i=0; i<32; i=i+1) $write("0x%05x ", storage_block_x.narrow_y.mem[1*256+i]); $write("\n"); + $write("X.Y.NARROW.C: "); for (i=0; i<32; i=i+1) $write("0x%05x ", storage_block_x.narrow_y.mem[2*256+i]); $write("\n"); + $write("X.Y.NARROW.D: "); for (i=0; i<32; i=i+1) $write("0x%05x ", storage_block_x.narrow_y.mem[3*256+i]); $write("\n"); + $write("X.Y.NARROW.E: "); for (i=0; i<32; i=i+1) $write("0x%05x ", storage_block_x.narrow_y.mem[4*256+i]); $write("\n"); + $write("X.Y.NARROW.COEFF: "); for (i=0; i<32; i=i+1) $write("0x%05x ", storage_block_x.narrow_y.mem[5*256+i]); $write("\n"); + $write("X.Y.NARROW.Q: "); for (i=0; i<32; i=i+1) $write("0x%05x ", storage_block_x.narrow_y.mem[6*256+i]); $write("\n"); + $write("X.Y.NARROW.EXT: "); for (i=0; i<32; i=i+1) $write("0x%05x ", storage_block_x.narrow_y.mem[7*256+i]); $write("\n"); // // Y.X // - $write(" "); for (i=0; i<64; i=i+1) $write("[ %3d ] ", i); $write("\n"); - $write("Y.X.WIDE.A: "); for (i=0; i<64; i=i+1) $write("0x%05x ", storage_block_y.gen_wide[0].wide_x.mem[0*256+i]); $write("\n"); - $write("Y.X.WIDE.B: "); for (i=0; i<64; i=i+1) $write("0x%05x ", storage_block_y.gen_wide[0].wide_x.mem[1*256+i]); $write("\n"); - $write("Y.X.WIDE.C: "); for (i=0; i<64; i=i+1) $write("0x%05x ", storage_block_y.gen_wide[0].wide_x.mem[2*256+i]); $write("\n"); - $write("Y.X.WIDE.D: "); for (i=0; i<64; i=i+1) $write("0x%05x ", storage_block_y.gen_wide[0].wide_x.mem[3*256+i]); $write("\n"); - $write("Y.X.WIDE.E: "); for (i=0; i<64; i=i+1) $write("0x%05x ", storage_block_y.gen_wide[0].wide_x.mem[4*256+i]); $write("\n"); - $write("Y.X.WIDE.N: "); for (i=0; i<64; i=i+1) $write("0x%05x ", storage_block_y.gen_wide[0].wide_x.mem[5*256+i]); $write("\n"); - $write("Y.X.WIDE.L: "); for (i=0; i<64; i=i+1) $write("0x%05x ", storage_block_y.gen_wide[0].wide_x.mem[6*256+i]); $write("\n"); - $write("Y.X.WIDE.H: "); for (i=0; i<64; i=i+1) $write("0x%05x ", storage_block_y.gen_wide[0].wide_x.mem[7*256+i]); $write("\n"); - $write(" "); for (i=0; i<64; i=i+1) $write(" ------ "); $write("\n"); - $write("Y.X.NARROW.A: "); for (i=0; i<64; i=i+1) $write("0x%05x ", storage_block_y.narrow_x.mem[0*256+i]); $write("\n"); - $write("Y.X.NARROW.B: "); for (i=0; i<64; i=i+1) $write("0x%05x ", storage_block_y.narrow_x.mem[1*256+i]); $write("\n"); - $write("Y.X.NARROW.C: "); for (i=0; i<64; i=i+1) $write("0x%05x ", storage_block_y.narrow_x.mem[2*256+i]); $write("\n"); - $write("Y.X.NARROW.D: "); for (i=0; i<64; i=i+1) $write("0x%05x ", storage_block_y.narrow_x.mem[3*256+i]); $write("\n"); - $write("Y.X.NARROW.E: "); for (i=0; i<64; i=i+1) $write("0x%05x ", storage_block_y.narrow_x.mem[4*256+i]); $write("\n"); - $write("Y.X.NARROW.COEFF: "); for (i=0; i<64; i=i+1) $write("0x%05x ", storage_block_y.narrow_x.mem[5*256+i]); $write("\n"); - $write("Y.X.NARROW.Q: "); for (i=0; i<64; i=i+1) $write("0x%05x ", storage_block_y.narrow_x.mem[6*256+i]); $write("\n"); - $write("Y.X.NARROW.EXT: "); for (i=0; i<64; i=i+1) $write("0x%05x ", storage_block_y.narrow_x.mem[7*256+i]); $write("\n"); + $write(" "); for (i=0; i<32; i=i+1) $write("[ %3d ] ", i); $write("\n"); + $write("Y.X.WIDE.A: "); for (i=0; i<32; i=i+1) $write("0x%05x ", storage_block_y.gen_wide[0].wide_x.mem[0*256+i]); $write("\n"); + $write("Y.X.WIDE.B: "); for (i=0; i<32; i=i+1) $write("0x%05x ", storage_block_y.gen_wide[0].wide_x.mem[1*256+i]); $write("\n"); + $write("Y.X.WIDE.C: "); for (i=0; i<32; i=i+1) $write("0x%05x ", storage_block_y.gen_wide[0].wide_x.mem[2*256+i]); $write("\n"); + $write("Y.X.WIDE.D: "); for (i=0; i<32; i=i+1) $write("0x%05x ", storage_block_y.gen_wide[0].wide_x.mem[3*256+i]); $write("\n"); + $write("Y.X.WIDE.E: "); for (i=0; i<32; i=i+1) $write("0x%05x ", storage_block_y.gen_wide[0].wide_x.mem[4*256+i]); $write("\n"); + $write("Y.X.WIDE.N: "); for (i=0; i<32; i=i+1) $write("0x%05x ", storage_block_y.gen_wide[0].wide_x.mem[5*256+i]); $write("\n"); + $write("Y.X.WIDE.L: "); for (i=0; i<32; i=i+1) $write("0x%05x ", storage_block_y.gen_wide[0].wide_x.mem[6*256+i]); $write("\n"); + $write("Y.X.WIDE.H: "); for (i=0; i<32; i=i+1) $write("0x%05x ", storage_block_y.gen_wide[0].wide_x.mem[7*256+i]); $write("\n"); + $write(" "); for (i=0; i<32; i=i+1) $write("------- "); $write("\n"); + $write("Y.X.NARROW.A: "); for (i=0; i<32; i=i+1) $write("0x%05x ", storage_block_y.narrow_x.mem[0*256+i]); $write("\n"); + $write("Y.X.NARROW.B: "); for (i=0; i<32; i=i+1) $write("0x%05x ", storage_block_y.narrow_x.mem[1*256+i]); $write("\n"); + $write("Y.X.NARROW.C: "); for (i=0; i<32; i=i+1) $write("0x%05x ", storage_block_y.narrow_x.mem[2*256+i]); $write("\n"); + $write("Y.X.NARROW.D: "); for (i=0; i<32; i=i+1) $write("0x%05x ", storage_block_y.narrow_x.mem[3*256+i]); $write("\n"); + $write("Y.X.NARROW.E: "); for (i=0; i<32; i=i+1) $write("0x%05x ", storage_block_y.narrow_x.mem[4*256+i]); $write("\n"); + $write("Y.X.NARROW.COEFF: "); for (i=0; i<32; i=i+1) $write("0x%05x ", storage_block_y.narrow_x.mem[5*256+i]); $write("\n"); + $write("Y.X.NARROW.Q: "); for (i=0; i<32; i=i+1) $write("0x%05x ", storage_block_y.narrow_x.mem[6*256+i]); $write("\n"); + $write("Y.X.NARROW.EXT: "); for (i=0; i<32; i=i+1) $write("0x%05x ", storage_block_y.narrow_x.mem[7*256+i]); $write("\n"); // // Y.Y // - $write(" "); for (i=0; i<64; i=i+1) $write("[ %3d ] ", i); $write("\n"); - $write("Y.Y.WIDE.A: "); for (i=0; i<64; i=i+1) $write("0x%05x ", storage_block_y.gen_wide[0].wide_y.mem[0*256+i]); $write("\n"); - $write("Y.Y.WIDE.B: "); for (i=0; i<64; i=i+1) $write("0x%05x ", storage_block_y.gen_wide[0].wide_y.mem[1*256+i]); $write("\n"); - $write("Y.Y.WIDE.C: "); for (i=0; i<64; i=i+1) $write("0x%05x ", storage_block_y.gen_wide[0].wide_y.mem[2*256+i]); $write("\n"); - $write("Y.Y.WIDE.D: "); for (i=0; i<64; i=i+1) $write("0x%05x ", storage_block_y.gen_wide[0].wide_y.mem[3*256+i]); $write("\n"); - $write("Y.Y.WIDE.E: "); for (i=0; i<64; i=i+1) $write("0x%05x ", storage_block_y.gen_wide[0].wide_y.mem[4*256+i]); $write("\n"); - $write("Y.Y.WIDE.N: "); for (i=0; i<64; i=i+1) $write("0x%05x ", storage_block_y.gen_wide[0].wide_y.mem[5*256+i]); $write("\n"); - $write("Y.Y.WIDE.L: "); for (i=0; i<64; i=i+1) $write("0x%05x ", storage_block_y.gen_wide[0].wide_y.mem[6*256+i]); $write("\n"); - $write("Y.Y.WIDE.H: "); for (i=0; i<64; i=i+1) $write("0x%05x ", storage_block_y.gen_wide[0].wide_y.mem[7*256+i]); $write("\n"); - $write(" "); for (i=0; i<64; i=i+1) $write(" ------ "); $write("\n"); - $write("Y.Y.NARROW.A: "); for (i=0; i<64; i=i+1) $write("0x%05x ", storage_block_y.narrow_y.mem[0*256+i]); $write("\n"); - $write("Y.Y.NARROW.B: "); for (i=0; i<64; i=i+1) $write("0x%05x ", storage_block_y.narrow_y.mem[1*256+i]); $write("\n"); - $write("Y.Y.NARROW.C: "); for (i=0; i<64; i=i+1) $write("0x%05x ", storage_block_y.narrow_y.mem[2*256+i]); $write("\n"); - $write("Y.Y.NARROW.D: "); for (i=0; i<64; i=i+1) $write("0x%05x ", storage_block_y.narrow_y.mem[3*256+i]); $write("\n"); - $write("Y.Y.NARROW.E: "); for (i=0; i<64; i=i+1) $write("0x%05x ", storage_block_y.narrow_y.mem[4*256+i]); $write("\n"); - $write("Y.Y.NARROW.COEFF: "); for (i=0; i<64; i=i+1) $write("0x%05x ", storage_block_y.narrow_y.mem[5*256+i]); $write("\n"); - $write("Y.Y.NARROW.Q: "); for (i=0; i<64; i=i+1) $write("0x%05x ", storage_block_y.narrow_y.mem[6*256+i]); $write("\n"); - $write("Y.Y.NARROW.EXT: "); for (i=0; i<64; i=i+1) $write("0x%05x ", storage_block_y.narrow_y.mem[7*256+i]); $write("\n"); + $write(" "); for (i=0; i<32; i=i+1) $write("[ %3d ] ", i); $write("\n"); + $write("Y.Y.WIDE.A: "); for (i=0; i<32; i=i+1) $write("0x%05x ", storage_block_y.gen_wide[0].wide_y.mem[0*256+i]); $write("\n"); + $write("Y.Y.WIDE.B: "); for (i=0; i<32; i=i+1) $write("0x%05x ", storage_block_y.gen_wide[0].wide_y.mem[1*256+i]); $write("\n"); + $write("Y.Y.WIDE.C: "); for (i=0; i<32; i=i+1) $write("0x%05x ", storage_block_y.gen_wide[0].wide_y.mem[2*256+i]); $write("\n"); + $write("Y.Y.WIDE.D: "); for (i=0; i<32; i=i+1) $write("0x%05x ", storage_block_y.gen_wide[0].wide_y.mem[3*256+i]); $write("\n"); + $write("Y.Y.WIDE.E: "); for (i=0; i<32; i=i+1) $write("0x%05x ", storage_block_y.gen_wide[0].wide_y.mem[4*256+i]); $write("\n"); + $write("Y.Y.WIDE.N: "); for (i=0; i<32; i=i+1) $write("0x%05x ", storage_block_y.gen_wide[0].wide_y.mem[5*256+i]); $write("\n"); + $write("Y.Y.WIDE.L: "); for (i=0; i<32; i=i+1) $write("0x%05x ", storage_block_y.gen_wide[0].wide_y.mem[6*256+i]); $write("\n"); + $write("Y.Y.WIDE.H: "); for (i=0; i<32; i=i+1) $write("0x%05x ", storage_block_y.gen_wide[0].wide_y.mem[7*256+i]); $write("\n"); + $write(" "); for (i=0; i<32; i=i+1) $write("------- "); $write("\n"); + $write("Y.Y.NARROW.A: "); for (i=0; i<32; i=i+1) $write("0x%05x ", storage_block_y.narrow_y.mem[0*256+i]); $write("\n"); + $write("Y.Y.NARROW.B: "); for (i=0; i<32; i=i+1) $write("0x%05x ", storage_block_y.narrow_y.mem[1*256+i]); $write("\n"); + $write("Y.Y.NARROW.C: "); for (i=0; i<32; i=i+1) $write("0x%05x ", storage_block_y.narrow_y.mem[2*256+i]); $write("\n"); + $write("Y.Y.NARROW.D: "); for (i=0; i<32; i=i+1) $write("0x%05x ", storage_block_y.narrow_y.mem[3*256+i]); $write("\n"); + $write("Y.Y.NARROW.E: "); for (i=0; i<32; i=i+1) $write("0x%05x ", storage_block_y.narrow_y.mem[4*256+i]); $write("\n"); + $write("Y.Y.NARROW.COEFF: "); for (i=0; i<32; i=i+1) $write("0x%05x ", storage_block_y.narrow_y.mem[5*256+i]); $write("\n"); + $write("Y.Y.NARROW.Q: "); for (i=0; i<32; i=i+1) $write("0x%05x ", storage_block_y.narrow_y.mem[6*256+i]); $write("\n"); + $write("Y.Y.NARROW.EXT: "); for (i=0; i<32; i=i+1) $write("0x%05x ", storage_block_y.narrow_y.mem[7*256+i]); $write("\n"); // end From git at cryptech.is Mon Jan 20 21:18:10 2020 From: git at cryptech.is (git at cryptech.is) Date: Mon, 20 Jan 2020 21:18:10 +0000 Subject: [Cryptech-Commits] [user/shatov/modexpng] 09/21: Updated microcode source to match the changes made to general worker module. In-Reply-To: <157955508198.72689.15683234965619703441@bikeshed.cryptech.is> References: <157955508198.72689.15683234965619703441@bikeshed.cryptech.is> Message-ID: <20200120211813.0FBC18BC2D0@bikeshed.cryptech.is> This is an automated email from the git hooks/post-receive script. meisterpaul1 at yandex.ru pushed a commit to branch master in repository user/shatov/modexpng. commit b590661dcc2da8b3fea52a587369eebbf9fd59b0 Author: Pavel V. Shatov (Meister) AuthorDate: Mon Jan 20 23:55:35 2020 +0300 Updated microcode source to match the changes made to general worker module. --- rtl/modexpng_uop_rom.v | 281 +++++++++++++++++++++++++------------------------ 1 file changed, 141 insertions(+), 140 deletions(-) diff --git a/rtl/modexpng_uop_rom.v b/rtl/modexpng_uop_rom.v index 7e199fe..3fdec00 100644 --- a/rtl/modexpng_uop_rom.v +++ b/rtl/modexpng_uop_rom.v @@ -50,150 +50,151 @@ module modexpng_uop_rom // // CRT mode // - 7'd000: data <= {UOP_OPCODE_INPUT_TO_WIDE, UOP_CRT_X, UOP_NPQ_N, UOP_AUX_1, UOP_LADDER_DNC, BANK_DNC, BANK_IN_1_N, BANK_WIDE_N, BANK_DNC }; // - 7'd001: data <= {UOP_OPCODE_INPUT_TO_WIDE, UOP_CRT_Y, UOP_NPQ_N, UOP_AUX_1, UOP_LADDER_DNC, BANK_DNC, BANK_IN_1_N, BANK_WIDE_N, BANK_DNC }; // - 7'd002: data <= {UOP_OPCODE_INPUT_TO_WIDE, UOP_CRT_X, UOP_NPQ_N, UOP_AUX_1, UOP_LADDER_DNC, BANK_DNC, BANK_IN_1_X, BANK_WIDE_A, BANK_DNC }; // - 7'd003: data <= {UOP_OPCODE_INPUT_TO_WIDE, UOP_CRT_Y, UOP_NPQ_N, UOP_AUX_1, UOP_LADDER_DNC, BANK_DNC, BANK_IN_1_Y, BANK_WIDE_A, BANK_DNC }; // - 7'd004: data <= {UOP_OPCODE_INPUT_TO_WIDE, UOP_CRT_X, UOP_NPQ_N, UOP_AUX_1, UOP_LADDER_DNC, BANK_DNC, BANK_IN_1_M, BANK_WIDE_E, BANK_DNC }; // - 7'd005: data <= {UOP_OPCODE_INPUT_TO_WIDE, UOP_CRT_Y, UOP_NPQ_N, UOP_AUX_1, UOP_LADDER_DNC, BANK_DNC, BANK_IN_1_M, BANK_WIDE_E, BANK_DNC }; // - // - 7'd006: data <= {UOP_OPCODE_INPUT_TO_NARROW, UOP_CRT_X, UOP_NPQ_N, UOP_AUX_1, UOP_LADDER_DNC, BANK_DNC, BANK_IN_1_N_COEFF, BANK_DNC, BANK_NARROW_COEFF}; // - 7'd007: data <= {UOP_OPCODE_INPUT_TO_NARROW, UOP_CRT_Y, UOP_NPQ_N, UOP_AUX_1, UOP_LADDER_DNC, BANK_DNC, BANK_IN_1_N_COEFF, BANK_DNC, BANK_NARROW_COEFF}; // - 7'd008: data <= {UOP_OPCODE_INPUT_TO_NARROW, UOP_CRT_X, UOP_NPQ_N, UOP_AUX_1, UOP_LADDER_DNC, BANK_DNC, BANK_IN_1_N_FACTOR, BANK_DNC, BANK_NARROW_A }; // - 7'd009: data <= {UOP_OPCODE_INPUT_TO_NARROW, UOP_CRT_Y, UOP_NPQ_N, UOP_AUX_1, UOP_LADDER_DNC, BANK_DNC, BANK_IN_1_N_FACTOR, BANK_DNC, BANK_NARROW_A }; // - 7'd010: data <= {UOP_OPCODE_INPUT_TO_NARROW, UOP_CRT_X, UOP_NPQ_N, UOP_AUX_1, UOP_LADDER_DNC, BANK_DNC, BANK_IN_1_M, BANK_DNC, BANK_NARROW_E }; // - 7'd011: data <= {UOP_OPCODE_INPUT_TO_NARROW, UOP_CRT_Y, UOP_NPQ_N, UOP_AUX_1, UOP_LADDER_DNC, BANK_DNC, BANK_IN_1_M, BANK_DNC, BANK_NARROW_E }; // - // - 7'd012: data <= {UOP_OPCODE_MODULAR_MULTIPLY, UOP_CRT_DNC, UOP_NPQ_N, UOP_AUX_1, UOP_LADDER_11, BANK_WIDE_A, BANK_NARROW_A, BANK_WIDE_B, BANK_NARROW_B }; // - 7'd013: data <= {UOP_OPCODE_MODULAR_MULTIPLY, UOP_CRT_DNC, UOP_NPQ_N, UOP_AUX_1, UOP_LADDER_11, BANK_WIDE_B, BANK_NARROW_B, BANK_WIDE_C, BANK_NARROW_C }; // - 7'd014: data <= {UOP_OPCODE_MODULAR_MULTIPLY, UOP_CRT_DNC, UOP_NPQ_N, UOP_AUX_2, UOP_LADDER_11, BANK_WIDE_C, BANK_DNC, BANK_WIDE_D, BANK_NARROW_D }; // - // - 7'd015: data <= {UOP_OPCODE_PROPAGATE_CARRIES, UOP_CRT_DNC, UOP_NPQ_N, UOP_AUX_DNC, UOP_LADDER_DNC, BANK_DNC, BANK_NARROW_D, BANK_DNC, BANK_NARROW_D }; // - // - 7'd016: data <= {UOP_OPCODE_OUTPUT_FROM_NARROW, UOP_CRT_X, UOP_NPQ_N, UOP_AUX_1, UOP_LADDER_DNC, BANK_DNC, BANK_NARROW_D, BANK_DNC, BANK_OUT_XM }; // - 7'd017: data <= {UOP_OPCODE_OUTPUT_FROM_NARROW, UOP_CRT_Y, UOP_NPQ_N, UOP_AUX_1, UOP_LADDER_DNC, BANK_DNC, BANK_NARROW_D, BANK_DNC, BANK_OUT_YM }; // - // - 7'd018: data <= {UOP_OPCODE_MODULAR_MULTIPLY, UOP_CRT_DNC, UOP_NPQ_N, UOP_AUX_1, UOP_LADDER_11, BANK_WIDE_E, BANK_NARROW_B, BANK_WIDE_C, BANK_NARROW_C }; // - // - 7'd019: data <= {UOP_OPCODE_PROPAGATE_CARRIES, UOP_CRT_DNC, UOP_NPQ_N, UOP_AUX_DNC, UOP_LADDER_DNC, BANK_DNC, BANK_NARROW_C, BANK_DNC, BANK_NARROW_C }; // - // - 7'd020: data <= {UOP_OPCODE_COPY_CRT_Y2X, UOP_CRT_DNC, UOP_NPQ_N, UOP_AUX_DNC, UOP_LADDER_DNC, BANK_WIDE_C, BANK_NARROW_C, BANK_WIDE_C, BANK_NARROW_C }; // - // - 7'd021: data <= {UOP_OPCODE_INPUT_TO_WIDE, UOP_CRT_X, UOP_NPQ_PQ, UOP_AUX_2, UOP_LADDER_DNC, BANK_DNC, BANK_IN_2_P, BANK_WIDE_N, BANK_DNC }; // - 7'd022: data <= {UOP_OPCODE_INPUT_TO_WIDE, UOP_CRT_Y, UOP_NPQ_PQ, UOP_AUX_2, UOP_LADDER_DNC, BANK_DNC, BANK_IN_2_Q, BANK_WIDE_N, BANK_DNC }; // - 7'd023: data <= {UOP_OPCODE_INPUT_TO_WIDE, UOP_CRT_X, UOP_NPQ_PQ, UOP_AUX_2, UOP_LADDER_DNC, BANK_DNC, BANK_IN_2_P_FACTOR, BANK_WIDE_A, BANK_DNC }; // - 7'd024: data <= {UOP_OPCODE_INPUT_TO_WIDE, UOP_CRT_Y, UOP_NPQ_PQ, UOP_AUX_2, UOP_LADDER_DNC, BANK_DNC, BANK_IN_2_Q_FACTOR, BANK_WIDE_A, BANK_DNC }; // - 7'd025: data <= {UOP_OPCODE_INPUT_TO_WIDE, UOP_CRT_X, UOP_NPQ_PQ, UOP_AUX_2, UOP_LADDER_DNC, BANK_DNC, BANK_IN_2_QINV, BANK_WIDE_E, BANK_DNC }; // - // - 7'd026: data <= {UOP_OPCODE_INPUT_TO_NARROW, UOP_CRT_X, UOP_NPQ_PQ, UOP_AUX_2, UOP_LADDER_DNC, BANK_DNC, BANK_IN_2_P_COEFF, BANK_DNC, BANK_NARROW_COEFF}; // - 7'd027: data <= {UOP_OPCODE_INPUT_TO_NARROW, UOP_CRT_Y, UOP_NPQ_PQ, UOP_AUX_2, UOP_LADDER_DNC, BANK_DNC, BANK_IN_2_Q_COEFF, BANK_DNC, BANK_NARROW_COEFF}; // - 7'd028: data <= {UOP_OPCODE_INPUT_TO_NARROW, UOP_CRT_X, UOP_NPQ_PQ, UOP_AUX_2, UOP_LADDER_DNC, BANK_DNC, BANK_IN_2_P_FACTOR, BANK_DNC, BANK_NARROW_A }; // - 7'd029: data <= {UOP_OPCODE_INPUT_TO_NARROW, UOP_CRT_Y, UOP_NPQ_PQ, UOP_AUX_2, UOP_LADDER_DNC, BANK_DNC, BANK_IN_2_Q_FACTOR, BANK_DNC, BANK_NARROW_A }; // - 7'd030: data <= {UOP_OPCODE_INPUT_TO_NARROW, UOP_CRT_X, UOP_NPQ_PQ, UOP_AUX_2, UOP_LADDER_DNC, BANK_DNC, BANK_IN_2_QINV, BANK_DNC, BANK_NARROW_E }; // - // - 7'd031: data <= {UOP_OPCODE_MODULAR_REDUCE_INIT, UOP_CRT_DNC, UOP_NPQ_DNC, UOP_AUX_DNC, UOP_LADDER_DNC, BANK_DNC, BANK_NARROW_C, BANK_DNC, BANK_DNC }; // - // - 7'd032: data <= {UOP_OPCODE_MODULAR_REDUCE_PROC, UOP_CRT_DNC, UOP_NPQ_PQ, UOP_AUX_DNC, UOP_LADDER_DNC, BANK_DNC, BANK_DNC, BANK_WIDE_D, BANK_NARROW_D }; // - // - 7'd033: data <= {UOP_OPCODE_MODULAR_MULTIPLY, UOP_CRT_DNC, UOP_NPQ_PQ, UOP_AUX_1, UOP_LADDER_11, BANK_WIDE_D, BANK_NARROW_A, BANK_WIDE_C, BANK_NARROW_C }; // - 7'd034: data <= {UOP_OPCODE_MODULAR_MULTIPLY, UOP_CRT_DNC, UOP_NPQ_PQ, UOP_AUX_1, UOP_LADDER_11, BANK_WIDE_C, BANK_NARROW_A, BANK_WIDE_D, BANK_NARROW_D }; // - 7'd035: data <= {UOP_OPCODE_MODULAR_MULTIPLY, UOP_CRT_DNC, UOP_NPQ_PQ, UOP_AUX_2, UOP_LADDER_11, BANK_WIDE_A, BANK_DNC, BANK_WIDE_C, BANK_NARROW_C }; // - // - 7'd036: data <= {UOP_OPCODE_COPY_LADDERS_X2Y, UOP_CRT_DNC, UOP_NPQ_PQ, UOP_AUX_DNC, UOP_LADDER_DNC, BANK_WIDE_D, BANK_NARROW_D, BANK_WIDE_C, BANK_NARROW_C }; // - // - 7'd037: data <= {UOP_OPCODE_LADDER_INIT, UOP_CRT_DNC, UOP_NPQ_DNC, UOP_AUX_DNC, UOP_LADDER_DNC, BANK_DNC, BANK_DNC, BANK_DNC, BANK_DNC }; // - 7'd038: data <= {UOP_OPCODE_MODULAR_MULTIPLY, UOP_CRT_DNC, UOP_NPQ_PQ, UOP_AUX_1, UOP_LADDER_PQ, BANK_WIDE_C, BANK_NARROW_C, BANK_WIDE_C, BANK_NARROW_C }; // - 7'd039: data <= {UOP_OPCODE_LADDER_STEP, UOP_CRT_DNC, UOP_NPQ_DNC, UOP_AUX_DNC, UOP_LADDER_DNC, BANK_DNC, BANK_DNC, BANK_DNC, BANK_DNC }; // - // - 7'd040: data <= {UOP_OPCODE_MODULAR_MULTIPLY, UOP_CRT_DNC, UOP_NPQ_PQ, UOP_AUX_2, UOP_LADDER_11, BANK_WIDE_C, BANK_DNC, BANK_WIDE_D, BANK_NARROW_D }; // - // - 7'd041: data <= {UOP_OPCODE_PROPAGATE_CARRIES, UOP_CRT_DNC, UOP_NPQ_PQ, UOP_AUX_DNC, UOP_LADDER_DNC, BANK_DNC, BANK_NARROW_D, BANK_DNC, BANK_NARROW_D }; // - // - 7'd042: data <= {UOP_OPCODE_CROSS_LADDERS_X2Y, UOP_CRT_DNC, UOP_NPQ_PQ, UOP_AUX_DNC, UOP_LADDER_DNC, BANK_WIDE_D, BANK_NARROW_D, BANK_WIDE_D, BANK_NARROW_D }; // - // - 7'd043: data <= {UOP_OPCODE_MODULAR_SUBTRACT, UOP_CRT_DNC, UOP_NPQ_PQ, UOP_AUX_DNC, UOP_LADDER_DNC, BANK_DNC, BANK_NARROW_D, BANK_WIDE_C, BANK_NARROW_C }; // - // - 7'd044: data <= {UOP_OPCODE_MODULAR_MULTIPLY, UOP_CRT_DNC, UOP_NPQ_PQ, UOP_AUX_1, UOP_LADDER_11, BANK_WIDE_C, BANK_NARROW_E, BANK_WIDE_C, BANK_NARROW_C }; // - 7'd045: data <= {UOP_OPCODE_MODULAR_MULTIPLY, UOP_CRT_DNC, UOP_NPQ_PQ, UOP_AUX_1, UOP_LADDER_11, BANK_WIDE_C, BANK_NARROW_A, BANK_WIDE_C, BANK_NARROW_C }; // - // - 7'd046: data <= {UOP_OPCODE_INPUT_TO_WIDE, UOP_CRT_X, UOP_NPQ_PQ, UOP_AUX_2, UOP_LADDER_DNC, BANK_DNC, BANK_IN_2_Q, BANK_WIDE_E, BANK_DNC }; // - // - 7'd047: data <= {UOP_OPCODE_INPUT_TO_NARROW, UOP_CRT_X, UOP_NPQ_PQ, UOP_AUX_2, UOP_LADDER_DNC, BANK_DNC, BANK_IN_2_Q, BANK_DNC, BANK_NARROW_E }; // - // - 7'd048: data <= {UOP_OPCODE_REGULAR_MULTIPLY, UOP_CRT_DNC, UOP_NPQ_PQ, UOP_AUX_1, UOP_LADDER_11, BANK_WIDE_E, BANK_NARROW_C, BANK_DNC, BANK_DNC }; // - // - 7'd049: data <= {UOP_OPCODE_MERGE_LH, UOP_CRT_DNC, UOP_NPQ_DNC, UOP_AUX_DNC, UOP_LADDER_DNC, BANK_DNC, BANK_DNC, BANK_DNC, BANK_NARROW_A }; // - // - 7'd050: data <= {UOP_OPCODE_PROPAGATE_CARRIES, UOP_CRT_DNC, UOP_NPQ_N, UOP_AUX_DNC, UOP_LADDER_DNC, BANK_DNC, BANK_NARROW_A, BANK_DNC, BANK_NARROW_A }; // - // - 7'd051: data <= {UOP_OPCODE_COPY_CRT_Y2X, UOP_CRT_DNC, UOP_NPQ_PQ, UOP_AUX_DNC, UOP_LADDER_DNC, BANK_WIDE_D, BANK_NARROW_D, BANK_WIDE_D, BANK_NARROW_D }; // - // - 7'd052: data <= {UOP_OPCODE_REGULAR_ADD_UNEVEN, UOP_CRT_DNC, UOP_NPQ_N, UOP_AUX_DNC, UOP_LADDER_DNC, BANK_NARROW_D, BANK_NARROW_A, BANK_DNC, BANK_NARROW_C }; // - // - 7'd053: data <= {UOP_OPCODE_INPUT_TO_WIDE, UOP_CRT_X, UOP_NPQ_N, UOP_AUX_1, UOP_LADDER_DNC, BANK_DNC, BANK_IN_1_N, BANK_WIDE_N, BANK_DNC }; // - 7'd054: data <= {UOP_OPCODE_INPUT_TO_WIDE, UOP_CRT_Y, UOP_NPQ_N, UOP_AUX_1, UOP_LADDER_DNC, BANK_DNC, BANK_IN_1_N, BANK_WIDE_N, BANK_DNC }; // - // - 7'd055: data <= {UOP_OPCODE_INPUT_TO_NARROW, UOP_CRT_X, UOP_NPQ_N, UOP_AUX_1, UOP_LADDER_DNC, BANK_DNC, BANK_IN_1_N_COEFF, BANK_DNC, BANK_NARROW_COEFF}; // - 7'd056: data <= {UOP_OPCODE_INPUT_TO_NARROW, UOP_CRT_Y, UOP_NPQ_N, UOP_AUX_1, UOP_LADDER_DNC, BANK_DNC, BANK_IN_1_N_COEFF, BANK_DNC, BANK_NARROW_COEFF}; // - // - 7'd057: data <= {UOP_OPCODE_MODULAR_MULTIPLY, UOP_CRT_DNC, UOP_NPQ_N, UOP_AUX_1, UOP_LADDER_00, BANK_WIDE_B, BANK_NARROW_C, BANK_WIDE_A, BANK_NARROW_A }; // - // - 7'd058: data <= {UOP_OPCODE_PROPAGATE_CARRIES, UOP_CRT_DNC, UOP_NPQ_N, UOP_AUX_DNC, UOP_LADDER_DNC, BANK_DNC, BANK_NARROW_A, BANK_DNC, BANK_NARROW_A }; // - // - 7'd059: data <= {UOP_OPCODE_OUTPUT_FROM_NARROW, UOP_CRT_X, UOP_NPQ_N, UOP_AUX_1, UOP_LADDER_DNC, BANK_DNC, BANK_NARROW_A, BANK_DNC, BANK_OUT_S }; // + 7'd000: data <= {UOP_OPCODE_INPUT_TO_WIDE, UOP_CRT_X, UOP_NPQ_N, UOP_AUX_1, UOP_LADDER_DNC, BANK_DNC, BANK_IN_1_N, BANK_WIDE_N, BANK_DNC }; // + 7'd001: data <= {UOP_OPCODE_INPUT_TO_WIDE, UOP_CRT_Y, UOP_NPQ_N, UOP_AUX_1, UOP_LADDER_DNC, BANK_DNC, BANK_IN_1_N, BANK_WIDE_N, BANK_DNC }; // + 7'd002: data <= {UOP_OPCODE_INPUT_TO_WIDE, UOP_CRT_X, UOP_NPQ_N, UOP_AUX_1, UOP_LADDER_DNC, BANK_DNC, BANK_IN_1_X, BANK_WIDE_A, BANK_DNC }; // + 7'd003: data <= {UOP_OPCODE_INPUT_TO_WIDE, UOP_CRT_Y, UOP_NPQ_N, UOP_AUX_1, UOP_LADDER_DNC, BANK_DNC, BANK_IN_1_Y, BANK_WIDE_A, BANK_DNC }; // + 7'd004: data <= {UOP_OPCODE_INPUT_TO_WIDE, UOP_CRT_X, UOP_NPQ_N, UOP_AUX_1, UOP_LADDER_DNC, BANK_DNC, BANK_IN_1_M, BANK_WIDE_E, BANK_DNC }; // + 7'd005: data <= {UOP_OPCODE_INPUT_TO_WIDE, UOP_CRT_Y, UOP_NPQ_N, UOP_AUX_1, UOP_LADDER_DNC, BANK_DNC, BANK_IN_1_M, BANK_WIDE_E, BANK_DNC }; // + // + 7'd006: data <= {UOP_OPCODE_INPUT_TO_NARROW, UOP_CRT_X, UOP_NPQ_N, UOP_AUX_1, UOP_LADDER_DNC, BANK_DNC, BANK_IN_1_N_COEFF, BANK_DNC, BANK_NARROW_COEFF}; // + 7'd007: data <= {UOP_OPCODE_INPUT_TO_NARROW, UOP_CRT_Y, UOP_NPQ_N, UOP_AUX_1, UOP_LADDER_DNC, BANK_DNC, BANK_IN_1_N_COEFF, BANK_DNC, BANK_NARROW_COEFF}; // + 7'd008: data <= {UOP_OPCODE_INPUT_TO_NARROW, UOP_CRT_X, UOP_NPQ_N, UOP_AUX_1, UOP_LADDER_DNC, BANK_DNC, BANK_IN_1_N_FACTOR, BANK_DNC, BANK_NARROW_A }; // + 7'd009: data <= {UOP_OPCODE_INPUT_TO_NARROW, UOP_CRT_Y, UOP_NPQ_N, UOP_AUX_1, UOP_LADDER_DNC, BANK_DNC, BANK_IN_1_N_FACTOR, BANK_DNC, BANK_NARROW_A }; // + 7'd010: data <= {UOP_OPCODE_INPUT_TO_NARROW, UOP_CRT_X, UOP_NPQ_N, UOP_AUX_1, UOP_LADDER_DNC, BANK_DNC, BANK_IN_1_M, BANK_DNC, BANK_NARROW_E }; // + 7'd011: data <= {UOP_OPCODE_INPUT_TO_NARROW, UOP_CRT_Y, UOP_NPQ_N, UOP_AUX_1, UOP_LADDER_DNC, BANK_DNC, BANK_IN_1_M, BANK_DNC, BANK_NARROW_E }; // + // + 7'd012: data <= {UOP_OPCODE_MODULAR_MULTIPLY, UOP_CRT_DNC, UOP_NPQ_N, UOP_AUX_1, UOP_LADDER_11, BANK_WIDE_A, BANK_NARROW_A, BANK_WIDE_B, BANK_NARROW_B }; // + 7'd013: data <= {UOP_OPCODE_MODULAR_MULTIPLY, UOP_CRT_DNC, UOP_NPQ_N, UOP_AUX_1, UOP_LADDER_11, BANK_WIDE_B, BANK_NARROW_B, BANK_WIDE_C, BANK_NARROW_C }; // + 7'd014: data <= {UOP_OPCODE_MODULAR_MULTIPLY, UOP_CRT_DNC, UOP_NPQ_N, UOP_AUX_2, UOP_LADDER_11, BANK_WIDE_C, BANK_DNC, BANK_WIDE_D, BANK_NARROW_D }; // + // + 7'd015: data <= {UOP_OPCODE_PROPAGATE_CARRIES, UOP_CRT_DNC, UOP_NPQ_N, UOP_AUX_DNC, UOP_LADDER_DNC, BANK_DNC, BANK_NARROW_D, BANK_DNC, BANK_NARROW_D }; // + // + 7'd016: data <= {UOP_OPCODE_OUTPUT_FROM_NARROW, UOP_CRT_X, UOP_NPQ_N, UOP_AUX_1, UOP_LADDER_DNC, BANK_DNC, BANK_NARROW_D, BANK_DNC, BANK_OUT_XM }; // + 7'd017: data <= {UOP_OPCODE_OUTPUT_FROM_NARROW, UOP_CRT_Y, UOP_NPQ_N, UOP_AUX_1, UOP_LADDER_DNC, BANK_DNC, BANK_NARROW_D, BANK_DNC, BANK_OUT_YM }; // + // + 7'd018: data <= {UOP_OPCODE_MODULAR_MULTIPLY, UOP_CRT_DNC, UOP_NPQ_N, UOP_AUX_1, UOP_LADDER_11, BANK_WIDE_E, BANK_NARROW_B, BANK_WIDE_C, BANK_NARROW_C }; // + // + 7'd019: data <= {UOP_OPCODE_PROPAGATE_CARRIES, UOP_CRT_DNC, UOP_NPQ_N, UOP_AUX_DNC, UOP_LADDER_DNC, BANK_DNC, BANK_NARROW_C, BANK_DNC, BANK_NARROW_C }; // + // + 7'd020: data <= {UOP_OPCODE_COPY_CRT_Y2X, UOP_CRT_DNC, UOP_NPQ_N, UOP_AUX_DNC, UOP_LADDER_DNC, BANK_WIDE_C, BANK_NARROW_C, BANK_WIDE_C, BANK_NARROW_C }; // + // + 7'd021: data <= {UOP_OPCODE_INPUT_TO_WIDE, UOP_CRT_X, UOP_NPQ_PQ, UOP_AUX_2, UOP_LADDER_DNC, BANK_DNC, BANK_IN_2_P, BANK_WIDE_N, BANK_DNC }; // + 7'd022: data <= {UOP_OPCODE_INPUT_TO_WIDE, UOP_CRT_Y, UOP_NPQ_PQ, UOP_AUX_2, UOP_LADDER_DNC, BANK_DNC, BANK_IN_2_Q, BANK_WIDE_N, BANK_DNC }; // + 7'd023: data <= {UOP_OPCODE_INPUT_TO_WIDE, UOP_CRT_X, UOP_NPQ_PQ, UOP_AUX_2, UOP_LADDER_DNC, BANK_DNC, BANK_IN_2_P_FACTOR, BANK_WIDE_A, BANK_DNC }; // + 7'd024: data <= {UOP_OPCODE_INPUT_TO_WIDE, UOP_CRT_Y, UOP_NPQ_PQ, UOP_AUX_2, UOP_LADDER_DNC, BANK_DNC, BANK_IN_2_Q_FACTOR, BANK_WIDE_A, BANK_DNC }; // + 7'd025: data <= {UOP_OPCODE_INPUT_TO_WIDE, UOP_CRT_X, UOP_NPQ_PQ, UOP_AUX_2, UOP_LADDER_DNC, BANK_DNC, BANK_IN_2_QINV, BANK_WIDE_E, BANK_DNC }; // + // + 7'd026: data <= {UOP_OPCODE_INPUT_TO_NARROW, UOP_CRT_X, UOP_NPQ_PQ, UOP_AUX_2, UOP_LADDER_DNC, BANK_DNC, BANK_IN_2_P_COEFF, BANK_DNC, BANK_NARROW_COEFF}; // + 7'd027: data <= {UOP_OPCODE_INPUT_TO_NARROW, UOP_CRT_Y, UOP_NPQ_PQ, UOP_AUX_2, UOP_LADDER_DNC, BANK_DNC, BANK_IN_2_Q_COEFF, BANK_DNC, BANK_NARROW_COEFF}; // + 7'd028: data <= {UOP_OPCODE_INPUT_TO_NARROW, UOP_CRT_X, UOP_NPQ_PQ, UOP_AUX_2, UOP_LADDER_DNC, BANK_DNC, BANK_IN_2_P_FACTOR, BANK_DNC, BANK_NARROW_A }; // + 7'd029: data <= {UOP_OPCODE_INPUT_TO_NARROW, UOP_CRT_Y, UOP_NPQ_PQ, UOP_AUX_2, UOP_LADDER_DNC, BANK_DNC, BANK_IN_2_Q_FACTOR, BANK_DNC, BANK_NARROW_A }; // + 7'd030: data <= {UOP_OPCODE_INPUT_TO_NARROW, UOP_CRT_X, UOP_NPQ_PQ, UOP_AUX_2, UOP_LADDER_DNC, BANK_DNC, BANK_IN_2_QINV, BANK_DNC, BANK_NARROW_E }; // + // + 7'd031: data <= {UOP_OPCODE_MODULAR_REDUCE_INIT, UOP_CRT_DNC, UOP_NPQ_DNC, UOP_AUX_DNC, UOP_LADDER_DNC, BANK_DNC, BANK_NARROW_C, BANK_DNC, BANK_DNC }; // + // + 7'd032: data <= {UOP_OPCODE_MODULAR_REDUCE_PROC, UOP_CRT_DNC, UOP_NPQ_PQ, UOP_AUX_DNC, UOP_LADDER_DNC, BANK_DNC, BANK_DNC, BANK_WIDE_D, BANK_NARROW_D }; // + // + 7'd033: data <= {UOP_OPCODE_MODULAR_MULTIPLY, UOP_CRT_DNC, UOP_NPQ_PQ, UOP_AUX_1, UOP_LADDER_11, BANK_WIDE_D, BANK_NARROW_A, BANK_WIDE_C, BANK_NARROW_C }; // + 7'd034: data <= {UOP_OPCODE_MODULAR_MULTIPLY, UOP_CRT_DNC, UOP_NPQ_PQ, UOP_AUX_1, UOP_LADDER_11, BANK_WIDE_C, BANK_NARROW_A, BANK_WIDE_D, BANK_NARROW_D }; // + 7'd035: data <= {UOP_OPCODE_MODULAR_MULTIPLY, UOP_CRT_DNC, UOP_NPQ_PQ, UOP_AUX_2, UOP_LADDER_11, BANK_WIDE_A, BANK_DNC, BANK_WIDE_C, BANK_NARROW_C }; // + // + 7'd036: data <= {UOP_OPCODE_COPY_LADDERS_X2Y, UOP_CRT_DNC, UOP_NPQ_PQ, UOP_AUX_DNC, UOP_LADDER_DNC, BANK_WIDE_D, BANK_NARROW_D, BANK_WIDE_C, BANK_NARROW_C }; // + // + 7'd037: data <= {UOP_OPCODE_LADDER_INIT, UOP_CRT_DNC, UOP_NPQ_DNC, UOP_AUX_DNC, UOP_LADDER_DNC, BANK_DNC, BANK_DNC, BANK_DNC, BANK_DNC }; // + 7'd038: data <= {UOP_OPCODE_MODULAR_MULTIPLY, UOP_CRT_DNC, UOP_NPQ_PQ, UOP_AUX_1, UOP_LADDER_PQ, BANK_WIDE_C, BANK_NARROW_C, BANK_WIDE_C, BANK_NARROW_C }; // + 7'd039: data <= {UOP_OPCODE_LADDER_STEP, UOP_CRT_DNC, UOP_NPQ_DNC, UOP_AUX_DNC, UOP_LADDER_DNC, BANK_DNC, BANK_DNC, BANK_DNC, BANK_DNC }; // + // + 7'd040: data <= {UOP_OPCODE_MODULAR_MULTIPLY, UOP_CRT_DNC, UOP_NPQ_PQ, UOP_AUX_2, UOP_LADDER_11, BANK_WIDE_C, BANK_DNC, BANK_WIDE_D, BANK_NARROW_D }; // + // + 7'd041: data <= {UOP_OPCODE_PROPAGATE_CARRIES, UOP_CRT_DNC, UOP_NPQ_PQ, UOP_AUX_DNC, UOP_LADDER_DNC, BANK_DNC, BANK_NARROW_D, BANK_DNC, BANK_NARROW_D }; // + // + 7'd042: data <= {UOP_OPCODE_CROSS_LADDERS_X2Y, UOP_CRT_DNC, UOP_NPQ_PQ, UOP_AUX_DNC, UOP_LADDER_DNC, BANK_WIDE_D, BANK_NARROW_D, BANK_WIDE_D, BANK_NARROW_D }; // + // + 7'd043: data <= {UOP_OPCODE_MODULAR_SUBTRACT_X, UOP_CRT_DNC, UOP_NPQ_PQ, UOP_AUX_DNC, UOP_LADDER_DNC, BANK_DNC, BANK_NARROW_D, BANK_DNC, BANK_NARROW_C }; // + 7'd044: data <= {UOP_OPCODE_MODULAR_SUBTRACT_Y, UOP_CRT_DNC, UOP_NPQ_PQ, UOP_AUX_DNC, UOP_LADDER_DNC, BANK_DNC, BANK_NARROW_C, BANK_WIDE_C, BANK_DNC }; // + 7'd045: data <= {UOP_OPCODE_MODULAR_SUBTRACT_Z, UOP_CRT_DNC, UOP_NPQ_PQ, UOP_AUX_DNC, UOP_LADDER_DNC, BANK_WIDE_C, BANK_NARROW_C, BANK_WIDE_C, BANK_NARROW_C }; // + // + 7'd046: data <= {UOP_OPCODE_MODULAR_MULTIPLY, UOP_CRT_DNC, UOP_NPQ_PQ, UOP_AUX_1, UOP_LADDER_11, BANK_WIDE_C, BANK_NARROW_E, BANK_WIDE_C, BANK_NARROW_C }; // + 7'd047: data <= {UOP_OPCODE_MODULAR_MULTIPLY, UOP_CRT_DNC, UOP_NPQ_PQ, UOP_AUX_1, UOP_LADDER_11, BANK_WIDE_C, BANK_NARROW_A, BANK_WIDE_C, BANK_NARROW_C }; // + // + 7'd048: data <= {UOP_OPCODE_INPUT_TO_WIDE, UOP_CRT_X, UOP_NPQ_PQ, UOP_AUX_2, UOP_LADDER_DNC, BANK_DNC, BANK_IN_2_Q, BANK_WIDE_E, BANK_DNC }; // + // + 7'd049: data <= {UOP_OPCODE_INPUT_TO_NARROW, UOP_CRT_X, UOP_NPQ_PQ, UOP_AUX_2, UOP_LADDER_DNC, BANK_DNC, BANK_IN_2_Q, BANK_DNC, BANK_NARROW_E }; // + // + 7'd050: data <= {UOP_OPCODE_REGULAR_MULTIPLY, UOP_CRT_DNC, UOP_NPQ_PQ, UOP_AUX_1, UOP_LADDER_11, BANK_WIDE_E, BANK_NARROW_C, BANK_DNC, BANK_DNC }; // + // + 7'd051: data <= {UOP_OPCODE_MERGE_LH, UOP_CRT_DNC, UOP_NPQ_DNC, UOP_AUX_DNC, UOP_LADDER_DNC, BANK_DNC, BANK_DNC, BANK_DNC, BANK_NARROW_A }; // + // + 7'd052: data <= {UOP_OPCODE_PROPAGATE_CARRIES, UOP_CRT_DNC, UOP_NPQ_N, UOP_AUX_DNC, UOP_LADDER_DNC, BANK_DNC, BANK_NARROW_A, BANK_DNC, BANK_NARROW_A }; // + // + 7'd053: data <= {UOP_OPCODE_COPY_CRT_Y2X, UOP_CRT_DNC, UOP_NPQ_PQ, UOP_AUX_DNC, UOP_LADDER_DNC, BANK_WIDE_D, BANK_NARROW_D, BANK_WIDE_D, BANK_NARROW_D }; // + // + 7'd054: data <= {UOP_OPCODE_REGULAR_ADD_UNEVEN, UOP_CRT_DNC, UOP_NPQ_N, UOP_AUX_DNC, UOP_LADDER_DNC, BANK_WIDE_D, BANK_NARROW_A, BANK_DNC, BANK_NARROW_C }; // + // + 7'd055: data <= {UOP_OPCODE_INPUT_TO_WIDE, UOP_CRT_X, UOP_NPQ_N, UOP_AUX_1, UOP_LADDER_DNC, BANK_DNC, BANK_IN_1_N, BANK_WIDE_N, BANK_DNC }; // + 7'd056: data <= {UOP_OPCODE_INPUT_TO_WIDE, UOP_CRT_Y, UOP_NPQ_N, UOP_AUX_1, UOP_LADDER_DNC, BANK_DNC, BANK_IN_1_N, BANK_WIDE_N, BANK_DNC }; // + // + 7'd057: data <= {UOP_OPCODE_INPUT_TO_NARROW, UOP_CRT_X, UOP_NPQ_N, UOP_AUX_1, UOP_LADDER_DNC, BANK_DNC, BANK_IN_1_N_COEFF, BANK_DNC, BANK_NARROW_COEFF}; // + 7'd058: data <= {UOP_OPCODE_INPUT_TO_NARROW, UOP_CRT_Y, UOP_NPQ_N, UOP_AUX_1, UOP_LADDER_DNC, BANK_DNC, BANK_IN_1_N_COEFF, BANK_DNC, BANK_NARROW_COEFF}; // + // + 7'd059: data <= {UOP_OPCODE_MODULAR_MULTIPLY, UOP_CRT_DNC, UOP_NPQ_N, UOP_AUX_1, UOP_LADDER_00, BANK_WIDE_B, BANK_NARROW_C, BANK_WIDE_A, BANK_NARROW_A }; // + // + 7'd060: data <= {UOP_OPCODE_PROPAGATE_CARRIES, UOP_CRT_DNC, UOP_NPQ_N, UOP_AUX_DNC, UOP_LADDER_DNC, BANK_DNC, BANK_NARROW_A, BANK_DNC, BANK_NARROW_A }; // + // + 7'd061: data <= {UOP_OPCODE_OUTPUT_FROM_NARROW, UOP_CRT_X, UOP_NPQ_N, UOP_AUX_1, UOP_LADDER_DNC, BANK_DNC, BANK_NARROW_A, BANK_DNC, BANK_OUT_S }; // // // Non-CRT Mode (i.e. only when "D" is known) // - 7'd064: data <= {UOP_OPCODE_INPUT_TO_WIDE, UOP_CRT_X, UOP_NPQ_N, UOP_AUX_1, UOP_LADDER_DNC, BANK_DNC, BANK_IN_1_N, BANK_WIDE_N, BANK_DNC }; // - 7'd065: data <= {UOP_OPCODE_INPUT_TO_WIDE, UOP_CRT_Y, UOP_NPQ_N, UOP_AUX_1, UOP_LADDER_DNC, BANK_DNC, BANK_IN_1_N, BANK_WIDE_N, BANK_DNC }; // - 7'd066: data <= {UOP_OPCODE_INPUT_TO_WIDE, UOP_CRT_X, UOP_NPQ_N, UOP_AUX_1, UOP_LADDER_DNC, BANK_DNC, BANK_IN_1_X, BANK_WIDE_A, BANK_DNC }; // - 7'd067: data <= {UOP_OPCODE_INPUT_TO_WIDE, UOP_CRT_Y, UOP_NPQ_N, UOP_AUX_1, UOP_LADDER_DNC, BANK_DNC, BANK_IN_1_Y, BANK_WIDE_A, BANK_DNC }; // - 7'd068: data <= {UOP_OPCODE_INPUT_TO_WIDE, UOP_CRT_X, UOP_NPQ_N, UOP_AUX_1, UOP_LADDER_DNC, BANK_DNC, BANK_IN_1_M, BANK_WIDE_E, BANK_DNC }; // - 7'd069: data <= {UOP_OPCODE_INPUT_TO_WIDE, UOP_CRT_Y, UOP_NPQ_N, UOP_AUX_1, UOP_LADDER_DNC, BANK_DNC, BANK_IN_1_M, BANK_WIDE_E, BANK_DNC }; // - // - 7'd070: data <= {UOP_OPCODE_INPUT_TO_NARROW, UOP_CRT_X, UOP_NPQ_N, UOP_AUX_1, UOP_LADDER_DNC, BANK_DNC, BANK_IN_1_N_COEFF, BANK_DNC, BANK_NARROW_COEFF}; // - 7'd071: data <= {UOP_OPCODE_INPUT_TO_NARROW, UOP_CRT_Y, UOP_NPQ_N, UOP_AUX_1, UOP_LADDER_DNC, BANK_DNC, BANK_IN_1_N_COEFF, BANK_DNC, BANK_NARROW_COEFF}; // - 7'd072: data <= {UOP_OPCODE_INPUT_TO_NARROW, UOP_CRT_X, UOP_NPQ_N, UOP_AUX_1, UOP_LADDER_DNC, BANK_DNC, BANK_IN_1_N_FACTOR, BANK_DNC, BANK_NARROW_A }; // - 7'd073: data <= {UOP_OPCODE_INPUT_TO_NARROW, UOP_CRT_Y, UOP_NPQ_N, UOP_AUX_1, UOP_LADDER_DNC, BANK_DNC, BANK_IN_1_N_FACTOR, BANK_DNC, BANK_NARROW_A }; // - 7'd074: data <= {UOP_OPCODE_INPUT_TO_NARROW, UOP_CRT_X, UOP_NPQ_N, UOP_AUX_1, UOP_LADDER_DNC, BANK_DNC, BANK_IN_1_M, BANK_DNC, BANK_NARROW_E }; // - 7'd075: data <= {UOP_OPCODE_INPUT_TO_NARROW, UOP_CRT_Y, UOP_NPQ_N, UOP_AUX_1, UOP_LADDER_DNC, BANK_DNC, BANK_IN_1_M, BANK_DNC, BANK_NARROW_E }; // - // - 7'd076: data <= {UOP_OPCODE_MODULAR_MULTIPLY, UOP_CRT_DNC, UOP_NPQ_N, UOP_AUX_1, UOP_LADDER_11, BANK_WIDE_A, BANK_NARROW_A, BANK_WIDE_B, BANK_NARROW_B }; // - 7'd077: data <= {UOP_OPCODE_MODULAR_MULTIPLY, UOP_CRT_DNC, UOP_NPQ_N, UOP_AUX_1, UOP_LADDER_11, BANK_WIDE_B, BANK_NARROW_B, BANK_WIDE_C, BANK_NARROW_C }; // - 7'd078: data <= {UOP_OPCODE_MODULAR_MULTIPLY, UOP_CRT_DNC, UOP_NPQ_N, UOP_AUX_2, UOP_LADDER_11, BANK_WIDE_C, BANK_DNC, BANK_WIDE_D, BANK_NARROW_D }; // - // - 7'd079: data <= {UOP_OPCODE_PROPAGATE_CARRIES, UOP_CRT_DNC, UOP_NPQ_N, UOP_AUX_DNC, UOP_LADDER_DNC, BANK_DNC, BANK_NARROW_D, BANK_DNC, BANK_NARROW_D }; // - - 7'd080: data <= {UOP_OPCODE_OUTPUT_FROM_NARROW, UOP_CRT_X, UOP_NPQ_N, UOP_AUX_1, UOP_LADDER_DNC, BANK_DNC, BANK_NARROW_D, BANK_DNC, BANK_OUT_XM }; // - 7'd081: data <= {UOP_OPCODE_OUTPUT_FROM_NARROW, UOP_CRT_Y, UOP_NPQ_N, UOP_AUX_1, UOP_LADDER_DNC, BANK_DNC, BANK_NARROW_D, BANK_DNC, BANK_OUT_YM }; // - - 7'd082: data <= {UOP_OPCODE_MODULAR_MULTIPLY, UOP_CRT_DNC, UOP_NPQ_N, UOP_AUX_1, UOP_LADDER_11, BANK_WIDE_E, BANK_NARROW_B, BANK_WIDE_C, BANK_NARROW_C }; // - - 7'd083: data <= {UOP_OPCODE_INPUT_TO_WIDE, UOP_CRT_X, UOP_NPQ_N, UOP_AUX_1, UOP_LADDER_DNC, BANK_DNC, BANK_IN_1_N_FACTOR, BANK_WIDE_A, BANK_DNC }; // - 7'd084: data <= {UOP_OPCODE_INPUT_TO_WIDE, UOP_CRT_Y, UOP_NPQ_N, UOP_AUX_1, UOP_LADDER_DNC, BANK_DNC, BANK_IN_1_N_FACTOR, BANK_WIDE_A, BANK_DNC }; // - - 7'd085: data <= {UOP_OPCODE_MODULAR_MULTIPLY, UOP_CRT_DNC, UOP_NPQ_N, UOP_AUX_1, UOP_LADDER_11, BANK_WIDE_C, BANK_NARROW_A, BANK_WIDE_D, BANK_NARROW_D }; // - 7'd086: data <= {UOP_OPCODE_MODULAR_MULTIPLY, UOP_CRT_DNC, UOP_NPQ_N, UOP_AUX_2, UOP_LADDER_11, BANK_WIDE_A, BANK_DNC, BANK_WIDE_C, BANK_NARROW_C }; // - - 7'd087: data <= {UOP_OPCODE_COPY_LADDERS_X2Y, UOP_CRT_DNC, UOP_NPQ_N, UOP_AUX_DNC, UOP_LADDER_DNC, BANK_WIDE_D, BANK_NARROW_D, BANK_WIDE_C, BANK_NARROW_C }; // - - 7'd088: data <= {UOP_OPCODE_LADDER_INIT, UOP_CRT_DNC, UOP_NPQ_DNC, UOP_AUX_DNC, UOP_LADDER_DNC, BANK_DNC, BANK_DNC, BANK_DNC, BANK_DNC }; // - 7'd089: data <= {UOP_OPCODE_MODULAR_MULTIPLY, UOP_CRT_DNC, UOP_NPQ_N, UOP_AUX_1, UOP_LADDER_D, BANK_WIDE_C, BANK_NARROW_C, BANK_WIDE_C, BANK_NARROW_C }; // - 7'd090: data <= {UOP_OPCODE_LADDER_STEP, UOP_CRT_DNC, UOP_NPQ_DNC, UOP_AUX_DNC, UOP_LADDER_DNC, BANK_DNC, BANK_DNC, BANK_DNC, BANK_DNC }; // - - 7'd091: data <= {UOP_OPCODE_CROSS_LADDERS_X2Y, UOP_CRT_DNC, UOP_NPQ_N, UOP_AUX_DNC, UOP_LADDER_DNC, BANK_WIDE_B, BANK_NARROW_B, BANK_WIDE_B, BANK_NARROW_B }; // - - 7'd092: data <= {UOP_OPCODE_MODULAR_MULTIPLY, UOP_CRT_DNC, UOP_NPQ_N, UOP_AUX_2, UOP_LADDER_11, BANK_WIDE_C, BANK_DNC, BANK_WIDE_D, BANK_NARROW_D }; // - 7'd093: data <= {UOP_OPCODE_MODULAR_MULTIPLY, UOP_CRT_DNC, UOP_NPQ_N, UOP_AUX_1, UOP_LADDER_00, BANK_WIDE_B, BANK_NARROW_D, BANK_WIDE_A, BANK_NARROW_A }; // - // - 7'd094: data <= {UOP_OPCODE_PROPAGATE_CARRIES, UOP_CRT_DNC, UOP_NPQ_N, UOP_AUX_DNC, UOP_LADDER_DNC, BANK_DNC, BANK_NARROW_A, BANK_DNC, BANK_NARROW_A }; // - // - 7'd095: data <= {UOP_OPCODE_OUTPUT_FROM_NARROW, UOP_CRT_Y, UOP_NPQ_N, UOP_AUX_2, UOP_LADDER_DNC, BANK_DNC, BANK_NARROW_A, BANK_DNC, BANK_OUT_S }; // - - default: data <= {UOP_OPCODE_STOP, UOP_CRT_DNC, UOP_NPQ_DNC, UOP_AUX_DNC, UOP_LADDER_DNC, BANK_DNC, BANK_DNC, BANK_DNC, BANK_DNC }; // + 7'd064: data <= {UOP_OPCODE_INPUT_TO_WIDE, UOP_CRT_X, UOP_NPQ_N, UOP_AUX_1, UOP_LADDER_DNC, BANK_DNC, BANK_IN_1_N, BANK_WIDE_N, BANK_DNC }; // + 7'd065: data <= {UOP_OPCODE_INPUT_TO_WIDE, UOP_CRT_Y, UOP_NPQ_N, UOP_AUX_1, UOP_LADDER_DNC, BANK_DNC, BANK_IN_1_N, BANK_WIDE_N, BANK_DNC }; // + 7'd066: data <= {UOP_OPCODE_INPUT_TO_WIDE, UOP_CRT_X, UOP_NPQ_N, UOP_AUX_1, UOP_LADDER_DNC, BANK_DNC, BANK_IN_1_X, BANK_WIDE_A, BANK_DNC }; // + 7'd067: data <= {UOP_OPCODE_INPUT_TO_WIDE, UOP_CRT_Y, UOP_NPQ_N, UOP_AUX_1, UOP_LADDER_DNC, BANK_DNC, BANK_IN_1_Y, BANK_WIDE_A, BANK_DNC }; // + 7'd068: data <= {UOP_OPCODE_INPUT_TO_WIDE, UOP_CRT_X, UOP_NPQ_N, UOP_AUX_1, UOP_LADDER_DNC, BANK_DNC, BANK_IN_1_M, BANK_WIDE_E, BANK_DNC }; // + 7'd069: data <= {UOP_OPCODE_INPUT_TO_WIDE, UOP_CRT_Y, UOP_NPQ_N, UOP_AUX_1, UOP_LADDER_DNC, BANK_DNC, BANK_IN_1_M, BANK_WIDE_E, BANK_DNC }; // + // + 7'd070: data <= {UOP_OPCODE_INPUT_TO_NARROW, UOP_CRT_X, UOP_NPQ_N, UOP_AUX_1, UOP_LADDER_DNC, BANK_DNC, BANK_IN_1_N_COEFF, BANK_DNC, BANK_NARROW_COEFF}; // + 7'd071: data <= {UOP_OPCODE_INPUT_TO_NARROW, UOP_CRT_Y, UOP_NPQ_N, UOP_AUX_1, UOP_LADDER_DNC, BANK_DNC, BANK_IN_1_N_COEFF, BANK_DNC, BANK_NARROW_COEFF}; // + 7'd072: data <= {UOP_OPCODE_INPUT_TO_NARROW, UOP_CRT_X, UOP_NPQ_N, UOP_AUX_1, UOP_LADDER_DNC, BANK_DNC, BANK_IN_1_N_FACTOR, BANK_DNC, BANK_NARROW_A }; // + 7'd073: data <= {UOP_OPCODE_INPUT_TO_NARROW, UOP_CRT_Y, UOP_NPQ_N, UOP_AUX_1, UOP_LADDER_DNC, BANK_DNC, BANK_IN_1_N_FACTOR, BANK_DNC, BANK_NARROW_A }; // + 7'd074: data <= {UOP_OPCODE_INPUT_TO_NARROW, UOP_CRT_X, UOP_NPQ_N, UOP_AUX_1, UOP_LADDER_DNC, BANK_DNC, BANK_IN_1_M, BANK_DNC, BANK_NARROW_E }; // + 7'd075: data <= {UOP_OPCODE_INPUT_TO_NARROW, UOP_CRT_Y, UOP_NPQ_N, UOP_AUX_1, UOP_LADDER_DNC, BANK_DNC, BANK_IN_1_M, BANK_DNC, BANK_NARROW_E }; // + // + 7'd076: data <= {UOP_OPCODE_MODULAR_MULTIPLY, UOP_CRT_DNC, UOP_NPQ_N, UOP_AUX_1, UOP_LADDER_11, BANK_WIDE_A, BANK_NARROW_A, BANK_WIDE_B, BANK_NARROW_B }; // + 7'd077: data <= {UOP_OPCODE_MODULAR_MULTIPLY, UOP_CRT_DNC, UOP_NPQ_N, UOP_AUX_1, UOP_LADDER_11, BANK_WIDE_B, BANK_NARROW_B, BANK_WIDE_C, BANK_NARROW_C }; // + 7'd078: data <= {UOP_OPCODE_MODULAR_MULTIPLY, UOP_CRT_DNC, UOP_NPQ_N, UOP_AUX_2, UOP_LADDER_11, BANK_WIDE_C, BANK_DNC, BANK_WIDE_D, BANK_NARROW_D }; // + // + 7'd079: data <= {UOP_OPCODE_PROPAGATE_CARRIES, UOP_CRT_DNC, UOP_NPQ_N, UOP_AUX_DNC, UOP_LADDER_DNC, BANK_DNC, BANK_NARROW_D, BANK_DNC, BANK_NARROW_D }; // + // + 7'd080: data <= {UOP_OPCODE_OUTPUT_FROM_NARROW, UOP_CRT_X, UOP_NPQ_N, UOP_AUX_1, UOP_LADDER_DNC, BANK_DNC, BANK_NARROW_D, BANK_DNC, BANK_OUT_XM }; // + 7'd081: data <= {UOP_OPCODE_OUTPUT_FROM_NARROW, UOP_CRT_Y, UOP_NPQ_N, UOP_AUX_1, UOP_LADDER_DNC, BANK_DNC, BANK_NARROW_D, BANK_DNC, BANK_OUT_YM }; // + // + 7'd082: data <= {UOP_OPCODE_MODULAR_MULTIPLY, UOP_CRT_DNC, UOP_NPQ_N, UOP_AUX_1, UOP_LADDER_11, BANK_WIDE_E, BANK_NARROW_B, BANK_WIDE_C, BANK_NARROW_C }; // + // + 7'd083: data <= {UOP_OPCODE_INPUT_TO_WIDE, UOP_CRT_X, UOP_NPQ_N, UOP_AUX_1, UOP_LADDER_DNC, BANK_DNC, BANK_IN_1_N_FACTOR, BANK_WIDE_A, BANK_DNC }; // + 7'd084: data <= {UOP_OPCODE_INPUT_TO_WIDE, UOP_CRT_Y, UOP_NPQ_N, UOP_AUX_1, UOP_LADDER_DNC, BANK_DNC, BANK_IN_1_N_FACTOR, BANK_WIDE_A, BANK_DNC }; // + // + 7'd085: data <= {UOP_OPCODE_MODULAR_MULTIPLY, UOP_CRT_DNC, UOP_NPQ_N, UOP_AUX_1, UOP_LADDER_11, BANK_WIDE_C, BANK_NARROW_A, BANK_WIDE_D, BANK_NARROW_D }; // + 7'd086: data <= {UOP_OPCODE_MODULAR_MULTIPLY, UOP_CRT_DNC, UOP_NPQ_N, UOP_AUX_2, UOP_LADDER_11, BANK_WIDE_A, BANK_DNC, BANK_WIDE_C, BANK_NARROW_C }; // + // + 7'd087: data <= {UOP_OPCODE_COPY_LADDERS_X2Y, UOP_CRT_DNC, UOP_NPQ_N, UOP_AUX_DNC, UOP_LADDER_DNC, BANK_WIDE_D, BANK_NARROW_D, BANK_WIDE_C, BANK_NARROW_C }; // + // + 7'd088: data <= {UOP_OPCODE_LADDER_INIT, UOP_CRT_DNC, UOP_NPQ_DNC, UOP_AUX_DNC, UOP_LADDER_DNC, BANK_DNC, BANK_DNC, BANK_DNC, BANK_DNC }; // + 7'd089: data <= {UOP_OPCODE_MODULAR_MULTIPLY, UOP_CRT_DNC, UOP_NPQ_N, UOP_AUX_1, UOP_LADDER_D, BANK_WIDE_C, BANK_NARROW_C, BANK_WIDE_C, BANK_NARROW_C }; // + 7'd090: data <= {UOP_OPCODE_LADDER_STEP, UOP_CRT_DNC, UOP_NPQ_DNC, UOP_AUX_DNC, UOP_LADDER_DNC, BANK_DNC, BANK_DNC, BANK_DNC, BANK_DNC }; // + // + 7'd091: data <= {UOP_OPCODE_CROSS_LADDERS_X2Y, UOP_CRT_DNC, UOP_NPQ_N, UOP_AUX_DNC, UOP_LADDER_DNC, BANK_WIDE_B, BANK_NARROW_B, BANK_WIDE_B, BANK_NARROW_B }; // + // + 7'd092: data <= {UOP_OPCODE_MODULAR_MULTIPLY, UOP_CRT_DNC, UOP_NPQ_N, UOP_AUX_2, UOP_LADDER_11, BANK_WIDE_C, BANK_DNC, BANK_WIDE_D, BANK_NARROW_D }; // + 7'd093: data <= {UOP_OPCODE_MODULAR_MULTIPLY, UOP_CRT_DNC, UOP_NPQ_N, UOP_AUX_1, UOP_LADDER_00, BANK_WIDE_B, BANK_NARROW_D, BANK_WIDE_A, BANK_NARROW_A }; // + // + 7'd094: data <= {UOP_OPCODE_PROPAGATE_CARRIES, UOP_CRT_DNC, UOP_NPQ_N, UOP_AUX_DNC, UOP_LADDER_DNC, BANK_DNC, BANK_NARROW_A, BANK_DNC, BANK_NARROW_A }; // + // + 7'd095: data <= {UOP_OPCODE_OUTPUT_FROM_NARROW, UOP_CRT_Y, UOP_NPQ_N, UOP_AUX_2, UOP_LADDER_DNC, BANK_DNC, BANK_NARROW_A, BANK_DNC, BANK_OUT_S }; // + // + default: data <= {UOP_OPCODE_STOP, UOP_CRT_DNC, UOP_NPQ_DNC, UOP_AUX_DNC, UOP_LADDER_DNC, BANK_DNC, BANK_DNC, BANK_DNC, BANK_DNC }; // // - - + endcase endmodule From git at cryptech.is Mon Jan 20 21:18:11 2020 From: git at cryptech.is (git at cryptech.is) Date: Mon, 20 Jan 2020 21:18:11 +0000 Subject: [Cryptech-Commits] [user/shatov/modexpng] 10/21: Updated uOP engine to match the changes made to the general worker module (modular subtraction was split into three micro-operations instead of one). In-Reply-To: <157955508198.72689.15683234965619703441@bikeshed.cryptech.is> References: <157955508198.72689.15683234965619703441@bikeshed.cryptech.is> Message-ID: <20200120211814.242598BC2CE@bikeshed.cryptech.is> This is an automated email from the git hooks/post-receive script. meisterpaul1 at yandex.ru pushed a commit to branch master in repository user/shatov/modexpng. commit b8c0536da225a146eece91c17082e279243380dc Author: Pavel V. Shatov (Meister) AuthorDate: Mon Jan 20 23:56:15 2020 +0300 Updated uOP engine to match the changes made to the general worker module (modular subtraction was split into three micro-operations instead of one). --- rtl/modexpng_uop_engine.v | 14 ++++++++++---- 1 file changed, 10 insertions(+), 4 deletions(-) diff --git a/rtl/modexpng_uop_engine.v b/rtl/modexpng_uop_engine.v index 2046db7..e5e8daf 100644 --- a/rtl/modexpng_uop_engine.v +++ b/rtl/modexpng_uop_engine.v @@ -336,7 +336,9 @@ module modexpng_uop_engine wire uop_opcode_is_wrk = (uop_data_opcode_dec == UOP_OPCODE_COPY_CRT_Y2X ) || (uop_data_opcode_dec == UOP_OPCODE_COPY_LADDERS_X2Y ) || (uop_data_opcode_dec == UOP_OPCODE_CROSS_LADDERS_X2Y ) || - (uop_data_opcode_dec == UOP_OPCODE_MODULAR_SUBTRACT ) || + (uop_data_opcode_dec == UOP_OPCODE_MODULAR_SUBTRACT_X ) || + (uop_data_opcode_dec == UOP_OPCODE_MODULAR_SUBTRACT_Y ) || + (uop_data_opcode_dec == UOP_OPCODE_MODULAR_SUBTRACT_Z ) || (uop_data_opcode_dec == UOP_OPCODE_MODULAR_REDUCE_INIT ) || (uop_data_opcode_dec == UOP_OPCODE_PROPAGATE_CARRIES ) || (uop_data_opcode_dec == UOP_OPCODE_MERGE_LH ) || @@ -535,8 +537,10 @@ module modexpng_uop_engine update_rdct_params(uop_data_sel_wide_out_dec, uop_data_sel_narrow_out_dec); end // - UOP_OPCODE_MODULAR_SUBTRACT: - update_wrk_params(BANK_DNC, uop_data_sel_narrow_in_dec, uop_data_sel_wide_out_dec, uop_data_sel_narrow_out_dec, uop_data_opcode_dec); + UOP_OPCODE_MODULAR_SUBTRACT_X, + UOP_OPCODE_MODULAR_SUBTRACT_Y, + UOP_OPCODE_MODULAR_SUBTRACT_Z: + update_wrk_params(uop_data_sel_wide_in_dec, uop_data_sel_narrow_in_dec, uop_data_sel_wide_out_dec, uop_data_sel_narrow_out_dec, uop_data_opcode_dec); // UOP_OPCODE_MODULAR_REDUCE_INIT: update_wrk_params(BANK_DNC, uop_data_sel_narrow_in_dec, BANK_DNC, BANK_DNC, uop_data_opcode_dec); @@ -627,7 +631,9 @@ module modexpng_uop_engine update_rdct_length(word_index_last_mux); end // - UOP_OPCODE_MODULAR_SUBTRACT: + UOP_OPCODE_MODULAR_SUBTRACT_X, + UOP_OPCODE_MODULAR_SUBTRACT_Y, + UOP_OPCODE_MODULAR_SUBTRACT_Z: update_wrk_length(word_index_last_mux, OP_ADDR_DNC); // UOP_OPCODE_MODULAR_REDUCE_INIT: From git at cryptech.is Mon Jan 20 21:18:12 2020 From: git at cryptech.is (git at cryptech.is) Date: Mon, 20 Jan 2020 21:18:12 +0000 Subject: [Cryptech-Commits] [user/shatov/modexpng] 11/21: For the new general worker module to work we need dynamic switching of DSP OPMODE, ALUMODE and CARRYINSEL ports, thus more defined constants. In-Reply-To: <157955508198.72689.15683234965619703441@bikeshed.cryptech.is> References: <157955508198.72689.15683234965619703441@bikeshed.cryptech.is> Message-ID: <20200120211820.0CA4B8BC2E1@bikeshed.cryptech.is> This is an automated email from the git hooks/post-receive script. meisterpaul1 at yandex.ru pushed a commit to branch master in repository user/shatov/modexpng. commit 64928838b16fe4c7bb6855d57e5695876314e286 Author: Pavel V. Shatov (Meister) AuthorDate: Mon Jan 20 23:57:42 2020 +0300 For the new general worker module to work we need dynamic switching of DSP OPMODE, ALUMODE and CARRYINSEL ports, thus more defined constants. --- rtl/modexpng_dsp48e1.vh | 51 +++++++++++++++++++++++++++++++++++++++++-------- 1 file changed, 43 insertions(+), 8 deletions(-) diff --git a/rtl/modexpng_dsp48e1.vh b/rtl/modexpng_dsp48e1.vh index 68c5335..8140917 100644 --- a/rtl/modexpng_dsp48e1.vh +++ b/rtl/modexpng_dsp48e1.vh @@ -30,12 +30,47 @@ // //====================================================================== -localparam DSP48E1_A_W = 30; -localparam DSP48E1_B_W = 18; -localparam DSP48E1_C_W = 48; -localparam DSP48E1_D_W = 25; -localparam DSP48E1_P_W = 48; -localparam DSP48E1_INMODE_W = 5; -localparam DSP48E1_OPMODE_W = 7; -localparam DSP48E1_ALUMODE_W = 4; +localparam DSP48E1_A_W = 30; +localparam DSP48E1_B_W = 18; +localparam DSP48E1_C_W = 48; +localparam DSP48E1_D_W = 25; +localparam DSP48E1_P_W = 48; +localparam DSP48E1_INMODE_W = 5; +localparam DSP48E1_OPMODE_W = 7; +localparam DSP48E1_ALUMODE_W = 4; +localparam DSP48E1_CARRYINSEL_W = 3; +localparam DSP48E1_CARRYOUT_W = 4; +localparam DSP48E1_OPMODE_X_DNC = 2'bXX; +localparam DSP48E1_OPMODE_X_0 = 2'b00; +localparam DSP48E1_OPMODE_X_AB = 2'b11; + +localparam DSP48E1_OPMODE_Y_DNC = 2'bXX; +localparam DSP48E1_OPMODE_Y_0 = 2'b00; +localparam DSP48E1_OPMODE_Y_C = 2'b11; + +localparam DSP48E1_OPMODE_Z_DNC = 3'bXXX; +localparam DSP48E1_OPMODE_Z_0 = 3'b000; +localparam DSP48E1_OPMODE_Z_P17 = 3'b110; +localparam DSP48E1_OPMODE_Z_PCIN17 = 3'b101; +localparam DSP48E1_OPMODE_Z_P = 3'b010; +localparam DSP48E1_OPMODE_Z_C = 3'b011; + +localparam DSP48E1_OPMODE_DNC = {DSP48E1_OPMODE_Z_DNC, DSP48E1_OPMODE_Y_DNC, DSP48E1_OPMODE_X_DNC}; + +localparam DSP48E1_OPMODE_Z0_YC_XAB = {DSP48E1_OPMODE_Z_0, DSP48E1_OPMODE_Y_C, DSP48E1_OPMODE_X_AB}; +localparam DSP48E1_OPMODE_ZP17_YC_XAB = {DSP48E1_OPMODE_Z_P17, DSP48E1_OPMODE_Y_C, DSP48E1_OPMODE_X_AB}; +localparam DSP48E1_OPMODE_Z0_Y0_XAB = {DSP48E1_OPMODE_Z_0, DSP48E1_OPMODE_Y_0, DSP48E1_OPMODE_X_AB}; +localparam DSP48E1_OPMODE_ZPCIN17_YC_XAB = {DSP48E1_OPMODE_Z_PCIN17, DSP48E1_OPMODE_Y_C, DSP48E1_OPMODE_X_AB}; +localparam DSP48E1_OPMODE_Z0_YC_X0 = {DSP48E1_OPMODE_Z_0, DSP48E1_OPMODE_Y_C, DSP48E1_OPMODE_X_0}; +localparam DSP48E1_OPMODE_ZP17_YC_X0 = {DSP48E1_OPMODE_Z_P17, DSP48E1_OPMODE_Y_C, DSP48E1_OPMODE_X_0}; +localparam DSP48E1_OPMODE_ZP_YC_X0 = {DSP48E1_OPMODE_Z_P, DSP48E1_OPMODE_Y_C, DSP48E1_OPMODE_X_0}; +localparam DSP48E1_OPMODE_ZC_Y0_XAB = {DSP48E1_OPMODE_Z_C, DSP48E1_OPMODE_Y_0, DSP48E1_OPMODE_X_AB}; + +localparam DSP48E1_CARRYINSEL_DNC = 3'bXXX; +localparam DSP48E1_CARRYINSEL_CARRYIN = 3'b000; +localparam DSP48E1_CARRYINSEL_CARRYCASCOUT = 3'b100; + +localparam DSP48E1_ALUMODE_DNC = 4'bXXXX; +localparam DSP48E1_ALUMODE_Z_PLUS_X_AND_Y_AND_CIN = 4'b0000; +localparam DSP48E1_ALUMODE_Z_MINUS_X_AND_Y_AND_CIN = 4'b0011; From git at cryptech.is Mon Jan 20 21:18:13 2020 From: git at cryptech.is (git at cryptech.is) Date: Mon, 20 Jan 2020 21:18:13 +0000 Subject: [Cryptech-Commits] [user/shatov/modexpng] 12/21: Tiny cosmetic typo fix ("dst" -> "dsp") In-Reply-To: <157955508198.72689.15683234965619703441@bikeshed.cryptech.is> References: <157955508198.72689.15683234965619703441@bikeshed.cryptech.is> Message-ID: <20200120211820.BF1868BAF16@bikeshed.cryptech.is> This is an automated email from the git hooks/post-receive script. meisterpaul1 at yandex.ru pushed a commit to branch master in repository user/shatov/modexpng. commit 8f7829ce5e37728ddc51b4aaa735b9e04a5eca01 Author: Pavel V. Shatov (Meister) AuthorDate: Tue Jan 21 00:02:25 2020 +0300 Tiny cosmetic typo fix ("dst" -> "dsp") --- rtl/modexpng_general_worker.v | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/rtl/modexpng_general_worker.v b/rtl/modexpng_general_worker.v index 6618b5f..b85f189 100644 --- a/rtl/modexpng_general_worker.v +++ b/rtl/modexpng_general_worker.v @@ -30,7 +30,7 @@ // //====================================================================== -module modexpng_general_worker_new +module modexpng_general_worker ( clk, rst_n, ena, rdy, @@ -1067,7 +1067,7 @@ module modexpng_general_worker_new // // DSP Slices // - `MODEXPNG_DSP_SLICE_ADDSUB dst_inst_x_x + `MODEXPNG_DSP_SLICE_ADDSUB dsp_inst_x_x ( .clk (clk), .ce_abc (dsp_ce_x), @@ -1084,7 +1084,7 @@ module modexpng_general_worker_new .carry_out (dsp_carry_out_x) ); - `MODEXPNG_DSP_SLICE_ADDSUB dst_inst_y_x + `MODEXPNG_DSP_SLICE_ADDSUB dsp_inst_y_x ( .clk (clk), .ce_abc (dsp_ce_x), @@ -1101,7 +1101,7 @@ module modexpng_general_worker_new .carry_out () ); - `MODEXPNG_DSP_SLICE_ADDSUB dst_inst_x_y + `MODEXPNG_DSP_SLICE_ADDSUB dsp_inst_x_y ( .clk (clk), .ce_abc (dsp_ce_y), @@ -1118,7 +1118,7 @@ module modexpng_general_worker_new .carry_out (dsp_carry_out_y) ); - `MODEXPNG_DSP_SLICE_ADDSUB dst_inst_y_y + `MODEXPNG_DSP_SLICE_ADDSUB dsp_inst_y_y ( .clk (clk), .ce_abc (dsp_ce_y), From git at cryptech.is Mon Jan 20 21:18:14 2020 From: git at cryptech.is (git at cryptech.is) Date: Mon, 20 Jan 2020 21:18:14 +0000 Subject: [Cryptech-Commits] [user/shatov/modexpng] 13/21: Added more meaningful constants to avoid certain hardcoded numbers. In-Reply-To: <157955508198.72689.15683234965619703441@bikeshed.cryptech.is> References: <157955508198.72689.15683234965619703441@bikeshed.cryptech.is> Message-ID: <20200120211828.417378BC2FA@bikeshed.cryptech.is> This is an automated email from the git hooks/post-receive script. meisterpaul1 at yandex.ru pushed a commit to branch master in repository user/shatov/modexpng. commit 2345e42241948889ddf46074415c3377553f2027 Author: Pavel V. Shatov (Meister) AuthorDate: Tue Jan 21 00:03:52 2020 +0300 Added more meaningful constants to avoid certain hardcoded numbers. --- rtl/modexpng_parameters.vh | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-) diff --git a/rtl/modexpng_parameters.vh b/rtl/modexpng_parameters.vh index 1718e00..77230fe 100644 --- a/rtl/modexpng_parameters.vh +++ b/rtl/modexpng_parameters.vh @@ -55,8 +55,8 @@ localparam BUS_OP_ADDR_W = cryptech_clog2(MAX_OP_W / BUS_DATA_W); localparam BIT_INDEX_W = cryptech_clog2(MAX_OP_W); localparam BANK_ADDR_W = 3; localparam OP_ADDR_W = cryptech_clog2(MAX_OP_W / WORD_W); -localparam COL_INDEX_W = OP_ADDR_W - cryptech_clog2(NUM_MULTS); localparam MAC_INDEX_W = cryptech_clog2(NUM_MULTS); +localparam COL_INDEX_W = OP_ADDR_W - MAC_INDEX_W; localparam CARRY_W = WORD_EXT_W - WORD_W; localparam WORD_MUX_W = cryptech_clog2(WORD_W); @@ -140,6 +140,7 @@ localparam [OP_ADDR_W-1:0] OP_ADDR_EXT_COEFF = 0; localparam [OP_ADDR_W-1:0] OP_ADDR_EXT_Q = 1; localparam [OP_ADDR_W-1:0] OP_ADDR_ZERO = {OP_ADDR_W{1'b0}}; localparam [OP_ADDR_W-1:0] OP_ADDR_ONE = {{(OP_ADDR_W-1){1'b0}}, 1'b1}; +localparam [OP_ADDR_W-1:0] OP_ADDR_TWO = {OP_ADDR_ONE[OP_ADDR_W-2:0], 1'b0}; localparam [OP_ADDR_W-1:0] OP_ADDR_DNC = {OP_ADDR_W{1'bX}}; // @@ -170,3 +171,10 @@ localparam [MAC_INDEX_W-1:0] MAC_INDEX_DNC = {MAC_INDEX_W{1'bX}}; // Multiplier Bitmap Values // localparam [NUM_MULTS-1:0] MULT_BITMAP_ZEROES = {NUM_MULTS{1'b0}}; + + +// +// Column Index Values +// +localparam [COL_INDEX_W-1:0] COL_INDEX_ZERO = {COL_INDEX_W{1'b0}}; +localparam [COL_INDEX_W-1:0] COL_INDEX_ONE = {{(COL_INDEX_W-1){1'b0}}, 1'b1}; From git at cryptech.is Mon Jan 20 21:18:15 2020 From: git at cryptech.is (git at cryptech.is) Date: Mon, 20 Jan 2020 21:18:15 +0000 Subject: [Cryptech-Commits] [user/shatov/modexpng] 14/21: Refactored modular reductor module. In-Reply-To: <157955508198.72689.15683234965619703441@bikeshed.cryptech.is> References: <157955508198.72689.15683234965619703441@bikeshed.cryptech.is> Message-ID: <20200120211828.9C3CB8BC2FC@bikeshed.cryptech.is> This is an automated email from the git hooks/post-receive script. meisterpaul1 at yandex.ru pushed a commit to branch master in repository user/shatov/modexpng. commit c5516257cb32d5608e028a36d48f8d24a6446e48 Author: Pavel V. Shatov (Meister) AuthorDate: Tue Jan 21 00:05:18 2020 +0300 Refactored modular reductor module. --- rtl/modexpng_reductor.v | 388 +++++++++++++++++++++++++++++++++--------------- 1 file changed, 266 insertions(+), 122 deletions(-) diff --git a/rtl/modexpng_reductor.v b/rtl/modexpng_reductor.v index 7404eba..1e6dcc1 100644 --- a/rtl/modexpng_reductor.v +++ b/rtl/modexpng_reductor.v @@ -46,6 +46,8 @@ module modexpng_reductor // Headers // `include "modexpng_parameters.vh" + `include "modexpng_dsp48e1.vh" + `include "modexpng_dsp_slice_primitives.vh" // @@ -97,7 +99,7 @@ module modexpng_reductor reg [ WORD_EXT_W -1:0] narrow_x_dout; reg [ WORD_EXT_W -1:0] narrow_y_dout; reg narrow_xy_valid = 1'b0; - + // // Mapping @@ -162,7 +164,7 @@ module modexpng_reductor input [ WORD_EXT_W -1:0] dout_x; input [ WORD_EXT_W -1:0] dout_y; _update_rdct_narrow(bank, addr, dout_x, dout_y, 1'b1); - endtask + endtask task clear_rdct_wide; _update_rdct_wide(BANK_DNC, OP_ADDR_DNC, WORD_EXT_DNC, WORD_EXT_DNC, 1'b0); @@ -181,179 +183,321 @@ module modexpng_reductor always @(posedge clk) // - {rd_wide_x_din_aux_pipe, rd_wide_y_din_aux_pipe} <= - {rd_wide_x_din_aux, rd_wide_y_din_aux } ; + {rd_wide_y_din_aux_pipe, rd_wide_x_din_aux_pipe} <= {rd_wide_y_din_aux, rd_wide_x_din_aux}; + + // + // Counter + // + integer i; // // Delay rcmb_final_* to match rd_wide_* // - reg rcmb_xy_valid_dly1_x = 1'b0; - reg rcmb_xy_valid_dly2_x = 1'b0; - reg rcmb_xy_valid_dly3_x = 1'b0; - reg rcmb_xy_valid_dly4_x = 1'b0; - - reg [BANK_ADDR_W -1:0] rcmb_xy_bank_dly1_x; - reg [BANK_ADDR_W -1:0] rcmb_xy_bank_dly2_x; - reg [BANK_ADDR_W -1:0] rcmb_xy_bank_dly3_x; - reg [BANK_ADDR_W -1:0] rcmb_xy_bank_dly4_x; - - reg [ OP_ADDR_W -1:0] rcmb_xy_addr_dly1_x; - reg [ OP_ADDR_W -1:0] rcmb_xy_addr_dly2_x; - reg [ OP_ADDR_W -1:0] rcmb_xy_addr_dly3_x; - reg [ OP_ADDR_W -1:0] rcmb_xy_addr_dly4_x; - - reg [ WORD_EXT_W -1:0] rcmb_x_dout_dly1_x; - reg [ WORD_EXT_W -1:0] rcmb_x_dout_dly2_x; - reg [ WORD_EXT_W -1:0] rcmb_x_dout_dly3_x; - reg [ WORD_EXT_W -1:0] rcmb_x_dout_dly4_x; - - reg [ WORD_EXT_W -1:0] rcmb_y_dout_dly1_x; - reg [ WORD_EXT_W -1:0] rcmb_y_dout_dly2_x; - reg [ WORD_EXT_W -1:0] rcmb_y_dout_dly3_x; - reg [ WORD_EXT_W -1:0] rcmb_y_dout_dly4_x; + reg rcmb_xy_valid_dly[1:6]; + reg [BANK_ADDR_W -1:0] rcmb_xy_bank_dly [1:6]; + reg [ OP_ADDR_W -1:0] rcmb_xy_addr_dly [1:6]; + reg [ WORD_EXT_W -1:0] rcmb_x_dout_dly [1:4]; + reg [ WORD_EXT_W -1:0] rcmb_y_dout_dly [1:4]; + + initial for (i=1; i<=6; i=i+1) rcmb_xy_valid_dly[i] = 1'b0; always @(posedge clk or negedge rst_n) // - if (!rst_n) {rcmb_xy_valid_dly4_x, rcmb_xy_valid_dly3_x, rcmb_xy_valid_dly2_x, rcmb_xy_valid_dly1_x} <= 4'b0000; - else {rcmb_xy_valid_dly4_x, rcmb_xy_valid_dly3_x, rcmb_xy_valid_dly2_x, rcmb_xy_valid_dly1_x} <= - {rcmb_xy_valid_dly3_x, rcmb_xy_valid_dly2_x, rcmb_xy_valid_dly1_x, rcmb_final_xy_valid } ; + if (!rst_n) for (i=1; i<=6; i=i+1) rcmb_xy_valid_dly[i] <= 1'b0; + else begin + rcmb_xy_valid_dly[1] <= rcmb_final_xy_valid; + for (i=2; i<=6; i=i+1) rcmb_xy_valid_dly[i] <= rcmb_xy_valid_dly[i-1]; + end always @(posedge clk) begin // - if (rcmb_final_xy_valid) {rcmb_xy_bank_dly1_x, rcmb_xy_addr_dly1_x, rcmb_x_dout_dly1_x, rcmb_y_dout_dly1_x} <= - {rcmb_final_xy_bank, rcmb_final_xy_addr, rcmb_final_x_din, rcmb_final_y_din } ; - if (rcmb_xy_valid_dly1_x) {rcmb_xy_bank_dly2_x, rcmb_xy_addr_dly2_x, rcmb_x_dout_dly2_x, rcmb_y_dout_dly2_x} <= - {rcmb_xy_bank_dly1_x, rcmb_xy_addr_dly1_x, rcmb_x_dout_dly1_x, rcmb_y_dout_dly1_x} ; - if (rcmb_xy_valid_dly2_x) {rcmb_xy_bank_dly3_x, rcmb_xy_addr_dly3_x, rcmb_x_dout_dly3_x, rcmb_y_dout_dly3_x} <= - {rcmb_xy_bank_dly2_x, rcmb_xy_addr_dly2_x, rcmb_x_dout_dly2_x, rcmb_y_dout_dly2_x} ; - if (rcmb_xy_valid_dly3_x) {rcmb_xy_bank_dly4_x, rcmb_xy_addr_dly4_x, rcmb_x_dout_dly4_x, rcmb_y_dout_dly4_x} <= - {rcmb_xy_bank_dly3_x, rcmb_xy_addr_dly3_x, rcmb_x_dout_dly3_x, rcmb_y_dout_dly3_x} ; + {rcmb_xy_bank_dly[1], rcmb_xy_addr_dly[1], rcmb_x_dout_dly[1], rcmb_y_dout_dly[1]} <= {rcmb_final_xy_bank, rcmb_final_xy_addr, rcmb_final_x_din, rcmb_final_y_din }; + for (i=2; i<=6; i=i+1) {rcmb_xy_bank_dly[i], rcmb_xy_addr_dly[i] } <= {rcmb_xy_bank_dly[i-1], rcmb_xy_addr_dly[i-1] }; + for (i=2; i<=4; i=i+1) { rcmb_x_dout_dly[i], rcmb_y_dout_dly[i]} <= { rcmb_x_dout_dly[i-1], rcmb_y_dout_dly[i-1]}; // end - + // - // LSB Carry Logic + // Internal Busy Flag Logic // - reg [ CARRY_W -1:0] rcmb_x_lsb_carry; - reg [ CARRY_W -1:0] rcmb_y_lsb_carry; - reg [ WORD_W -1:0] rcmb_x_lsb_dummy; - reg [ WORD_W -1:0] rcmb_y_lsb_dummy; - wire [WORD_EXT_W -1:0] rcmb_x_lsb_carry_ext = {WORD_ZERO, rcmb_x_lsb_carry}; - wire [WORD_EXT_W -1:0] rcmb_y_lsb_carry_ext = {WORD_ZERO, rcmb_y_lsb_carry}; + reg busy_next = 1'b0; + reg [4:0] busy_now_shreg = 5'b00000; + wire busy_now = busy_now_shreg[4]; - task calc_rcmb_xy_lsb_carry; - begin - {rcmb_x_lsb_carry, rcmb_x_lsb_dummy} <= rcmb_x_dout_dly4_x + rd_wide_x_din_aux_pipe + rcmb_x_lsb_carry_ext; - {rcmb_y_lsb_carry, rcmb_y_lsb_dummy} <= rcmb_y_dout_dly4_x + rd_wide_y_din_aux_pipe + rcmb_y_lsb_carry_ext; + always @(posedge clk or negedge rst_n) + // + if (!rst_n) busy_now_shreg <= 5'b00000; + else begin + if (rdy && ena) busy_now_shreg <= 5'b11111; + else busy_now_shreg <= {busy_now_shreg[3:0], busy_next}; end - endtask + always @(posedge clk or negedge rst_n) + // + if (!rst_n) busy_next <= 1'b0; + else begin + if (rdy && ena) busy_next <= 1'b1; + if (!rdy && rcmb_xy_valid_dly[4] && (rcmb_xy_bank_dly[4] == BANK_RCMB_EXT)) busy_next <= 1'b0; + end + // - // LSB Carry Computation + // Ready Flag Logic // - always @(posedge clk) + reg rdy_reg = 1'b1; + + assign rdy = rdy_reg; + + always @(posedge clk or negedge rst_n) // - if (ena) begin - // - rcmb_x_lsb_carry <= CARRY_ZERO; - rcmb_y_lsb_carry <= CARRY_ZERO; - // - end else if (rcmb_xy_valid_dly4_x) - // - case (rcmb_xy_bank_dly4_x) - BANK_RCMB_ML: calc_rcmb_xy_lsb_carry; - BANK_RCMB_MH: if (rcmb_xy_addr_dly4_x == OP_ADDR_ZERO) calc_rcmb_xy_lsb_carry; - endcase + if (!rst_n) rdy_reg <= 1'b1; + else begin + if (rdy && ena) rdy_reg <= 1'b0; + if (!rdy && !busy_now) rdy_reg <= 1'b1; + end + + + // + // Pipelined Flags + // + reg rcmb_xy_addr_dly3_is_zero; + reg rcmb_xy_addr_dly3_is_one; + reg rcmb_xy_addr_dly3_gt_one; + reg rcmb_xy_addr_dly5_is_one; + reg rcmb_xy_addr_dly5_gt_one; + reg rcmb_xy_addr_dly6_is_zero; + + always @(posedge clk) begin + rcmb_xy_addr_dly3_is_zero <= rcmb_xy_addr_dly[2] == OP_ADDR_ZERO; + rcmb_xy_addr_dly3_is_one <= rcmb_xy_addr_dly[2] == OP_ADDR_ONE; + rcmb_xy_addr_dly3_gt_one <= rcmb_xy_addr_dly[2] > OP_ADDR_ONE; + rcmb_xy_addr_dly5_is_one <= rcmb_xy_addr_dly[4] == OP_ADDR_ONE; + rcmb_xy_addr_dly5_gt_one <= rcmb_xy_addr_dly[4] > OP_ADDR_ONE; + rcmb_xy_addr_dly6_is_zero <= rcmb_xy_addr_dly[5] == OP_ADDR_ZERO; + end + - // - // MSB Sum Logic + // LSB Math // - wire [WORD_EXT_W -1:0] sum_rdct_x = rcmb_x_dout_dly4_x + rd_wide_x_din_aux_pipe; - wire [WORD_EXT_W -1:0] sum_rdct_y = rcmb_y_dout_dly4_x + rd_wide_y_din_aux_pipe; + reg lsb_ce = 1'b0; + reg lsb_ce_dly = 1'b0; + reg [DSP48E1_OPMODE_W -1:0] lsb_opmode; - wire [WORD_EXT_W -1:0] sum_rdct_x_carry = sum_rdct_x + rcmb_x_lsb_carry_ext; - wire [WORD_EXT_W -1:0] sum_rdct_y_carry = sum_rdct_y + rcmb_y_lsb_carry_ext; + wire [DSP48E1_P_W -1:0] lsb_px; + wire [DSP48E1_P_W -1:0] lsb_py; + wire [DSP48E1_C_W -1:0] lsb_ax = {{(DSP48E1_C_W-(WORD_EXT_W+1)){1'b0}}, rcmb_x_dout_dly[4][WORD_EXT_W-1:WORD_W], 1'b1, rcmb_x_dout_dly[4][WORD_W-1:0]}; + wire [DSP48E1_C_W -1:0] lsb_ay = {{(DSP48E1_C_W-(WORD_EXT_W+1)){1'b0}}, rcmb_y_dout_dly[4][WORD_EXT_W-1:WORD_W], 1'b1, rcmb_y_dout_dly[4][WORD_W-1:0]}; + wire [DSP48E1_C_W -1:0] lsb_bx = {{(DSP48E1_C_W-(WORD_EXT_W+1)){1'b0}}, rd_wide_x_din_aux_pipe[WORD_EXT_W-1:WORD_W], 1'b0, rd_wide_x_din_aux_pipe[WORD_W-1:0]}; + wire [DSP48E1_C_W -1:0] lsb_by = {{(DSP48E1_C_W-(WORD_EXT_W+1)){1'b0}}, rd_wide_y_din_aux_pipe[WORD_EXT_W-1:WORD_W], 1'b0, rd_wide_y_din_aux_pipe[WORD_W-1:0]}; + + wire [DSP48E1_P_W -1:0] lsb2msb_px_casc; + wire [DSP48E1_P_W -1:0] lsb2msb_py_casc; + + `MODEXPNG_DSP_SLICE_ADDSUB dsp_lsb_x + ( + .clk (clk), + .ce_abc (lsb_ce), + .ce_p (lsb_ce_dly), + .ce_ctrl (lsb_ce), + .x (lsb_ax), + .y (lsb_bx), + .p (lsb_px), + .op_mode (lsb_opmode), + .alu_mode (DSP48E1_ALUMODE_Z_PLUS_X_AND_Y_AND_CIN), + .carry_in_sel (DSP48E1_CARRYINSEL_CARRYIN), + .casc_p_in (), + .casc_p_out (lsb2msb_px_casc), + .carry_out () + ); + `MODEXPNG_DSP_SLICE_ADDSUB dsp_lsb_y + ( + .clk (clk), + .ce_abc (lsb_ce), + .ce_p (lsb_ce_dly), + .ce_ctrl (lsb_ce), + .x (lsb_ay), + .y (lsb_by), + .p (lsb_py), + .op_mode (lsb_opmode), + .alu_mode (DSP48E1_ALUMODE_Z_PLUS_X_AND_Y_AND_CIN), + .carry_in_sel (DSP48E1_CARRYINSEL_CARRYIN), + .casc_p_in (), + .casc_p_out (lsb2msb_py_casc), + .carry_out () + ); - // - // MSB Sum Computation - // always @(posedge clk or negedge rst_n) // - if (!rst_n) begin - clear_rdct_wide; - clear_rdct_narrow; - end else begin - // - clear_rdct_wide; - clear_rdct_narrow; - // - if (rcmb_xy_valid_dly4_x) + if (!rst_n) lsb_ce <= 1'b0; + else begin + lsb_ce <= 1'b0; + if (rcmb_xy_valid_dly[3]) // - case (rcmb_xy_bank_dly4_x) - - BANK_RCMB_MH: - if (rcmb_xy_addr_dly4_x == OP_ADDR_ONE) begin - set_rdct_wide (sel_wide_out, OP_ADDR_ZERO, sum_rdct_x_carry, sum_rdct_y_carry); - set_rdct_narrow(sel_narrow_out, OP_ADDR_ZERO, sum_rdct_x_carry, sum_rdct_y_carry); - end else if (rcmb_xy_addr_dly4_x > OP_ADDR_ONE) begin - set_rdct_wide (sel_wide_out, rcmb_xy_addr_dly4_x - 1'b1, sum_rdct_x, sum_rdct_y); - set_rdct_narrow(sel_narrow_out, rcmb_xy_addr_dly4_x - 1'b1, sum_rdct_x, sum_rdct_y); - end - - BANK_RCMB_EXT: begin - set_rdct_wide (sel_wide_out, word_index_last, rcmb_x_dout_dly4_x, rcmb_y_dout_dly4_x); - set_rdct_narrow(sel_narrow_out, word_index_last, rcmb_x_dout_dly4_x, rcmb_y_dout_dly4_x); - end - + case (rcmb_xy_bank_dly[3]) + BANK_RCMB_ML: lsb_ce <= 1'b1; + BANK_RCMB_MH: if (rcmb_xy_addr_dly3_is_zero) lsb_ce <= 1'b1; endcase // end + always @(posedge clk) begin + // + //lsb_opmode <= DSP48E1_OPMODE_DNC; + // + if (rcmb_xy_valid_dly[3]) + // + case (rcmb_xy_bank_dly[3]) + BANK_RCMB_ML: if (rcmb_xy_addr_dly3_is_zero) lsb_opmode <= DSP48E1_OPMODE_Z0_YC_XAB; + else lsb_opmode <= DSP48E1_OPMODE_ZP17_YC_XAB; + BANK_RCMB_MH: if (rcmb_xy_addr_dly3_is_zero) lsb_opmode <= DSP48E1_OPMODE_ZP17_YC_XAB; + endcase + // + end + + always @(posedge clk or negedge rst_n) + // + if (!rst_n) lsb_ce_dly <= 1'b0; + else lsb_ce_dly <= lsb_ce; + // - // Internal Busy Flag Logic + // MSB Math // - reg busy_next = 1'b0; - reg [2:0] busy_now_shreg = 3'b000; - wire busy_now = busy_now_shreg[2]; + reg msb_ce = 1'b0; + reg msb_ce_dly1 = 1'b0; + reg msb_ce_dly2 = 1'b0; + reg [DSP48E1_OPMODE_W -1:0] msb_opmode; + + wire [DSP48E1_P_W -1:0] msb_px; + wire [DSP48E1_P_W -1:0] msb_py; + wire [DSP48E1_C_W -1:0] msb_ax = {{(DSP48E1_C_W-WORD_EXT_W){1'b0}}, rcmb_x_dout_dly[4][WORD_EXT_W-1:WORD_W], rcmb_x_dout_dly[4][WORD_W-1:0]}; + wire [DSP48E1_C_W -1:0] msb_ay = {{(DSP48E1_C_W-WORD_EXT_W){1'b0}}, rcmb_y_dout_dly[4][WORD_EXT_W-1:WORD_W], rcmb_y_dout_dly[4][WORD_W-1:0]}; + wire [DSP48E1_C_W -1:0] msb_bx = {{(DSP48E1_C_W-WORD_EXT_W){1'b0}}, rd_wide_x_din_aux_pipe[WORD_EXT_W-1:WORD_W], rd_wide_x_din_aux_pipe[WORD_W-1:0]}; + wire [DSP48E1_C_W -1:0] msb_by = {{(DSP48E1_C_W-WORD_EXT_W){1'b0}}, rd_wide_y_din_aux_pipe[WORD_EXT_W-1:WORD_W], rd_wide_y_din_aux_pipe[WORD_W-1:0]}; + + `MODEXPNG_DSP_SLICE_ADDSUB dsp_msb_x + ( + .clk (clk), + .ce_abc (msb_ce), + .ce_p (msb_ce_dly1), + .ce_ctrl (msb_ce), + .x (msb_ax), + .y (msb_bx), + .p (msb_px), + .op_mode (msb_opmode), + .alu_mode (DSP48E1_ALUMODE_Z_PLUS_X_AND_Y_AND_CIN), + .carry_in_sel (DSP48E1_CARRYINSEL_CARRYIN), + .casc_p_in (lsb2msb_px_casc), + .casc_p_out (), + .carry_out () + ); + + `MODEXPNG_DSP_SLICE_ADDSUB dsp_msb_y + ( + .clk (clk), + .ce_abc (msb_ce), + .ce_p (msb_ce_dly1), + .ce_ctrl (msb_ce), + .x (msb_ay), + .y (msb_by), + .p (msb_py), + .op_mode (msb_opmode), + .alu_mode (DSP48E1_ALUMODE_Z_PLUS_X_AND_Y_AND_CIN), + .carry_in_sel (DSP48E1_CARRYINSEL_CARRYIN), + .casc_p_in (lsb2msb_py_casc), + .casc_p_out (), + .carry_out () + ); + always @(posedge clk or negedge rst_n) // - if (!rst_n) busy_now_shreg <= 3'b000; + if (!rst_n) msb_ce <= 1'b0; else begin - if (rdy && ena) busy_now_shreg <= 3'b111; - else busy_now_shreg <= {busy_now_shreg[1:0], busy_next}; - end - + msb_ce <= 1'b0; + if (rcmb_xy_valid_dly[3]) + // + case (rcmb_xy_bank_dly[3]) + BANK_RCMB_MH: if (!rcmb_xy_addr_dly3_is_zero) msb_ce <= 1'b1; + BANK_RCMB_EXT: msb_ce <= 1'b1; + endcase + // + end + + always @(posedge clk) begin + // + msb_opmode <= DSP48E1_OPMODE_DNC; + // + if (rcmb_xy_valid_dly[3]) + // + case (rcmb_xy_bank_dly[3]) + BANK_RCMB_MH: if (rcmb_xy_addr_dly3_is_one) msb_opmode <= DSP48E1_OPMODE_ZPCIN17_YC_XAB; + else if (rcmb_xy_addr_dly3_gt_one) msb_opmode <= DSP48E1_OPMODE_Z0_YC_XAB; + BANK_RCMB_EXT: msb_opmode <= DSP48E1_OPMODE_Z0_Y0_XAB; + endcase + // + end + always @(posedge clk or negedge rst_n) // - if (!rst_n) busy_next <= 1'b0; - else begin - if (rdy && ena) busy_next <= 1'b1; - if (!rdy && rcmb_xy_valid_dly4_x && (rcmb_xy_bank_dly4_x == BANK_RCMB_EXT)) busy_next <= 1'b0; - end + if (!rst_n) {msb_ce_dly2, msb_ce_dly1} <= {2'b00}; + else {msb_ce_dly2, msb_ce_dly1} <= {msb_ce_dly1, msb_ce}; // - // Ready Flag Logic + // Output Logic // - reg rdy_reg = 1'b1; + reg [OP_ADDR_W -1:0] wide_xy_addr_next; + reg [OP_ADDR_W -1:0] narrow_xy_addr_next; - assign rdy = rdy_reg; + always @(posedge clk) + // + if (msb_ce_dly1) + // + case (rcmb_xy_bank_dly[5]) + + BANK_RCMB_MH: + if (rcmb_xy_addr_dly5_is_one) + {wide_xy_addr_next, narrow_xy_addr_next} <= {OP_ADDR_ZERO, OP_ADDR_ZERO}; + else if (rcmb_xy_addr_dly5_gt_one) + {wide_xy_addr_next, narrow_xy_addr_next} <= {rcmb_xy_addr_dly[5] - 1'b1, rcmb_xy_addr_dly[5] - 1'b1}; + + BANK_RCMB_EXT: + {wide_xy_addr_next, narrow_xy_addr_next} <= {word_index_last, word_index_last}; + + endcase always @(posedge clk or negedge rst_n) // - if (!rst_n) rdy_reg <= 1'b1; - else begin - if (rdy && ena) rdy_reg <= 1'b0; - if (!rdy && !busy_now) rdy_reg <= 1'b1; - end - + if (!rst_n) begin + clear_rdct_wide; + clear_rdct_narrow; + end else begin + // + clear_rdct_wide; + clear_rdct_narrow; + // + if (msb_ce_dly2) + // + case (rcmb_xy_bank_dly[6]) + // + BANK_RCMB_MH: if (!rcmb_xy_addr_dly6_is_zero) begin + set_rdct_wide (sel_wide_out, wide_xy_addr_next, msb_px[WORD_EXT_W-1:0], msb_py[WORD_EXT_W-1:0]); + set_rdct_narrow(sel_narrow_out, narrow_xy_addr_next, msb_px[WORD_EXT_W-1:0], msb_py[WORD_EXT_W-1:0]); + end + // + BANK_RCMB_EXT: begin + set_rdct_wide (sel_wide_out, wide_xy_addr_next, msb_px[WORD_EXT_W-1:0], msb_py[WORD_EXT_W-1:0]); + set_rdct_narrow(sel_narrow_out, narrow_xy_addr_next, msb_px[WORD_EXT_W-1:0], msb_py[WORD_EXT_W-1:0]); + end + // + endcase + // + end endmodule From git at cryptech.is Mon Jan 20 21:18:16 2020 From: git at cryptech.is (git at cryptech.is) Date: Mon, 20 Jan 2020 21:18:16 +0000 Subject: [Cryptech-Commits] [user/shatov/modexpng] 15/21: Renumbered micro-operations. In-Reply-To: <157955508198.72689.15683234965619703441@bikeshed.cryptech.is> References: <157955508198.72689.15683234965619703441@bikeshed.cryptech.is> Message-ID: <20200120211835.116518BC2FA@bikeshed.cryptech.is> This is an automated email from the git hooks/post-receive script. meisterpaul1 at yandex.ru pushed a commit to branch master in repository user/shatov/modexpng. commit 04bc457e32f578f611d26753bc731122cb5dd87e Author: Pavel V. Shatov (Meister) AuthorDate: Tue Jan 21 00:06:25 2020 +0300 Renumbered micro-operations. --- rtl/modexpng_microcode.vh | 21 ++++++++++++--------- 1 file changed, 12 insertions(+), 9 deletions(-) diff --git a/rtl/modexpng_microcode.vh b/rtl/modexpng_microcode.vh index 15e4607..d117933 100644 --- a/rtl/modexpng_microcode.vh +++ b/rtl/modexpng_microcode.vh @@ -83,15 +83,18 @@ localparam [UOP_OPCODE_W -1:0] UOP_OPCODE_MODULAR_MULTIPLY = 5'd07; * AUX = AUX_2 forces B input to 1 (AUX_1 reads from source NARROW as usual) * LADDER specifies Montgomery ladder mode */ -localparam [UOP_OPCODE_W -1:0] UOP_OPCODE_MODULAR_SUBTRACT = 5'd08; + +localparam [UOP_OPCODE_W -1:0] UOP_OPCODE_MODULAR_SUBTRACT_X = 5'd08; +localparam [UOP_OPCODE_W -1:0] UOP_OPCODE_MODULAR_SUBTRACT_Y = 5'd09; +localparam [UOP_OPCODE_W -1:0] UOP_OPCODE_MODULAR_SUBTRACT_Z = 5'd10; /* CRT is don't care * NPQ specifies the width of the operand * AUX is don't care * LADDER is don't care */ -localparam [UOP_OPCODE_W -1:0] UOP_OPCODE_MODULAR_REDUCE_INIT = 5'd09; -localparam [UOP_OPCODE_W -1:0] UOP_OPCODE_MODULAR_REDUCE_PROC = 5'd10; +localparam [UOP_OPCODE_W -1:0] UOP_OPCODE_MODULAR_REDUCE_INIT = 5'd11; +localparam [UOP_OPCODE_W -1:0] UOP_OPCODE_MODULAR_REDUCE_PROC = 5'd12; /* CRT * NPQ * AUX @@ -99,7 +102,7 @@ localparam [UOP_OPCODE_W -1:0] UOP_OPCODE_MODULAR_REDUCE_PROC = 5'd10; */ -localparam [UOP_OPCODE_W -1:0] UOP_OPCODE_PROPAGATE_CARRIES = 5'd11; +localparam [UOP_OPCODE_W -1:0] UOP_OPCODE_PROPAGATE_CARRIES = 5'd13; /* CRT is don't care * NPQ specifies the width of the operand * AUX is don't care @@ -107,20 +110,20 @@ localparam [UOP_OPCODE_W -1:0] UOP_OPCODE_PROPAGATE_CARRIES = 5'd11; * source and destination WIDE are don't care */ -localparam [UOP_OPCODE_W -1:0] UOP_OPCODE_MERGE_LH = 5'd12; +localparam [UOP_OPCODE_W -1:0] UOP_OPCODE_MERGE_LH = 5'd14; /* */ -localparam [UOP_OPCODE_W -1:0] UOP_OPCODE_REGULAR_MULTIPLY = 5'd13; +localparam [UOP_OPCODE_W -1:0] UOP_OPCODE_REGULAR_MULTIPLY = 5'd15; /* */ -localparam [UOP_OPCODE_W -1:0] UOP_OPCODE_REGULAR_ADD_UNEVEN = 5'd14; +localparam [UOP_OPCODE_W -1:0] UOP_OPCODE_REGULAR_ADD_UNEVEN = 5'd16; /* */ -localparam [UOP_OPCODE_W -1:0] UOP_OPCODE_LADDER_INIT = 5'd15; -localparam [UOP_OPCODE_W -1:0] UOP_OPCODE_LADDER_STEP = 5'd16; +localparam [UOP_OPCODE_W -1:0] UOP_OPCODE_LADDER_INIT = 5'd17; +localparam [UOP_OPCODE_W -1:0] UOP_OPCODE_LADDER_STEP = 5'd18; /* CRT is don't care * NPQ is don't care * AUX is don't care From git at cryptech.is Mon Jan 20 21:18:17 2020 From: git at cryptech.is (git at cryptech.is) Date: Mon, 20 Jan 2020 21:18:17 +0000 Subject: [Cryptech-Commits] [user/shatov/modexpng] 16/21: The I/O manager has to work in sync with the general worker module. Made the necessary changes to make it work after the general worker update. Also moved debug simulation-time code into a separate file. In-Reply-To: <157955508198.72689.15683234965619703441@bikeshed.cryptech.is> References: <157955508198.72689.15683234965619703441@bikeshed.cryptech.is> Message-ID: <20200120211840.2EEB68BC5E4@bikeshed.cryptech.is> This is an automated email from the git hooks/post-receive script. meisterpaul1 at yandex.ru pushed a commit to branch master in repository user/shatov/modexpng. commit 76f89d63c8f06bb61ea7db42eddecc3498c5c73e Author: Pavel V. Shatov (Meister) AuthorDate: Tue Jan 21 00:07:36 2020 +0300 The I/O manager has to work in sync with the general worker module. Made the necessary changes to make it work after the general worker update. Also moved debug simulation-time code into a separate file. --- rtl/modexpng_io_manager.v | 364 ++++++++++++++++++++++++--------------- rtl/modexpng_io_manager_debug.vh | 41 +++++ 2 files changed, 266 insertions(+), 139 deletions(-) diff --git a/rtl/modexpng_io_manager.v b/rtl/modexpng_io_manager.v index a5cd1db..466f1ea 100644 --- a/rtl/modexpng_io_manager.v +++ b/rtl/modexpng_io_manager.v @@ -173,19 +173,26 @@ module modexpng_io_manager // // FSM Declaration // - localparam [2:0] IO_FSM_STATE_IDLE = 3'b000; - localparam [2:0] IO_FSM_STATE_LATENCY_PRE1 = 3'b001; - localparam [2:0] IO_FSM_STATE_LATENCY_PRE2 = 3'b010; - localparam [2:0] IO_FSM_STATE_BUSY = 3'b011; - localparam [2:0] IO_FSM_STATE_EXTRA = 3'b100; - localparam [2:0] IO_FSM_STATE_LATENCY_POST1 = 3'b101; - localparam [2:0] IO_FSM_STATE_LATENCY_POST2 = 3'b110; - localparam [2:0] IO_FSM_STATE_STOP = 3'b111; - - reg [2:0] io_fsm_state = IO_FSM_STATE_IDLE; - reg [2:0] io_fsm_state_next; - + localparam [3:0] IO_FSM_STATE_IDLE_X = 4'h0; + localparam [3:0] IO_FSM_STATE_LATENCY_PRE1_X = 4'h1; + localparam [3:0] IO_FSM_STATE_LATENCY_PRE2_X = 4'h2; + localparam [3:0] IO_FSM_STATE_LATENCY_PRE3_X = 4'h3; + localparam [3:0] IO_FSM_STATE_LATENCY_PRE4_X = 4'h4; + localparam [3:0] IO_FSM_STATE_BUSY1_X = 4'hA; + localparam [3:0] IO_FSM_STATE_BUSY2_X = 4'hB; + localparam [3:0] IO_FSM_STATE_EXTRA1_X = 4'hC; + localparam [3:0] IO_FSM_STATE_EXTRA2_X = 4'hD; + localparam [3:0] IO_FSM_STATE_LATENCY_POST1_X = 4'h5; + localparam [3:0] IO_FSM_STATE_LATENCY_POST2_X = 4'h6; + localparam [3:0] IO_FSM_STATE_LATENCY_POST3_X = 4'h7; + localparam [3:0] IO_FSM_STATE_LATENCY_POST4_X = 4'h8; + localparam [3:0] IO_FSM_STATE_STOP_X = 4'hF; + + reg [3:0] io_fsm_state = IO_FSM_STATE_IDLE_X; + reg [3:0] io_fsm_state_next; + wire [3:0] io_fsm_state_after_busy; + // // Control Signals // @@ -280,17 +287,32 @@ module modexpng_io_manager // Delays // reg [OP_ADDR_W -1:0] in_1_addr_op_dly1; - reg [OP_ADDR_W -1:0] in_1_addr_op_dly2; reg [OP_ADDR_W -1:0] in_2_addr_op_dly1; - reg [OP_ADDR_W -1:0] in_2_addr_op_dly2; - reg [OP_ADDR_W -1:0] dummy_addr_op_dly1; - reg [OP_ADDR_W -1:0] dummy_addr_op_dly2; + reg [OP_ADDR_W -1:0] dummy_addr_op_dly1; + + reg [WORD_W -1:0] io_in_1_din_dly1; + reg [WORD_W -1:0] io_in_2_din_dly1; + + reg [WORD_W -1:0] wrk_narrow_x_din_x_lsb_dly1; + reg [WORD_W -1:0] wrk_narrow_y_din_x_lsb_dly1; + reg [WORD_W -1:0] wrk_narrow_x_din_y_lsb_dly1; + reg [WORD_W -1:0] wrk_narrow_y_din_y_lsb_dly1; + always @(posedge clk) begin // - {in_1_addr_op_dly2, in_1_addr_op_dly1} <= {in_1_addr_op_dly1, in_1_addr_op}; - {in_2_addr_op_dly2, in_2_addr_op_dly1} <= {in_2_addr_op_dly1, in_2_addr_op}; - {dummy_addr_op_dly2, dummy_addr_op_dly1} <= {dummy_addr_op_dly1, dummy_addr_op}; + {in_1_addr_op_dly1} <= {in_1_addr_op}; + {in_2_addr_op_dly1} <= {in_2_addr_op}; + // + {io_in_1_din_dly1} <= {io_in_1_din}; + {io_in_2_din_dly1} <= {io_in_2_din}; + // + {dummy_addr_op_dly1} <= {dummy_addr_op}; + // + {wrk_narrow_x_din_x_lsb_dly1} <= {wrk_narrow_x_din_x_lsb}; + {wrk_narrow_y_din_x_lsb_dly1} <= {wrk_narrow_y_din_x_lsb}; + {wrk_narrow_x_din_y_lsb_dly1} <= {wrk_narrow_x_din_y_lsb}; + {wrk_narrow_y_din_y_lsb_dly1} <= {wrk_narrow_y_din_y_lsb}; // end @@ -318,12 +340,7 @@ module modexpng_io_manager wire sel_aux_is_1 = sel_aux == UOP_AUX_1; wire sel_aux_is_2 = sel_aux == UOP_AUX_2; - - wire in_1_addr_op_next_is_last; - wire in_2_addr_op_next_is_last; - wire in_2_addr_op_next_is_one; - wire dummy_addr_op_next_is_last; - + // // Ladder Init/Step Logic @@ -346,7 +363,7 @@ module modexpng_io_manager always @(posedge clk) // - if (io_fsm_state_next == IO_FSM_STATE_LATENCY_PRE1) begin + if (io_fsm_state_next == IO_FSM_STATE_LATENCY_PRE1_X) begin // if (opcode_is_ladder_init) begin ladder_index <= ladder_steps; @@ -366,40 +383,46 @@ module modexpng_io_manager // // Ladder Mux // - reg ladder_dpq_mux; + reg ladder_dpq_mux_dly1; + reg ladder_dpq_mux_dly2; + wire ladder_dpq_mux = ladder_dpq_mux_dly2; - always @(io_in_2_din, ladder_index_lsb) + always @(io_in_2_din_dly1, ladder_index_lsb) // case(ladder_index_lsb) - 4'b0000: ladder_dpq_mux = io_in_2_din[ 0]; - 4'b0001: ladder_dpq_mux = io_in_2_din[ 1]; - 4'b0010: ladder_dpq_mux = io_in_2_din[ 2]; - 4'b0011: ladder_dpq_mux = io_in_2_din[ 3]; - 4'b0100: ladder_dpq_mux = io_in_2_din[ 4]; - 4'b0101: ladder_dpq_mux = io_in_2_din[ 5]; - 4'b0110: ladder_dpq_mux = io_in_2_din[ 6]; - 4'b0111: ladder_dpq_mux = io_in_2_din[ 7]; - 4'b1000: ladder_dpq_mux = io_in_2_din[ 8]; - 4'b1001: ladder_dpq_mux = io_in_2_din[ 9]; - 4'b1010: ladder_dpq_mux = io_in_2_din[10]; - 4'b1011: ladder_dpq_mux = io_in_2_din[11]; - 4'b1100: ladder_dpq_mux = io_in_2_din[12]; - 4'b1101: ladder_dpq_mux = io_in_2_din[13]; - 4'b1110: ladder_dpq_mux = io_in_2_din[14]; - 4'b1111: ladder_dpq_mux = io_in_2_din[15]; + 4'b0000: ladder_dpq_mux_dly1 = io_in_2_din_dly1[ 0]; + 4'b0001: ladder_dpq_mux_dly1 = io_in_2_din_dly1[ 1]; + 4'b0010: ladder_dpq_mux_dly1 = io_in_2_din_dly1[ 2]; + 4'b0011: ladder_dpq_mux_dly1 = io_in_2_din_dly1[ 3]; + 4'b0100: ladder_dpq_mux_dly1 = io_in_2_din_dly1[ 4]; + 4'b0101: ladder_dpq_mux_dly1 = io_in_2_din_dly1[ 5]; + 4'b0110: ladder_dpq_mux_dly1 = io_in_2_din_dly1[ 6]; + 4'b0111: ladder_dpq_mux_dly1 = io_in_2_din_dly1[ 7]; + 4'b1000: ladder_dpq_mux_dly1 = io_in_2_din_dly1[ 8]; + 4'b1001: ladder_dpq_mux_dly1 = io_in_2_din_dly1[ 9]; + 4'b1010: ladder_dpq_mux_dly1 = io_in_2_din_dly1[10]; + 4'b1011: ladder_dpq_mux_dly1 = io_in_2_din_dly1[11]; + 4'b1100: ladder_dpq_mux_dly1 = io_in_2_din_dly1[12]; + 4'b1101: ladder_dpq_mux_dly1 = io_in_2_din_dly1[13]; + 4'b1110: ladder_dpq_mux_dly1 = io_in_2_din_dly1[14]; + 4'b1111: ladder_dpq_mux_dly1 = io_in_2_din_dly1[15]; endcase - + + always @(posedge clk) + // + ladder_dpq_mux_dly2 <= ladder_dpq_mux_dly1; + always @(posedge clk) // case (io_fsm_state) // - IO_FSM_STATE_BUSY: + IO_FSM_STATE_BUSY1_X: if (opcode_is_ladder) ladder_d_r <= ladder_dpq_mux; // - IO_FSM_STATE_LATENCY_POST1: + IO_FSM_STATE_LATENCY_POST1_X: if (opcode_is_ladder) ladder_p_r <= ladder_dpq_mux; // - IO_FSM_STATE_LATENCY_POST2: + IO_FSM_STATE_LATENCY_POST3_X: if (opcode_is_ladder) ladder_q_r <= ladder_dpq_mux; // endcase @@ -415,14 +438,14 @@ module modexpng_io_manager in_2_en <= 1'b0; end else case (io_fsm_state_next) // - IO_FSM_STATE_LATENCY_PRE1, - IO_FSM_STATE_LATENCY_PRE2, - IO_FSM_STATE_BUSY: begin + IO_FSM_STATE_LATENCY_PRE1_X, + IO_FSM_STATE_LATENCY_PRE3_X, + IO_FSM_STATE_BUSY1_X: begin in_1_en <= opcode_is_input && sel_aux_is_1; in_2_en <= (opcode_is_input && sel_aux_is_2) || opcode_is_ladder; end // - IO_FSM_STATE_EXTRA: begin + IO_FSM_STATE_EXTRA1_X: begin in_1_en <= opcode_is_input && sel_aux_is_1 && sel_in_needs_extra; in_2_en <= opcode_is_input && sel_aux_is_2 && sel_in_needs_extra; end @@ -434,6 +457,7 @@ module modexpng_io_manager // endcase + // // Destination Enable Logic // @@ -450,9 +474,9 @@ module modexpng_io_manager // end else case (io_fsm_state) // - IO_FSM_STATE_BUSY, - IO_FSM_STATE_EXTRA, - IO_FSM_STATE_LATENCY_POST1: begin + IO_FSM_STATE_BUSY1_X, + IO_FSM_STATE_EXTRA1_X, + IO_FSM_STATE_LATENCY_POST1_X: begin // wide_xy_ena_x <= opcode_is_input_wide && sel_crt_is_x; wide_xy_ena_y <= opcode_is_input_wide && sel_crt_is_y; @@ -463,7 +487,7 @@ module modexpng_io_manager // end // - IO_FSM_STATE_LATENCY_POST2: begin + IO_FSM_STATE_LATENCY_POST3_X: begin // wide_xy_ena_x <= 1'b0; wide_xy_ena_y <= 1'b0; @@ -491,10 +515,23 @@ module modexpng_io_manager // // Output Data Logic // - wire [WORD_EXT_W -1:0] io_in_dout_mux = {{(WORD_EXT_W-WORD_W){1'b0}}, sel_aux_is_1 ? io_in_1_din : io_in_2_din}; + reg [ WORD_W -1:0] io_in_dout_mux_dly2; + wire [WORD_EXT_W -1:0] io_in_dout_mux = {{(WORD_EXT_W-WORD_W){1'b0}}, io_in_dout_mux_dly2}; - wire [WORD_W -1:0] wrk_narrow_xy_din_x_mux_lsb = sel_aux == UOP_AUX_1 ? wrk_narrow_x_din_x_lsb : wrk_narrow_y_din_x_lsb; - wire [WORD_W -1:0] wrk_narrow_xy_din_y_mux_lsb = sel_aux == UOP_AUX_1 ? wrk_narrow_x_din_y_lsb : wrk_narrow_y_din_y_lsb; + reg [WORD_W -1:0] wrk_narrow_din_x_lsb_mux_dly2; + reg [WORD_W -1:0] wrk_narrow_din_y_lsb_mux_dly2; + + wire [WORD_W -1:0] wrk_narrow_din_x_lsb_mux = wrk_narrow_din_x_lsb_mux_dly2; + wire [WORD_W -1:0] wrk_narrow_din_y_lsb_mux = wrk_narrow_din_y_lsb_mux_dly2; + + always @(posedge clk) begin + // + io_in_dout_mux_dly2 <= sel_aux_is_1 ? io_in_1_din_dly1 : io_in_2_din_dly1; + // + wrk_narrow_din_x_lsb_mux_dly2 = sel_aux == UOP_AUX_1 ? wrk_narrow_x_din_x_lsb_dly1 : wrk_narrow_y_din_x_lsb_dly1; + wrk_narrow_din_y_lsb_mux_dly2 = sel_aux == UOP_AUX_1 ? wrk_narrow_x_din_y_lsb_dly1 : wrk_narrow_y_din_y_lsb_dly1; + // + end always @(posedge clk) begin // @@ -511,25 +548,25 @@ module modexpng_io_manager // case (io_fsm_state) // - IO_FSM_STATE_BUSY, - IO_FSM_STATE_EXTRA, - IO_FSM_STATE_LATENCY_POST1: begin + IO_FSM_STATE_BUSY1_X, + IO_FSM_STATE_EXTRA1_X, + IO_FSM_STATE_LATENCY_POST1_X: begin // if (opcode_is_input_wide && sel_crt_is_x) {wide_x_din_x, wide_y_din_x} <= {2{io_in_dout_mux}}; if (opcode_is_input_wide && sel_crt_is_y) {wide_x_din_y, wide_y_din_y} <= {2{io_in_dout_mux}}; if (opcode_is_input_narrow && sel_crt_is_x) {narrow_x_din_x, narrow_y_din_x} <= {2{io_in_dout_mux}}; if (opcode_is_input_narrow && sel_crt_is_y) {narrow_x_din_y, narrow_y_din_y} <= {2{io_in_dout_mux}}; // - if (opcode_is_output) out_dout <= sel_crt_is_x ? wrk_narrow_xy_din_x_mux_lsb : wrk_narrow_xy_din_y_mux_lsb; + if (opcode_is_output) out_dout <= sel_crt_is_x ? wrk_narrow_din_x_lsb_mux : wrk_narrow_din_y_lsb_mux; // end // - IO_FSM_STATE_LATENCY_POST2: begin + IO_FSM_STATE_LATENCY_POST3_X: begin // if (opcode_is_input_narrow && sel_crt_is_x && sel_in_needs_extra) {narrow_x_din_x, narrow_y_din_x} <= {2{io_in_dout_mux}}; if (opcode_is_input_narrow && sel_crt_is_y && sel_in_needs_extra) {narrow_x_din_y, narrow_y_din_y} <= {2{io_in_dout_mux}}; // - if (opcode_is_output) out_dout <= sel_crt_is_x ? wrk_narrow_xy_din_x_mux_lsb : wrk_narrow_xy_din_y_mux_lsb; + if (opcode_is_output) out_dout <= sel_crt_is_x ? wrk_narrow_din_x_lsb_mux : wrk_narrow_din_y_lsb_mux; // end // @@ -541,8 +578,26 @@ module modexpng_io_manager // // Destination Address Logic // - wire [OP_ADDR_W -1:0] in_addr_op_dly2_mux = - sel_aux_is_1 ? in_1_addr_op_dly2 : in_2_addr_op_dly2; + reg [OP_ADDR_W -1:0] in_addr_op_dly2_mux; + reg [OP_ADDR_W -1:0] in_addr_op_dly3_mux; + reg [OP_ADDR_W -1:0] in_addr_op_dly4_mux; + wire [OP_ADDR_W -1:0] in_addr_op_mux = in_addr_op_dly4_mux; + + reg [OP_ADDR_W -1:0] dummy_addr_op_dly2; + reg [OP_ADDR_W -1:0] dummy_addr_op_dly3; + reg [OP_ADDR_W -1:0] dummy_addr_op_dly4; + + always @(posedge clk) begin + // + in_addr_op_dly2_mux <= sel_aux_is_1 ? in_1_addr_op_dly1 : in_2_addr_op_dly1; + in_addr_op_dly3_mux <= in_addr_op_dly2_mux; + in_addr_op_dly4_mux <= in_addr_op_dly3_mux; + // + dummy_addr_op_dly2 <= dummy_addr_op_dly1; + dummy_addr_op_dly3 <= dummy_addr_op_dly2; + dummy_addr_op_dly4 <= dummy_addr_op_dly3; + // + end always @(posedge clk) begin // @@ -554,27 +609,27 @@ module modexpng_io_manager // case (io_fsm_state) // - IO_FSM_STATE_BUSY, - IO_FSM_STATE_EXTRA, - IO_FSM_STATE_LATENCY_POST1: begin - if (opcode_is_input_wide && sel_crt_is_x) {wide_xy_bank_x, wide_xy_addr_x } <= {sel_out, in_addr_op_dly2_mux}; - if (opcode_is_input_wide && sel_crt_is_y) {wide_xy_bank_y, wide_xy_addr_y } <= {sel_out, in_addr_op_dly2_mux}; - if (opcode_is_input_narrow && sel_crt_is_x) {narrow_xy_bank_x, narrow_xy_addr_x} <= {sel_out, in_addr_op_dly2_mux}; - if (opcode_is_input_narrow && sel_crt_is_y) {narrow_xy_bank_y, narrow_xy_addr_y} <= {sel_out, in_addr_op_dly2_mux}; - if (opcode_is_output ) {out_addr_bank, out_addr_op} <= {sel_out, dummy_addr_op_dly2}; + IO_FSM_STATE_BUSY1_X, + IO_FSM_STATE_EXTRA1_X, + IO_FSM_STATE_LATENCY_POST1_X: begin + if (opcode_is_input_wide && sel_crt_is_x) {wide_xy_bank_x, wide_xy_addr_x } <= {sel_out, in_addr_op_mux }; + if (opcode_is_input_wide && sel_crt_is_y) {wide_xy_bank_y, wide_xy_addr_y } <= {sel_out, in_addr_op_mux }; + if (opcode_is_input_narrow && sel_crt_is_x) {narrow_xy_bank_x, narrow_xy_addr_x} <= {sel_out, in_addr_op_mux }; + if (opcode_is_input_narrow && sel_crt_is_y) {narrow_xy_bank_y, narrow_xy_addr_y} <= {sel_out, in_addr_op_mux }; + if (opcode_is_output ) {out_addr_bank, out_addr_op} <= {sel_out, dummy_addr_op_dly4}; end // - IO_FSM_STATE_LATENCY_POST2: begin + IO_FSM_STATE_LATENCY_POST3_X: begin if (opcode_is_input_narrow && sel_crt_is_x && sel_in_needs_extra) {narrow_xy_bank_x, narrow_xy_addr_x} <= {BANK_NARROW_EXT, OP_ADDR_EXT_COEFF }; if (opcode_is_input_narrow && sel_crt_is_y && sel_in_needs_extra) {narrow_xy_bank_y, narrow_xy_addr_y} <= {BANK_NARROW_EXT, OP_ADDR_EXT_COEFF }; - if (opcode_is_output ) {out_addr_bank, out_addr_op } <= {sel_out, dummy_addr_op_dly2}; + if (opcode_is_output ) {out_addr_bank, out_addr_op } <= {sel_out, dummy_addr_op_dly4}; end // endcase // end - + // // Source Address Logic // @@ -586,42 +641,70 @@ module modexpng_io_manager wire [OP_ADDR_W -1:0] in_2_addr_op_next = in_2_addr_next[OP_ADDR_W -1:0]; wire [OP_ADDR_W -1:0] dummy_addr_op_next = dummy_addr_next; - assign in_1_addr_op_next_is_last = in_1_addr_op_next == word_index_last; - assign in_2_addr_op_next_is_last = in_2_addr_op_next == word_index_last; -// assign in_2_addr_op_next_is_one = in_2_addr_op_next == OP_ADDR_ONE; - assign dummy_addr_op_next_is_last = dummy_addr_op_next == word_index_last; - + reg in_1_addr_op_is_last = 1'b0; + reg in_2_addr_op_is_last = 1'b0; + reg dummy_addr_op_is_last = 1'b0; + always @(posedge clk) begin // {in_1_addr_bank, in_1_addr_op } <= {BANK_DNC, OP_ADDR_DNC}; {in_2_addr_bank, in_2_addr_op } <= {BANK_DNC, OP_ADDR_DNC}; { dummy_addr_op} <= { OP_ADDR_DNC}; // - in_1_addr_next <= {BANK_DNC, OP_ADDR_DNC}; - in_2_addr_next <= {BANK_DNC, OP_ADDR_DNC}; - dummy_addr_next <= { OP_ADDR_DNC}; - // case (io_fsm_state_next) // - IO_FSM_STATE_LATENCY_PRE1: begin + IO_FSM_STATE_LATENCY_PRE1_X: begin // {in_1_addr_bank, in_1_addr_op } <= {sel_in, OP_ADDR_ZERO}; if (!opcode_is_ladder) {in_2_addr_bank, in_2_addr_op } <= {sel_in, OP_ADDR_ZERO}; - else {in_2_addr_bank, in_2_addr_op } <= {BANK_DNC, OP_ADDR_DNC}; + else {in_2_addr_bank, in_2_addr_op } <= {BANK_DNC, OP_ADDR_DNC }; { dummy_addr_op} <= { OP_ADDR_ZERO}; - // - in_1_addr_next <= {sel_in, OP_ADDR_ONE}; - in_2_addr_next <= {sel_in, OP_ADDR_ONE}; - dummy_addr_next <= { OP_ADDR_ONE}; - // + // end // - IO_FSM_STATE_LATENCY_PRE2: begin + IO_FSM_STATE_LATENCY_PRE3_X: begin // {in_1_addr_bank, in_1_addr_op } <= in_1_addr_next; if (!opcode_is_ladder) {in_2_addr_bank, in_2_addr_op } <= in_2_addr_next; else {in_2_addr_bank, in_2_addr_op } <= {BANK_IN_2_D, ladder_index_msb}; { dummy_addr_op} <= dummy_addr_next; + // + end + // + IO_FSM_STATE_BUSY1_X: begin + // + {in_1_addr_bank, in_1_addr_op } <= in_1_addr_next; + {in_2_addr_bank, in_2_addr_op } <= in_2_addr_next; + { dummy_addr_op} <= dummy_addr_next; + // + end + // + IO_FSM_STATE_EXTRA1_X: + // + if (opcode_is_input && sel_in_needs_extra) begin + // + if (sel_aux_is_1) {in_1_addr_bank, in_1_addr_op} <= in_1_addr_next; + if (sel_aux_is_2) {in_2_addr_bank, in_2_addr_op} <= in_2_addr_next; + // + end + // + endcase + // + end + + always @(posedge clk) + // + case (io_fsm_state_next) + // + IO_FSM_STATE_LATENCY_PRE1_X: begin + // + in_1_addr_next <= {sel_in, OP_ADDR_ONE}; + in_2_addr_next <= {sel_in, OP_ADDR_ONE}; + dummy_addr_next <= { OP_ADDR_ONE}; + // + end + // + IO_FSM_STATE_LATENCY_PRE3_X: begin // in_1_addr_next <= in_1_addr_next + 1'b1; if (!opcode_is_ladder) in_2_addr_next <= in_2_addr_next + 1'b1; @@ -630,11 +713,7 @@ module modexpng_io_manager // end // - IO_FSM_STATE_BUSY: begin - // - {in_1_addr_bank, in_1_addr_op } <= in_1_addr_next; - {in_2_addr_bank, in_2_addr_op } <= in_2_addr_next; - { dummy_addr_op} <= dummy_addr_next; + IO_FSM_STATE_BUSY1_X: begin // in_1_addr_next <= in_1_addr_next + 1'b1; if (!opcode_is_ladder) in_2_addr_next <= in_2_addr_next + 1'b1; @@ -643,33 +722,42 @@ module modexpng_io_manager // end // - IO_FSM_STATE_EXTRA: + IO_FSM_STATE_EXTRA1_X: // if (opcode_is_input && sel_in_needs_extra) begin // - if (sel_aux_is_1) begin - {in_1_addr_bank, in_1_addr_op} <= in_1_addr_next; - in_1_addr_next <= in_1_addr_next + 1'b1; - end - // - if (sel_aux_is_2) begin - {in_2_addr_bank, in_2_addr_op} <= in_2_addr_next; - in_2_addr_next <= in_2_addr_next + 1'b1; - end + if (sel_aux_is_1) in_1_addr_next <= in_1_addr_next + 1'b1; + if (sel_aux_is_2) in_2_addr_next <= in_2_addr_next + 1'b1; // - end + end + // + endcase + + always @(posedge clk) begin + // + in_1_addr_op_is_last <= 1'b0; + in_2_addr_op_is_last <= 1'b0; + dummy_addr_op_is_last <= 1'b0; + // + case (io_fsm_state_next) + // + IO_FSM_STATE_BUSY1_X: begin + in_1_addr_op_is_last <= in_1_addr_op_next == word_index_last; + in_2_addr_op_is_last <= in_2_addr_op_next == word_index_last; + dummy_addr_op_is_last <= dummy_addr_op_next == word_index_last; + end // endcase // end - + // // FSM Process // always @(posedge clk or negedge rst_n) // - if (!rst_n) io_fsm_state <= IO_FSM_STATE_IDLE; + if (!rst_n) io_fsm_state <= IO_FSM_STATE_IDLE_X; else io_fsm_state <= io_fsm_state_next; @@ -682,13 +770,13 @@ module modexpng_io_manager // io_fsm_done <= 1'b0; // - if (io_fsm_state == IO_FSM_STATE_BUSY) begin + if (io_fsm_state == IO_FSM_STATE_BUSY1_X) begin // if (opcode_is_input) begin - if (sel_aux_is_1 && in_1_addr_op_next_is_last) io_fsm_done <= 1'b1; - if (sel_aux_is_2 && in_2_addr_op_next_is_last) io_fsm_done <= 1'b1; + if (sel_aux_is_1 && in_1_addr_op_is_last) io_fsm_done <= 1'b1; + if (sel_aux_is_2 && in_2_addr_op_is_last) io_fsm_done <= 1'b1; end else if (opcode_is_output || opcode_is_ladder) begin - if (dummy_addr_op_next_is_last) io_fsm_done <= 1'b1; + if (dummy_addr_op_is_last) io_fsm_done <= 1'b1; end // end @@ -699,19 +787,26 @@ module modexpng_io_manager // // FSM Transition Logic // - wire [2:0] io_fsm_state_after_busy = opcode_is_input ? IO_FSM_STATE_EXTRA : IO_FSM_STATE_LATENCY_POST1; + assign io_fsm_state_after_busy = opcode_is_input ? IO_FSM_STATE_EXTRA1_X : IO_FSM_STATE_LATENCY_POST1_X; always @* begin // case (io_fsm_state) - IO_FSM_STATE_IDLE: io_fsm_state_next = ena ? IO_FSM_STATE_LATENCY_PRE1 : IO_FSM_STATE_IDLE ; - IO_FSM_STATE_LATENCY_PRE1: io_fsm_state_next = IO_FSM_STATE_LATENCY_PRE2 ; - IO_FSM_STATE_LATENCY_PRE2: io_fsm_state_next = IO_FSM_STATE_BUSY ; - IO_FSM_STATE_BUSY: io_fsm_state_next = io_fsm_done ? io_fsm_state_after_busy : IO_FSM_STATE_BUSY ; - IO_FSM_STATE_EXTRA: io_fsm_state_next = IO_FSM_STATE_LATENCY_POST1 ; - IO_FSM_STATE_LATENCY_POST1: io_fsm_state_next = IO_FSM_STATE_LATENCY_POST2 ; - IO_FSM_STATE_LATENCY_POST2: io_fsm_state_next = IO_FSM_STATE_STOP ; - IO_FSM_STATE_STOP: io_fsm_state_next = IO_FSM_STATE_IDLE ; + IO_FSM_STATE_IDLE_X: io_fsm_state_next = ena ? IO_FSM_STATE_LATENCY_PRE1_X : IO_FSM_STATE_IDLE_X ; + IO_FSM_STATE_LATENCY_PRE1_X: io_fsm_state_next = IO_FSM_STATE_LATENCY_PRE2_X ; + IO_FSM_STATE_LATENCY_PRE2_X: io_fsm_state_next = IO_FSM_STATE_LATENCY_PRE3_X ; + IO_FSM_STATE_LATENCY_PRE3_X: io_fsm_state_next = IO_FSM_STATE_LATENCY_PRE4_X ; + IO_FSM_STATE_LATENCY_PRE4_X: io_fsm_state_next = IO_FSM_STATE_BUSY1_X ; + IO_FSM_STATE_BUSY1_X: io_fsm_state_next = IO_FSM_STATE_BUSY2_X ; + IO_FSM_STATE_BUSY2_X: io_fsm_state_next = io_fsm_done ? io_fsm_state_after_busy : IO_FSM_STATE_BUSY1_X; + IO_FSM_STATE_EXTRA1_X: io_fsm_state_next = IO_FSM_STATE_EXTRA2_X ; + IO_FSM_STATE_EXTRA2_X: io_fsm_state_next = IO_FSM_STATE_LATENCY_POST1_X ; + IO_FSM_STATE_LATENCY_POST1_X: io_fsm_state_next = IO_FSM_STATE_LATENCY_POST2_X ; + IO_FSM_STATE_LATENCY_POST2_X: io_fsm_state_next = IO_FSM_STATE_LATENCY_POST3_X ; + IO_FSM_STATE_LATENCY_POST3_X: io_fsm_state_next = IO_FSM_STATE_LATENCY_POST4_X ; + IO_FSM_STATE_LATENCY_POST4_X: io_fsm_state_next = IO_FSM_STATE_STOP_X ; + IO_FSM_STATE_STOP_X: io_fsm_state_next = IO_FSM_STATE_IDLE_X ; + default: io_fsm_state_next = IO_FSM_STATE_IDLE_X ; endcase // end @@ -728,25 +823,16 @@ module modexpng_io_manager // if (!rst_n) rdy_reg <= 1'b1; else case (io_fsm_state) - IO_FSM_STATE_IDLE: rdy_reg <= ~ena; - IO_FSM_STATE_STOP: rdy_reg <= 1'b1; + IO_FSM_STATE_IDLE_X: rdy_reg <= ~ena; + IO_FSM_STATE_STOP_X: rdy_reg <= 1'b1; endcase // - // BEGIN DEBUG + // Optional Debug Facility // `ifdef MODEXPNG_ENABLE_DEBUG - always @(posedge clk) - // - if (io_fsm_state == IO_FSM_STATE_STOP) begin - if (opcode_is_ladder_init) begin - $display("[step] | D | P | Q"); - $display("-------+---+---+---"); - end else if (opcode_is_ladder_step) - $display("[%4d] | %d | %d | %d", ladder_index, ladder_d_r, ladder_p_r, ladder_q_r); - end - // + `include "modexpng_io_manager_debug.vh" `endif diff --git a/rtl/modexpng_io_manager_debug.vh b/rtl/modexpng_io_manager_debug.vh new file mode 100644 index 0000000..593005b --- /dev/null +++ b/rtl/modexpng_io_manager_debug.vh @@ -0,0 +1,41 @@ +//====================================================================== +// +// Copyright (c) 2019, NORDUnet A/S All rights reserved. +// +// Redistribution and use in source and binary forms, with or without +// modification, are permitted provided that the following conditions +// are met: +// - Redistributions of source code must retain the above copyright +// notice, this list of conditions and the following disclaimer. +// +// - Redistributions in binary form must reproduce the above copyright +// notice, this list of conditions and the following disclaimer in the +// documentation and/or other materials provided with the distribution. +// +// - Neither the name of the NORDUnet nor the names of its contributors may +// be used to endorse or promote products derived from this software +// without specific prior written permission. +// +// THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS +// IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED +// TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A +// PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT +// HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, +// SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED +// TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR +// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF +// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING +// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS +// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. +// +//====================================================================== + +always @(posedge clk) + // + if (io_fsm_state == IO_FSM_STATE_STOP_X) begin + if (opcode_is_ladder_init) begin + $display("[step] | D | P | Q"); + $display("-------+---+---+---"); + end else if (opcode_is_ladder_step) + $display("[%4d] | %d | %d | %d", ladder_index, ladder_d_r, ladder_p_r, ladder_q_r); + end From git at cryptech.is Mon Jan 20 21:18:18 2020 From: git at cryptech.is (git at cryptech.is) Date: Mon, 20 Jan 2020 21:18:18 +0000 Subject: [Cryptech-Commits] [user/shatov/modexpng] 17/21: Update DSP wrapper instance names. In-Reply-To: <157955508198.72689.15683234965619703441@bikeshed.cryptech.is> References: <157955508198.72689.15683234965619703441@bikeshed.cryptech.is> Message-ID: <20200120211843.A74AC8BC60B@bikeshed.cryptech.is> This is an automated email from the git hooks/post-receive script. meisterpaul1 at yandex.ru pushed a commit to branch master in repository user/shatov/modexpng. commit e68e11a1e47b09d4cca6fb6d5ecac0feb95b0ba8 Author: Pavel V. Shatov (Meister) AuthorDate: Tue Jan 21 00:10:03 2020 +0300 Update DSP wrapper instance names. --- rtl/modexpng_dsp_array_block.v | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/rtl/modexpng_dsp_array_block.v b/rtl/modexpng_dsp_array_block.v index 89110f0..daee814 100644 --- a/rtl/modexpng_dsp_array_block.v +++ b/rtl/modexpng_dsp_array_block.v @@ -40,7 +40,7 @@ module modexpng_dsp_array_block `include "modexpng_parameters.vh" `include "modexpng_dsp48e1.vh" - `include "modexpng_dsp_slice_primitive.vh" + `include "modexpng_dsp_slice_primitives.vh" input clk; @@ -77,7 +77,7 @@ module modexpng_dsp_array_block // begin : gen_DSP48E1 // - `MODEXPNG_DSP_SLICE # + `MODEXPNG_DSP_SLICE_MULT # ( .AB_INPUT("DIRECT"), .B_REG(2) @@ -109,7 +109,7 @@ module modexpng_dsp_array_block .casc_b_out (casc_b[z]) ); // - `MODEXPNG_DSP_SLICE # + `MODEXPNG_DSP_SLICE_MULT # ( .AB_INPUT("CASCADE"), .B_REG(1) @@ -145,7 +145,7 @@ module modexpng_dsp_array_block // endgenerate - `MODEXPNG_DSP_SLICE # + `MODEXPNG_DSP_SLICE_MULT # ( .AB_INPUT("DIRECT"), .B_REG(2) From git at cryptech.is Mon Jan 20 21:18:19 2020 From: git at cryptech.is (git at cryptech.is) Date: Mon, 20 Jan 2020 21:18:19 +0000 Subject: [Cryptech-Commits] [user/shatov/modexpng] 18/21: Refactored MMM recombinator module, accomodated the changes in DSP slice wrapper names. In-Reply-To: <157955508198.72689.15683234965619703441@bikeshed.cryptech.is> References: <157955508198.72689.15683234965619703441@bikeshed.cryptech.is> Message-ID: <20200120211845.3A15F8BC5E6@bikeshed.cryptech.is> This is an automated email from the git hooks/post-receive script. meisterpaul1 at yandex.ru pushed a commit to branch master in repository user/shatov/modexpng. commit c6029af4482192c3e25ef0f6561bfbefa76e75ec Author: Pavel V. Shatov (Meister) AuthorDate: Tue Jan 21 00:11:42 2020 +0300 Refactored MMM recombinator module, accomodated the changes in DSP slice wrapper names. --- rtl/modexpng_recombinator_block.v | 1244 ++++++++++++++++++------------------- rtl/modexpng_recombinator_cell.v | 94 ++- 2 files changed, 676 insertions(+), 662 deletions(-) diff --git a/rtl/modexpng_recombinator_block.v b/rtl/modexpng_recombinator_block.v index 077ae47..f6e23e5 100644 --- a/rtl/modexpng_recombinator_block.v +++ b/rtl/modexpng_recombinator_block.v @@ -53,58 +53,58 @@ module modexpng_recombinator_block `include "modexpng_mmm_dual_fsm.vh" - input clk; - input rst_n; - input ena; - output rdy; - input [MMM_FSM_STATE_W-1:0] fsm_state_next; - input [7:0] word_index_last; - input dsp_xy_ce_p; - input [9*47-1:0] dsp_x_p; - input [9*47-1:0] dsp_y_p; - input [ 4:0] col_index; - input [ 4:0] col_index_last; + input clk; + input rst_n; + input ena; + output rdy; + input [MMM_FSM_STATE_W -1:0] fsm_state_next; + input [ OP_ADDR_W -1:0] word_index_last; + input dsp_xy_ce_p; + input [ MAC_W * NUM_MULTS_AUX -1:0] dsp_x_p; + input [ MAC_W * NUM_MULTS_AUX -1:0] dsp_y_p; + input [ COL_INDEX_W -1:0] col_index; + input [ COL_INDEX_W -1:0] col_index_last; - input [ BANK_ADDR_W -1:0] rd_narrow_xy_bank; - input [ 7:0] rd_narrow_xy_addr; + input [ BANK_ADDR_W -1:0] rd_narrow_xy_bank; + input [ OP_ADDR_W -1:0] rd_narrow_xy_addr; - output [ BANK_ADDR_W -1:0] rcmb_wide_xy_bank; - output [ 7:0] rcmb_wide_xy_addr; - output [ 17:0] rcmb_wide_x_dout; - output [ 17:0] rcmb_wide_y_dout; - output rcmb_wide_xy_valid; + output [ BANK_ADDR_W -1:0] rcmb_wide_xy_bank; + output [ OP_ADDR_W -1:0] rcmb_wide_xy_addr; + output [ WORD_EXT_W -1:0] rcmb_wide_x_dout; + output [ WORD_EXT_W -1:0] rcmb_wide_y_dout; + output rcmb_wide_xy_valid; - output [ BANK_ADDR_W -1:0] rcmb_narrow_xy_bank; - output [ 7:0] rcmb_narrow_xy_addr; - output [ 17:0] rcmb_narrow_x_dout; - output [ 17:0] rcmb_narrow_y_dout; - output rcmb_narrow_xy_valid; + output [ BANK_ADDR_W -1:0] rcmb_narrow_xy_bank; + output [ OP_ADDR_W -1:0] rcmb_narrow_xy_addr; + output [ WORD_EXT_W -1:0] rcmb_narrow_x_dout; + output [ WORD_EXT_W -1:0] rcmb_narrow_y_dout; + output rcmb_narrow_xy_valid; - output [ BANK_ADDR_W -1:0] rdct_narrow_xy_bank; - output [ 7:0] rdct_narrow_xy_addr; - output [ 17:0] rdct_narrow_x_dout; - output [ 17:0] rdct_narrow_y_dout; - output rdct_narrow_xy_valid; + output [ BANK_ADDR_W -1:0] rdct_narrow_xy_bank; + output [ OP_ADDR_W -1:0] rdct_narrow_xy_addr; + output [ WORD_EXT_W -1:0] rdct_narrow_x_dout; + output [ WORD_EXT_W -1:0] rdct_narrow_y_dout; + output rdct_narrow_xy_valid; // // Latches // - reg [1*47-1:0] dsp_x_p_latch[0:8]; - reg [1*47-1:0] dsp_y_p_latch[0:8]; + reg [MAC_W-1:0] dsp_x_p_latch[0:NUM_MULTS_AUX-1]; + reg [MAC_W-1:0] dsp_y_p_latch[0:NUM_MULTS_AUX-1]; // // Mapping // - wire [46:0] dsp_x_p_split[0:8]; - wire [46:0] dsp_y_p_split[0:8]; + wire [MAC_W-1:0] dsp_x_p_split[0:NUM_MULTS_AUX-1]; + wire [MAC_W-1:0] dsp_y_p_split[0:NUM_MULTS_AUX-1]; genvar z; - generate for (z=0; z<(NUM_MULTS+1); z=z+1) + generate for (z=0; z {z[13:0], y[15:0], x[15:0]} + // + wire [WORD_W -3:0] din_z = din[3 * WORD_W -3 : 2 * WORD_W]; // [45:32] + wire [WORD_W -1:0] din_y = din[2 * WORD_W -1 : WORD_W]; // [31:16] + wire [WORD_W -1:0] din_x = din[ WORD_W -1 : 0]; // [15: 0] + + + // + // Delayed Clock Enable + // + reg ce_dly = 1'b0; + always @(posedge clk) ce_dly <= ce; + + + // + // DSP Slice Buses + // + wire [DSP48E1_A_W-1:0] a_int; + wire [DSP48E1_B_W-1:0] b_int; + wire [DSP48E1_C_W-1:0] c_int; + wire [DSP48E1_P_W-1:0] p_int; - reg [WORD_W -2:0] z; - reg [WORD_W :0] y; - reg [WORD_W +1:0] x; - - assign dout = x[WORD_W-1:0]; + assign {a_int, b_int} = {{(DSP48E1_C_W-WORD_W){1'b0}}, cin}; + assign {c_int} = {din_z, 1'b0, din_y, 1'b1, din_x}; + - wire [WORD_W -2:0] din_z = din[3*WORD_W -2 :2*WORD_W]; // [46:32] - wire [WORD_W -1:0] din_y = din[2*WORD_W -1 : WORD_W]; // [31:16] - wire [WORD_W -1:0] din_x = din[ WORD_W -1 : 0]; // [15: 0] + // + // Combinational OPMODE Switch + // + reg [DSP48E1_OPMODE_W-1:0] opmode; - always @(posedge clk) + always @(clr, cry) // - if (ce) begin - z <= din_z; - y <= clr ? {1'b0, din_y} : {1'b0, din_y} + {2'b00, z}; - x <= clr ? {2'b00, din_x} : {2'b00, din_x} + {1'b0, y} + {WORD_ZERO, x[WORD_EXT_W-1:WORD_W]}; - end + casez ({clr, cry}) // clr has priority over cry! + 2'b1?: opmode = DSP48E1_OPMODE_Z0_YC_X0; + 2'b00: opmode = DSP48E1_OPMODE_ZP17_YC_X0; + 2'b01: opmode = DSP48E1_OPMODE_ZP17_YC_XAB; + endcase + + + // + // DSP Slice Instance + // + `MODEXPNG_DSP_SLICE_ADDSUB dsp_inst + ( + .clk (clk), + .ce_abc (ce), + .ce_p (ce_dly), + .ce_ctrl (ce), + .x ({a_int, b_int}), + .y (c_int), + .p (p_int), + .op_mode (opmode), + .alu_mode (DSP48E1_ALUMODE_Z_PLUS_X_AND_Y_AND_CIN), + .carry_in_sel (DSP48E1_CARRYINSEL_CARRYIN), + .casc_p_in (), + .casc_p_out (), + .carry_out () + ); + + + // + // Output Mapping + // + assign dout = {p_int[WORD_W-1:0]}; + assign dout_ext = {p_int[WORD_W+1], dout}; + endmodule From git at cryptech.is Mon Jan 20 21:18:20 2020 From: git at cryptech.is (git at cryptech.is) Date: Mon, 20 Jan 2020 21:18:20 +0000 Subject: [Cryptech-Commits] [user/shatov/modexpng] 19/21: Refactored the MMM module, now uses meaningful constant names from the include file, not hardcoded widths. In-Reply-To: <157955508198.72689.15683234965619703441@bikeshed.cryptech.is> References: <157955508198.72689.15683234965619703441@bikeshed.cryptech.is> Message-ID: <20200120211852.B4AAA8BAF17@bikeshed.cryptech.is> This is an automated email from the git hooks/post-receive script. meisterpaul1 at yandex.ru pushed a commit to branch master in repository user/shatov/modexpng. commit b0cbd33df04f024a7dea928756f4937b79c91631 Author: Pavel V. Shatov (Meister) AuthorDate: Tue Jan 21 00:13:33 2020 +0300 Refactored the MMM module, now uses meaningful constant names from the include file, not hardcoded widths. --- rtl/modexpng_mmm_dual.v | 733 +++++++++++++++++++++++++----------------------- 1 file changed, 380 insertions(+), 353 deletions(-) diff --git a/rtl/modexpng_mmm_dual.v b/rtl/modexpng_mmm_dual.v index 8d8b83d..bb1a55c 100644 --- a/rtl/modexpng_mmm_dual.v +++ b/rtl/modexpng_mmm_dual.v @@ -94,72 +94,73 @@ module modexpng_mmm_dual // // Ports // - input clk; - input rst_n; + input clk; + input rst_n; - input ena; - output rdy; + input ena; + output rdy; - input ladder_mode; - input [7:0] word_index_last; - input [7:0] word_index_last_minus1; - input force_unity_b; - input only_reduce; - input just_multiply; + input ladder_mode; + input [ OP_ADDR_W -1:0] word_index_last; + input [ OP_ADDR_W -1:0] word_index_last_minus1; + input force_unity_b; + input only_reduce; + input just_multiply; - input [BANK_ADDR_W-1:0] sel_wide_in; - input [BANK_ADDR_W-1:0] sel_narrow_in; + input [BANK_ADDR_W -1:0] sel_wide_in; + input [BANK_ADDR_W -1:0] sel_narrow_in; - output rd_wide_xy_ena; - output rd_wide_xy_ena_aux; - output [ BANK_ADDR_W -1:0] rd_wide_xy_bank; - output [ BANK_ADDR_W -1:0] rd_wide_xy_bank_aux; - output [ 8*NUM_MULTS/2-1:0] rd_wide_xy_addr; - output [ 8-1:0] rd_wide_xy_addr_aux; - input [18*NUM_MULTS/2-1:0] rd_wide_x_din; - input [18*NUM_MULTS/2-1:0] rd_wide_y_din; - input [ 18-1:0] rd_wide_x_din_aux; - input [ 18-1:0] rd_wide_y_din_aux; + output rd_wide_xy_ena; + output rd_wide_xy_ena_aux; + output [BANK_ADDR_W -1:0] rd_wide_xy_bank; + output [BANK_ADDR_W -1:0] rd_wide_xy_bank_aux; + + output [ OP_ADDR_W * NUM_MULTS_HALF -1:0] rd_wide_xy_addr; + output [ OP_ADDR_W -1:0] rd_wide_xy_addr_aux; + input [ WORD_EXT_W * NUM_MULTS_HALF -1:0] rd_wide_x_din; + input [ WORD_EXT_W * NUM_MULTS_HALF -1:0] rd_wide_y_din; + input [ WORD_EXT_W -1:0] rd_wide_x_din_aux; + input [ WORD_EXT_W -1:0] rd_wide_y_din_aux; - output rd_narrow_xy_ena; - output [ BANK_ADDR_W -1:0] rd_narrow_xy_bank; - output [ 7:0] rd_narrow_xy_addr; - input [18-1:0] rd_narrow_x_din; - input [18-1:0] rd_narrow_y_din; + output rd_narrow_xy_ena; + output [BANK_ADDR_W -1:0] rd_narrow_xy_bank; + output [ OP_ADDR_W -1:0] rd_narrow_xy_addr; + input [ WORD_EXT_W -1:0] rd_narrow_x_din; + input [ WORD_EXT_W -1:0] rd_narrow_y_din; - output [BANK_ADDR_W -1:0] rcmb_wide_xy_bank; - output [ 7:0] rcmb_wide_xy_addr; - output [17:0] rcmb_wide_x_dout; - output [17:0] rcmb_wide_y_dout; - output rcmb_wide_xy_valid; + output [BANK_ADDR_W -1:0] rcmb_wide_xy_bank; + output [ OP_ADDR_W -1:0] rcmb_wide_xy_addr; + output [ WORD_EXT_W -1:0] rcmb_wide_x_dout; + output [ WORD_EXT_W -1:0] rcmb_wide_y_dout; + output rcmb_wide_xy_valid; - output [BANK_ADDR_W -1:0] rcmb_narrow_xy_bank; - output [ 7:0] rcmb_narrow_xy_addr; - output [17:0] rcmb_narrow_x_dout; - output [17:0] rcmb_narrow_y_dout; - output rcmb_narrow_xy_valid; + output [BANK_ADDR_W -1:0] rcmb_narrow_xy_bank; + output [ OP_ADDR_W -1:0] rcmb_narrow_xy_addr; + output [ WORD_EXT_W -1:0] rcmb_narrow_x_dout; + output [ WORD_EXT_W -1:0] rcmb_narrow_y_dout; + output rcmb_narrow_xy_valid; - output [BANK_ADDR_W -1:0] rcmb_xy_bank; - output [ 7:0] rcmb_xy_addr; - output [17:0] rcmb_x_dout; - output [17:0] rcmb_y_dout; - output rcmb_xy_valid; + output [BANK_ADDR_W -1:0] rcmb_xy_bank; + output [ OP_ADDR_W -1:0] rcmb_xy_addr; + output [ WORD_EXT_W -1:0] rcmb_x_dout; + output [ WORD_EXT_W -1:0] rcmb_y_dout; + output rcmb_xy_valid; - output rdct_ena; - input rdct_rdy; + output rdct_ena; + input rdct_rdy; // // FSM Declaration // - reg [MMM_FSM_STATE_W-1:0] fsm_state = MMM_FSM_STATE_IDLE; - reg [MMM_FSM_STATE_W-1:0] fsm_state_next; + reg [MMM_FSM_STATE_W -1:0] fsm_state = MMM_FSM_STATE_IDLE; + reg [MMM_FSM_STATE_W -1:0] fsm_state_next; - wire [MMM_FSM_STATE_W-1:0] fsm_state_after_idle; - wire [MMM_FSM_STATE_W-1:0] fsm_state_after_mult_square; - wire [MMM_FSM_STATE_W-1:0] fsm_state_after_mult_triangle; - wire [MMM_FSM_STATE_W-1:0] fsm_state_after_mult_rectangle; - wire [MMM_FSM_STATE_W-1:0] fsm_state_after_square_holdoff; + wire [MMM_FSM_STATE_W -1:0] fsm_state_after_idle; + wire [MMM_FSM_STATE_W -1:0] fsm_state_after_mult_square; + wire [MMM_FSM_STATE_W -1:0] fsm_state_after_mult_triangle; + wire [MMM_FSM_STATE_W -1:0] fsm_state_after_mult_rectangle; + wire [MMM_FSM_STATE_W -1:0] fsm_state_after_square_holdoff; // @@ -174,48 +175,55 @@ module modexpng_mmm_dual // // Storage Control Interface // - reg wide_xy_ena = 1'b0; - reg wide_xy_ena_aux = 1'b0; - reg [ BANK_ADDR_W -1:0] wide_xy_bank; - reg [ BANK_ADDR_W -1:0] wide_xy_bank_aux; - reg [ 8-1:0] wide_xy_addr[0:3]; - reg [ 8-1:0] wide_xy_addr_aux; + reg wide_xy_ena = 1'b0; + reg wide_xy_ena_aux = 1'b0; + reg [BANK_ADDR_W -1:0] wide_xy_bank; + reg [BANK_ADDR_W -1:0] wide_xy_bank_aux; + reg [ OP_ADDR_W -1:0] wide_xy_addr[0:NUM_MULTS_HALF-1]; + reg [ OP_ADDR_W -1:0] wide_xy_addr_aux; + + reg narrow_xy_ena = 1'b0; + reg [BANK_ADDR_W -1:0] narrow_xy_bank; + reg [ OP_ADDR_W -1:0] narrow_xy_addr; + reg [ OP_ADDR_W -1:0] narrow_xy_addr_dly; + wire [ OP_ADDR_W -1:0] narrow_xy_addr_inc = narrow_xy_addr + 1'b1; - reg narrow_xy_ena = 1'b0; - reg [ BANK_ADDR_W -1:0] narrow_xy_bank; - reg [ 7:0] narrow_xy_addr; - reg [ 7:0] narrow_xy_addr_dly; - assign rd_wide_xy_ena = wide_xy_ena; + // + // Outmap Port Mapping + // + assign rd_wide_xy_ena = wide_xy_ena; assign rd_wide_xy_ena_aux = wide_xy_ena_aux; - assign rd_wide_xy_bank = wide_xy_bank; + assign rd_wide_xy_bank = wide_xy_bank; assign rd_wide_xy_bank_aux = wide_xy_bank_aux; assign rd_wide_xy_addr_aux = wide_xy_addr_aux; - assign rd_narrow_xy_ena = narrow_xy_ena; - assign rd_narrow_xy_bank = narrow_xy_bank; - assign rd_narrow_xy_addr = narrow_xy_addr; + assign rd_narrow_xy_ena = narrow_xy_ena; + assign rd_narrow_xy_bank = narrow_xy_bank; + assign rd_narrow_xy_addr = narrow_xy_addr; genvar z; - generate for (z=0; z<(NUM_MULTS/2); z=z+1) + generate for (z=0; z 8'd0) - wide_xy_addr_next = wide_xy_addr_current - 1'b1; - else - wide_xy_addr_next = wide_xy_addr_last; - end + function [OP_ADDR_W-1:0] wide_xy_addr_next; + input [OP_ADDR_W-1:0] wide_xy_addr_current; + input [OP_ADDR_W-1:0] wide_xy_addr_last; + if (wide_xy_addr_current > OP_ADDR_ZERO) wide_xy_addr_next = wide_xy_addr_current - 1'b1; + else wide_xy_addr_next = wide_xy_addr_last; endfunction integer j; @@ -459,128 +460,143 @@ module modexpng_mmm_dual // // Wide Address // - for (j=0; j<(NUM_MULTS/2); j=j+1) + for (j=0; j References: <157955508198.72689.15683234965619703441@bikeshed.cryptech.is> Message-ID: <20200120211853.2DE848BAF0F@bikeshed.cryptech.is> This is an automated email from the git hooks/post-receive script. meisterpaul1 at yandex.ru pushed a commit to branch master in repository user/shatov/modexpng. commit c4bee71625c4fc9f15fdd8c6ca6de98fb6131bab Author: Pavel V. Shatov (Meister) AuthorDate: Tue Jan 21 00:16:39 2020 +0300 Cosmetic change to easily switch tests on/off. --- bench/tb_core_full_512.v | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/bench/tb_core_full_512.v b/bench/tb_core_full_512.v index c3a62ab..f17b56c 100644 --- a/bench/tb_core_full_512.v +++ b/bench/tb_core_full_512.v @@ -273,7 +273,7 @@ module tb_core_full_512; sync_clk_bus; // switch to slow bus clock core_set_input; // write to core input banks - + /*//*/ sync_clk; // switch to fast core clock core_set_crt_mode(1); // enable CRT signing core_pulse_next; // assert 'next' bit for one cycle @@ -282,7 +282,7 @@ module tb_core_full_512; sync_clk_bus; // switch to slow bus clock core_get_output; // read from core output banks core_verify_output; // check, whether core output matches precomputed known good refrence values - + /*//*/ sync_clk; // switch to fast core clock core_set_crt_mode(0); // disable CRT signing core_pulse_next; // assert 'next' bit for one cycle @@ -291,6 +291,7 @@ module tb_core_full_512; sync_clk_bus; // switch to slow bus clock core_get_output; // read from core output banks core_verify_output; // check, whether core output matches precomputed known good refrence values + /*//*/ end endtask @@ -458,7 +459,6 @@ module tb_core_full_512; // endtask - // // _bus_drive() From git at cryptech.is Mon Jan 20 21:18:22 2020 From: git at cryptech.is (git at cryptech.is) Date: Mon, 20 Jan 2020 21:18:22 +0000 Subject: [Cryptech-Commits] [user/shatov/modexpng] 21/21: Bump version number. In-Reply-To: <157955508198.72689.15683234965619703441@bikeshed.cryptech.is> References: <157955508198.72689.15683234965619703441@bikeshed.cryptech.is> Message-ID: <20200120211856.C7B858BC24E@bikeshed.cryptech.is> This is an automated email from the git hooks/post-receive script. meisterpaul1 at yandex.ru pushed a commit to branch master in repository user/shatov/modexpng. commit 97d5b79954e5b8cab118c6a469b38835450e73a8 Author: Pavel V. Shatov (Meister) AuthorDate: Tue Jan 21 00:17:22 2020 +0300 Bump version number. --- rtl/modexpng_wrapper.v | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/rtl/modexpng_wrapper.v b/rtl/modexpng_wrapper.v index 0af6c32..53e3ede 100644 --- a/rtl/modexpng_wrapper.v +++ b/rtl/modexpng_wrapper.v @@ -95,9 +95,9 @@ module modexpng_wrapper // // Default Values // - `define MODEXPNG_DEFAULT_NAME0 32'h6D6F6465 - `define MODEXPNG_DEFAULT_NAME1 32'h78706E67 - `define MODEXPNG_DEFAULT_VERSION 32'h302E3130 + `define MODEXPNG_DEFAULT_NAME0 32'h6D6F6465 // "mode" + `define MODEXPNG_DEFAULT_NAME1 32'h78706E67 // "xpng" + `define MODEXPNG_DEFAULT_VERSION 32'h302E3230 // "0.20" `define MODEXPNG_DEFAULT_CONTROL 1'b0 `define MODEXPNG_DEFAULT_MODE 1'b0 @@ -118,9 +118,9 @@ module modexpng_wrapper // // Register Values // - localparam CORE_NAME0 = `MODEXPNG_DEFAULT_NAME0; // "mode" - localparam CORE_NAME1 = `MODEXPNG_DEFAULT_NAME1; // "xpng" - localparam CORE_VERSION = `MODEXPNG_DEFAULT_VERSION; // "0.10" + localparam CORE_NAME0 = `MODEXPNG_DEFAULT_NAME0; + localparam CORE_NAME1 = `MODEXPNG_DEFAULT_NAME1; + localparam CORE_VERSION = `MODEXPNG_DEFAULT_VERSION; // From git at cryptech.is Tue Jan 21 08:42:47 2020 From: git at cryptech.is (git at cryptech.is) Date: Tue, 21 Jan 2020 08:42:47 +0000 Subject: [Cryptech-Commits] [sw/stm32] branch fmc_clk updated (4ac1beb -> 723766c) Message-ID: <157959616715.90321.5362674346714113817@bikeshed.cryptech.is> This is an automated email from the git hooks/post-receive script. meisterpaul1 at yandex.ru pushed a change to branch fmc_clk in repository sw/stm32. from 4ac1beb Rebase branch 'fmc_clk' of git.cryptech.is:sw/stm32 from master new fbce363 Updated FMC initialization code to match changes in FMC arbiter. new 723766c New FMC settings for STM32. The 2 revisions listed above as "new" are entirely new to this repository and will be described in separate emails. The revisions listed as "add" were already present in the repository and have only been added to this reference. Summary of changes: stm-fmc.c | 51 +++++++++++++++++++++++++++------------------------ 1 file changed, 27 insertions(+), 24 deletions(-) From git at cryptech.is Tue Jan 21 08:42:48 2020 From: git at cryptech.is (git at cryptech.is) Date: Tue, 21 Jan 2020 08:42:48 +0000 Subject: [Cryptech-Commits] [sw/stm32] 01/02: Updated FMC initialization code to match changes in FMC arbiter. In-Reply-To: <157959616715.90321.5362674346714113817@bikeshed.cryptech.is> References: <157959616715.90321.5362674346714113817@bikeshed.cryptech.is> Message-ID: <20200121084247.AAFBB8BAF0F@bikeshed.cryptech.is> This is an automated email from the git hooks/post-receive script. meisterpaul1 at yandex.ru pushed a commit to branch fmc_clk in repository sw/stm32. commit fbce36388c455b9633a25eb4ce872940ae6f72fa Author: Pavel V. Shatov (Meister) AuthorDate: Thu Jan 31 21:36:40 2019 +0300 Updated FMC initialization code to match changes in FMC arbiter. --- stm-fmc.c | 47 +++++++++++++++++++++++++---------------------- 1 file changed, 25 insertions(+), 22 deletions(-) diff --git a/stm-fmc.c b/stm-fmc.c index 58df6fe..62b6433 100644 --- a/stm-fmc.c +++ b/stm-fmc.c @@ -62,7 +62,7 @@ void fmc_init(void) */ GPIO_InitStruct.Pin = GPIO_PIN_6; GPIO_InitStruct.Mode = GPIO_MODE_INPUT; - GPIO_InitStruct.Pull = GPIO_PULLUP; + GPIO_InitStruct.Pull = GPIO_PULLDOWN; HAL_GPIO_Init(GPIOD, &GPIO_InitStruct); fmc_af_gpio(GPIOE, GPIO_PIN_2 @@ -152,10 +152,10 @@ void fmc_init(void) // not needed, since nwait will be polled manually fmc_timing.BusTurnAroundDuration = 0; - // use smallest allowed divisor for best performance - fmc_timing.CLKDivision = 2; + // use 45 MHz to match what FMC arbiter in the FPGA expects + fmc_timing.CLKDivision = 4; - // use min suitable for fastest transfer + // use 4 to match what FMC arbiter in the FPGA expects fmc_timing.DataLatency = 4; // don't care in sync mode @@ -164,22 +164,25 @@ void fmc_init(void) // initialize fmc HAL_SRAM_Init(&_fmc_fpga_inst, &fmc_timing, NULL); - // STM32 only enables FMC clock right before the very first read/write - // access. FPGA takes certain time (<= 100 us) to lock its PLL to this frequency, - // so a certain number of initial FMC transactions may be missed. One read transaction - // takes ~0.1 us (9 ticks @ 90 MHz), so doing 1000 dummy reads will make sure, that FPGA - // has already locked its PLL and is ready. Another way around is to repeatedly read - // some register that is guaranteed to have known value until reading starts returning - // correct data. - - // to prevent compiler from optimizing this away, we pretent we're calculating sum - int cyc; - uint32_t sum; - volatile uint32_t part; - - for (cyc=0; cyc<1000; cyc++) - { - part = *(__IO uint32_t *)FMC_FPGA_BASE_ADDR; - sum += part; - } + // STM32 only enables FMC clock right before the very first read/write + // access. FPGA takes certain time (<= 100 us) to lock its PLL to this frequency, + // so a certain number of initial FMC transactions may be missed. One read transaction + // takes ~0.22 us (10 ticks @ 45 MHz), so doing ~500 dummy reads will make sure, that FPGA + // has already locked its PLL and is ready. Another way around is to repeatedly read + // some register that is guaranteed to have known value until reading starts returning + // correct data. + + // to prevent compiler from optimizing this away, we pretent we're calculating sum + int cyc; + uint32_t sum = 0; + volatile uint32_t part; + + for (cyc=0; cyc<500; cyc++) + { + part = *(__IO uint32_t *)FMC_FPGA_BASE_ADDR; + sum += part; + } + + // dummy write to read-only address to not let the compiler remove the above loop + *(__IO uint32_t *)FMC_FPGA_BASE_ADDR = sum; } From git at cryptech.is Tue Jan 21 08:42:49 2020 From: git at cryptech.is (git at cryptech.is) Date: Tue, 21 Jan 2020 08:42:49 +0000 Subject: [Cryptech-Commits] [sw/stm32] 02/02: New FMC settings for STM32. In-Reply-To: <157959616715.90321.5362674346714113817@bikeshed.cryptech.is> References: <157959616715.90321.5362674346714113817@bikeshed.cryptech.is> Message-ID: <20200121084248.4B24C8BAF10@bikeshed.cryptech.is> This is an automated email from the git hooks/post-receive script. meisterpaul1 at yandex.ru pushed a commit to branch fmc_clk in repository sw/stm32. commit 723766c08127d3f489d90ea94bc090d9481bbebb Author: Pavel V. Shatov (Meister) AuthorDate: Tue Jan 21 11:41:07 2020 +0300 New FMC settings for STM32. --- stm-fmc.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/stm-fmc.c b/stm-fmc.c index 62b6433..b5e79e8 100644 --- a/stm-fmc.c +++ b/stm-fmc.c @@ -149,14 +149,14 @@ void fmc_init(void) // don't care in sync mode fmc_timing.DataSetupTime = 255; - // not needed, since nwait will be polled manually + // not needed fmc_timing.BusTurnAroundDuration = 0; // use 45 MHz to match what FMC arbiter in the FPGA expects - fmc_timing.CLKDivision = 4; + fmc_timing.CLKDivision = 4; // 180/4 - // use 4 to match what FMC arbiter in the FPGA expects - fmc_timing.DataLatency = 4; + // use 6 to match what FMC arbiter in the FPGA expects + fmc_timing.DataLatency = 6; // don't care in sync mode fmc_timing.AccessMode = FMC_ACCESS_MODE_A; From git at cryptech.is Tue Jan 21 14:59:18 2020 From: git at cryptech.is (git at cryptech.is) Date: Tue, 21 Jan 2020 14:59:18 +0000 Subject: [Cryptech-Commits] [user/ln5/stm32-avalanche-noise] 01/02: Add version and application info to ELF file In-Reply-To: <157961875773.4592.333335421808050787@bikeshed.cryptech.is> References: <157961875773.4592.333335421808050787@bikeshed.cryptech.is> Message-ID: <20200121145918.22EF38BAF0F@bikeshed.cryptech.is> This is an automated email from the git hooks/post-receive script. linus at nordberg.se pushed a commit to branch ln/devel in repository user/ln5/stm32-avalanche-noise. commit 49d39287252bf0bd2b3fd75e86d07616b56c7fd2 Author: Linus Nordberg AuthorDate: Thu Dec 19 12:51:30 2019 +0100 Add version and application info to ELF file --- src/cc20rng/Makefile | 18 +++++++++++++++++- src/entropy/Makefile | 18 +++++++++++++++++- 2 files changed, 34 insertions(+), 2 deletions(-) diff --git a/src/cc20rng/Makefile b/src/cc20rng/Makefile index b3627b6..c52bc52 100644 --- a/src/cc20rng/Makefile +++ b/src/cc20rng/Makefile @@ -1,3 +1,19 @@ +# Application number and version info are included in the ELF file +# symbol table through two global, asbolute symbols. Use nm(1) on an +# ELF file to figure things out. Example: +# nm cc20rng.elf | awk '/VERSION/{print $1}' + +# Application number: +# - cc20rng: 01 +# - entropy: 02 +APPLICATION = 0x00000001 + +# Version info is four octects, from most significant to least: +# - Major version, two bytes: 00..FFFF +# - Minor version, one byte: 00..FF +# - Patch level, one byte: 00..FF +VERSION = 0x00010000 + # put your *.o targets here, make should handle the rest! SRCS = main.c stm_init.c system_stm32f4xx.c stm32f4xx_it.c stm32f4xx_hal_msp.c cc20_prng.c @@ -23,7 +39,7 @@ lib: proj: $(PROJ_NAME).elf $(PROJ_NAME).elf: $(SRCS) - $(CC) $(CFLAGS) $^ -o $@ -L$(STD_PERIPH_LIB) -lstmf4 -L$(LDSCRIPT_INC) -T$(MCU_LINKSCRIPT) -g + $(CC) $(CFLAGS) $^ -o $@ -L$(STD_PERIPH_LIB) -lstmf4 -L$(LDSCRIPT_INC) -T$(MCU_LINKSCRIPT) -g -Wl,--defsym=APPLICATION=$(APPLICATION) -Wl,--defsym=VERSION=$(VERSION) $(OBJCOPY) -O ihex $(PROJ_NAME).elf $(PROJ_NAME).hex $(OBJCOPY) -O binary $(PROJ_NAME).elf $(PROJ_NAME).bin $(OBJDUMP) -St $(PROJ_NAME).elf >$(PROJ_NAME).lst diff --git a/src/entropy/Makefile b/src/entropy/Makefile index 68723b8..0285460 100644 --- a/src/entropy/Makefile +++ b/src/entropy/Makefile @@ -1,3 +1,19 @@ +# Application number and version info are included in the ELF file +# symbol table through two global, asbolute symbols. Use nm(1) on an +# ELF file to figure things out. Example: +# nm entropy.elf | awk '/VERSION/{print $1}' + +# Application number: +# - cc20rng: 01 +# - entropy: 02 +APPLICATION = 0x00000002 + +# Version info is four octects, from most significant to least: +# - Major version, two bytes: 00..FFFF +# - Minor version, one byte: 00..FF +# - Patch level, one byte: 00..FF +VERSION = 0x00010000 + # put your *.o targets here, make should handle the rest! SRCS = main.c stm_init.c system_stm32f4xx.c stm32f4xx_it.c stm32f4xx_hal_msp.c @@ -23,7 +39,7 @@ lib: proj: $(PROJ_NAME).elf $(PROJ_NAME).elf: $(SRCS) - $(CC) $(CFLAGS) $^ -o $@ -L$(STD_PERIPH_LIB) -lstmf4 -L$(LDSCRIPT_INC) -T$(MCU_LINKSCRIPT) -g + $(CC) $(CFLAGS) $^ -o $@ -L$(STD_PERIPH_LIB) -lstmf4 -L$(LDSCRIPT_INC) -T$(MCU_LINKSCRIPT) -g -Wl,--defsym=APPLICATION=$(APPLICATION) -Wl,--defsym=VERSION=$(VERSION) $(OBJCOPY) -O ihex $(PROJ_NAME).elf $(PROJ_NAME).hex $(OBJCOPY) -O binary $(PROJ_NAME).elf $(PROJ_NAME).bin $(OBJDUMP) -St $(PROJ_NAME).elf >$(PROJ_NAME).lst From git at cryptech.is Tue Jan 21 14:59:17 2020 From: git at cryptech.is (git at cryptech.is) Date: Tue, 21 Jan 2020 14:59:17 +0000 Subject: [Cryptech-Commits] [user/ln5/stm32-avalanche-noise] branch ln/devel updated (6f4f2af -> e775954) Message-ID: <157961875773.4592.333335421808050787@bikeshed.cryptech.is> This is an automated email from the git hooks/post-receive script. linus at nordberg.se pushed a change to branch ln/devel in repository user/ln5/stm32-avalanche-noise. from 6f4f2af Don't find | rm for target 'clean' new 49d3928 Add version and application info to ELF file new e775954 Update comments for later revisions The 2 revisions listed above as "new" are entirely new to this repository and will be described in separate emails. The revisions listed as "add" were already present in the repository and have only been added to this reference. Summary of changes: common.mk | 4 ++-- src/cc20rng/Makefile | 18 +++++++++++++++++- src/entropy/Makefile | 18 +++++++++++++++++- 3 files changed, 36 insertions(+), 4 deletions(-) From git at cryptech.is Tue Jan 21 14:59:19 2020 From: git at cryptech.is (git at cryptech.is) Date: Tue, 21 Jan 2020 14:59:19 +0000 Subject: [Cryptech-Commits] [user/ln5/stm32-avalanche-noise] 02/02: Update comments for later revisions In-Reply-To: <157961875773.4592.333335421808050787@bikeshed.cryptech.is> References: <157961875773.4592.333335421808050787@bikeshed.cryptech.is> Message-ID: <20200121145918.CEA2E8BAF10@bikeshed.cryptech.is> This is an automated email from the git hooks/post-receive script. linus at nordberg.se pushed a commit to branch ln/devel in repository user/ln5/stm32-avalanche-noise. commit e775954924a390e2e8c9fa5e4c0d16f96e9c6fe5 Author: Linus Nordberg AuthorDate: Tue Jan 21 15:51:33 2020 +0100 Update comments for later revisions --- common.mk | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/common.mk b/common.mk index 9006feb..0d4ecdd 100644 --- a/common.mk +++ b/common.mk @@ -35,12 +35,12 @@ OPENOCD_PROC_FILE ?= stm32f4discovery.cfg # MCU selection parameters # -# For the rev08 or rev09 board with an STM32F401RBT6, set -DSTM32F401xC +# For rev08 and later boards with an STM32F401RBT6, set -DSTM32F401xC # For Nucleo board with an STM32F401RET6, set -DSTM32F401xE # STDPERIPH_SETTINGS ?= -DUSE_STDPERIPH_DRIVER -DSTM32F4XX -DSTM32F401xC # -# For the rev08 or rev09 board (hardware compatible), use rev08.ld. +# For rev08 and later boards (hardware compatible), use rev08.ld. # For the Nucleo board, use nucleo.ld. MCU_LINKSCRIPT ?= rev08.ld From git at cryptech.is Thu Jan 23 09:01:31 2020 From: git at cryptech.is (git at cryptech.is) Date: Thu, 23 Jan 2020 09:01:31 +0000 Subject: [Cryptech-Commits] [core/platform/alpha] branch fmc_clk_core created (now c85809c) Message-ID: <157977009118.5102.9555264409384673553@bikeshed.cryptech.is> This is an automated email from the git hooks/post-receive script. meisterpaul1 at yandex.ru pushed a change to branch fmc_clk_core in repository core/platform/alpha. at c85809c Out of curiosity I tried compiling the bitstream with Vivado. These constraints may come handy if you're brave enough to try this at home. This branch includes the following new commits: new 18c0a8e New Alpha platform with three clocks: * 45 MHz (aka "io_clk") is the I/O clock for the FMC bus * 90 MHz (aka "sys_clk") is the system clock for all the cores * 180 MHz (aka "core_clk") is the high-speed clock for high-performance cores new 923e7d3 Bumped version number. new 7bf84ba Testbench for the new clock manager. new d6f47a4 Tweak the Makefile to match the new Alpha platform. new fbf287f This commit turns off the "equivalent_register_removal" setting for XST. new 3535924 Changed FMC I/O frequency to 45 MHz in the UCF file to match hardware and also updated offset constraint values accordingly. new c85809c Out of curiosity I tried compiling the bitstream with Vivado. These constraints may come handy if you're brave enough to try this at home. The 7 revisions listed above as "new" are entirely new to this repository and will be described in separate emails. The revisions listed as "add" were already present in the repository and have only been added to this reference. From git at cryptech.is Thu Jan 23 09:01:32 2020 From: git at cryptech.is (git at cryptech.is) Date: Thu, 23 Jan 2020 09:01:32 +0000 Subject: [Cryptech-Commits] [core/platform/alpha] 01/07: New Alpha platform with three clocks: * 45 MHz (aka "io_clk") is the I/O clock for the FMC bus * 90 MHz (aka "sys_clk") is the system clock for all the cores * 180 MHz (aka "core_clk") is the high-speed clock for high-performance cores In-Reply-To: <157977009118.5102.9555264409384673553@bikeshed.cryptech.is> References: <157977009118.5102.9555264409384673553@bikeshed.cryptech.is> Message-ID: <20200123090132.175F48BAF0F@bikeshed.cryptech.is> This is an automated email from the git hooks/post-receive script. meisterpaul1 at yandex.ru pushed a commit to branch fmc_clk_core in repository core/platform/alpha. commit 18c0a8e73bacd2ccbb9089d7aa220290e95dcb76 Author: Pavel V. Shatov (Meister) AuthorDate: Tue Jan 21 15:36:05 2020 +0300 New Alpha platform with three clocks: * 45 MHz (aka "io_clk") is the I/O clock for the FMC bus * 90 MHz (aka "sys_clk") is the system clock for all the cores * 180 MHz (aka "core_clk") is the high-speed clock for high-performance cores --- rtl/alpha_clkmgr.v | 182 ++++++++++++++++----------------- rtl/alpha_fmc_top.v | 272 ++++++++++++++++++++++++------------------------- rtl/clkmgr_mmcm.v | 241 ++++++++++++++++++++----------------------- rtl/clkmgr_mmcm_ctrl.v | 117 +++++++++++++++++++++ rtl/clkmgr_reset_gen.v | 69 +++++++++++++ 5 files changed, 518 insertions(+), 363 deletions(-) diff --git a/rtl/alpha_clkmgr.v b/rtl/alpha_clkmgr.v index 5c4099e..0f5f912 100644 --- a/rtl/alpha_clkmgr.v +++ b/rtl/alpha_clkmgr.v @@ -7,7 +7,7 @@ // // // Author: Pavel Shatov -// Copyright (c) 2016, 2018 NORDUnet A/S All rights reserved. +// Copyright (c) 2016, 2018-2019 NORDUnet A/S All rights reserved. // // Redistribution and use in source and binary forms, with or without // modification, are permitted provided that the following conditions @@ -38,101 +38,101 @@ //====================================================================== module alpha_clkmgr - ( - input wire fmc_clk, // signal from clock pin - - output wire sys_clk, // buffered system clock output - output wire sys_rst_n // system reset output (async set, sync clear, active-low) - ); - - - // - // Settings - // - - /* - * fmc_clk is 90 MHz, sys_clk is also 90 MHz routed through an MMCM. - * - * VCO frequency is 1080 MHz. - * - */ - localparam CLK_OUT_MUL = 12.0; - localparam CLK_OUT_DIV = 12.0; - localparam CLK_OUT_PHI = 45.0; +( + input wire fmc_clk, // signal from clock pin + output wire io_clk, // buffered i/o clock + output wire sys_clk, // buffered system clock output + output wire sys_rst_n, // system reset output (async set, sync clear, active-low) + output wire core_clk // buffered high speed core clock +); - // - // Wrapper for Xilinx-specific MMCM (Mixed Mode Clock Manager) primitive. - // - - wire gclk_int; // buffered input clock - wire mmcm_reset; // reset input - wire mmcm_locked; // output clock valid - wire gclk_missing; // input clock stopped - - clkmgr_mmcm # - ( - .CLK_OUT_MUL (CLK_OUT_MUL), // 2..64 - .CLK_OUT_DIV (CLK_OUT_DIV), // 1..128 - .CLK_OUT_PHI (CLK_OUT_PHI) // 0.0..360.0 - ) - mmcm - ( - .gclk_in (fmc_clk), - .reset_in (mmcm_reset), - - .gclk_out (gclk_int), - .gclk_missing_out (gclk_missing), - - .clk_out (sys_clk), - .clk_valid_out (mmcm_locked) - ); - - - - // - // MMCM Reset Logic - // - - /* MMCM should be reset on power-up and when the input clock is stopped. - * Note that MMCM requires active-high reset, so the shift register is - * preloaded with 1's and then gradually filled with 0's. - */ - - reg [15: 0] mmcm_rst_shreg = {16{1'b1}}; // 16-bit shift register - - always @(posedge gclk_int or posedge gclk_missing) - // - if (gclk_missing == 1'b1) - mmcm_rst_shreg <= {16{1'b1}}; - else - mmcm_rst_shreg <= {mmcm_rst_shreg[14:0], 1'b0}; - - assign mmcm_reset = mmcm_rst_shreg[15]; - - - // - // System Reset Logic - // - - /* System reset is asserted for 16 cycles whenever MMCM aquires lock. Note - * that system reset is active-low, so the shift register is preloaded with - * 0's and gradually filled with 1's. - */ - - reg [15: 0] sys_rst_shreg = {16{1'b0}}; // 16-bit shift register - - always @(posedge sys_clk or posedge gclk_missing or negedge mmcm_locked) - // - if ((gclk_missing == 1'b1) || (mmcm_locked == 1'b0)) - sys_rst_shreg <= {16{1'b0}}; - else if (mmcm_locked == 1'b1) - sys_rst_shreg <= {sys_rst_shreg[14:0], 1'b1}; - - assign sys_rst_n = sys_rst_shreg[15]; + + // + // Parameters + // + parameter integer CLK_CORE_MULT = 4; + + + // + // STARTUPE2 + // + wire cfg_mclk; // 65 MHz (+/- 50%) internal oscillator + wire cfg_eos; // end-of-startup flag + + STARTUPE2 # + ( + .SIM_CCLK_FREQ (0.0), // config clock frequency for simulation + .PROG_USR ("FALSE") // only used with encrypted bitstreams + ) + STARTUPE2_inst + ( + .CLK (1'b0), // no external clock + .CFGCLK (), // config clock (unused) + .CFGMCLK (cfg_mclk), // config clock + .EOS (cfg_eos), // end-of-startup flag + + .USRCCLKO (1'b0), // custom clock for configuration memory access (unused) + .USRCCLKTS (1'b0), // UG470 recommends this to be held low + .USRDONEO (1'b1), // custom value to drive onto DONE pin (unused) + .USRDONETS (1'b1), // tri-state DONE pin (unused) + + .GSR (1'b0), // UG470 recommends hardwiring this low + .GTS (1'b0), // UG470 recommends this to be tied low + + .PREQ (), // unused when PROG_USR is disabled + .PACK (1'b0), // only used when PROG_USR is enabled + + .KEYCLEARB (1'b1) // unused + ); + + + // + // Wrapper for Xilinx-specific MMCM (Mixed Mode Clock Manager) primitive. + // + wire mmcm_reset; // reset input + wire mmcm_locked; // output clock valid + + clkmgr_mmcm # + ( + .CLK_CORE_MULT(CLK_CORE_MULT) + ) + mmcm_inst + ( + .fmc_clk_in (fmc_clk), + .rst_in (mmcm_reset), + + .io_clk_out (io_clk), + .sys_clk_out (sys_clk), + .core_clk_out (core_clk), + .locked_out (mmcm_locked) + ); + + // + // MMCM Controller + // + clkmgr_mmcm_ctrl mmcm_ctrl_inst + ( + .clk_in (cfg_mclk), + .reset_n_in (cfg_eos), + .locked_in (mmcm_locked), + .reset_out (mmcm_reset) + ); + + + // + // System Reset Logic + // + clkmgr_reset_gen #(.SHREG_WIDTH(16)) reset_gen_inst + ( + .clk_in (sys_clk), + .locked_in (mmcm_locked), + .reset_n_out (sys_rst_n) + ); endmodule + //====================================================================== // EOF alpha_clkmgr.v diff --git a/rtl/alpha_fmc_top.v b/rtl/alpha_fmc_top.v index a07beee..09229b5 100644 --- a/rtl/alpha_fmc_top.v +++ b/rtl/alpha_fmc_top.v @@ -8,7 +8,7 @@ // // // Author: Pavel Shatov -// Copyright (c) 2016, 2018 NORDUnet A/S All rights reserved. +// Copyright (c) 2016, 2018-2019 NORDUnet A/S All rights reserved. // // Redistribution and use in source and binary forms, with or without // modification, are permitted provided that the following conditions @@ -38,147 +38,139 @@ // //====================================================================== -`timescale 1ns / 1ps - module alpha_fmc_top - ( - input wire gclk_pin, // 50 MHz - - input wire ct_noise, // cryptech avalanche noise circuit - - input wire fmc_clk, // clock - input wire [23: 0] fmc_a, // address - inout wire [31: 0] fmc_d, // data - input wire fmc_ne1, // chip select - input wire fmc_noe, // output enable - input wire fmc_nwe, // write enable - input wire fmc_nl, // latch enable - output wire fmc_nwait,// wait - - output wire mkm_sclk, - output wire mkm_cs_n, - input wire mkm_do, - output wire mkm_di, - - output wire [3: 0] led_pins // {red, yellow, green, blue} - ); - - - //---------------------------------------------------------------- - // Clock Manager - // - // Clock manager is used to buffer FMC_CLK and implement reset logic. - // ---------------------------------------------------------------- - wire sys_clk; // system clock (90 MHz) - wire sys_rst_n; // active-low reset - - alpha_clkmgr clkmgr - ( - .fmc_clk (fmc_clk), - - .sys_clk (sys_clk), - .sys_rst_n (sys_rst_n) - ); - - - - //---------------------------------------------------------------- - // FMC Arbiter - // - // FMC arbiter handles FMC accesses. - //---------------------------------------------------------------- - - wire [23: 0] sys_fmc_addr; // address - wire sys_fmc_wren; // write enable - wire sys_fmc_rden; // read enable - wire [31: 0] sys_fmc_dout; // data output (from STM32 to FPGA) - wire [31: 0] sys_fmc_din; // data input (from FPGA to STM32) - - fmc_arbiter # - ( - .NUM_ADDR_BITS(24) // change to 26 when Alpha is alive! - ) - fmc - ( - .fmc_a(fmc_a), - .fmc_d(fmc_d), - .fmc_ne1(fmc_ne1), - .fmc_nl(fmc_nl), - .fmc_nwe(fmc_nwe), - .fmc_noe(fmc_noe), - .fmc_nwait(fmc_nwait), - - .sys_clk(sys_clk), - - .sys_addr(sys_fmc_addr), - .sys_wr_en(sys_fmc_wren), - .sys_rd_en(sys_fmc_rden), - .sys_data_out(sys_fmc_dout), - .sys_data_in(sys_fmc_din) - ); - - - //---------------------------------------------------------------- - // LED Driver - // - // A simple utility LED driver that turns on the Alpha - // board LED when the FMC interface is active. - //---------------------------------------------------------------- - fmc_indicator led - ( - .sys_clk(sys_clk), - .sys_rst_n(sys_rst_n), - .fmc_active(sys_fmc_wren | sys_fmc_rden), - .led_out(led_pins[0]) - ); - - - //---------------------------------------------------------------- - // Core Selector - // - // This multiplexer is used to map different types of cores, such as - // hashes, RNGs and ciphers to different regions (segments) of memory. - //---------------------------------------------------------------- - - // A note on byte-swapping: - // STM32 is little-endian, while the register interface here is - // big-endian. The software reads and writes 32-bit integer values, - // which means transmitting the least significant byte first. Up to - // now, we've been doing byte-swapping in software, which is - // inefficient, especially for bulk data transfer. So now we're doing - // the byte-swapping in hardware. - - wire [31:0] write_data; - assign write_data = {sys_fmc_dout[7:0], sys_fmc_dout[15:8], sys_fmc_dout[23:16], sys_fmc_dout[31:24]}; - - wire [31 : 0] read_data; - assign sys_fmc_din = {read_data[7:0], read_data[15:8], read_data[23:16], read_data[31:24]}; - - core_selector cores - ( - .sys_clk(sys_clk), - .sys_rst_n(sys_rst_n), - - .sys_fmc_addr(sys_fmc_addr), - .sys_fmc_wr(sys_fmc_wren), - .sys_fmc_rd(sys_fmc_rden), - .sys_write_data(write_data), - .sys_read_data(read_data), - - .noise(ct_noise), - - .mkm_sclk(mkm_sclk), - .mkm_cs_n(mkm_cs_n), - .mkm_do(mkm_do), - .mkm_di(mkm_di) - ); - - - // - // Dummy assignment to bypass unconnected outpins pins check in BitGen - // +( +// input wire gclk_pin, // 50 MHz + + input wire ct_noise, // cryptech avalanche noise circuit + + input wire fmc_clk, // clock + input wire [23: 0] fmc_a, // address + inout wire [31: 0] fmc_d, // data + input wire fmc_ne1, // chip select + input wire fmc_noe, // output enable + input wire fmc_nwe, // write enable + input wire fmc_nl, // latch enable + output wire fmc_nwait,// wait + + output wire mkm_sclk, + output wire mkm_cs_n, + input wire mkm_do, + output wire mkm_di, + + output wire [3: 0] led_pins // {red, yellow, green, blue} +); + + + //---------------------------------------------------------------- + // Clock Manager + // + // Clock manager is used to buffer FMC_CLK and implement reset logic. + // ---------------------------------------------------------------- + wire io_clk; // FMC I/O clock (45 MHz) + wire sys_clk; // system clock (90 MHz) + wire sys_rst_n; // active-low system reset + wire core_clk; // high-speed core clock (45*N MHz) + + alpha_clkmgr #(.CLK_CORE_MULT(4)) clkmgr + ( + .fmc_clk (fmc_clk), + + .io_clk (io_clk), + .sys_clk (sys_clk), + .sys_rst_n (sys_rst_n), + .core_clk (core_clk) + ); + + + + //---------------------------------------------------------------- + // FMC Arbiter + // + // FMC arbiter handles FMC accesses. + //---------------------------------------------------------------- + + wire [23: 0] sys_fmc_addr; // address + wire sys_fmc_wren; // write enable + wire sys_fmc_rden; // read enable + wire [31: 0] sys_fmc_dout; // data output (from STM32 to FPGA) + wire [31: 0] sys_fmc_din; // data input (from FPGA to STM32) + + fmc_arbiter # + ( + .NUM_ADDR_BITS(24) // change to 26 when Alpha is alive! + ) + fmc + ( + .fmc_a(fmc_a), + .fmc_d(fmc_d), + .fmc_ne1(fmc_ne1), + .fmc_nl(fmc_nl), + .fmc_nwe(fmc_nwe), + .fmc_noe(fmc_noe), + .fmc_nwait(fmc_nwait), + + .sys_clk(sys_clk), + .io_clk(io_clk), + + .sys_addr(sys_fmc_addr), + .sys_wr_en(sys_fmc_wren), + .sys_rd_en(sys_fmc_rden), + .sys_data_out(sys_fmc_dout), + .sys_data_in(sys_fmc_din) + ); + + + //---------------------------------------------------------------- + // LED Driver + // + // A simple utility LED driver that turns on the Alpha + // board LED when the FMC interface is active. + //---------------------------------------------------------------- + fmc_indicator led + ( + .sys_clk(sys_clk), + .sys_rst_n(sys_rst_n), + .fmc_active(sys_fmc_wren | sys_fmc_rden), + .led_out(led_pins[0]) + ); + + + //---------------------------------------------------------------- + // Core Selector + // + // This multiplexer is used to map different types of cores, such as + // hashes, RNGs and ciphers to different regions (segments) of memory. + //---------------------------------------------------------------- + + core_selector cores + ( + .sys_clk(sys_clk), + .sys_rst_n(sys_rst_n), + .core_clk(core_clk), + + .sys_fmc_addr(sys_fmc_addr), + .sys_fmc_wr(sys_fmc_wren), + .sys_fmc_rd(sys_fmc_rden), + .sys_write_data(sys_fmc_dout), + .sys_read_data(sys_fmc_din), + .sys_error(), + + .noise(ct_noise), + + .mkm_sclk(mkm_sclk), + .mkm_cs_n(mkm_cs_n), + .mkm_do(mkm_do), + .mkm_di(mkm_di), + + .debug() + ); + - assign led_pins[3:1] = 3'b000; + // + // Dummy assignment to get past the unconnected outpins pins check in BitGen + // + assign led_pins[3:1] = 3'b000; endmodule diff --git a/rtl/clkmgr_mmcm.v b/rtl/clkmgr_mmcm.v index 03b0747..afb5716 100644 --- a/rtl/clkmgr_mmcm.v +++ b/rtl/clkmgr_mmcm.v @@ -1,12 +1,12 @@ //====================================================================== // // clkmgr_mmcm.v -// --------------- +// ------------- // Xilinx MMCM primitive wrapper to avoid using Clocking Wizard IP core. // // // Author: Pavel Shatov -// Copyright (c) 2016, 2018, NORDUnet A/S All rights reserved. +// Copyright (c) 2019, NORDUnet A/S All rights reserved. // // Redistribution and use in source and binary forms, with or without // modification, are permitted provided that the following conditions @@ -37,144 +37,121 @@ //====================================================================== module clkmgr_mmcm - ( - input wire gclk_in, - input wire reset_in, - - output wire gclk_out, - output wire gclk_missing_out, +( + input fmc_clk_in, + input rst_in, + + output io_clk_out, + output sys_clk_out, + output core_clk_out, + output locked_out +); - output wire clk_out, - output wire clk_valid_out - ); - - - // - // Parameters - // - parameter CLK_OUT_MUL = 12.0; // multiply factor for output clock frequency (2..64) - parameter CLK_OUT_DIV = 12.0; // divide factor for output clock frequency (1..128) - parameter CLK_OUT_PHI = 45.0; // clock phase shift (0.0..360.0) - - - // - // IBUFG - // - - (* BUFFER_TYPE="NONE" *) - wire gclk_in_ibufg; - - IBUFG IBUFG_gclk - ( - .I (gclk_in), - .O (gclk_in_ibufg) - ); - - - // - // MMCME2_ADV - // - wire mmcm_clkout0; - wire mmcm_locked; - wire mmcm_clkfbout; - wire mmcm_clkfbout_bufg; - - MMCME2_ADV # - ( - .CLKIN1_PERIOD (11.111), - .REF_JITTER1 (0.010), - - .STARTUP_WAIT ("FALSE"), - .BANDWIDTH ("OPTIMIZED"), - .COMPENSATION ("ZHOLD"), - - .DIVCLK_DIVIDE (1), - - .CLKFBOUT_MULT_F (CLK_OUT_MUL), - .CLKFBOUT_PHASE (0.000), - .CLKFBOUT_USE_FINE_PS ("FALSE"), - - .CLKOUT0_DIVIDE_F (CLK_OUT_DIV), - .CLKOUT0_PHASE (CLK_OUT_PHI), - .CLKOUT0_USE_FINE_PS ("FALSE"), - .CLKOUT0_DUTY_CYCLE (0.500), - - .CLKOUT4_CASCADE ("FALSE") - ) - MMCME2_ADV_inst - ( - .CLKIN1 (gclk_in_ibufg), - .CLKIN2 (1'b0), - .CLKINSEL (1'b1), + parameter integer CLK_CORE_MULT = 4; - .CLKFBIN (mmcm_clkfbout_bufg), - .CLKFBOUT (mmcm_clkfbout), - .CLKFBOUTB (), - - .CLKINSTOPPED (gclk_missing_out), - .CLKFBSTOPPED (), + function integer lookup_vco_freq; + input integer core_mult; + case (core_mult) + 2: lookup_vco_freq = 990; + 3: lookup_vco_freq = 945; + 4: lookup_vco_freq = 1080; + 5: lookup_vco_freq = 900; + 6: lookup_vco_freq = 1080; + 7: lookup_vco_freq = 945; + 8: lookup_vco_freq = 1080; + endcase + endfunction + + localparam real FMC_CLK_PERIOD_F = 22.222; + localparam real FB_PHASE_F = 45.0; + localparam real SYS_FREQ_F = 90.0; + + localparam integer VCO_FREQ = lookup_vco_freq(CLK_CORE_MULT); + localparam integer IO_FREQ = 45; + localparam integer CORE_FREQ = CLK_CORE_MULT * IO_FREQ; - .CLKOUT0 (mmcm_clkout0), - .CLKOUT0B (), - - .CLKOUT1 (), - .CLKOUT1B (), - - .CLKOUT2 (), - .CLKOUT2B (), - - .CLKOUT3 (), - .CLKOUT3B (), - - .CLKOUT4 (), - .CLKOUT5 (), - .CLKOUT6 (), - - .DCLK (1'b0), - .DEN (1'b0), - .DWE (1'b0), - .DADDR (7'd0), - .DI (16'h0000), - .DO (), - .DRDY (), - - .PSCLK (1'b0), - .PSEN (1'b0), - .PSINCDEC (1'b0), - .PSDONE (), - - .LOCKED (mmcm_locked), - .PWRDWN (1'b0), - .RST (reset_in) - ); + localparam real VCO_FREQ_F = 1.0 * VCO_FREQ; + localparam real FMC_FREQ_F = 1.0 * IO_FREQ; + + localparam real FB_MULT_F = VCO_FREQ_F / FMC_FREQ_F; + localparam real CLK0_DIV_F = VCO_FREQ_F / SYS_FREQ_F; + + localparam integer CLK1_DIV = VCO_FREQ / IO_FREQ; + localparam integer CLK2_DIV = VCO_FREQ / CORE_FREQ; + wire fmc_clk_ibufg; + IBUFG IBUFG_fmc_clk + ( + .I(fmc_clk_in), + .O(fmc_clk_ibufg) + ); - // - // Mapping - // - assign gclk_out = gclk_in_ibufg; - assign clk_valid_out = mmcm_locked; - - - // - // BUFGs - // - BUFG BUFG_gclk - ( - .I (mmcm_clkout0), - .O (clk_out) + wire mmcm_clkout0; + wire mmcm_clkout1; + wire mmcm_clkout2; + wire mmcm_clkfbin; + wire mmcm_clkfbout; + MMCME2_BASE # + ( + .CLKIN1_PERIOD (FMC_CLK_PERIOD_F), + .CLKFBOUT_MULT_F (FB_MULT_F), + .CLKFBOUT_PHASE (FB_PHASE_F), + .CLKOUT0_DIVIDE_F (CLK0_DIV_F), + .CLKOUT1_DIVIDE (CLK1_DIV), + .CLKOUT2_DIVIDE (CLK2_DIV) + ) + MMCME2_BASE_io_clk + ( + .CLKIN1 (fmc_clk_ibufg), + .RST (rst_in), + .PWRDWN (1'b0), + + .LOCKED (locked_out), + + + .CLKFBOUT (mmcm_clkfbout), + .CLKFBOUTB (), + .CLKFBIN (mmcm_clkfbin), + + .CLKOUT0 (mmcm_clkout0), + .CLKOUT0B (), + .CLKOUT1 (mmcm_clkout1), + .CLKOUT1B (), + .CLKOUT2 (mmcm_clkout2), + .CLKOUT2B (), + .CLKOUT3 (), + .CLKOUT3B (), + .CLKOUT4 (), + .CLKOUT5 (), + .CLKOUT6 () ); - - BUFG BUFG_feedback - ( - .I (mmcm_clkfbout), - .O (mmcm_clkfbout_bufg) - ); - - + + BUFG BUFG_clkfb + ( + .I(mmcm_clkfbout), + .O(mmcm_clkfbin) + ); + + BUFG BUFG_clk_io + ( + .I(mmcm_clkout0), + .O(sys_clk_out) + ); + + BUFG BUFG_clk_sys + ( + .I(mmcm_clkout1), + .O(io_clk_out) + ); + + BUFG BUFG_clk_core + ( + .I(mmcm_clkout2), + .O(core_clk_out) + ); endmodule //====================================================================== -// EOF clkmgr_mmcm.v +// EOF clkmgr_mmcm_io.v //====================================================================== diff --git a/rtl/clkmgr_mmcm_ctrl.v b/rtl/clkmgr_mmcm_ctrl.v new file mode 100644 index 0000000..6d22291 --- /dev/null +++ b/rtl/clkmgr_mmcm_ctrl.v @@ -0,0 +1,117 @@ +//====================================================================== +// +// clkmgr_mmcm_ctrl.v +// ------------------ +// PLL reset generator and lock monitor. +// +// +// Author: Pavel Shatov +// Copyright (c) 2019, NORDUnet A/S All rights reserved. +// +// Redistribution and use in source and binary forms, with or without +// modification, are permitted provided that the following conditions +// are met: +// - Redistributions of source code must retain the above copyright +// notice, this list of conditions and the following disclaimer. +// +// - Redistributions in binary form must reproduce the above copyright +// notice, this list of conditions and the following disclaimer in the +// documentation and/or other materials provided with the distribution. +// +// - Neither the name of the NORDUnet nor the names of its contributors may +// be used to endorse or promote products derived from this software +// without specific prior written permission. +// +// THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS +// IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED +// TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A +// PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT +// HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, +// SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED +// TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR +// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF +// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING +// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS +// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. +// +//====================================================================== + +module clkmgr_mmcm_ctrl +( + input clk_in, + input reset_n_in, + input locked_in, + output reset_out +); + + // + // Timer Settings + // + localparam [13:0] tmr_zero = {14{1'b0}}; + localparam [13:0] tmr_last = {14{1'b1}}; + reg [13:0] tmr = tmr_zero; + wire [13:0] tmr_next = tmr + 1'b1; + wire [ 9:0] tmr_msb = tmr[13:4]; + + wire tmr_about_to_wrap = tmr == tmr_last; + wire tmr_msb_is_zero = tmr_msb == 10'd0; + + // + // Lock Counting and Monitoring + // + reg [1:0] pll_seen_locks = 2'd0; + wire [1:0] pll_seen_locks_next = (pll_seen_locks < 2'd3) ? pll_seen_locks + 1'b1 : pll_seen_locks; + + (* SHREG_EXTRACT="NO" *) + (* EQUIVALENT_REGISTER_REMOVAL="NO" *) + reg [3:0] pll_stable_lock_shreg = 4'h0; + wire pll_stable_lock = pll_stable_lock_shreg[3]; + + wire pll_needs_reset = (pll_seen_locks != 2'd1) || !pll_stable_lock; + + // + // Output Register + // + reg reset_out_r = 1'b1; + assign reset_out = reset_out_r; + + // + // Timer Increment/Wrap Logic + // + always @(posedge clk_in or negedge reset_n_in) + // + if (!reset_n_in) tmr <= tmr_zero; + else begin + if (!tmr_about_to_wrap) tmr <= tmr_next; + else if (pll_needs_reset) tmr <= tmr_zero; + end + + // + // Lock Counter + // + always @(posedge locked_in or posedge reset_out) + // + if (reset_out) pll_seen_locks <= 2'd0; + else pll_seen_locks <= pll_seen_locks_next; + + // + // Lock Monitor + // + always @(posedge clk_in or negedge locked_in) + // + if (!locked_in) pll_stable_lock_shreg <= 4'h0; + else pll_stable_lock_shreg <= {pll_stable_lock_shreg[2:0], 1'b1}; + + // + // Reset Generator + // + always @(posedge clk_in or negedge reset_n_in) + // + if (!reset_n_in) reset_out_r <= 1'b1; + else reset_out_r <= tmr_msb_is_zero; + +endmodule + +//====================================================================== +// EOF clkmgr_mmcm_ctrl.v +//====================================================================== diff --git a/rtl/clkmgr_reset_gen.v b/rtl/clkmgr_reset_gen.v new file mode 100644 index 0000000..9cc4401 --- /dev/null +++ b/rtl/clkmgr_reset_gen.v @@ -0,0 +1,69 @@ +//====================================================================== +// +// clkmgr_reset_gen.v +// ------------------ +// System reset generator. +// +// +// Author: Pavel Shatov +// Copyright (c) 2019, NORDUnet A/S All rights reserved. +// +// Redistribution and use in source and binary forms, with or without +// modification, are permitted provided that the following conditions +// are met: +// - Redistributions of source code must retain the above copyright +// notice, this list of conditions and the following disclaimer. +// +// - Redistributions in binary form must reproduce the above copyright +// notice, this list of conditions and the following disclaimer in the +// documentation and/or other materials provided with the distribution. +// +// - Neither the name of the NORDUnet nor the names of its contributors may +// be used to endorse or promote products derived from this software +// without specific prior written permission. +// +// THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS +// IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED +// TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A +// PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT +// HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, +// SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED +// TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR +// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF +// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING +// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS +// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. +// +//====================================================================== + +module clkmgr_reset_gen +( + input clk_in, + input locked_in, + output reset_n_out +); + + // + // Parameter + // + parameter integer SHREG_WIDTH = 16; + + // + // Shift Register + // + (* SHREG_EXTRACT="NO" *) + (* EQUIVALENT_REGISTER_REMOVAL="NO" *) + reg [SHREG_WIDTH-1:0] sys_rst_shreg = {SHREG_WIDTH{1'b0}}; + + always @(posedge clk_in or negedge locked_in) + // + if (!locked_in) sys_rst_shreg <= {SHREG_WIDTH{1'b0}}; + else sys_rst_shreg <= {sys_rst_shreg[SHREG_WIDTH-2:0], 1'b1}; + + assign reset_n_out = sys_rst_shreg[SHREG_WIDTH-1]; + +endmodule + +//====================================================================== +// EOF clkmgr_reset_gen.v +//====================================================================== From git at cryptech.is Thu Jan 23 09:01:33 2020 From: git at cryptech.is (git at cryptech.is) Date: Thu, 23 Jan 2020 09:01:33 +0000 Subject: [Cryptech-Commits] [core/platform/alpha] 02/07: Bumped version number. In-Reply-To: <157977009118.5102.9555264409384673553@bikeshed.cryptech.is> References: <157977009118.5102.9555264409384673553@bikeshed.cryptech.is> Message-ID: <20200123090132.B02478BAF10@bikeshed.cryptech.is> This is an automated email from the git hooks/post-receive script. meisterpaul1 at yandex.ru pushed a commit to branch fmc_clk_core in repository core/platform/alpha. commit 923e7d3b3cf7dd065c87fb33cab1f130bbf56206 Author: Pavel V. Shatov (Meister) AuthorDate: Tue Jan 21 15:41:14 2020 +0300 Bumped version number. --- rtl/alpha_regs.v | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/rtl/alpha_regs.v b/rtl/alpha_regs.v index ce17863..31cfbbf 100644 --- a/rtl/alpha_regs.v +++ b/rtl/alpha_regs.v @@ -68,7 +68,7 @@ module board_regs // Core ID constants. localparam CORE_NAME0 = 32'h414c5048; // "ALPH" localparam CORE_NAME1 = 32'h41202020; // "A " - localparam CORE_VERSION = 32'h302e3230; // "0.20" + localparam CORE_VERSION = 32'h302e3330; // "0.30" From git at cryptech.is Thu Jan 23 09:01:34 2020 From: git at cryptech.is (git at cryptech.is) Date: Thu, 23 Jan 2020 09:01:34 +0000 Subject: [Cryptech-Commits] [core/platform/alpha] 03/07: Testbench for the new clock manager. In-Reply-To: <157977009118.5102.9555264409384673553@bikeshed.cryptech.is> References: <157977009118.5102.9555264409384673553@bikeshed.cryptech.is> Message-ID: <20200123090133.229C58BAF0E@bikeshed.cryptech.is> This is an automated email from the git hooks/post-receive script. meisterpaul1 at yandex.ru pushed a commit to branch fmc_clk_core in repository core/platform/alpha. commit 7bf84ba5628678abc9d9ffa58c639eca7f3e095b Author: Pavel V. Shatov (Meister) AuthorDate: Tue Jan 21 15:43:40 2020 +0300 Testbench for the new clock manager. --- rtl/bench/tb_clkmgr.v | 38 ++++++++++++++++++++++++++++++++++++++ 1 file changed, 38 insertions(+) diff --git a/rtl/bench/tb_clkmgr.v b/rtl/bench/tb_clkmgr.v new file mode 100644 index 0000000..a987e1c --- /dev/null +++ b/rtl/bench/tb_clkmgr.v @@ -0,0 +1,38 @@ +`timescale 1ns / 1ps + +module tb_clkmgr; + + // Input + reg fmc_clk = 1'b0; + + // Clock + always #11.111 fmc_clk = ~fmc_clk; + + // Outputs + wire io_clk; + wire sys_clk; + wire sys_rst_n; + wire core_clk; + + // Internals + reg fmc_clk_inhibit = 1'b0; + + // UUT + alpha_clkmgr uut + ( + .fmc_clk(fmc_clk & ~fmc_clk_inhibit), + .io_clk(io_clk), + .sys_clk(sys_clk), + .sys_rst_n(sys_rst_n), + .core_clk(core_clk) + ); + + // Script + initial begin + #399000; + fmc_clk_inhibit = 1'b1; + #1000; + fmc_clk_inhibit = 1'b0; + end + +endmodule From git at cryptech.is Thu Jan 23 09:01:35 2020 From: git at cryptech.is (git at cryptech.is) Date: Thu, 23 Jan 2020 09:01:35 +0000 Subject: [Cryptech-Commits] [core/platform/alpha] 04/07: Tweak the Makefile to match the new Alpha platform. In-Reply-To: <157977009118.5102.9555264409384673553@bikeshed.cryptech.is> References: <157977009118.5102.9555264409384673553@bikeshed.cryptech.is> Message-ID: <20200123090133.89F7A8BAF11@bikeshed.cryptech.is> This is an automated email from the git hooks/post-receive script. meisterpaul1 at yandex.ru pushed a commit to branch fmc_clk_core in repository core/platform/alpha. commit d6f47a422d1e41edb7b25b83e81931a19efd7ff0 Author: Pavel V. Shatov (Meister) AuthorDate: Tue Jan 21 15:45:18 2020 +0300 Tweak the Makefile to match the new Alpha platform. --- build/Makefile | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/build/Makefile b/build/Makefile index ca5d735..218e444 100644 --- a/build/Makefile +++ b/build/Makefile @@ -20,7 +20,10 @@ ucf ?= ../ucf/$(project).ucf # Verilog include directories, if needed -vlgincdir = $(CORE_TREE)/lib/lowlevel $(CORE_TREE)/math/ecdsalib/rtl/microcode +vlgincdir = \ + $(CORE_TREE)/lib/lowlevel \ + $(CORE_TREE)/math/ecdsalib/rtl/microcode \ + $(CORE_TREE)/lib/util all: $(project).bit @@ -47,7 +50,10 @@ vfiles = \ $(CORE_TREE)/platform/alpha/rtl/alpha_regs.v \ $(CORE_TREE)/platform/alpha/rtl/alpha_clkmgr.v \ $(CORE_TREE)/platform/alpha/rtl/clkmgr_mmcm.v \ + $(CORE_TREE)/platform/alpha/rtl/clkmgr_mmcm_ctrl.v \ ./core_selector.v \ + $(CORE_TREE)/platform/alpha/rtl/clkmgr_reset_gen.v \ + $(CORE_TREE)/platform/common/extra/reset_replicator.v \ $(CORE_TREE)/comm/fmc/src/rtl/fmc_arbiter.v \ $(CORE_TREE)/comm/fmc/src/rtl/fmc_d_phy.v \ $(CORE_TREE)/comm/fmc/src/rtl/fmc_indicator.v \ From git at cryptech.is Thu Jan 23 09:01:36 2020 From: git at cryptech.is (git at cryptech.is) Date: Thu, 23 Jan 2020 09:01:36 +0000 Subject: [Cryptech-Commits] [core/platform/alpha] 05/07: This commit turns off the "equivalent_register_removal" setting for XST. In-Reply-To: <157977009118.5102.9555264409384673553@bikeshed.cryptech.is> References: <157977009118.5102.9555264409384673553@bikeshed.cryptech.is> Message-ID: <20200123090134.072C68BAF0F@bikeshed.cryptech.is> This is an automated email from the git hooks/post-receive script. meisterpaul1 at yandex.ru pushed a commit to branch fmc_clk_core in repository core/platform/alpha. commit fbf287f57879678fe8cf4a74e07e72ca5c7b153b Author: Pavel V. Shatov (Meister) AuthorDate: Tue Jan 21 15:46:06 2020 +0300 This commit turns off the "equivalent_register_removal" setting for XST. Okay, here's the story. Xilinx synthesis tool ("XST") is smart in the sense, that it detects all the registers with equivalent behaviour and then removes all of them, but one, and connects all loads to this one flip-flop. This works fine most of the time and usually even saves some resources, but for our particular design it was starting to cause just too many problems. The reason is that ModExp* cores exploit the parallel nature of an FPGA, for example, the ModExpNG instantiates four copies of the modular multiplier internally. Those multipliers all operate the same way (but on different data, of course), so all their internal signals such as, say, clock enables and word counters are the same. XST happily throws away all the internals from three multipliers, leaves only one instance of control signals and then the map and place&route tools start struggling for hours fusing this all together. Turning off equivalent register removal entirely leads to excessive resource consumption, so the optimal solution would be to selectively turn it off only for those tricky places where several copies of control signals are actually required to meet timing. The problem is that according to Xilinx' docs (UG687 v14.5, p. 363) "quivalent_register_removal = no" inline constraint can be applied to entire modules, not only individual registers, but I was unable to get this to work, XST seems to just ignore it. This may have been fixed in Vivado though, haven't tried yet. Another potential solution is to prepend every register declaration inside the modular multiplier with this constraint, but that would look just ugly. One trick I've seen somewhere is to `define a new 'keep_equivalent_reg' "keyword" to be '"quivalent_register_removal = no" reg' and tweak register declarations accordingly, that seems to looks somewhat less ugly, don't know. Yet another way around might be to use the "max_fanout" constraint instead. Say there're eight DSP slices per multiplier (thirty two DSP slices total since there're four multiplier instances). In theory we can constrain their clock enable fanout to not exceed 8. The problem is that XST will first throw away three of the clock enables, and then gradually add them back to limit each clock enable fanout to 8. This way there's no guarantee, that the first clock enable will be routed to all the eight DSP slices in the first multiplier, it can be routed to DSP slices in the three remaining multipliers as well, since XST will try to just limit the fanout. It's difficult to predict how the place&route tools will handle this. Anyways, the current slice consumption with 2x ModExpA7 and 1x ModExpNG is ~40%, and the timing situation is very good (the very first phase of place and route already has zero setup time violations, yay!). With global equivalent register removal turned on, utilization drops to ~35%, but timing is impossible to meet even on the highest map and place&route effort setting. I believe the best way forward is to just keep global removal disabled for now. We may revisit this in the future, say, if we decide to generate a custom dedicated RSA-only signer bitstream with as many core instances as possible. Then every register will count, but I suspect we won't get away with just re-enabling global equivalent register removal alone, likely some floorplanning will be required too at least. --- build/xilinx.opt | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/build/xilinx.opt b/build/xilinx.opt index 1ac8957..91ce9fa 100644 --- a/build/xilinx.opt +++ b/build/xilinx.opt @@ -43,5 +43,5 @@ -use_sync_set Auto -use_sync_reset Auto -iob auto --equivalent_register_removal YES +-equivalent_register_removal NO -slice_utilization_ratio_maxmargin 5 From git at cryptech.is Thu Jan 23 09:01:37 2020 From: git at cryptech.is (git at cryptech.is) Date: Thu, 23 Jan 2020 09:01:37 +0000 Subject: [Cryptech-Commits] [core/platform/alpha] 06/07: Changed FMC I/O frequency to 45 MHz in the UCF file to match hardware and also updated offset constraint values accordingly. In-Reply-To: <157977009118.5102.9555264409384673553@bikeshed.cryptech.is> References: <157977009118.5102.9555264409384673553@bikeshed.cryptech.is> Message-ID: <20200123090136.C086E8BAF1E@bikeshed.cryptech.is> This is an automated email from the git hooks/post-receive script. meisterpaul1 at yandex.ru pushed a commit to branch fmc_clk_core in repository core/platform/alpha. commit 35359243a63cac4a9e8cce6bd718f17756ce8a98 Author: Pavel V. Shatov (Meister) AuthorDate: Thu Jan 23 11:16:12 2020 +0300 Changed FMC I/O frequency to 45 MHz in the UCF file to match hardware and also updated offset constraint values accordingly. Another important change is that the core selector is now multicycle. The I/O logic runs at 45 MHz, while all the cores and the core selector run at 90 MHz. The job of the core selector is to distribute the write data (STM32 -> FPGA) to the particular core being addressed and to select the read data (FPGA -> STM32) from the core being addressed. Timing issues arising from the distribution of write commands are mitigated by replication of data and control signals. Each core has its individual set of chip select, address and write data registers, they can be placed somewhere inbetween the core itself and the selector and thus the critical timing paths become twice shorter. Readback logic has a huge multiplexor that selects the read data from the core being addressed and then forwards it into the FMC bus arbiter. Since the FMC arbiter operates at 45 MHz (twice slower, than the readback multiplexor), it makes sense to give the multiplexor two 90 MHz clock cycles to select the value, since the arbiter waits one 45 MHz clock cycle before sampling the readback data. This is achieved by applying FROM-TO constraint. Note the two gotches here (took me some time to figure out): 1) you attach the TNM or TNM_NET... well, okay, there's the third potential gotcha here, since only one of the two works with I/O pads, read the CDG for more details, but luckily, it's not out case, phew. So, you attach a TNM_NET to a net, and ISE will follow the net and attach the TNM to **the next register being driven by this net**. This is somewhat non-obvious: say, you have a flip-flop called 'something_dout<0>' and it's output net is naturally also called 'something_dout<0>', attaching TNM_NET to this net doesn't apply the TNM to the flip-flop, it instead follows the net and attaches the TNM to all the load flip-flops on the net. For FROM-TO to work as expected we have to apply TNM_NET to all the output "read data" nets of all the cores, so that they could be traced into the multiplexor. Note, that for a multicycle multiplexor you actually need two FROM-TO constraints: one for the data and one for the control signal. 2) Applying FROM-TO affects both the setup and the hold check in ISE. This is different from Vivado, when you have to individually specify the setup and the hold checks. --- ucf/alpha_fmc.ucf | 66 ++++++++++++++++++++++++++++++++++++------------------- 1 file changed, 43 insertions(+), 23 deletions(-) diff --git a/ucf/alpha_fmc.ucf b/ucf/alpha_fmc.ucf index 7926762..908f9fe 100644 --- a/ucf/alpha_fmc.ucf +++ b/ucf/alpha_fmc.ucf @@ -46,31 +46,28 @@ #------------------------------------------------------------------------------- -# FMC_CLK Timing (can be up to 90 MHz) +# FMC_CLK Timing #------------------------------------------------------------------------------- NET "fmc_clk" TNM_NET = TNM_fmc_clk; -# 90 MHz -#TIMESPEC TS_fmc_clk = PERIOD TNM_fmc_clk 11.111 ns HIGH 50%; - -# 60 MHz -TIMESPEC TS_fmc_clk = PERIOD TNM_fmc_clk 16.667 ns HIGH 50%; +# 45 MHz +TIMESPEC TS_fmc_clk = PERIOD TNM_fmc_clk 22.222 ns HIGH 50%; #------------------------------------------------------------------------------- # FPGA Pinout #------------------------------------------------------------------------------- # -NET "led_pins<0>" LOC = "U3"; -NET "led_pins<1>" LOC = "T1"; -NET "led_pins<2>" LOC = "W22"; -NET "led_pins<3>" LOC = "AA20"; +NET "led_pins<0>" LOC = "U3"; +NET "led_pins<1>" LOC = "T1"; +NET "led_pins<2>" LOC = "W22"; +NET "led_pins<3>" LOC = "AA20"; # -NET "led_pins<*>" IOSTANDARD = "LVCMOS33"; -NET "led_pins<*>" SLEW = SLOW; -NET "led_pins<*>" DRIVE = 8; +NET "led_pins<*>" IOSTANDARD = "LVCMOS33"; +NET "led_pins<*>" SLEW = SLOW; +NET "led_pins<*>" DRIVE = 8; # -NET "gclk_pin" LOC = "D17" | IOSTANDARD = "LVCMOS33" ; +#NET "gclk_pin" LOC = "D17" | IOSTANDARD = "LVCMOS33" ; # NET "fmc_clk" LOC = "W11" | IOSTANDARD = "LVCMOS33" ; NET "fmc_ne1" LOC = "V5" | IOSTANDARD = "LVCMOS33" ; @@ -155,34 +152,57 @@ NET "mkm_do" LOC = "Y1" | IOSTANDARD = "LVCMOS33" ; # MKM_FPGA_MISO # Control signals NE1, NL and NWE all have different timing values. # Instead of writing individual constraints for every control signal, the most # strict constraint is applied to all control signals. This should not cause -# any P&R issues, since Spartan-6 (and Artix-7) can handle 90 MHz easily. +# any P&R issues, since Spartan-6 (and Artix-7) can handle 45 MHz very easily. # # NOE signal is not constrained, since it drives "T" input of IOBUF primitive. # -# Data and Address buses also have different timings, with Data bus timing being +# Data and Address buses also have different timings, with Address bus timing being # more strict. # -# Oh, and the datasheet doesn't explicitly specify hold time for the data bus. +# Oh, and the datasheet doesn't explicitly specify hold time for the data bus. But that +# is not a problem, since it's apparently impossible to specify both OFFSET OUT BEFORE +# and AFTER at the same time :) # +# "ideal" values +#TIMEGRP "TNM_FMC_IN_DATA" OFFSET = IN 8.5 ns VALID 19.5 ns BEFORE "fmc_clk" RISING ; +#TIMEGRP "TNM_FMC_IN_ADDR" OFFSET = IN 11.0 ns VALID 11.0 ns BEFORE "fmc_clk" RISING ; +#TIMEGRP "TNM_FMC_IN_CONTROL" OFFSET = IN 10.5 ns VALID 15.5 ns BEFORE "fmc_clk" RISING ; + +# more realistic ("strict") values with 1 ns subtracted from both sides of the +# valid window +TIMEGRP "TNM_FMC_IN_DATA" OFFSET = IN 7.5 ns VALID 17.5 ns BEFORE "fmc_clk" RISING ; +TIMEGRP "TNM_FMC_IN_ADDR" OFFSET = IN 10.0 ns VALID 9.0 ns BEFORE "fmc_clk" RISING ; +TIMEGRP "TNM_FMC_IN_CONTROL" OFFSET = IN 9.5 ns VALID 13.5 ns BEFORE "fmc_clk" RISING ; + NET "fmc_d<*>" TNM = "TNM_FMC_IN_DATA" ; NET "fmc_a<*>" TNM = "TNM_FMC_IN_ADDR" ; - +# NET "fmc_ne1" TNM = "TNM_FMC_IN_CONTROL" ; NET "fmc_nl" TNM = "TNM_FMC_IN_CONTROL" ; NET "fmc_nwe" TNM = "TNM_FMC_IN_CONTROL" ; -TIMEGRP "TNM_FMC_IN_DATA" OFFSET = IN 3.0 ns VALID 8.5 ns BEFORE "fmc_clk" RISING ; -TIMEGRP "TNM_FMC_IN_ADDR" OFFSET = IN 5.5 ns VALID 5.5 ns BEFORE "fmc_clk" RISING ; -TIMEGRP "TNM_FMC_IN_CONTROL" OFFSET = IN 5.0 ns VALID 10.0 ns BEFORE "fmc_clk" RISING ; - #------------------------------------------------------------------------------- # FMC Output Timing #------------------------------------------------------------------------------- +TIMEGRP "TNM_FMC_OUT_DATA" OFFSET = OUT 6.0 ns BEFORE "fmc_clk" FALLING; #+1ns + NET "fmc_d<*>" TNM = "TNM_FMC_OUT_DATA" ; -TIMEGRP "TNM_FMC_OUT_DATA" OFFSET = OUT 6.0 ns AFTER "fmc_clk" RISING; + +#------------------------------------------------------------------------------- +# Core Selector Multi-Cycle Readback Mux +#------------------------------------------------------------------------------- +NET "cores/reg_read_data_*<*>" TNM_NET = TNM_CORE_SELECTOR_MUX_INPUT ; +NET "cores/addr_core_num_dly2<*>" TNM_NET = TNM_CORE_SELECTOR_MUX_CONTROL ; +NET "cores/pipe_read_data_*<*>" TNM_NET = TNM_CORE_SELECTOR_MUX_OUTPUT ; + +TIMEGRP "TNM_CORE_SELECTOR_MUX_SRC" = "TNM_CORE_SELECTOR_MUX_INPUT" "TNM_CORE_SELECTOR_MUX_CONTROL" ; +TIMEGRP "TNM_CORE_SELECTOR_MUX_DST" = "TNM_CORE_SELECTOR_MUX_OUTPUT" ; + +TIMESPEC TS_core_selector = FROM TNM_CORE_SELECTOR_MUX_SRC TO TNM_CORE_SELECTOR_MUX_DST TS_clkmgr_mmcm_inst_mmcm_clkout0 * 2 ; + #====================================================================== From git at cryptech.is Thu Jan 23 09:01:38 2020 From: git at cryptech.is (git at cryptech.is) Date: Thu, 23 Jan 2020 09:01:38 +0000 Subject: [Cryptech-Commits] [core/platform/alpha] 07/07: Out of curiosity I tried compiling the bitstream with Vivado. These constraints may come handy if you're brave enough to try this at home. In-Reply-To: <157977009118.5102.9555264409384673553@bikeshed.cryptech.is> References: <157977009118.5102.9555264409384673553@bikeshed.cryptech.is> Message-ID: <20200123090137.F09F38BC252@bikeshed.cryptech.is> This is an automated email from the git hooks/post-receive script. meisterpaul1 at yandex.ru pushed a commit to branch fmc_clk_core in repository core/platform/alpha. commit c85809c98cdba10afe45a959bc9da2a4089cff82 Author: Pavel V. Shatov (Meister) AuthorDate: Thu Jan 23 11:57:57 2020 +0300 Out of curiosity I tried compiling the bitstream with Vivado. These constraints may come handy if you're brave enough to try this at home. --- xdc/alpha_fmc_clocks.xdc | 1 + xdc/alpha_fmc_pinout.xdc | 100 +++++++++++++++++++++++++++++++++++++++++++++++ xdc/alpha_fmc_timing.xdc | 1 + 3 files changed, 102 insertions(+) diff --git a/xdc/alpha_fmc_clocks.xdc b/xdc/alpha_fmc_clocks.xdc new file mode 100644 index 0000000..4184ff4 --- /dev/null +++ b/xdc/alpha_fmc_clocks.xdc @@ -0,0 +1 @@ +create_clock -period 22.222 -name clk_fmc -waveform {0.000 11.111} [get_ports fmc_clk] diff --git a/xdc/alpha_fmc_pinout.xdc b/xdc/alpha_fmc_pinout.xdc new file mode 100644 index 0000000..9f4e08a --- /dev/null +++ b/xdc/alpha_fmc_pinout.xdc @@ -0,0 +1,100 @@ +set_property PACKAGE_PIN "U3" [get_ports {led_pins[0]}] +set_property PACKAGE_PIN "T1" [get_ports {led_pins[1]}] +set_property PACKAGE_PIN "W22" [get_ports {led_pins[2]}] +set_property PACKAGE_PIN "AA20" [get_ports {led_pins[3]}] + +set_property PACKAGE_PIN "W11" [get_ports {fmc_clk }] +set_property PACKAGE_PIN "V5" [get_ports {fmc_ne1 }] +set_property PACKAGE_PIN "W16" [get_ports {fmc_noe }] +set_property PACKAGE_PIN "AA6" [get_ports {fmc_nwe }] +set_property PACKAGE_PIN "W17" [get_ports {fmc_nl }] +set_property PACKAGE_PIN "Y6" [get_ports {fmc_nwait }] + +set_property PACKAGE_PIN "Y17" [get_ports {fmc_a[0]}] +set_property PACKAGE_PIN "AB16" [get_ports {fmc_a[1]}] +set_property PACKAGE_PIN "AA16" [get_ports {fmc_a[2]}] +set_property PACKAGE_PIN "Y16" [get_ports {fmc_a[3]}] +set_property PACKAGE_PIN "AB17" [get_ports {fmc_a[4]}] +set_property PACKAGE_PIN "AA13" [get_ports {fmc_a[5]}] +set_property PACKAGE_PIN "AB13" [get_ports {fmc_a[6]}] +set_property PACKAGE_PIN "AA15" [get_ports {fmc_a[7]}] +set_property PACKAGE_PIN "AB15" [get_ports {fmc_a[8]}] +set_property PACKAGE_PIN "Y13" [get_ports {fmc_a[9]}] +set_property PACKAGE_PIN "AA14" [get_ports {fmc_a[10]}] +set_property PACKAGE_PIN "Y14" [get_ports {fmc_a[11]}] +set_property PACKAGE_PIN "AB10" [get_ports {fmc_a[12]}] +set_property PACKAGE_PIN "V2" [get_ports {fmc_a[13]}] +set_property PACKAGE_PIN "AB12" [get_ports {fmc_a[14]}] +set_property PACKAGE_PIN "AB8" [get_ports {fmc_a[15]}] +set_property PACKAGE_PIN "AA9" [get_ports {fmc_a[16]}] +set_property PACKAGE_PIN "AA8" [get_ports {fmc_a[17]}] +set_property PACKAGE_PIN "Y7" [get_ports {fmc_a[18]}] +set_property PACKAGE_PIN "AB21" [get_ports {fmc_a[19]}] +set_property PACKAGE_PIN "AB22" [get_ports {fmc_a[20]}] +set_property PACKAGE_PIN "AB20" [get_ports {fmc_a[21]}] +set_property PACKAGE_PIN "Y21" [get_ports {fmc_a[22]}] +set_property PACKAGE_PIN "Y22" [get_ports {fmc_a[23]}] +#set_property PACKAGE_PIN "AB18" [get_ports {fmc_a[24]}] +#set_property PACKAGE_PIN "AA19" [get_ports {fmc_a[25]}] + +set_property PACKAGE_PIN "AB7" [get_ports {fmc_d[0]}] +set_property PACKAGE_PIN "AB6" [get_ports {fmc_d[1]}] +set_property PACKAGE_PIN "U1" [get_ports {fmc_d[2]}] +set_property PACKAGE_PIN "U2" [get_ports {fmc_d[3]}] +set_property PACKAGE_PIN "AB11" [get_ports {fmc_d[4]}] +set_property PACKAGE_PIN "AA11" [get_ports {fmc_d[5]}] +set_property PACKAGE_PIN "Y11" [get_ports {fmc_d[6]}] +set_property PACKAGE_PIN "Y12" [get_ports {fmc_d[7]}] +set_property PACKAGE_PIN "Y18" [get_ports {fmc_d[8]}] +set_property PACKAGE_PIN "AA21" [get_ports {fmc_d[9]}] +set_property PACKAGE_PIN "W20" [get_ports {fmc_d[10]}] +set_property PACKAGE_PIN "N15" [get_ports {fmc_d[11]}] +set_property PACKAGE_PIN "U20" [get_ports {fmc_d[12]}] +set_property PACKAGE_PIN "AA1" [get_ports {fmc_d[13]}] +set_property PACKAGE_PIN "AB1" [get_ports {fmc_d[14]}] +set_property PACKAGE_PIN "AB2" [get_ports {fmc_d[15]}] +set_property PACKAGE_PIN "AB3" [get_ports {fmc_d[16]}] +set_property PACKAGE_PIN "Y3" [get_ports {fmc_d[17]}] +set_property PACKAGE_PIN "AA3" [get_ports {fmc_d[18]}] +set_property PACKAGE_PIN "AA5" [get_ports {fmc_d[19]}] +set_property PACKAGE_PIN "AB5" [get_ports {fmc_d[20]}] +set_property PACKAGE_PIN "Y4" [get_ports {fmc_d[21]}] +set_property PACKAGE_PIN "AA4" [get_ports {fmc_d[22]}] +set_property PACKAGE_PIN "V4" [get_ports {fmc_d[23]}] +set_property PACKAGE_PIN "W10" [get_ports {fmc_d[24]}] +set_property PACKAGE_PIN "R4" [get_ports {fmc_d[25]}] +set_property PACKAGE_PIN "W12" [get_ports {fmc_d[26]}] +set_property PACKAGE_PIN "W14" [get_ports {fmc_d[27]}] +set_property PACKAGE_PIN "V20" [get_ports {fmc_d[28]}] +set_property PACKAGE_PIN "V18" [get_ports {fmc_d[29]}] +set_property PACKAGE_PIN "R21" [get_ports {fmc_d[30]}] +set_property PACKAGE_PIN "P21" [get_ports {fmc_d[31]}] + +set_property PACKAGE_PIN "W19" [get_ports {ct_noise}] + +set_property PACKAGE_PIN "V3" [get_ports {mkm_sclk}] +set_property PACKAGE_PIN "W2" [get_ports {mkm_cs_n}] +set_property PACKAGE_PIN "W1" [get_ports {mkm_di}] +set_property PACKAGE_PIN "Y1" [get_ports {mkm_do}] + + +set_property IOSTANDARD "LVCMOS33" [get_ports {led_*}] +set_property IOSTANDARD "LVCMOS33" [get_ports {fmc_*}] +set_property IOSTANDARD "LVCMOS33" [get_ports {ct_noise}] +set_property IOSTANDARD "LVCMOS33" [get_ports {mkm_*}] + + +set_property DRIVE 8 [get_ports {led_pins[*]}] +set_property DRIVE 8 [get_ports {fmc_nwait}] +set_property DRIVE 8 [get_ports {fmc_d[*]}] +set_property DRIVE 8 [get_ports {mkm_sclk}] +set_property DRIVE 8 [get_ports {mkm_cs_n}] +set_property DRIVE 8 [get_ports {mkm_di}] + + +set_property SLEW "SLOW" [get_ports {led_pins[*]}] +set_property SLEW "FAST" [get_ports {fmc_nwait}] +set_property SLEW "FAST" [get_ports {fmc_d[*]}] +set_property SLEW "SLOW" [get_ports {mkm_sclk}] +set_property SLEW "SLOW" [get_ports {mkm_cs_n}] +set_property SLEW "SLOW" [get_ports {mkm_di}] diff --git a/xdc/alpha_fmc_timing.xdc b/xdc/alpha_fmc_timing.xdc new file mode 100644 index 0000000..e311270 --- /dev/null +++ b/xdc/alpha_fmc_timing.xdc @@ -0,0 +1 @@ +create_clock -period 22.222 -name clk_fmc -waveform {0.000 11.111} [get_ports clk_fmc] From git at cryptech.is Sun Jan 26 09:37:06 2020 From: git at cryptech.is (git at cryptech.is) Date: Sun, 26 Jan 2020 09:37:06 +0000 Subject: [Cryptech-Commits] [user/ln5/stm32-avalanche-noise] branch master updated (15f056c -> be1d685) Message-ID: <158003142612.3121.15153406251089907045@bikeshed.cryptech.is> This is an automated email from the git hooks/post-receive script. linus at nordberg.se pushed a change to branch master in repository user/ln5/stm32-avalanche-noise. from 15f056c rev11 hardware add 6f4f2af Don't find | rm for target 'clean' add 49d3928 Add version and application info to ELF file add e775954 Update comments for later revisions new 41eb060 [cc20rng] Code formatting changes new 76a6b63 [cc20rng] Revamping the ChaCha20 seeding new ee79942 Merge branch 'ln/cc20rng-revamp' into ln/devel new abfbb7b [entropy] Raise USB baud rate to match application 'cc20rng' new be1d685 Revert "Add version and application info to ELF file" The 5 revisions listed above as "new" are entirely new to this repository and will be described in separate emails. The revisions listed as "add" were already present in the repository and have only been added to this reference. Summary of changes: common.mk | 4 +- src/cc20rng/.clang-format | 4 ++ src/cc20rng/Makefile | 5 +- src/cc20rng/cc20_prng.c | 147 ++++++++++++++++++++++--------------------- src/cc20rng/cc20_prng.h | 20 +++--- src/cc20rng/main.c | 151 +++++++++++++++++++++------------------------ src/cc20rng/main.h | 4 -- src/cc20rng/stm32f4xx_it.h | 2 - src/entropy/Makefile | 1 - src/entropy/stm_init.c | 2 +- 10 files changed, 170 insertions(+), 170 deletions(-) create mode 100644 src/cc20rng/.clang-format delete mode 100644 src/cc20rng/main.h From git at cryptech.is Sun Jan 26 09:37:07 2020 From: git at cryptech.is (git at cryptech.is) Date: Sun, 26 Jan 2020 09:37:07 +0000 Subject: [Cryptech-Commits] [user/ln5/stm32-avalanche-noise] 01/05: [cc20rng] Code formatting changes In-Reply-To: <158003142612.3121.15153406251089907045@bikeshed.cryptech.is> References: <158003142612.3121.15153406251089907045@bikeshed.cryptech.is> Message-ID: <20200126093706.BAB328BAF0F@bikeshed.cryptech.is> This is an automated email from the git hooks/post-receive script. linus at nordberg.se pushed a commit to branch master in repository user/ln5/stm32-avalanche-noise. commit 41eb060367b91415aadea26f63efc3db8fdbc92b Author: Linus Nordberg AuthorDate: Wed Dec 18 23:35:46 2019 +0100 [cc20rng] Code formatting changes Keep indentation level 2 in main.c to minimise changes. --- src/cc20rng/.clang-format | 4 ++ src/cc20rng/Makefile | 4 ++ src/cc20rng/cc20_prng.c | 87 +++++++++++++++++----------------- src/cc20rng/cc20_prng.h | 9 ++-- src/cc20rng/main.c | 118 +++++++++++++++++++++------------------------- 5 files changed, 111 insertions(+), 111 deletions(-) diff --git a/src/cc20rng/.clang-format b/src/cc20rng/.clang-format new file mode 100644 index 0000000..8367454 --- /dev/null +++ b/src/cc20rng/.clang-format @@ -0,0 +1,4 @@ +IndentWidth: 4 +BreakBeforeBraces: Custom +BraceWrapping: + AfterFunction: true diff --git a/src/cc20rng/Makefile b/src/cc20rng/Makefile index 2873505..6c274be 100644 --- a/src/cc20rng/Makefile +++ b/src/cc20rng/Makefile @@ -29,6 +29,10 @@ $(PROJ_NAME).elf: $(SRCS) $(OBJDUMP) -St $(PROJ_NAME).elf >$(PROJ_NAME).lst $(SIZE) $(PROJ_NAME).elf +c-format: + clang-format-4.0 -i cc20_prng.[ch] + clang-format-4.0 -style="{IndentWidth: 2}" -i main.c + clean: find ./ -name '*~' | xargs rm -f rm -f *.o diff --git a/src/cc20rng/cc20_prng.c b/src/cc20rng/cc20_prng.c index 3d52fc6..50739bd 100644 --- a/src/cc20rng/cc20_prng.c +++ b/src/cc20rng/cc20_prng.c @@ -44,12 +44,13 @@ void _dump(struct cc20_state *cc, char *str); #endif -inline uint32_t rotl32 (uint32_t x, uint32_t n) +inline uint32_t rotl32(uint32_t x, uint32_t n) { return (x << n) | (x >> (32 - n)); } -inline void _qr(struct cc20_state *cc, uint32_t a, uint32_t b, uint32_t c, uint32_t d) +inline void _qr(struct cc20_state *cc, uint32_t a, uint32_t b, uint32_t c, + uint32_t d) { cc->i[a] += cc->i[b]; cc->i[d] ^= cc->i[a]; @@ -72,11 +73,12 @@ void chacha20_prng_reseed(struct cc20_state *cc, uint32_t *entropy) { uint32_t i = 256 / 8 / 4; while (i--) { - cc->i[i] = entropy[i]; + cc->i[i] = entropy[i]; } } -void chacha20_prng_block(struct cc20_state *cc, uint32_t block_counter, struct cc20_state *out) +void chacha20_prng_block(struct cc20_state *cc, uint32_t block_counter, + struct cc20_state *out) { uint32_t i; @@ -88,22 +90,22 @@ void chacha20_prng_block(struct cc20_state *cc, uint32_t block_counter, struct c cc->i[12] = block_counter; for (i = 4; i < CHACHA20_NUM_WORDS; i++) { - out->i[i] = cc->i[i]; + out->i[i] = cc->i[i]; } for (i = 10; i; i--) { - _qr(out, 0, 4, 8,12); - _qr(out, 1, 5, 9,13); - _qr(out, 2, 6,10,14); - _qr(out, 3, 7,11,15); - _qr(out, 0, 5,10,15); - _qr(out, 1, 6,11,12); - _qr(out, 2, 7, 8,13); - _qr(out, 3, 4, 9,14); + _qr(out, 0, 4, 8, 12); + _qr(out, 1, 5, 9, 13); + _qr(out, 2, 6, 10, 14); + _qr(out, 3, 7, 11, 15); + _qr(out, 0, 5, 10, 15); + _qr(out, 1, 6, 11, 12); + _qr(out, 2, 7, 8, 13); + _qr(out, 3, 4, 9, 14); } for (i = 0; i < CHACHA20_NUM_WORDS; i++) { - out->i[i] += cc->i[i]; + out->i[i] += cc->i[i]; } } @@ -119,11 +121,11 @@ int chacha20_prng_self_test1() 0x00000001, 0x09000000, 0x4a000000, 0x00000000, }}; struct cc20_state expected = { - {0xe4e7f110, 0x15593bd1, 0x1fdd0f50, 0xc47120a3, - 0xc7f4d1c7, 0x0368c033, 0x9aaa2204, 0x4e6cd4c3, - 0x466482d2, 0x09aa9f07, 0x05d7c214, 0xa2028bd9, - 0xd19c12b5, 0xb94e16de, 0xe883d0cb, 0x4e3c50a2, - }}; + {0xe4e7f110, 0x15593bd1, 0x1fdd0f50, 0xc47120a3, + 0xc7f4d1c7, 0x0368c033, 0x9aaa2204, 0x4e6cd4c3, + 0x466482d2, 0x09aa9f07, 0x05d7c214, 0xa2028bd9, + 0xd19c12b5, 0xb94e16de, 0xe883d0cb, 0x4e3c50a2, + }}; uint32_t i; struct cc20_state out; @@ -137,7 +139,8 @@ int chacha20_prng_self_test1() #endif for (i = 0; i < CHACHA20_NUM_WORDS; i++) { - if (out.i[i] != expected.i[i]) return 0; + if (out.i[i] != expected.i[i]) + return 0; } return 1; @@ -149,23 +152,23 @@ int chacha20_prng_self_test2() * https://tools.ietf.org/html/rfc7539#section-2.4.2 */ struct cc20_state test = { - {0x61707865, 0x3320646e, 0x79622d32, 0x6b206574, - 0x03020100, 0x07060504, 0x0b0a0908, 0x0f0e0d0c, - 0x13121110, 0x17161514, 0x1b1a1918, 0x1f1e1d1c, - 0x00000001, 0x00000000, 0x4a000000, 0x00000000, - }}; + {0x61707865, 0x3320646e, 0x79622d32, 0x6b206574, + 0x03020100, 0x07060504, 0x0b0a0908, 0x0f0e0d0c, + 0x13121110, 0x17161514, 0x1b1a1918, 0x1f1e1d1c, + 0x00000001, 0x00000000, 0x4a000000, 0x00000000, + }}; struct cc20_state expected1 = { - {0xf3514f22, 0xe1d91b40, 0x6f27de2f, 0xed1d63b8, - 0x821f138c, 0xe2062c3d, 0xecca4f7e, 0x78cff39e, - 0xa30a3b8a, 0x920a6072, 0xcd7479b5, 0x34932bed, - 0x40ba4c79, 0xcd343ec6, 0x4c2c21ea, 0xb7417df0, - }}; + {0xf3514f22, 0xe1d91b40, 0x6f27de2f, 0xed1d63b8, + 0x821f138c, 0xe2062c3d, 0xecca4f7e, 0x78cff39e, + 0xa30a3b8a, 0x920a6072, 0xcd7479b5, 0x34932bed, + 0x40ba4c79, 0xcd343ec6, 0x4c2c21ea, 0xb7417df0, + }}; struct cc20_state expected2 = { - {0x9f74a669, 0x410f633f, 0x28feca22, 0x7ec44dec, - 0x6d34d426, 0x738cb970, 0x3ac5e9f3, 0x45590cc4, - 0xda6e8b39, 0x892c831a, 0xcdea67c1, 0x2b7e1d90, - 0x037463f3, 0xa11a2073, 0xe8bcfb88, 0xedc49139, - }}; + {0x9f74a669, 0x410f633f, 0x28feca22, 0x7ec44dec, + 0x6d34d426, 0x738cb970, 0x3ac5e9f3, 0x45590cc4, + 0xda6e8b39, 0x892c831a, 0xcdea67c1, 0x2b7e1d90, + 0x037463f3, 0xa11a2073, 0xe8bcfb88, 0xedc49139, + }}; struct cc20_state out; uint32_t i; @@ -178,7 +181,8 @@ int chacha20_prng_self_test2() _dump(&out, "First block"); #endif for (i = 0; i < CHACHA20_NUM_WORDS; i++) { - if (out.i[i] != expected1.i[i]) return 0; + if (out.i[i] != expected1.i[i]) + return 0; } chacha20_prng_block(&test, 2, &out); @@ -186,7 +190,8 @@ int chacha20_prng_self_test2() _dump(&out, "Second block"); #endif for (i = 0; i < CHACHA20_NUM_WORDS; i++) { - if (out.i[i] != expected2.i[i]) return 0; + if (out.i[i] != expected2.i[i]) + return 0; } return 1; @@ -201,8 +206,8 @@ void _dump(struct cc20_state *cc, char *str) printf("%s\n", str); for (i = 0; i < 4; i++) { - printf("%02i %08x %08x %08x %08x\n", i * 4, cc->i[i * 4 + 0], - cc->i[i * 4 + 1], cc->i[i * 4 + 2], cc->i[i * 4 + 3]); + printf("%02i %08x %08x %08x %08x\n", i * 4, cc->i[i * 4 + 0], + cc->i[i * 4 + 1], cc->i[i * 4 + 2], cc->i[i * 4 + 3]); } printf("\n"); } @@ -213,7 +218,5 @@ void _dump(struct cc20_state *cc, char *str) */ int chacha20_prng_self_test() { - return \ - chacha20_prng_self_test1() && \ - chacha20_prng_self_test2(); + return chacha20_prng_self_test1() && chacha20_prng_self_test2(); } diff --git a/src/cc20rng/cc20_prng.h b/src/cc20rng/cc20_prng.h index 6940f53..7b597d0 100644 --- a/src/cc20rng/cc20_prng.h +++ b/src/cc20rng/cc20_prng.h @@ -3,16 +3,17 @@ #include -#define CHACHA20_MAX_BLOCK_COUNTER 0xffffffff -#define CHACHA20_NUM_WORDS 16 -#define CHACHA20_BLOCK_SIZE (CHACHA20_NUM_WORDS * 4) +#define CHACHA20_MAX_BLOCK_COUNTER 0xffffffff +#define CHACHA20_NUM_WORDS 16 +#define CHACHA20_BLOCK_SIZE (CHACHA20_NUM_WORDS * 4) struct cc20_state { uint32_t i[CHACHA20_NUM_WORDS]; }; extern void chacha20_prng_reseed(struct cc20_state *cc, uint32_t *entropy); -extern void chacha20_prng_block(struct cc20_state *cc, uint32_t block_counter, struct cc20_state *out); +extern void chacha20_prng_block(struct cc20_state *cc, uint32_t block_counter, + struct cc20_state *out); extern int chacha20_prng_self_test(); #endif /* __STM32_CHACHA20_H */ diff --git a/src/cc20rng/main.c b/src/cc20rng/main.c index e478409..88b5144 100644 --- a/src/cc20rng/main.c +++ b/src/cc20rng/main.c @@ -31,12 +31,12 @@ */ #include +#include "cc20_prng.h" #include "main.h" #include "stm_init.h" -#include "cc20_prng.h" -#define UART_RANDOM_BYTES_PER_CHUNK 8 -#define RESEED_BLOCKS CHACHA20_MAX_BLOCK_COUNTER +#define UART_RANDOM_BYTES_PER_CHUNK 8 +#define RESEED_BLOCKS CHACHA20_MAX_BLOCK_COUNTER extern DMA_HandleTypeDef hdma_tim; @@ -44,19 +44,22 @@ static UART_HandleTypeDef *huart; static __IO ITStatus UartReady = RESET; static union { - uint8_t rnd[257]; /* 256 bytes + 1 for use in the POST */ + uint8_t rnd[257]; /* 256 bytes + 1 for use in the POST */ uint32_t rnd32[64]; } buf; -/* First DMA value (DMA_counters[0]) is unreliable, leftover in DMA FIFO perhaps? */ -#define FIRST_DMA_IDX_USED 3 +/* First DMA value (DMA_counters[0]) is unreliable, leftover in DMA FIFO + * perhaps? */ +#define FIRST_DMA_IDX_USED 3 /* * Number of counters used to produce 8 bits of entropy is: - * 8 * 4 - four flanks are used to produce two (hopefully) uncorrelated bits (a and b) + * 8 * 4 - four flanks are used to produce two (hopefully) uncorrelated bits + * (a and b) * * 2 - von Neumann will on average discard 1/2 of the bits 'a' and 'b' */ -#define DMA_COUNTERS_NUM ((UART_RANDOM_BYTES_PER_CHUNK * 8 * 4 * 2) + FIRST_DMA_IDX_USED + 1) +#define DMA_COUNTERS_NUM \ + ((UART_RANDOM_BYTES_PER_CHUNK * 8 * 4 * 2) + FIRST_DMA_IDX_USED + 1) struct DMA_params { volatile uint32_t buf0[DMA_COUNTERS_NUM]; volatile uint32_t buf1[DMA_COUNTERS_NUM]; @@ -64,28 +67,23 @@ struct DMA_params { }; static struct DMA_params DMA = { - {}, - {}, - 0, + {}, {}, 0, }; - /* The main work horse functions */ static void get_entropy32(uint32_t num_bytes, uint32_t buf_idx); /* Various support functions */ static inline uint32_t get_one_bit(void) __attribute__((__always_inline__)); static volatile uint32_t *restart_DMA(void); static inline volatile uint32_t *get_DMA_read_buf(void); -static inline uint32_t safe_get_counter(volatile uint32_t *dmabuf, const uint32_t dmabuf_idx); +static inline uint32_t safe_get_counter(volatile uint32_t *dmabuf, + const uint32_t dmabuf_idx); static void check_uart_rx(UART_HandleTypeDef *this); static void cc_reseed(struct cc20_state *cc); void Error_Handler(void); - -int -main() -{ +int main() { uint32_t i, timeout, block_counter = 0; struct cc20_state cc, out; HAL_StatusTypeDef res; @@ -97,10 +95,10 @@ main() DMA.buf1[i] = 0xffff0100 + i; } - stm_init((uint32_t *) &DMA.buf0, DMA_COUNTERS_NUM); + stm_init((uint32_t *)&DMA.buf0, DMA_COUNTERS_NUM); - if (! chacha20_prng_self_test()) { - Error_Handler(); + if (!chacha20_prng_self_test()) { + Error_Handler(); } /* Ensure there is actual Timer IC counters in both DMA buffers. */ @@ -110,29 +108,25 @@ main() huart = &huart1; /* Toggle GREEN LED to show we've initialized */ - { - for (i = 0; i < 10; i++) { - HAL_GPIO_TogglePin(LED_PORT, LED_GREEN); - HAL_Delay(125); - } + for (i = 0; i < 10; i++) { + HAL_GPIO_TogglePin(LED_PORT, LED_GREEN); + HAL_Delay(125); } /* Generate initial block of random data directly into buf */ cc_reseed(&cc); block_counter = RESEED_BLOCKS; - chacha20_prng_block(&cc, block_counter--, (struct cc20_state *) buf.rnd32); + chacha20_prng_block(&cc, block_counter--, (struct cc20_state *)buf.rnd32); - /* - * Main loop - */ + /* Main loop */ while (1) { - if (! (block_counter % 1000)) { + if (!(block_counter % 1000)) { HAL_GPIO_TogglePin(LED_PORT, LED_YELLOW); } - if (! block_counter) { - cc_reseed(&cc); - block_counter = RESEED_BLOCKS; + if (!block_counter) { + cc_reseed(&cc); + block_counter = RESEED_BLOCKS; } /* Send buf on UART (non blocking interrupt driven send). */ @@ -143,12 +137,14 @@ main() chacha20_prng_block(&cc, block_counter--, &out); /* Copying using a loop is faster than memcpy on STM32 */ for (i = 0; i < CHACHA20_NUM_WORDS; i++) { - buf.rnd32[i] = out.i[i]; + buf.rnd32[i] = out.i[i]; } if (res == HAL_OK) { timeout = 0xffff; - while (UartReady != SET && timeout) { timeout--; } + while (UartReady != SET && timeout) { + timeout--; + } } if (UartReady != SET) { @@ -164,22 +160,19 @@ main() } } - /** * @brief Reseed chacha20 state with hardware generated entropy. * @param cc: ChaCha20 state * @retval None */ -static void -cc_reseed(struct cc20_state *cc) -{ - HAL_GPIO_WritePin(LED_PORT, LED_BLUE, GPIO_PIN_SET); +static void cc_reseed(struct cc20_state *cc) { + HAL_GPIO_WritePin(LED_PORT, LED_BLUE, GPIO_PIN_SET); - get_entropy32(CHACHA20_BLOCK_SIZE / 4, 0); - restart_DMA(); - chacha20_prng_reseed(cc, (uint32_t *) &buf); + get_entropy32(CHACHA20_BLOCK_SIZE / 4, 0); + restart_DMA(); + chacha20_prng_reseed(cc, (uint32_t *)&buf); - HAL_GPIO_WritePin(LED_PORT, LED_BLUE, GPIO_PIN_RESET); + HAL_GPIO_WritePin(LED_PORT, LED_BLUE, GPIO_PIN_RESET); } /** @@ -188,16 +181,14 @@ cc_reseed(struct cc20_state *cc) * @param start: Start index value into buf.rnd32. * @retval None */ -static inline void get_entropy32(uint32_t count, const uint32_t start) -{ +static inline void get_entropy32(uint32_t count, const uint32_t start) { uint32_t i, bits, buf_idx; buf_idx = start; do { bits = 0; - /* Get 32 bits of entropy. - */ + /* Get 32 bits of entropy. */ for (i = 32; i; i--) { bits <<= 1; bits += get_one_bit(); @@ -213,8 +204,7 @@ static inline void get_entropy32(uint32_t count, const uint32_t start) * @param None * @retval One bit, in the LSB of an uint32_t since this is a 32 bit MCU. */ -static inline uint32_t get_one_bit() -{ +static inline uint32_t get_one_bit() { register uint32_t a, b, temp; /* Start at end of buffer so restart_DMA() is called. */ static uint32_t dmabuf_idx = DMA_COUNTERS_NUM - 1; @@ -264,8 +254,7 @@ static inline uint32_t get_one_bit() * @param None * @retval Pointer to buffer currently being read from. */ -static inline volatile uint32_t *get_DMA_read_buf(void) -{ +static inline volatile uint32_t *get_DMA_read_buf(void) { return DMA.write_buf ? DMA.buf0 : DMA.buf1; } @@ -274,8 +263,7 @@ static inline volatile uint32_t *get_DMA_read_buf(void) * @param None * @retval Pointer to buffer currently being written to. */ -static inline volatile uint32_t *get_DMA_write_buf(void) -{ +static inline volatile uint32_t *get_DMA_write_buf(void) { return DMA.write_buf ? DMA.buf1 : DMA.buf0; } @@ -284,16 +272,17 @@ static inline volatile uint32_t *get_DMA_write_buf(void) * @param None * @retval Pointer to buffer full of Timer IC values ready to be consumed. */ -static volatile uint32_t *restart_DMA(void) -{ +static volatile uint32_t *restart_DMA(void) { /* Wait for transfer complete flag to become SET. Trying to change the * M0AR register while the DMA is running is a no-no. */ - while(__HAL_DMA_GET_FLAG(&hdma_tim, __HAL_DMA_GET_TC_FLAG_INDEX(&hdma_tim)) == RESET) { ; } + while (__HAL_DMA_GET_FLAG(&hdma_tim, + __HAL_DMA_GET_TC_FLAG_INDEX(&hdma_tim)) == RESET) + ; /* Switch buffer being written to */ DMA.write_buf ^= 1; - hdma_tim.Instance->M0AR = (uint32_t) get_DMA_write_buf(); + hdma_tim.Instance->M0AR = (uint32_t)get_DMA_write_buf(); /* Start at 0 to help manual inspection */ TIM2->CNT = 0; @@ -311,7 +300,8 @@ static volatile uint32_t *restart_DMA(void) * @param dmabuf_idx: Word index into `dmabuf'. * @retval One Timer IC counter value. */ -static inline uint32_t safe_get_counter(volatile uint32_t *dmabuf, const uint32_t dmabuf_idx) { +static inline uint32_t safe_get_counter(volatile uint32_t *dmabuf, + const uint32_t dmabuf_idx) { register uint32_t a; /* Prevent re-use of values. DMA stored values are <= 0xffff. */ do { @@ -322,8 +312,7 @@ static inline uint32_t safe_get_counter(volatile uint32_t *dmabuf, const uint32_ } /* UART transmit complete callback */ -void HAL_UART_TxCpltCallback(UART_HandleTypeDef *UH) -{ +void HAL_UART_TxCpltCallback(UART_HandleTypeDef *UH) { if ((UH->Instance == USART1 && huart->Instance == USART1) || (UH->Instance == USART2 && huart->Instance == USART2)) { /* Signal UART transmit complete to the code in the main loop. */ @@ -350,10 +339,9 @@ static void check_uart_rx(UART_HandleTypeDef *this) { * @param None * @retval None */ - -void Error_Handler(void) -{ +void Error_Handler(void) { /* Turn on RED LED and then loop indefinitely */ HAL_GPIO_WritePin(LED_PORT, LED_RED, GPIO_PIN_SET); - while(1) { ; } + while (1) + ; } From git at cryptech.is Sun Jan 26 09:37:08 2020 From: git at cryptech.is (git at cryptech.is) Date: Sun, 26 Jan 2020 09:37:08 +0000 Subject: [Cryptech-Commits] [user/ln5/stm32-avalanche-noise] 02/05: [cc20rng] Revamping the ChaCha20 seeding In-Reply-To: <158003142612.3121.15153406251089907045@bikeshed.cryptech.is> References: <158003142612.3121.15153406251089907045@bikeshed.cryptech.is> Message-ID: <20200126093707.556F28BAF10@bikeshed.cryptech.is> This is an automated email from the git hooks/post-receive script. linus at nordberg.se pushed a commit to branch master in repository user/ln5/stm32-avalanche-noise. commit 76a6b631f4bd6866622f537870bc145c935bef40 Author: Linus Nordberg AuthorDate: Wed Dec 18 23:36:25 2019 +0100 [cc20rng] Revamping the ChaCha20 seeding - chacha20_prng_block() uses counter in the state struct - chacha20_setup() replaces chacha20_prng_reseed() and fills the whole state struct, fixing a bug where only half of the key was being set; as a result of 'counter' being set, a state struct filled with entropy from the TRNG makes reseeding occur after a random number of rounds instead of after a fixed 2^32-1 rounds - decrementing of the block counter is done in chacha20_prng_block() - chacha output is copied to buf _after_ the interrupt driven transmission of buf to UART has finished, to stop the race between reading and refilling of buf --- src/cc20rng/cc20_prng.c | 124 +++++++++++++++++++++++---------------------- src/cc20rng/cc20_prng.h | 21 +++++--- src/cc20rng/main.c | 55 ++++++++++---------- src/cc20rng/main.h | 4 -- src/cc20rng/stm32f4xx_it.h | 2 - 5 files changed, 104 insertions(+), 102 deletions(-) diff --git a/src/cc20rng/cc20_prng.c b/src/cc20rng/cc20_prng.c index 50739bd..36c06a4 100644 --- a/src/cc20rng/cc20_prng.c +++ b/src/cc20rng/cc20_prng.c @@ -41,16 +41,16 @@ #define CHACHA20_CONSTANT3 0x6b206574 #ifdef CHACHA20_PRNG_DEBUG -void _dump(struct cc20_state *cc, char *str); +static void _dump(struct cc20_state *cc, char *str); #endif -inline uint32_t rotl32(uint32_t x, uint32_t n) +static inline uint32_t rotl32(uint32_t x, uint32_t n) { return (x << n) | (x >> (32 - n)); } -inline void _qr(struct cc20_state *cc, uint32_t a, uint32_t b, uint32_t c, - uint32_t d) +static inline void _qr(struct cc20_state *cc, uint32_t a, uint32_t b, + uint32_t c, uint32_t d) { cc->i[a] += cc->i[b]; cc->i[d] ^= cc->i[a]; @@ -69,44 +69,40 @@ inline void _qr(struct cc20_state *cc, uint32_t a, uint32_t b, uint32_t c, cc->i[b] = rotl32(cc->i[b], 7); } -void chacha20_prng_reseed(struct cc20_state *cc, uint32_t *entropy) +static void chacha20_setup(struct cc20_state *state, + const struct cc20_state *src) { - uint32_t i = 256 / 8 / 4; - while (i--) { - cc->i[i] = entropy[i]; - } + state->s.constant[0] = CHACHA20_CONSTANT0; + state->s.constant[1] = CHACHA20_CONSTANT1; + state->s.constant[2] = CHACHA20_CONSTANT2; + state->s.constant[3] = CHACHA20_CONSTANT3; + + for (int i = 4; i < CHACHA20_BLOCK_SIZE_WORDS; i++) + state->i[i] = src->i[i]; } -void chacha20_prng_block(struct cc20_state *cc, uint32_t block_counter, - struct cc20_state *out) +void chacha20_prng_block(struct cc20_state *cc, uint8_t *out) { uint32_t i; + struct cc20_state *ws = (struct cc20_state *)out; - out->i[0] = CHACHA20_CONSTANT0; - out->i[1] = CHACHA20_CONSTANT1; - out->i[2] = CHACHA20_CONSTANT2; - out->i[3] = CHACHA20_CONSTANT3; - - cc->i[12] = block_counter; - - for (i = 4; i < CHACHA20_NUM_WORDS; i++) { - out->i[i] = cc->i[i]; - } + chacha20_setup(ws, cc); for (i = 10; i; i--) { - _qr(out, 0, 4, 8, 12); - _qr(out, 1, 5, 9, 13); - _qr(out, 2, 6, 10, 14); - _qr(out, 3, 7, 11, 15); - _qr(out, 0, 5, 10, 15); - _qr(out, 1, 6, 11, 12); - _qr(out, 2, 7, 8, 13); - _qr(out, 3, 4, 9, 14); + _qr(ws, 0, 4, 8, 12); + _qr(ws, 1, 5, 9, 13); + _qr(ws, 2, 6, 10, 14); + _qr(ws, 3, 7, 11, 15); + _qr(ws, 0, 5, 10, 15); + _qr(ws, 1, 6, 11, 12); + _qr(ws, 2, 7, 8, 13); + _qr(ws, 3, 4, 9, 14); } - for (i = 0; i < CHACHA20_NUM_WORDS; i++) { - out->i[i] += cc->i[i]; - } + for (i = 0; i < CHACHA20_BLOCK_SIZE_WORDS; i++) + ws->i[i] += cc->i[i]; + + cc->s.counter--; } int chacha20_prng_self_test1() @@ -115,16 +111,18 @@ int chacha20_prng_self_test1() * https://tools.ietf.org/html/rfc7539#section-2.3.2 */ struct cc20_state test = { - {0x61707865, 0x3320646e, 0x79622d32, 0x6b206574, - 0x03020100, 0x07060504, 0x0b0a0908, 0x0f0e0d0c, - 0x13121110, 0x17161514, 0x1b1a1918, 0x1f1e1d1c, - 0x00000001, 0x09000000, 0x4a000000, 0x00000000, + .i = { + 0x61707865, 0x3320646e, 0x79622d32, 0x6b206574, + 0x03020100, 0x07060504, 0x0b0a0908, 0x0f0e0d0c, + 0x13121110, 0x17161514, 0x1b1a1918, 0x1f1e1d1c, + 0x00000001, 0x09000000, 0x4a000000, 0x00000000, }}; struct cc20_state expected = { - {0xe4e7f110, 0x15593bd1, 0x1fdd0f50, 0xc47120a3, - 0xc7f4d1c7, 0x0368c033, 0x9aaa2204, 0x4e6cd4c3, - 0x466482d2, 0x09aa9f07, 0x05d7c214, 0xa2028bd9, - 0xd19c12b5, 0xb94e16de, 0xe883d0cb, 0x4e3c50a2, + .i = { + 0xe4e7f110, 0x15593bd1, 0x1fdd0f50, 0xc47120a3, + 0xc7f4d1c7, 0x0368c033, 0x9aaa2204, 0x4e6cd4c3, + 0x466482d2, 0x09aa9f07, 0x05d7c214, 0xa2028bd9, + 0xd19c12b5, 0xb94e16de, 0xe883d0cb, 0x4e3c50a2, }}; uint32_t i; struct cc20_state out; @@ -133,12 +131,13 @@ int chacha20_prng_self_test1() _dump(&test, "Test vector from RFC7539, section 2.3.2. Input:"); #endif - chacha20_prng_block(&test, 1, &out); + test.s.counter = 1; /* nop */ + chacha20_prng_block(&test, (uint8_t *)&out); #ifdef CHACHA20_PRNG_DEBUG _dump(&out, "Test vector from RFC7539, section 2.3.2. Output:"); #endif - for (i = 0; i < CHACHA20_NUM_WORDS; i++) { + for (i = 0; i < CHACHA20_BLOCK_SIZE_WORDS; i++) { if (out.i[i] != expected.i[i]) return 0; } @@ -152,22 +151,25 @@ int chacha20_prng_self_test2() * https://tools.ietf.org/html/rfc7539#section-2.4.2 */ struct cc20_state test = { - {0x61707865, 0x3320646e, 0x79622d32, 0x6b206574, - 0x03020100, 0x07060504, 0x0b0a0908, 0x0f0e0d0c, - 0x13121110, 0x17161514, 0x1b1a1918, 0x1f1e1d1c, - 0x00000001, 0x00000000, 0x4a000000, 0x00000000, + .i = { + 0x61707865, 0x3320646e, 0x79622d32, 0x6b206574, + 0x03020100, 0x07060504, 0x0b0a0908, 0x0f0e0d0c, + 0x13121110, 0x17161514, 0x1b1a1918, 0x1f1e1d1c, + 0x00000001, 0x00000000, 0x4a000000, 0x00000000, }}; struct cc20_state expected1 = { - {0xf3514f22, 0xe1d91b40, 0x6f27de2f, 0xed1d63b8, - 0x821f138c, 0xe2062c3d, 0xecca4f7e, 0x78cff39e, - 0xa30a3b8a, 0x920a6072, 0xcd7479b5, 0x34932bed, - 0x40ba4c79, 0xcd343ec6, 0x4c2c21ea, 0xb7417df0, + .i = { + 0xf3514f22, 0xe1d91b40, 0x6f27de2f, 0xed1d63b8, + 0x821f138c, 0xe2062c3d, 0xecca4f7e, 0x78cff39e, + 0xa30a3b8a, 0x920a6072, 0xcd7479b5, 0x34932bed, + 0x40ba4c79, 0xcd343ec6, 0x4c2c21ea, 0xb7417df0, }}; struct cc20_state expected2 = { - {0x9f74a669, 0x410f633f, 0x28feca22, 0x7ec44dec, - 0x6d34d426, 0x738cb970, 0x3ac5e9f3, 0x45590cc4, - 0xda6e8b39, 0x892c831a, 0xcdea67c1, 0x2b7e1d90, - 0x037463f3, 0xa11a2073, 0xe8bcfb88, 0xedc49139, + .i = { + 0x9f74a669, 0x410f633f, 0x28feca22, 0x7ec44dec, + 0x6d34d426, 0x738cb970, 0x3ac5e9f3, 0x45590cc4, + 0xda6e8b39, 0x892c831a, 0xcdea67c1, 0x2b7e1d90, + 0x037463f3, 0xa11a2073, 0xe8bcfb88, 0xedc49139, }}; struct cc20_state out; uint32_t i; @@ -176,20 +178,22 @@ int chacha20_prng_self_test2() _dump(&test, "Test vector from RFC7539, section 2.4.2. Input:"); #endif - chacha20_prng_block(&test, 1, &out); + test.s.counter = 1; /* nop */ + chacha20_prng_block(&test, (uint8_t *)&out); #ifdef CHACHA20_PRNG_DEBUG _dump(&out, "First block"); #endif - for (i = 0; i < CHACHA20_NUM_WORDS; i++) { + for (i = 0; i < CHACHA20_BLOCK_SIZE_WORDS; i++) { if (out.i[i] != expected1.i[i]) return 0; } - chacha20_prng_block(&test, 2, &out); + test.s.counter = 2; + chacha20_prng_block(&test, (uint8_t *)&out); #ifdef CHACHA20_PRNG_DEBUG _dump(&out, "Second block"); #endif - for (i = 0; i < CHACHA20_NUM_WORDS; i++) { + for (i = 0; i < CHACHA20_BLOCK_SIZE_WORDS; i++) { if (out.i[i] != expected2.i[i]) return 0; } @@ -213,8 +217,8 @@ void _dump(struct cc20_state *cc, char *str) } #endif -/* Test vector from RFC, used as simple power-on self-test of ability to compute - * a block correctly. +/* Test vector from RFC7539, used as simple power-on self-test of + * ability to compute a block correctly. */ int chacha20_prng_self_test() { diff --git a/src/cc20rng/cc20_prng.h b/src/cc20rng/cc20_prng.h index 7b597d0..08f78d7 100644 --- a/src/cc20rng/cc20_prng.h +++ b/src/cc20rng/cc20_prng.h @@ -3,17 +3,22 @@ #include -#define CHACHA20_MAX_BLOCK_COUNTER 0xffffffff -#define CHACHA20_NUM_WORDS 16 -#define CHACHA20_BLOCK_SIZE (CHACHA20_NUM_WORDS * 4) +#define CHACHA20_BLOCK_SIZE_WORDS 16 +#define CHACHA20_BLOCK_SIZE (CHACHA20_BLOCK_SIZE_WORDS * 4) struct cc20_state { - uint32_t i[CHACHA20_NUM_WORDS]; + union { + struct { + uint32_t constant[4]; + uint32_t key[8]; + uint32_t counter; + uint32_t nonce[3]; + } s; + uint32_t i[CHACHA20_BLOCK_SIZE_WORDS]; + }; }; -extern void chacha20_prng_reseed(struct cc20_state *cc, uint32_t *entropy); -extern void chacha20_prng_block(struct cc20_state *cc, uint32_t block_counter, - struct cc20_state *out); -extern int chacha20_prng_self_test(); +void chacha20_prng_block(struct cc20_state *cc, uint8_t *out); +int chacha20_prng_self_test(); #endif /* __STM32_CHACHA20_H */ diff --git a/src/cc20rng/main.c b/src/cc20rng/main.c index 88b5144..cc063f4 100644 --- a/src/cc20rng/main.c +++ b/src/cc20rng/main.c @@ -1,5 +1,6 @@ /* * Copyright (c) 2014, 2015, 2016 NORDUnet A/S + * Copyright (c) 2019 Sunet * All rights reserved. * * Redistribution and use in source and binary forms, with or @@ -32,7 +33,6 @@ #include #include "cc20_prng.h" -#include "main.h" #include "stm_init.h" #define UART_RANDOM_BYTES_PER_CHUNK 8 @@ -79,13 +79,14 @@ static inline volatile uint32_t *get_DMA_read_buf(void); static inline uint32_t safe_get_counter(volatile uint32_t *dmabuf, const uint32_t dmabuf_idx); static void check_uart_rx(UART_HandleTypeDef *this); -static void cc_reseed(struct cc20_state *cc); +static uint32_t cc_reseed(struct cc20_state *cc); void Error_Handler(void); int main() { - uint32_t i, timeout, block_counter = 0; - struct cc20_state cc, out; + uint32_t i, timeout, block_counter; + struct cc20_state cc_state = {0}; + uint8_t cc_result[CHACHA20_BLOCK_SIZE]; HAL_StatusTypeDef res; /* Initialize buffers */ @@ -113,40 +114,32 @@ int main() { HAL_Delay(125); } - /* Generate initial block of random data directly into buf */ - cc_reseed(&cc); - block_counter = RESEED_BLOCKS; - chacha20_prng_block(&cc, block_counter--, (struct cc20_state *)buf.rnd32); + /* Generate initial block of ChaCha20 output directly into buf. */ + block_counter = cc_reseed(&cc_state); + chacha20_prng_block(&cc_state, buf.rnd); + block_counter--; /* Main loop */ while (1) { - if (!(block_counter % 1000)) { + if (!(block_counter % 1000)) HAL_GPIO_TogglePin(LED_PORT, LED_YELLOW); - } - - if (!block_counter) { - cc_reseed(&cc); - block_counter = RESEED_BLOCKS; - } /* Send buf on UART (non blocking interrupt driven send). */ UartReady = RESET; - res = HAL_UART_Transmit_IT(huart, &buf.rnd[0], CHACHA20_BLOCK_SIZE); + res = HAL_UART_Transmit_IT(huart, buf.rnd, CHACHA20_BLOCK_SIZE); /* Generate next block while this block is being transmitted */ - chacha20_prng_block(&cc, block_counter--, &out); - /* Copying using a loop is faster than memcpy on STM32 */ - for (i = 0; i < CHACHA20_NUM_WORDS; i++) { - buf.rnd32[i] = out.i[i]; - } + if (!block_counter) + block_counter = cc_reseed(&cc_state); + chacha20_prng_block(&cc_state, cc_result); + block_counter--; + /* Wait for transfer to complete. */ if (res == HAL_OK) { timeout = 0xffff; - while (UartReady != SET && timeout) { + while (UartReady != SET && timeout) timeout--; - } } - if (UartReady != SET) { /* Failed to send, turn on RED LED for one second */ HAL_GPIO_WritePin(LED_PORT, LED_RED, GPIO_PIN_SET); @@ -154,6 +147,10 @@ int main() { HAL_GPIO_WritePin(LED_PORT, LED_RED, GPIO_PIN_RESET); } + /* Fill buffer with ChaCha20 output. */ + for (i = 0; i < CHACHA20_BLOCK_SIZE; i++) + buf.rnd[i] = cc_result[i]; + /* Check for UART change request */ check_uart_rx(&huart1); check_uart_rx(&huart2); @@ -163,16 +160,18 @@ int main() { /** * @brief Reseed chacha20 state with hardware generated entropy. * @param cc: ChaCha20 state - * @retval None + * @retval ChaCha20 block counter */ -static void cc_reseed(struct cc20_state *cc) { +static uint32_t cc_reseed(struct cc20_state *cc) { HAL_GPIO_WritePin(LED_PORT, LED_BLUE, GPIO_PIN_SET); - get_entropy32(CHACHA20_BLOCK_SIZE / 4, 0); + get_entropy32(CHACHA20_BLOCK_SIZE_WORDS, 0); restart_DMA(); - chacha20_prng_reseed(cc, (uint32_t *)&buf); + memcpy(cc, buf.rnd, CHACHA20_BLOCK_SIZE); HAL_GPIO_WritePin(LED_PORT, LED_BLUE, GPIO_PIN_RESET); + + return cc->s.counter; } /** diff --git a/src/cc20rng/main.h b/src/cc20rng/main.h deleted file mode 100644 index 902ecf4..0000000 --- a/src/cc20rng/main.h +++ /dev/null @@ -1,4 +0,0 @@ -#ifndef __MAIN_H -#define __MAIN_H - -#endif /* __MAIN_H */ diff --git a/src/cc20rng/stm32f4xx_it.h b/src/cc20rng/stm32f4xx_it.h index 04c5a36..6edba3c 100644 --- a/src/cc20rng/stm32f4xx_it.h +++ b/src/cc20rng/stm32f4xx_it.h @@ -44,8 +44,6 @@ #endif /* Includes ------------------------------------------------------------------*/ -#include "main.h" - /* Exported types ------------------------------------------------------------*/ /* Exported constants --------------------------------------------------------*/ /* Exported macro ------------------------------------------------------------*/ From git at cryptech.is Sun Jan 26 09:37:09 2020 From: git at cryptech.is (git at cryptech.is) Date: Sun, 26 Jan 2020 09:37:09 +0000 Subject: [Cryptech-Commits] [user/ln5/stm32-avalanche-noise] 03/05: Merge branch 'ln/cc20rng-revamp' into ln/devel In-Reply-To: <158003142612.3121.15153406251089907045@bikeshed.cryptech.is> References: <158003142612.3121.15153406251089907045@bikeshed.cryptech.is> Message-ID: <20200126093707.E77598BAF0D@bikeshed.cryptech.is> This is an automated email from the git hooks/post-receive script. linus at nordberg.se pushed a commit to branch master in repository user/ln5/stm32-avalanche-noise. commit ee799425c42bbd46dddb506f9133091874a74175 Merge: e775954 76a6b63 Author: Linus Nordberg AuthorDate: Tue Jan 21 16:11:38 2020 +0100 Merge branch 'ln/cc20rng-revamp' into ln/devel src/cc20rng/.clang-format | 4 ++ src/cc20rng/Makefile | 4 ++ src/cc20rng/cc20_prng.c | 147 ++++++++++++++++++++++--------------------- src/cc20rng/cc20_prng.h | 20 +++--- src/cc20rng/main.c | 151 +++++++++++++++++++++------------------------ src/cc20rng/main.h | 4 -- src/cc20rng/stm32f4xx_it.h | 2 - 7 files changed, 167 insertions(+), 165 deletions(-) diff --cc src/cc20rng/Makefile index c52bc52,6c274be..bc5fce5 --- a/src/cc20rng/Makefile +++ b/src/cc20rng/Makefile @@@ -45,7 -29,12 +45,11 @@@ $(PROJ_NAME).elf: $(SRCS $(OBJDUMP) -St $(PROJ_NAME).elf >$(PROJ_NAME).lst $(SIZE) $(PROJ_NAME).elf + c-format: + clang-format-4.0 -i cc20_prng.[ch] + clang-format-4.0 -style="{IndentWidth: 2}" -i main.c + clean: - find ./ -name '*~' | xargs rm -f rm -f *.o rm -f $(PROJ_NAME).elf rm -f $(PROJ_NAME).hex From git at cryptech.is Sun Jan 26 09:37:10 2020 From: git at cryptech.is (git at cryptech.is) Date: Sun, 26 Jan 2020 09:37:10 +0000 Subject: [Cryptech-Commits] [user/ln5/stm32-avalanche-noise] 04/05: [entropy] Raise USB baud rate to match application 'cc20rng' In-Reply-To: <158003142612.3121.15153406251089907045@bikeshed.cryptech.is> References: <158003142612.3121.15153406251089907045@bikeshed.cryptech.is> Message-ID: <20200126093708.61E9B8BAF0F@bikeshed.cryptech.is> This is an automated email from the git hooks/post-receive script. linus at nordberg.se pushed a commit to branch master in repository user/ln5/stm32-avalanche-noise. commit abfbb7b7a3c1e14873c898082acd2b22db23e9a9 Author: Linus Nordberg AuthorDate: Thu Jan 23 13:09:53 2020 +0100 [entropy] Raise USB baud rate to match application 'cc20rng' Using different baud rates for the two otherwise compatible applications risk confusing users of both applications. An espeicially unlucky case is setting host side to 460800 while reading from a board set to 921600 -- the stream will look random while it's actually pretty bad. Raising 'entropy' rather than lowering 'cc20rng' is motivated by the fact that cc20rng saturates 460800 baud(*) with its ~75 kB/s output. (*) 460800 baud should be 57.6 kB/s or 51.2 kB/s, depending on whether or not the stop bit is counted towards the baud rate. I would suppose it is. --- src/entropy/stm_init.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/entropy/stm_init.c b/src/entropy/stm_init.c index 5f69760..fc8b84f 100644 --- a/src/entropy/stm_init.c +++ b/src/entropy/stm_init.c @@ -50,7 +50,7 @@ /* Private typedef -----------------------------------------------------------*/ /* Private define ------------------------------------------------------------*/ -#define UART1_BAUD_RATE 460800 +#define UART1_BAUD_RATE 921600 #define UART2_BAUD_RATE 115200 From git at cryptech.is Sun Jan 26 09:37:11 2020 From: git at cryptech.is (git at cryptech.is) Date: Sun, 26 Jan 2020 09:37:11 +0000 Subject: [Cryptech-Commits] [user/ln5/stm32-avalanche-noise] 05/05: Revert "Add version and application info to ELF file" In-Reply-To: <158003142612.3121.15153406251089907045@bikeshed.cryptech.is> References: <158003142612.3121.15153406251089907045@bikeshed.cryptech.is> Message-ID: <20200126093708.E6B4B8BAF11@bikeshed.cryptech.is> This is an automated email from the git hooks/post-receive script. linus at nordberg.se pushed a commit to branch master in repository user/ln5/stm32-avalanche-noise. commit be1d685c0d9d5535c181d0ab7220f621609abde4 Author: Linus Nordberg AuthorDate: Thu Jan 23 13:23:53 2020 +0100 Revert "Add version and application info to ELF file" Adding symbols to an ELF file doesn't solve the problem we're trying to solve. We want to be able to identify which application is running on a board when we don't know what ELF file has been loaded in flash. A checksum over the contens of the flash memory seems less dumb. This reverts commit 49d39287252bf0bd2b3fd75e86d07616b56c7fd2. --- src/cc20rng/Makefile | 18 +----------------- src/entropy/Makefile | 18 +----------------- 2 files changed, 2 insertions(+), 34 deletions(-) diff --git a/src/cc20rng/Makefile b/src/cc20rng/Makefile index bc5fce5..debf547 100644 --- a/src/cc20rng/Makefile +++ b/src/cc20rng/Makefile @@ -1,19 +1,3 @@ -# Application number and version info are included in the ELF file -# symbol table through two global, asbolute symbols. Use nm(1) on an -# ELF file to figure things out. Example: -# nm cc20rng.elf | awk '/VERSION/{print $1}' - -# Application number: -# - cc20rng: 01 -# - entropy: 02 -APPLICATION = 0x00000001 - -# Version info is four octects, from most significant to least: -# - Major version, two bytes: 00..FFFF -# - Minor version, one byte: 00..FF -# - Patch level, one byte: 00..FF -VERSION = 0x00010000 - # put your *.o targets here, make should handle the rest! SRCS = main.c stm_init.c system_stm32f4xx.c stm32f4xx_it.c stm32f4xx_hal_msp.c cc20_prng.c @@ -39,7 +23,7 @@ lib: proj: $(PROJ_NAME).elf $(PROJ_NAME).elf: $(SRCS) - $(CC) $(CFLAGS) $^ -o $@ -L$(STD_PERIPH_LIB) -lstmf4 -L$(LDSCRIPT_INC) -T$(MCU_LINKSCRIPT) -g -Wl,--defsym=APPLICATION=$(APPLICATION) -Wl,--defsym=VERSION=$(VERSION) + $(CC) $(CFLAGS) $^ -o $@ -L$(STD_PERIPH_LIB) -lstmf4 -L$(LDSCRIPT_INC) -T$(MCU_LINKSCRIPT) -g $(OBJCOPY) -O ihex $(PROJ_NAME).elf $(PROJ_NAME).hex $(OBJCOPY) -O binary $(PROJ_NAME).elf $(PROJ_NAME).bin $(OBJDUMP) -St $(PROJ_NAME).elf >$(PROJ_NAME).lst diff --git a/src/entropy/Makefile b/src/entropy/Makefile index 0285460..68723b8 100644 --- a/src/entropy/Makefile +++ b/src/entropy/Makefile @@ -1,19 +1,3 @@ -# Application number and version info are included in the ELF file -# symbol table through two global, asbolute symbols. Use nm(1) on an -# ELF file to figure things out. Example: -# nm entropy.elf | awk '/VERSION/{print $1}' - -# Application number: -# - cc20rng: 01 -# - entropy: 02 -APPLICATION = 0x00000002 - -# Version info is four octects, from most significant to least: -# - Major version, two bytes: 00..FFFF -# - Minor version, one byte: 00..FF -# - Patch level, one byte: 00..FF -VERSION = 0x00010000 - # put your *.o targets here, make should handle the rest! SRCS = main.c stm_init.c system_stm32f4xx.c stm32f4xx_it.c stm32f4xx_hal_msp.c @@ -39,7 +23,7 @@ lib: proj: $(PROJ_NAME).elf $(PROJ_NAME).elf: $(SRCS) - $(CC) $(CFLAGS) $^ -o $@ -L$(STD_PERIPH_LIB) -lstmf4 -L$(LDSCRIPT_INC) -T$(MCU_LINKSCRIPT) -g -Wl,--defsym=APPLICATION=$(APPLICATION) -Wl,--defsym=VERSION=$(VERSION) + $(CC) $(CFLAGS) $^ -o $@ -L$(STD_PERIPH_LIB) -lstmf4 -L$(LDSCRIPT_INC) -T$(MCU_LINKSCRIPT) -g $(OBJCOPY) -O ihex $(PROJ_NAME).elf $(PROJ_NAME).hex $(OBJCOPY) -O binary $(PROJ_NAME).elf $(PROJ_NAME).bin $(OBJDUMP) -St $(PROJ_NAME).elf >$(PROJ_NAME).lst From git at cryptech.is Sun Jan 26 09:37:27 2020 From: git at cryptech.is (git at cryptech.is) Date: Sun, 26 Jan 2020 09:37:27 +0000 Subject: [Cryptech-Commits] [user/ln5/stm32-avalanche-noise] annotated tag base-1.0.0 created (now 4e12187) Message-ID: <158003144777.3428.3465653588713231726@bikeshed.cryptech.is> This is an automated email from the git hooks/post-receive script. linus at nordberg.se pushed a change to annotated tag base-1.0.0 in repository user/ln5/stm32-avalanche-noise. at 4e12187 (tag) tagging be1d685c0d9d5535c181d0ab7220f621609abde4 (commit) by Linus Nordberg on Sun Jan 26 09:57:48 2020 +0100 - Log ----------------------------------------------------------------- base-1.0.0 -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEjEzVEQlemC6w77+iHovzSSMpEmUFAl4tVJkACgkQHovzSSMp EmWJ1xAAv8Y2tqY/m0WXWnIh+mkxwExGnjKx6t0Om5FEeVOfAAitQDOSfCvqZXTS 3ZopmV7TfOBSeujdfYj9XqAvBfeksA5ifZQjo1+rjRuloAsX1q0X7YL7t3dkxpTB ojbR4+nvdv8b4xRZqZeaHm6q+PJwqA3GlAUUMeWEbHYyp4DbfKA+1OFvHe6o8/Fn TNtVw67OpX6vSBOvQCM3zD9U1ypn1uqxLTWOSAT5B9S6nfCEQMY7vRLaS2oxYYt8 tsqI2ZKl5at0Iz/i9Gc5qSsM2wbVHKOHBFPCXNpBxgOFKbRazbjZjVps0X0pHY6u 54RKWamp8C8+fwjQ5wZhZQaVLIpKyWbQyfc3BRB9zziMBuskcKGvI1n8Bi0+PO+o o/5VplaM66Y+UyffJtJ1r/xK4l01T6zkbFUdObQ9IBjxfTlplZF0fv1WDn0Uf5wr dqJQBGkuLWlLb6kte1ArhKx4GAvWXgHFWJEexOKGhwiY1FASGNyikSROLdIHWczy Gh2Dp3KqHHrNHNsRpHKMNv8HYOdazS4bas2zvaeI4FSyv3y0gmz2ytQOC+E0NnaP +8oI00pxHQyuo5nW5j5dYgLGMU+GzSjW+KljjCKaLsVJQ/vM4+wcK8TdJtGUPwGY ct40XZlX2Obpp+JL+CS4WcsBy2YXJGM5cyd+xWELQwBNJkwZ7WM= =GqUZ -----END PGP SIGNATURE----- ----------------------------------------------------------------------- No new revisions were added by this update. From git at cryptech.is Sun Jan 26 09:37:28 2020 From: git at cryptech.is (git at cryptech.is) Date: Sun, 26 Jan 2020 09:37:28 +0000 Subject: [Cryptech-Commits] [user/ln5/stm32-avalanche-noise] annotated tag cc20rng-1.0.0 created (now c714954) Message-ID: <158003144779.3428.12048488907002538842@bikeshed.cryptech.is> This is an automated email from the git hooks/post-receive script. linus at nordberg.se pushed a change to annotated tag cc20rng-1.0.0 in repository user/ln5/stm32-avalanche-noise. at c714954 (tag) tagging be1d685c0d9d5535c181d0ab7220f621609abde4 (commit) by Linus Nordberg on Sun Jan 26 09:58:33 2020 +0100 - Log ----------------------------------------------------------------- cc20rng-1.0.0 -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEjEzVEQlemC6w77+iHovzSSMpEmUFAl4tVL4ACgkQHovzSSMp EmUGXw/7BcCiw44V18hmC1e4s8PNlf0ZOyYs7b+un5okR++QiGT7OntAl97YfKjq bxZmW3/8Km7ufM0nQQnLfXjxIISP/yYozjxuHsZicsg8qXTnOFPJfea3wVMhiB63 Qqch7IhrPJ4a6GHvh3XXYey1m9oJVOr3n6BXpbCTiL1x/vAPrMXVrDwzLSZMi2e9 vUNnmM+WSi/xo1Lf8jbu5ZbGKnl1wrrdKRdtxv+DTmw42MIhusmreTGNFXooF1j8 XV8KFumXCjG+FE/Wv9SmsL2CWOmPnrBWnwTWyRdQM5NjIIRjy/4QdNhzaMsyHLnJ uHy7f8T/X0gyiEQC3YO2tMA2bkjR15RjKReAt87/cfMuJ3xkFiABlTj2loWtix35 bzUf/1Zja+RGh9Li6sxSHC/IkYtPV7GM1XDzVo1AWsFEfDhrfFLBJK9SlomXnIF1 9tD1enKyXL9OPGbIQGi3EWfCwiX+DnIWvhnYO2r49iNVGQAiDixuVofkn0ahNUcy 4dVhaeJ6RnRhZJLYb9DU0vjQkSSHx85yJzl1LBGHkKSCzuRa4dlvv6bwGUnLIpZo /MwxEszr5KgDZLVxuLf03l2qKdsuTMfWS4nOg7OTTHI8CAtSyfFjkTpOcd6Gx9/x QB/HTGsamKMXo8sYFn9lfsxvGhWQCWID4XX2J8O98zsPZdKQXOQ= =tm6i -----END PGP SIGNATURE----- ----------------------------------------------------------------------- No new revisions were added by this update.