[Cryptech-Commits] [core/pkey/ecdhp256] 01/01: Initial commit of P-256 point multiplier suitable for ECDH.

git at cryptech.is git at cryptech.is
Tue Apr 17 12:06:41 UTC 2018


This is an automated email from the git hooks/post-receive script.

meisterpaul1 at yandex.ru pushed a commit to branch master
in repository core/pkey/ecdhp256.

commit 4e0581c98e289e79af09d95b747f9932a14c89fd
Author: Pavel V. Shatov (Meister) <meisterpaul1 at yandex.ru>
AuthorDate: Tue Apr 17 15:05:22 2018 +0300

    Initial commit of P-256 point multiplier suitable for ECDH.
---
 README.md                             |  98 ++++
 bench/tb_point_multiplier_256.v       | 341 ++++++++++++++
 rtl/curve/point_dbl_add_256.v         | 864 ++++++++++++++++++++++++++++++++++
 rtl/curve/point_mul_256.v             | 849 +++++++++++++++++++++++++++++++++
 rtl/ecdhp256.v                        | 192 ++++++++
 rtl/ecdhp256_wrapper.v                | 177 +++++++
 stm32_driver/ecdhp256_driver_sample.c | 261 ++++++++++
 7 files changed, 2782 insertions(+)

diff --git a/README.md b/README.md
new file mode 100644
index 0000000..755f822
--- /dev/null
+++ b/README.md
@@ -0,0 +1,98 @@
+# ecdhp256
+
+## Core Description
+
+This core implements the scalar point multiplier for ECDSA curve P-256. It can be used during generation of public keys, the core can also be used as part of the signing operation, it can also do ECDH key exchange.
+
+## API Specification
+
+The core interface is similar to other Cryptech cores. FMC memory map looks like the following:
+
+`0x0000 | NAME0`  
+`0x0004 | NAME1`  
+`0x0008 | VERSION`  
+
+`0x0020 | CONTROL`  
+`0x0024 | STATUS`  
+
+`0x0100 | K0`  
+`0x0104 | K1`  
+`...`  
+`0x011C | K7`  
+
+`0x0120 | XIN0`  
+`0x0124 | XIN1`  
+`...`  
+`0x013C | XIN7`
+  
+`0x0140 | YIN0`  
+`0x0144 | YIN1`  
+`...`  
+`0x015C | YIN7`  
+
+`0x0160 | XOUT0`  
+`0x0164 | XOUT1`  
+`...`  
+`0x017C | XOUT7`  
+
+`0x0180 | YOUT0`  
+`0x0184 | YOUT1`  
+`...`  
+`0x019C | YOUT7`  
+
+The core has the following registers:
+
+ * **NAME0**, **NAME1**
+Read-only core name ("ecdhp256").
+
+ * **VERSION**
+Read-only core version, currently "0.10".
+
+ * **CONTROL**
+Control register bits:
+[31:2] Don't care, always read as 0
+[1] "next" control bit
+[0] Don't care, always read as 0
+The core starts multiplication when the "next" control bit changes from 0 to 1. This way when the bit is set, the core will only perform one multiplication and then stop. To start another operation, the bit must be cleared at first and then set to 1 again.
+
+ * **STATUS**
+Read-only status register bits:
+[31:2] Don't care, always read as 0
+[1] "valid" control bit
+[0] "ready" control bit (always read as 1)
+The "valid" control bit is cleared as soon as the core starts operation, and gets set after the multiplication operations is complete. Note, that unlike some other Cryptech cores, this core doesn't need any special initialization, so the "ready" control bit is simply hardwired to always read as 1. This is to keep general core interface consistency.
+
+ * **K0**-**K7**
+Buffer for the 256-bit multiplication factor (multiplier) K. The core will compute R(XOUT, YOUT) = K * P(XIN, YIN). K0 is the least significant 32-bit word of K, i.e. bits [31:0], while K7 is the most significant 32-bit word of K, i.e. bits [255:224].
+
+ * **XIN0**-**XIN7**, **YIN0**-**YIN7**
+Writeable buffers for the 256-bit coordinates X and Y of the input multiplicand P(XIN, YIN). Values should be in affine coordinates. XIN0 and YIN0 contain the least significant 32-bit words, i.e. bits [31:0], while XIN7 and YIN7 contain the most significant 32-bit words, i.e. bits [255:224]. Fill the buffers with coordinates of the base point during public key generation and during multiplication by the per-message (random) number. Fill the buffers with coordinates of Bob's public key to [...]
+
+ * **XIN0**-**XIN7**, **YIN0**-**YIN7**
+Read-only buffers for the 256-bit coordinates X and Y of the product R(XOUT, YOUT). Values are returned in affine coordinates. XOUT0 and YOUT0 contain the least significant 32-bit words, i.e. bits [31:0], while XOUT7 and YOUT7 contain the most significant 32-bit words, i.e. bits [255:224].
+
+## Implementation Details
+
+The top-level core module contains block memory buffers for input and output operands and the base point multiplier, that reads from the input buffer and writes to the output buffers.
+
+The base point multiplier itself consists of the following:
+ * Buffers for storage of temporary values
+ * Configurable "worker" unit
+ * Microprograms for the worker unit
+ * Multi-word mover unit
+ * Modular inversion unit
+
+The "worker" unit can execute five basic operations:
+ * comparison
+ * copying
+ * modular addition
+ * modular subtraction
+ * modular multiplications
+
+There are two primary microprograms, that the worker runs: curve point doubling and addition of curve point to the base point. Those microprograms use projective Jacobian coordinates, so one more microprogram is used to convert the product into affine coordinates with the help of modular inversion unit.
+
+Note, that the core is supplemented by a reference model written in C, that has extensive comments describing tricky corners of the underlying math.
+
+## Vendor-specific Primitives
+
+Cryptech Alpha platform is based on Xilinx Artix-7 200T FPGA, so this core takes advantage of Xilinx-specific DSP slices to carry out math-intensive operations. All vendor-specific math primitives are placed under /rtl/lowlevel/artix7, the core also offers generic replacements under /rtl/lowlevel/generic, they can be used for simulation with 3rd party tools, that are not aware of Xilinx-specific stuff. Selection of vendor/generic primitives is done in ecdsa_lowlevel_settings.v, when port [...]
diff --git a/bench/tb_point_multiplier_256.v b/bench/tb_point_multiplier_256.v
new file mode 100644
index 0000000..3647a6a
--- /dev/null
+++ b/bench/tb_point_multiplier_256.v
@@ -0,0 +1,341 @@
+//------------------------------------------------------------------------------
+//
+// tb_point_multiplier_256.v
+// -----------------------------------------------------------------------------
+// Testbench for P-256 point scalar multiplier.
+//
+// Authors: Pavel Shatov
+//
+// Copyright (c) 2018, NORDUnet A/S
+//
+// Redistribution and use in source and binary forms, with or without
+// modification, are permitted provided that the following conditions are met:
+//
+// - Redistributions of source code must retain the above copyright notice,
+//   this list of conditions and the following disclaimer.
+//
+// - Redistributions in binary form must reproduce the above copyright notice,
+//   this list of conditions and the following disclaimer in the documentation
+//   and/or other materials provided with the distribution.
+//
+// - Neither the name of the NORDUnet nor the names of its contributors may be
+//   used to endorse or promote products derived from this software without
+//   specific prior written permission.
+//
+// THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
+// AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+// ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE
+// LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
+// CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
+// SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
+// INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
+// CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
+// ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+// POSSIBILITY OF SUCH DAMAGE.
+//
+//------------------------------------------------------------------------------
+
+//------------------------------------------------------------------------------
+`timescale 1ns / 1ps
+//------------------------------------------------------------------------------
+
+module tb_point_multiplier_256;
+
+
+		//
+		// Test Vectors
+		//
+	`include "../../../user/shatov/ecdh_fpga_model/test_vectors/ecdh_test_vectors.v"
+			
+		
+				
+		//
+		// Core Parameters
+		//
+	localparam	WORD_COUNTER_WIDTH	=  3;
+	localparam	OPERAND_NUM_WORDS		=  8;
+	
+
+		//
+		// Clock (100 MHz)
+		//
+	reg clk = 1'b0;
+	always #5 clk = ~clk;
+
+	
+		//
+		// Inputs, Outputs
+		//
+	reg	rst_n;
+	reg	ena;
+	wire	rdy;
+
+	
+		//
+		// Buffers (K, PX, PY, QX, QY)
+		//
+	wire	[WORD_COUNTER_WIDTH-1:0]	core_k_addr;
+	wire	[WORD_COUNTER_WIDTH-1:0]	core_px_addr;
+	wire	[WORD_COUNTER_WIDTH-1:0]	core_py_addr;
+	wire	[WORD_COUNTER_WIDTH-1:0]	core_qx_addr;
+	wire	[WORD_COUNTER_WIDTH-1:0]	core_qy_addr;
+	
+	wire										core_qx_wren;
+	wire										core_qy_wren;
+	
+	wire	[                32-1:0]	core_k_data;
+	wire	[                32-1:0]	core_px_data;
+	wire	[                32-1:0]	core_py_data;
+	wire	[                32-1:0]	core_qx_data;
+	wire	[                32-1:0]	core_qy_data;
+	
+	reg	[WORD_COUNTER_WIDTH-1:0]	tb_k_addr;
+	reg	[WORD_COUNTER_WIDTH-1:0]	tb_pxy_addr;
+	reg	[WORD_COUNTER_WIDTH-1:0]	tb_qxy_addr;
+	
+	reg										tb_k_wren;
+	reg										tb_pxy_wren;
+	
+	reg	[                  31:0]	tb_k_data;
+	reg	[                  31:0]	tb_px_data;
+	reg	[                  31:0]	tb_py_data;
+	wire	[                  31:0]	tb_qx_data;
+	wire	[                  31:0]	tb_qy_data;
+	
+	bram_1rw_1ro_readfirst # (.MEM_WIDTH(32), .MEM_ADDR_BITS(WORD_COUNTER_WIDTH))
+	bram_k
+	(	.clk(clk),
+		.a_addr(tb_k_addr), .a_wr(tb_k_wren), .a_in(tb_k_data), .a_out(),
+		.b_addr(core_k_addr), .b_out(core_k_data)
+	);
+	
+	bram_1rw_1ro_readfirst # (.MEM_WIDTH(32), .MEM_ADDR_BITS(WORD_COUNTER_WIDTH))
+	bram_px
+	(	.clk(clk),
+		.a_addr(tb_pxy_addr), .a_wr(tb_pxy_wren), .a_in(tb_px_data), .a_out(),
+		.b_addr(core_px_addr), .b_out(core_px_data)
+	);
+	
+	bram_1rw_1ro_readfirst # (.MEM_WIDTH(32), .MEM_ADDR_BITS(WORD_COUNTER_WIDTH))
+	bram_py
+	(	.clk(clk),
+		.a_addr(tb_pxy_addr), .a_wr(tb_pxy_wren), .a_in(tb_py_data), .a_out(),
+		.b_addr(core_py_addr), .b_out(core_py_data)
+	);
+
+	bram_1rw_1ro_readfirst # (.MEM_WIDTH(32), .MEM_ADDR_BITS(WORD_COUNTER_WIDTH))
+	bram_qx
+	(	.clk(clk),
+		.a_addr(core_qx_addr), .a_wr(core_qx_wren), .a_in(core_qx_data), .a_out(),
+		.b_addr(tb_qxy_addr), .b_out(tb_qx_data)
+	);
+	
+	bram_1rw_1ro_readfirst # (.MEM_WIDTH(32), .MEM_ADDR_BITS(WORD_COUNTER_WIDTH))
+	bram_qy
+	(	.clk(clk),
+		.a_addr(core_qy_addr), .a_wr(core_qy_wren), .a_in(core_qy_data), .a_out(),
+		.b_addr(tb_qxy_addr), .b_out(tb_qy_data)
+	);
+	
+	
+		//
+		// UUT
+		//
+	point_mul_256 uut
+	(
+		.clk			(clk),
+		.rst_n		(rst_n),
+		
+		.ena			(ena),
+		.rdy			(rdy),
+		
+		.k_addr		(core_k_addr),
+		.qx_addr		(core_px_addr),
+		.qy_addr		(core_py_addr),
+		.rx_addr		(core_qx_addr),
+		.ry_addr		(core_qy_addr),
+		
+		.rx_wren		(core_qx_wren),
+		.ry_wren		(core_qy_wren),
+		
+		.k_din		(core_k_data),
+		.qx_din		(core_px_data),
+		.qy_din		(core_py_data),
+		.rx_dout		(core_qx_data),
+		.ry_dout		(core_qy_data)
+	);
+		
+		
+		//
+		// Testbench Routine
+		//
+	reg ok = 1;
+	initial begin
+		
+			/* initialize control inputs */
+		rst_n		= 0;
+		ena		= 0;
+		
+			/* wait for some time */
+		#200;
+		
+			/* de-assert reset */
+		rst_n		= 1;
+		
+			/* wait for some time */
+		#100;		
+		
+			/* run tests */
+
+		$display(" 1. H = 2 * G...");
+		test_point_multiplier(256'd2, P_256_G_X, P_256_G_Y, P_256_H_X, P_256_H_Y);
+
+		$display(" 2. H = (n + 2) * G...");
+		test_point_multiplier(P_256_N + 256'd2, P_256_G_X, P_256_G_Y, P_256_H_X, P_256_H_Y);
+
+		$display(" 3. QA = dA * G...");
+		test_point_multiplier(P_256_DA, P_256_G_X, P_256_G_Y, P_256_QA_X, P_256_QA_Y);
+
+		$display(" 4. QB = dB * G...");
+		test_point_multiplier(P_256_DB, P_256_G_X, P_256_G_Y, P_256_QB_X, P_256_QB_Y);
+	
+		$display(" 5. S = dB * QA...");
+		test_point_multiplier(P_256_DB, P_256_QA_X, P_256_QA_Y, P_256_S_X, P_256_S_Y);
+
+		$display(" 6. S = dA * QB...");
+		test_point_multiplier(P_256_DA, P_256_QB_X, P_256_QB_Y, P_256_S_X, P_256_S_Y);
+		
+		$display(" 7. QA2 = 2 * QA...");
+		test_point_multiplier(256'd2, P_256_QA_X, P_256_QA_Y, P_256_QA2_X, P_256_QA2_Y);
+
+		$display(" 8. QA2 = (n + 2) * QA...");
+		test_point_multiplier(P_256_N + 256'd2, P_256_QA_X, P_256_QA_Y, P_256_QA2_X, P_256_QA2_Y);
+
+		$display(" 9. QB2 = 2 * QB...");
+		test_point_multiplier(256'd2, P_256_QB_X, P_256_QB_Y, P_256_QB2_X, P_256_QB2_Y);
+
+		$display("10. QB2 = (n + 2) * QB...");
+		test_point_multiplier(P_256_N + 256'd2, P_256_QB_X, P_256_QB_Y, P_256_QB2_X, P_256_QB2_Y);
+
+		
+			/* print result */
+		if (ok)	$display("tb_point_multiplier_256: SUCCESS");
+		else		$display("tb_point_multiplier_256: FAILURE");
+		//
+		//$finish;
+		//
+	end
+	
+	
+		//
+		// Test Task
+		//	
+	reg		q_ok;
+	
+	integer	w;
+
+	task test_point_multiplier;
+	
+		input	[255:0]	k;
+		input	[255:0]	px;
+		input	[255:0]	py;
+		input [255:0]	qx;
+		input	[255:0]	qy;
+		
+		reg	[255:0]	k_shreg;
+		reg	[255:0]	px_shreg;
+		reg	[255:0]	py_shreg;
+		reg	[255:0]	qx_shreg;
+		reg	[255:0]	qy_shreg;
+		
+		begin
+		
+				/* start filling memories */
+			tb_k_wren = 1;
+			tb_pxy_wren = 1;
+			
+				/* initialize shift registers */
+			k_shreg = k;
+			px_shreg = px;
+			py_shreg = py;
+			
+				/* write all the words */
+			for (w=0; w<OPERAND_NUM_WORDS; w=w+1) begin
+				
+					/* set addresses */
+				tb_k_addr = w[WORD_COUNTER_WIDTH-1:0];
+				tb_pxy_addr = w[WORD_COUNTER_WIDTH-1:0];
+				
+					/* set data words */
+				tb_k_data	= k_shreg[31:0];
+				tb_px_data	= px_shreg[31:0];
+				tb_py_data	= py_shreg[31:0];
+				
+					/* shift inputs */
+				k_shreg = {{32{1'bX}}, k_shreg[255:32]};
+				px_shreg = {{32{1'bX}}, px_shreg[255:32]};
+				py_shreg = {{32{1'bX}}, py_shreg[255:32]};
+				
+					/* wait for 1 clock tick */
+				#10;
+				
+			end
+			
+				/* wipe addresses */
+			tb_k_addr = {WORD_COUNTER_WIDTH{1'bX}};
+			tb_pxy_addr = {WORD_COUNTER_WIDTH{1'bX}};
+			
+				/* wipe data words */
+			tb_k_data = {32{1'bX}};
+			tb_px_data = {32{1'bX}};
+			tb_py_data = {32{1'bX}};
+			
+				/* stop filling memories */
+			tb_k_wren = 0;
+			tb_pxy_wren = 0;
+			
+				/* start operation */
+			ena = 1;
+			
+				/* clear flag */
+			#10 ena = 0;
+			
+				/* wait for operation to complete */
+			while (!rdy) #10;
+			
+				/* read result */
+			for (w=0; w<OPERAND_NUM_WORDS; w=w+1) begin
+				
+					/* set address */
+				tb_qxy_addr = w[WORD_COUNTER_WIDTH-1:0];
+				
+					/* wait for 1 clock tick */
+				#10;
+				
+					/* store data word */
+				qx_shreg = {tb_qx_data, qx_shreg[255:32]};
+				qy_shreg = {tb_qy_data, qy_shreg[255:32]};
+
+			end
+			
+				/* compare */
+			q_ok =	(qx_shreg == qx) &&
+						(qy_shreg == qy);
+
+				/* display results */
+			$display("test_point_multiplier(): %s", q_ok ? "OK" : "ERROR");
+			
+				/* update global flag */
+			ok = ok && q_ok;
+		
+		end
+		
+	endtask
+	
+endmodule
+
+
+//------------------------------------------------------------------------------
+// End-of-File
+//------------------------------------------------------------------------------
diff --git a/rtl/curve/point_dbl_add_256.v b/rtl/curve/point_dbl_add_256.v
new file mode 100644
index 0000000..56c88e7
--- /dev/null
+++ b/rtl/curve/point_dbl_add_256.v
@@ -0,0 +1,864 @@
+//------------------------------------------------------------------------------
+//
+// point_dbl_add_256.v
+// -----------------------------------------------------------------------------
+// Elliptic curve point adder and doubler.
+//
+// Authors: Pavel Shatov
+//
+// Copyright (c) 2018, NORDUnet A/S
+//
+// Redistribution and use in source and binary forms, with or without
+// modification, are permitted provided that the following conditions are met:
+//
+// - Redistributions of source code must retain the above copyright notice,
+//   this list of conditions and the following disclaimer.
+//
+// - Redistributions in binary form must reproduce the above copyright notice,
+//   this list of conditions and the following disclaimer in the documentation
+//   and/or other materials provided with the distribution.
+//
+// - Neither the name of the NORDUnet nor the names of its contributors may be
+//   used to endorse or promote products derived from this software without
+//   specific prior written permission.
+//
+// THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
+// AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+// ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE
+// LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
+// CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
+// SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
+// INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
+// CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
+// ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+// POSSIBILITY OF SUCH DAMAGE.
+//
+//------------------------------------------------------------------------------
+
+module point_dbl_add_256
+  (
+   clk, rst_n,
+   ena, rdy,
+   uop_addr, uop,
+	gx_addr, gy_addr, hx_addr, hy_addr,
+   px_addr, py_addr, pz_addr, rx_addr, ry_addr, rz_addr, q_addr, v_addr,
+   rx_wren, ry_wren, rz_wren,
+	gx_din, gy_din, hx_din, hy_din,
+   px_din, py_din, pz_din,
+   rx_din, ry_din, rz_din,
+   rx_dout, ry_dout, rz_dout, q_din, v_din
+   );
+
+
+   //
+   // Microcode
+   //
+`include "../../../../math/ecdsalib/rtl/curve/uop_ecdsa.v"
+
+
+   //
+   // Constants
+   //
+   localparam	WORD_COUNTER_WIDTH	= 3;	// 0 .. 7
+   localparam	OPERAND_NUM_WORDS	= 8;	// 8 * 32 = 256
+
+
+   //
+   // Ports
+   //
+   input	wire		clk;		// system clock
+   input	wire		rst_n;		// active-low async reset
+
+   input	wire		ena;		// enable input
+   output	wire 		rdy;		// ready output
+
+   output	reg [ 6-1: 0] 	uop_addr;
+   input	wire [20-1: 0]	uop;
+
+   output	reg [WORD_COUNTER_WIDTH-1:0] gx_addr;
+   output	reg [WORD_COUNTER_WIDTH-1:0] gy_addr;
+   output	reg [WORD_COUNTER_WIDTH-1:0] hx_addr;
+   output	reg [WORD_COUNTER_WIDTH-1:0] hy_addr;
+   output	reg [WORD_COUNTER_WIDTH-1:0] px_addr;
+   output	reg [WORD_COUNTER_WIDTH-1:0] py_addr;
+   output	reg [WORD_COUNTER_WIDTH-1:0] pz_addr;
+   output	reg [WORD_COUNTER_WIDTH-1:0] rx_addr;
+   output	reg [WORD_COUNTER_WIDTH-1:0] ry_addr;
+   output	reg [WORD_COUNTER_WIDTH-1:0] rz_addr;
+   output	reg [WORD_COUNTER_WIDTH-1:0] v_addr;
+   output	wire [WORD_COUNTER_WIDTH-1:0] q_addr;
+
+   output	wire 			      rx_wren;
+   output	wire 			      ry_wren;
+   output	wire 			      rz_wren;
+
+   input		wire [                32-1:0] gx_din;
+   input		wire [                32-1:0] gy_din;
+   input		wire [                32-1:0] hx_din;
+   input		wire [                32-1:0] hy_din;
+   input		wire [                32-1:0] px_din;
+   input		wire [                32-1:0] py_din;
+   input		wire [                32-1:0] pz_din;
+   input		wire [                32-1:0] rx_din;
+   input		wire [                32-1:0] ry_din;
+   input		wire [                32-1:0] rz_din;
+   output	wire [                32-1:0] 	      rx_dout;
+   output	wire [                32-1:0] 	      ry_dout;
+   output	wire [                32-1:0] 	      rz_dout;
+   input		wire [                32-1:0] q_din;
+   input		wire [                32-1:0] v_din;
+
+
+   //
+   // Microcode
+   //
+   wire [ 4: 0] 				      uop_opcode	= uop[19:15];
+   wire [ 4: 0] 				      uop_src_a	= uop[14:10];
+   wire [ 4: 0] 				      uop_src_b	= uop[ 9: 5];
+   wire [ 2: 0] 				      uop_dst		= uop[ 4: 2];
+   wire [ 1: 0] 				      uop_exec		= uop[ 1: 0];
+
+
+   //
+   // Multi-Word Comparator
+   //
+   wire 					      mw_cmp_ena;
+   wire 					      mw_cmp_rdy;
+
+   wire 					      mw_cmp_out_l;
+   wire 					      mw_cmp_out_e;
+   wire 					      mw_cmp_out_g;
+
+   wire [WORD_COUNTER_WIDTH-1:0] 		      mw_cmp_addr_xy;
+
+   wire [                32-1:0] 		      mw_cmp_din_x;
+   wire [                32-1:0] 		      mw_cmp_din_y;
+
+   // flags
+   reg 						      flag_pz_is_zero;
+   reg 						      flag_t1_is_zero;
+   reg 						      flag_t2_is_zero;
+
+   mw_comparator #
+     (
+      .WORD_COUNTER_WIDTH	(WORD_COUNTER_WIDTH),
+      .OPERAND_NUM_WORDS	(OPERAND_NUM_WORDS)
+      )
+   mw_comparator_inst
+     (
+      .clk			(clk),
+      .rst_n		(rst_n),
+
+      .ena			(mw_cmp_ena),
+      .rdy			(mw_cmp_rdy),
+
+      .xy_addr		(mw_cmp_addr_xy),
+      .x_din		(mw_cmp_din_x),
+      .y_din		(mw_cmp_din_y),
+
+      .cmp_l		(mw_cmp_out_l),
+      .cmp_e		(mw_cmp_out_e),
+      .cmp_g		(mw_cmp_out_g)
+      );
+
+
+   //
+   // Modular Adder
+   //
+   wire 					      mod_add_ena;
+   wire 					      mod_add_rdy;
+
+   wire [WORD_COUNTER_WIDTH-1:0] 		      mod_add_addr_ab;
+   wire [WORD_COUNTER_WIDTH-1:0] 		      mod_add_addr_n;
+   wire [WORD_COUNTER_WIDTH-1:0] 		      mod_add_addr_s;
+   wire 					      mod_add_wren_s;
+
+   wire [                32-1:0] 		      mod_add_din_a;
+   wire [                32-1:0] 		      mod_add_din_b;
+   wire [                32-1:0] 		      mod_add_din_n;
+   wire [                32-1:0] 		      mod_add_dout_s;
+
+   assign mod_add_din_n = q_din;
+
+   modular_adder #
+     (
+      .WORD_COUNTER_WIDTH	(WORD_COUNTER_WIDTH),
+      .OPERAND_NUM_WORDS	(OPERAND_NUM_WORDS)
+      )
+   modular_adder_inst
+     (
+      .clk			(clk),
+      .rst_n		(rst_n),
+
+      .ena			(mod_add_ena),
+      .rdy			(mod_add_rdy),
+
+      .ab_addr		(mod_add_addr_ab),
+      .n_addr		(mod_add_addr_n),
+      .s_addr		(mod_add_addr_s),
+      .s_wren		(mod_add_wren_s),
+
+      .a_din		(mod_add_din_a),
+      .b_din		(mod_add_din_b),
+      .n_din		(mod_add_din_n),
+      .s_dout		(mod_add_dout_s)
+      );
+
+
+   //
+   // Modular Subtractor
+   //
+   wire 					      mod_sub_ena;
+   wire 					      mod_sub_rdy;
+
+   wire [WORD_COUNTER_WIDTH-1:0] 		      mod_sub_addr_ab;
+   wire [WORD_COUNTER_WIDTH-1:0] 		      mod_sub_addr_n;
+   wire [WORD_COUNTER_WIDTH-1:0] 		      mod_sub_addr_d;
+   wire 					      mod_sub_wren_d;
+
+   wire [                32-1:0] 		      mod_sub_din_a;
+   wire [                32-1:0] 		      mod_sub_din_b;
+   wire [                32-1:0] 		      mod_sub_din_n;
+   wire [                32-1:0] 		      mod_sub_dout_d;
+
+   assign mod_sub_din_n = q_din;
+
+   modular_subtractor #
+     (
+      .WORD_COUNTER_WIDTH	(WORD_COUNTER_WIDTH),
+      .OPERAND_NUM_WORDS	(OPERAND_NUM_WORDS)
+      )
+   modular_subtractor_inst
+     (
+      .clk			(clk),
+      .rst_n		(rst_n),
+
+      .ena			(mod_sub_ena),
+      .rdy			(mod_sub_rdy),
+
+      .ab_addr		(mod_sub_addr_ab),
+      .n_addr		(mod_sub_addr_n),
+      .d_addr		(mod_sub_addr_d),
+      .d_wren		(mod_sub_wren_d),
+
+      .a_din		(mod_sub_din_a),
+      .b_din		(mod_sub_din_b),
+      .n_din		(mod_sub_din_n),
+      .d_dout		(mod_sub_dout_d)
+      );
+
+
+   //
+   // Modular Multiplier
+   //
+   wire 					      mod_mul_ena;
+   wire 					      mod_mul_rdy;
+
+   wire [WORD_COUNTER_WIDTH-1:0] 		      mod_mul_addr_a;
+   wire [WORD_COUNTER_WIDTH-1:0] 		      mod_mul_addr_b;
+   wire [WORD_COUNTER_WIDTH-1:0] 		      mod_mul_addr_n;
+   wire [WORD_COUNTER_WIDTH-1:0] 		      mod_mul_addr_p;
+   wire 					      mod_mul_wren_p;
+
+   wire [                32-1:0] 		      mod_mul_din_a;
+   wire [                32-1:0] 		      mod_mul_din_b;
+   wire [                32-1:0] 		      mod_mul_din_n;
+   wire [                32-1:0] 		      mod_mul_dout_p;
+
+   assign mod_mul_din_n = q_din;
+
+   modular_multiplier_256 modular_multiplier_inst
+     (
+      .clk		(clk),
+      .rst_n	(rst_n),
+
+      .ena		(mod_mul_ena),
+      .rdy		(mod_mul_rdy),
+
+      .a_addr	(mod_mul_addr_a),
+      .b_addr	(mod_mul_addr_b),
+      .n_addr	(mod_mul_addr_n),
+      .p_addr	(mod_mul_addr_p),
+      .p_wren	(mod_mul_wren_p),
+
+      .a_din	(mod_mul_din_a),
+      .b_din	(mod_mul_din_b),
+      .n_din	(mod_mul_din_n),
+      .p_dout	(mod_mul_dout_p)
+      );
+
+
+   //
+   // Multi-Word Data Mover
+   //
+   wire 					      mw_mov_ena;
+   wire 					      mw_mov_rdy;
+
+   wire [WORD_COUNTER_WIDTH-1:0] 		      mw_mov_addr_x;
+   wire [WORD_COUNTER_WIDTH-1:0] 		      mw_mov_addr_y;
+   wire 					      mw_mov_wren_y;
+
+   wire [                32-1:0] 		      mw_mov_din_x;
+   wire [                32-1:0] 		      mw_mov_dout_y;
+
+   mw_mover #
+     (
+      .WORD_COUNTER_WIDTH	(WORD_COUNTER_WIDTH),
+      .OPERAND_NUM_WORDS	(OPERAND_NUM_WORDS)
+
+      )
+   mw_mover_inst
+     (
+      .clk		(clk),
+      .rst_n	(rst_n),
+
+      .ena		(mw_mov_ena),
+      .rdy		(mw_mov_rdy),
+
+      .x_addr	(mw_mov_addr_x),
+      .y_addr	(mw_mov_addr_y),
+      .y_wren	(mw_mov_wren_y),
+
+      .x_din	(mw_mov_din_x),
+      .y_dout	(mw_mov_dout_y)
+      );
+
+
+   //
+   // ROMs
+   //
+   reg [WORD_COUNTER_WIDTH-1:0] 		      brom_one_addr;
+   //reg	[WORD_COUNTER_WIDTH-1:0]	brom_zero_addr;
+   reg [WORD_COUNTER_WIDTH-1:0] 		      brom_delta_addr;
+
+   wire [                32-1:0] 		      brom_one_dout;
+   wire [                32-1:0] 		      brom_zero_dout;
+   wire [                32-1:0] 		      brom_delta_dout;
+
+   (* ROM_STYLE="BLOCK" *) brom_p256_one brom_one_inst
+     (.clk(clk), .b_addr(brom_one_addr), .b_out(brom_one_dout));
+
+   brom_p256_zero brom_zero_inst
+     (.b_out(brom_zero_dout));
+
+   (* ROM_STYLE="BLOCK" *) brom_p256_delta brom_delta_inst
+     (.clk(clk), .b_addr(brom_delta_addr), .b_out(brom_delta_dout));
+
+
+   //
+   // Temporary Variables
+   //
+   reg [WORD_COUNTER_WIDTH-1:0] 		      bram_t1_wr_addr;
+   reg [WORD_COUNTER_WIDTH-1:0] 		      bram_t2_wr_addr;
+   reg [WORD_COUNTER_WIDTH-1:0] 		      bram_t3_wr_addr;
+   reg [WORD_COUNTER_WIDTH-1:0] 		      bram_t4_wr_addr;
+
+   reg [WORD_COUNTER_WIDTH-1:0] 		      bram_t1_rd_addr;
+   reg [WORD_COUNTER_WIDTH-1:0] 		      bram_t2_rd_addr;
+   reg [WORD_COUNTER_WIDTH-1:0] 		      bram_t3_rd_addr;
+   reg [WORD_COUNTER_WIDTH-1:0] 		      bram_t4_rd_addr;
+
+   wire 					      bram_t1_wr_en;
+   wire 					      bram_t2_wr_en;
+   wire 					      bram_t3_wr_en;
+   wire 					      bram_t4_wr_en;
+
+   wire [                32-1:0] 		      bram_t1_wr_data;
+   wire [                32-1:0] 		      bram_t2_wr_data;
+   wire [                32-1:0] 		      bram_t3_wr_data;
+   wire [                32-1:0] 		      bram_t4_wr_data;
+
+   wire [                32-1:0] 		      bram_t1_rd_data;
+   wire [                32-1:0] 		      bram_t2_rd_data;
+   wire [                32-1:0] 		      bram_t3_rd_data;
+   wire [                32-1:0] 		      bram_t4_rd_data;
+
+   bram_1rw_1ro_readfirst #
+     (	.MEM_WIDTH(32), .MEM_ADDR_BITS(WORD_COUNTER_WIDTH)
+	)
+   bram_t1
+     (	.clk		(clk),
+	.a_addr(bram_t1_wr_addr), .a_wr(bram_t1_wr_en), .a_in(bram_t1_wr_data), .a_out(),
+	.b_addr(bram_t1_rd_addr),                                               .b_out(bram_t1_rd_data)
+	);
+
+   bram_1rw_1ro_readfirst #
+     (	.MEM_WIDTH(32), .MEM_ADDR_BITS(WORD_COUNTER_WIDTH)
+	)
+   bram_t2
+     (	.clk		(clk),
+	.a_addr(bram_t2_wr_addr), .a_wr(bram_t2_wr_en), .a_in(bram_t2_wr_data), .a_out(),
+	.b_addr(bram_t2_rd_addr),                                               .b_out(bram_t2_rd_data)
+	);
+
+   bram_1rw_1ro_readfirst #
+     (	.MEM_WIDTH(32), .MEM_ADDR_BITS(WORD_COUNTER_WIDTH)
+	)
+   bram_t3
+     (	.clk		(clk),
+	.a_addr(bram_t3_wr_addr), .a_wr(bram_t3_wr_en), .a_in(bram_t3_wr_data), .a_out(),
+	.b_addr(bram_t3_rd_addr),                                               .b_out(bram_t3_rd_data)
+	);
+
+   bram_1rw_1ro_readfirst #
+     (	.MEM_WIDTH(32), .MEM_ADDR_BITS(WORD_COUNTER_WIDTH)
+	)
+   bram_t4
+     (	.clk		(clk),
+	.a_addr(bram_t4_wr_addr), .a_wr(bram_t4_wr_en), .a_in(bram_t4_wr_data), .a_out(),
+	.b_addr(bram_t4_rd_addr),                                               .b_out(bram_t4_rd_data)
+	);
+
+
+   //
+   // uOP Trigger Logic
+   //
+   reg 						      uop_trig;
+   always @(posedge clk or negedge rst_n)
+     //
+     if (rst_n == 1'b0)	uop_trig <= 1'b0;
+     else						uop_trig <= (fsm_state == FSM_STATE_FETCH) ? 1'b1 : 1'b0;
+
+
+   //
+   // FSM
+   //
+   localparam	[ 1: 0]	FSM_STATE_STALL	= 2'b00;
+   localparam	[ 1: 0]	FSM_STATE_FETCH	= 2'b01;
+   localparam	[ 1: 0]	FSM_STATE_EXECUTE	= 2'b10;
+
+   reg [ 1: 0] 					      fsm_state		= FSM_STATE_STALL;
+   wire [ 1: 0] 				      fsm_state_next	= (uop_opcode == OPCODE_RDY) ? FSM_STATE_STALL : FSM_STATE_FETCH;
+
+
+   //
+   // FSM Transition Logic
+   //
+   reg 						      uop_done;
+
+   always @(posedge clk or negedge rst_n)
+     //
+     if (rst_n == 1'b0)		fsm_state <= FSM_STATE_STALL;
+     else case (fsm_state)
+	    FSM_STATE_STALL:		fsm_state <= ena ? FSM_STATE_FETCH : FSM_STATE_STALL;
+	    FSM_STATE_FETCH:		fsm_state <= FSM_STATE_EXECUTE;
+	    FSM_STATE_EXECUTE:	fsm_state <= (!uop_trig && uop_done) ? fsm_state_next : FSM_STATE_EXECUTE;
+	    default:					fsm_state <= FSM_STATE_STALL;
+	  endcase
+
+
+   //
+   // uOP Address Increment Logic
+   //
+   always @(posedge clk)
+     //
+     if (fsm_state == FSM_STATE_STALL)
+       uop_addr <= 5'd0;
+     else if (fsm_state == FSM_STATE_EXECUTE)
+       if (!uop_trig && uop_done)
+	 uop_addr <= (uop_opcode == OPCODE_RDY) ? 5'd0 : uop_addr + 1'b1;
+
+
+   //
+   // uOP Completion Logic
+   //
+   always @(*)
+   //
+   case (uop_opcode)
+     OPCODE_CMP:	uop_done = mw_cmp_rdy;
+     OPCODE_MOV:	uop_done = mw_mov_rdy;
+     OPCODE_ADD:	uop_done = mod_add_rdy;
+     OPCODE_SUB:	uop_done = mod_sub_rdy;
+     OPCODE_MUL:	uop_done = mod_mul_rdy;
+     OPCODE_RDY:	uop_done = 1'b1;
+     default:		uop_done = 1'b0;
+   endcase
+
+
+   //
+   // Helper Modules Enable Logic
+   //
+   assign mw_cmp_ena		= uop_opcode[0] & uop_trig;
+   assign mw_mov_ena		= uop_opcode[1] & uop_trig;
+   assign mod_add_ena	= uop_opcode[2] & uop_trig;
+   assign mod_sub_ena	= uop_opcode[3] & uop_trig;
+   assign mod_mul_ena	= uop_opcode[4] & uop_trig;
+
+
+   //
+   // uOP Source Value Decoding Logic
+   //
+   reg [31: 0] 					      uop_src_a_value;
+
+   always @(*)
+				   //
+     case (uop_src_a)
+       UOP_SRC_PX:		uop_src_a_value = px_din;
+       UOP_SRC_PY:		uop_src_a_value = py_din;
+       UOP_SRC_PZ:		uop_src_a_value = pz_din;
+
+       UOP_SRC_RX:		uop_src_a_value = rx_din;
+       UOP_SRC_RY:		uop_src_a_value = ry_din;
+       UOP_SRC_RZ:		uop_src_a_value = rz_din;
+
+       UOP_SRC_T1:		uop_src_a_value = bram_t1_rd_data;
+       UOP_SRC_T2:		uop_src_a_value = bram_t2_rd_data;
+       UOP_SRC_T3:		uop_src_a_value = bram_t3_rd_data;
+       UOP_SRC_T4:		uop_src_a_value = bram_t4_rd_data;
+
+       UOP_SRC_ONE:	uop_src_a_value = brom_one_dout;
+       UOP_SRC_ZERO:	uop_src_a_value = brom_zero_dout;
+       UOP_SRC_DELTA:	uop_src_a_value = brom_delta_dout;
+
+       UOP_SRC_G_X:	uop_src_a_value = gx_din;
+       UOP_SRC_G_Y:	uop_src_a_value = gy_din;
+
+       UOP_SRC_H_X:	uop_src_a_value = hx_din;
+       UOP_SRC_H_Y:	uop_src_a_value = hy_din;
+
+       UOP_SRC_V:		uop_src_a_value = v_din;
+
+       default:			uop_src_a_value = {32{1'bX}};
+     endcase
+
+
+   assign mw_cmp_din_x  = uop_src_a_value;
+   assign mw_mov_din_x  = uop_src_a_value;
+   assign mod_add_din_a = uop_src_a_value;
+   assign mod_sub_din_a = uop_src_a_value;
+   assign mod_mul_din_a = uop_src_a_value;
+
+   reg [31: 0] 					      uop_src_b_value;
+
+   always @(*)
+						  //
+     case (uop_src_b)
+       UOP_SRC_PX:		uop_src_b_value = px_din;
+       UOP_SRC_PY:		uop_src_b_value = py_din;
+       UOP_SRC_PZ:		uop_src_b_value = pz_din;
+
+       UOP_SRC_RX:		uop_src_b_value = rx_din;
+       UOP_SRC_RY:		uop_src_b_value = ry_din;
+       UOP_SRC_RZ:		uop_src_b_value = rz_din;
+
+       UOP_SRC_T1:		uop_src_b_value = bram_t1_rd_data;
+       UOP_SRC_T2:		uop_src_b_value = bram_t2_rd_data;
+       UOP_SRC_T3:		uop_src_b_value = bram_t3_rd_data;
+       UOP_SRC_T4:		uop_src_b_value = bram_t4_rd_data;
+
+       UOP_SRC_ONE:	uop_src_b_value = brom_one_dout;
+       UOP_SRC_ZERO:	uop_src_b_value = brom_zero_dout;
+       UOP_SRC_DELTA:	uop_src_b_value = brom_delta_dout;
+
+       UOP_SRC_G_X:	uop_src_b_value = gx_din;
+       UOP_SRC_G_Y:	uop_src_b_value = gy_din;
+
+       UOP_SRC_H_X:	uop_src_b_value = hx_din;
+       UOP_SRC_H_Y:	uop_src_b_value = hy_din;
+
+       UOP_SRC_V:		uop_src_b_value = v_din;
+
+       default:			uop_src_b_value = {32{1'bX}};
+     endcase
+
+   assign mw_cmp_din_y  = uop_src_b_value;
+   assign mod_add_din_b = uop_src_b_value;
+   assign mod_sub_din_b = uop_src_b_value;
+   assign mod_mul_din_b = uop_src_b_value;
+
+
+   //
+   // uOP Source & Destination Address Decoding Logic
+   //
+   reg [WORD_COUNTER_WIDTH-1:0] 		      uop_src_a_addr;
+   reg [WORD_COUNTER_WIDTH-1:0] 		      uop_src_b_addr;
+   reg [WORD_COUNTER_WIDTH-1:0] 		      uop_dst_addr;
+   reg [WORD_COUNTER_WIDTH-1:0] 		      uop_q_addr;
+
+   assign q_addr = uop_q_addr;
+
+   always @(*)
+						  //
+     case (uop_opcode)
+       //
+       OPCODE_CMP:	begin
+	  uop_src_a_addr = mw_cmp_addr_xy;
+	  uop_src_b_addr = mw_cmp_addr_xy;
+	  uop_dst_addr	= {WORD_COUNTER_WIDTH{1'bX}};
+	  uop_q_addr		= {WORD_COUNTER_WIDTH{1'bX}};
+       end
+       //
+       OPCODE_MOV:	begin
+	  uop_src_a_addr = mw_mov_addr_x;
+	  uop_src_b_addr = {WORD_COUNTER_WIDTH{1'bX}};
+	  uop_dst_addr	= mw_mov_addr_y;
+	  uop_q_addr		= {WORD_COUNTER_WIDTH{1'bX}};
+       end
+       //
+       OPCODE_ADD:	begin
+	  uop_src_a_addr = mod_add_addr_ab;
+	  uop_src_b_addr = mod_add_addr_ab;
+	  uop_dst_addr	= mod_add_addr_s;
+	  uop_q_addr		= mod_add_addr_n;
+       end
+       //
+       OPCODE_SUB:	begin
+	  uop_src_a_addr = mod_sub_addr_ab;
+	  uop_src_b_addr = mod_sub_addr_ab;
+	  uop_dst_addr	= mod_sub_addr_d;
+	  uop_q_addr		= mod_sub_addr_n;
+       end
+       //
+       OPCODE_MUL:	begin
+	  uop_src_a_addr = mod_mul_addr_a;
+	  uop_src_b_addr = mod_mul_addr_b;
+	  uop_dst_addr	= mod_mul_addr_p;
+	  uop_q_addr		= mod_mul_addr_n;
+       end
+       //
+       default: begin
+	  uop_src_a_addr = {WORD_COUNTER_WIDTH{1'bX}};
+	  uop_src_b_addr = {WORD_COUNTER_WIDTH{1'bX}};
+	  uop_dst_addr	= {WORD_COUNTER_WIDTH{1'bX}};
+	  uop_q_addr		= {WORD_COUNTER_WIDTH{1'bX}};
+       end
+       //
+     endcase
+
+
+   //
+   // uOP Conditional Execution Logic
+   //
+   reg	uop_exec_effective;
+
+   always @(*)
+			   //
+     case (uop_exec)
+       UOP_EXEC_ALWAYS:		uop_exec_effective = 1'b1;
+       UOP_EXEC_PZT1T2_0XX:	uop_exec_effective =  flag_pz_is_zero;
+       UOP_EXEC_PZT1T2_100:	uop_exec_effective = !flag_pz_is_zero && flag_t1_is_zero &&  flag_t2_is_zero;
+       UOP_EXEC_PZT1T2_101:	uop_exec_effective = !flag_pz_is_zero && flag_t1_is_zero && !flag_t2_is_zero;
+     endcase
+
+
+   //
+   // uOP Destination Store Logic
+   //
+   reg	uop_dst_wren;
+
+   always @(*)
+						     //
+     case (uop_opcode)
+       //
+       OPCODE_MOV:	uop_dst_wren = mw_mov_wren_y & uop_exec_effective;
+       OPCODE_ADD:	uop_dst_wren = mod_add_wren_s;
+       OPCODE_SUB:	uop_dst_wren = mod_sub_wren_d;
+       OPCODE_MUL:	uop_dst_wren = mod_mul_wren_p;
+       default:		uop_dst_wren = 1'b0;
+       //
+     endcase
+
+
+   always @(*) begin
+      //
+      //
+      //
+      if      (uop_src_a == UOP_SRC_PX) px_addr = uop_src_a_addr;
+      else if (uop_src_b == UOP_SRC_PX) px_addr = uop_src_b_addr;
+      else                              px_addr = {WORD_COUNTER_WIDTH{1'bX}};
+      //
+      if      (uop_src_a == UOP_SRC_PY) py_addr = uop_src_a_addr;
+      else if (uop_src_b == UOP_SRC_PY) py_addr = uop_src_b_addr;
+      else                              py_addr = {WORD_COUNTER_WIDTH{1'bX}};
+      //
+      if      (uop_src_a == UOP_SRC_PZ) pz_addr = uop_src_a_addr;
+      else if (uop_src_b == UOP_SRC_PZ) pz_addr = uop_src_b_addr;
+      else                              pz_addr = {WORD_COUNTER_WIDTH{1'bX}};
+      //
+      //
+      //
+      if      (uop_src_a == UOP_SRC_ONE)   brom_one_addr = uop_src_a_addr;
+      else if (uop_src_b == UOP_SRC_ONE)   brom_one_addr = uop_src_b_addr;
+      else                                 brom_one_addr = {WORD_COUNTER_WIDTH{1'bX}};
+      //
+      //if      (uop_src_a == UOP_SRC_ZERO)  brom_zero_addr = uop_src_a_addr;
+      //else if (uop_src_b == UOP_SRC_ZERO)  brom_zero_addr = uop_src_b_addr;
+      //else                                 brom_zero_addr = {WORD_COUNTER_WIDTH{1'bX}};
+      //
+      if      (uop_src_a == UOP_SRC_DELTA) brom_delta_addr = uop_src_a_addr;
+      else if (uop_src_b == UOP_SRC_DELTA) brom_delta_addr = uop_src_b_addr;
+      else                                 brom_delta_addr = {WORD_COUNTER_WIDTH{1'bX}};
+      //
+      //
+      //
+      if      (uop_src_a == UOP_SRC_G_X) gx_addr = uop_src_a_addr;
+      else if (uop_src_b == UOP_SRC_G_X) gx_addr = uop_src_b_addr;
+      else                               gx_addr = {WORD_COUNTER_WIDTH{1'bX}};
+      //
+      if      (uop_src_a == UOP_SRC_G_Y) gy_addr = uop_src_a_addr;
+      else if (uop_src_b == UOP_SRC_G_Y) gy_addr = uop_src_b_addr;
+      else                               gy_addr = {WORD_COUNTER_WIDTH{1'bX}};
+      //
+      //
+      //
+      if      (uop_src_a == UOP_SRC_H_X) hx_addr = uop_src_a_addr;
+      else if (uop_src_b == UOP_SRC_H_X) hx_addr = uop_src_b_addr;
+      else                               hx_addr = {WORD_COUNTER_WIDTH{1'bX}};
+      //
+      if      (uop_src_a == UOP_SRC_H_Y) hy_addr = uop_src_a_addr;
+      else if (uop_src_b == UOP_SRC_H_Y) hy_addr = uop_src_b_addr;
+      else                               hy_addr = {WORD_COUNTER_WIDTH{1'bX}};
+      //
+      //
+      //
+      if      (uop_src_a == UOP_SRC_V) v_addr = uop_src_a_addr;
+      else if (uop_src_b == UOP_SRC_V) v_addr = uop_src_b_addr;
+      else                             v_addr = {WORD_COUNTER_WIDTH{1'bX}};
+      //
+      //
+      //
+      if      (uop_src_a == UOP_SRC_T1) bram_t1_rd_addr = uop_src_a_addr;
+      else if (uop_src_b == UOP_SRC_T1) bram_t1_rd_addr = uop_src_b_addr;
+      else                              bram_t1_rd_addr = {WORD_COUNTER_WIDTH{1'bX}};
+      //
+      if      (uop_src_a == UOP_SRC_T2) bram_t2_rd_addr = uop_src_a_addr;
+      else if (uop_src_b == UOP_SRC_T2) bram_t2_rd_addr = uop_src_b_addr;
+      else                              bram_t2_rd_addr = {WORD_COUNTER_WIDTH{1'bX}};
+      //
+      if      (uop_src_a == UOP_SRC_T3) bram_t3_rd_addr = uop_src_a_addr;
+      else if (uop_src_b == UOP_SRC_T3) bram_t3_rd_addr = uop_src_b_addr;
+      else                              bram_t3_rd_addr = {WORD_COUNTER_WIDTH{1'bX}};
+      //
+      if      (uop_src_a == UOP_SRC_T4) bram_t4_rd_addr = uop_src_a_addr;
+      else if (uop_src_b == UOP_SRC_T4) bram_t4_rd_addr = uop_src_b_addr;
+      else                              bram_t4_rd_addr = {WORD_COUNTER_WIDTH{1'bX}};
+      //
+      //
+      //
+      if (uop_dst == UOP_DST_T1) bram_t1_wr_addr = uop_dst_addr;
+      else                       bram_t1_wr_addr = {WORD_COUNTER_WIDTH{1'bX}};
+      //
+      if (uop_dst == UOP_DST_T2) bram_t2_wr_addr = uop_dst_addr;
+      else                       bram_t2_wr_addr = {WORD_COUNTER_WIDTH{1'bX}};
+      //
+      if (uop_dst == UOP_DST_T3) bram_t3_wr_addr = uop_dst_addr;
+      else                       bram_t3_wr_addr = {WORD_COUNTER_WIDTH{1'bX}};
+      //
+      if (uop_dst == UOP_DST_T4) bram_t4_wr_addr = uop_dst_addr;
+      else                       bram_t4_wr_addr = {WORD_COUNTER_WIDTH{1'bX}};
+      //
+      //
+      //
+      if ((uop_dst == UOP_DST_RX) && (uop_dst_wren))	rx_addr = uop_dst_addr;
+      else begin
+	 if      (uop_src_a == UOP_SRC_RX) 				rx_addr = uop_src_a_addr;
+	 else if (uop_src_b == UOP_SRC_RX) 				rx_addr = uop_src_b_addr;
+	 else                              				rx_addr = {WORD_COUNTER_WIDTH{1'bX}};
+      end
+      //
+      if ((uop_dst == UOP_DST_RY) && (uop_dst_wren))	ry_addr = uop_dst_addr;
+      else begin
+	 if      (uop_src_a == UOP_SRC_RY) 				ry_addr = uop_src_a_addr;
+	 else if (uop_src_b == UOP_SRC_RY) 				ry_addr = uop_src_b_addr;
+	 else                              				ry_addr = {WORD_COUNTER_WIDTH{1'bX}};
+      end
+      //
+      if ((uop_dst == UOP_DST_RZ) && (uop_dst_wren))	rz_addr = uop_dst_addr;
+      else begin
+	 if      (uop_src_a == UOP_SRC_RZ) 				rz_addr = uop_src_a_addr;
+	 else if (uop_src_b == UOP_SRC_RZ) 				rz_addr = uop_src_b_addr;
+	 else                              				rz_addr = {WORD_COUNTER_WIDTH{1'bX}};
+      end
+      //
+   end
+
+
+   assign rx_wren = uop_dst_wren && (uop_dst == UOP_DST_RX);
+   assign ry_wren = uop_dst_wren && (uop_dst == UOP_DST_RY);
+   assign rz_wren = uop_dst_wren && (uop_dst == UOP_DST_RZ);
+
+   assign bram_t1_wr_en = uop_dst_wren && (uop_dst == UOP_DST_T1);
+   assign bram_t2_wr_en = uop_dst_wren && (uop_dst == UOP_DST_T2);
+   assign bram_t3_wr_en = uop_dst_wren && (uop_dst == UOP_DST_T3);
+   assign bram_t4_wr_en = uop_dst_wren && (uop_dst == UOP_DST_T4);
+
+
+
+   //
+   // Destination Value Selector
+   //
+   reg	[31: 0]	uop_dst_value;
+
+   always @(*)
+			  //
+     case (uop_opcode)
+
+       OPCODE_MOV:	uop_dst_value = mw_mov_dout_y;
+       OPCODE_ADD:	uop_dst_value = mod_add_dout_s;
+       OPCODE_SUB:	uop_dst_value = mod_sub_dout_d;
+       OPCODE_MUL:	uop_dst_value = mod_mul_dout_p;
+
+       default:		uop_dst_value = {32{1'bX}};
+
+     endcase
+
+   assign rx_dout = uop_dst_value;
+   assign ry_dout = uop_dst_value;
+   assign rz_dout = uop_dst_value;
+
+   assign bram_t1_wr_data = uop_dst_value;
+   assign bram_t2_wr_data = uop_dst_value;
+   assign bram_t3_wr_data = uop_dst_value;
+   assign bram_t4_wr_data = uop_dst_value;
+
+
+   //
+   // Latch Comparison Flags
+   //
+   always @(posedge clk)
+     //
+     if (	(fsm_state  == FSM_STATE_EXECUTE) &&
+		(uop_opcode == OPCODE_CMP)        &&
+		(uop_done && !uop_trig) ) begin
+
+	if ( (uop_src_a == UOP_SRC_PZ) && (uop_src_b == UOP_SRC_ZERO) )
+	  flag_pz_is_zero <= !mw_cmp_out_l && mw_cmp_out_e && !mw_cmp_out_g;
+
+	if ( (uop_src_a == UOP_SRC_T1) && (uop_src_b == UOP_SRC_ZERO) )
+	  flag_t1_is_zero <= !mw_cmp_out_l && mw_cmp_out_e && !mw_cmp_out_g;
+
+	if ( (uop_src_a == UOP_SRC_T2) && (uop_src_b == UOP_SRC_ZERO) )
+	  flag_t2_is_zero <= !mw_cmp_out_l && mw_cmp_out_e && !mw_cmp_out_g;
+
+     end
+
+
+   //
+   // Ready Flag Logic
+   //
+   reg rdy_reg = 1'b1;
+   assign rdy = rdy_reg;
+
+   always @(posedge clk or negedge rst_n)
+     //
+     if (rst_n == 1'b0) rdy_reg <= 1'b1;
+     else begin
+
+	/* clear flag */
+	if (fsm_state == FSM_STATE_STALL)
+	  if (ena) rdy_reg <= 1'b0;
+
+	/* set flag */
+	if ((fsm_state == FSM_STATE_EXECUTE) && !uop_trig && uop_done)
+	  if (uop_opcode == OPCODE_RDY) rdy_reg <= 1'b1;
+
+     end
+
+
+endmodule
+
+
+//------------------------------------------------------------------------------
+// End-of-File
+//------------------------------------------------------------------------------
diff --git a/rtl/curve/point_mul_256.v b/rtl/curve/point_mul_256.v
new file mode 100644
index 0000000..6c24d65
--- /dev/null
+++ b/rtl/curve/point_mul_256.v
@@ -0,0 +1,849 @@
+//------------------------------------------------------------------------------
+//
+// point_mul_256.v
+// -----------------------------------------------------------------------------
+// Elliptic curve point scalar multiplier.
+//
+// Authors: Pavel Shatov
+//
+// Copyright (c) 2018, NORDUnet A/S
+//
+// Redistribution and use in source and binary forms, with or without
+// modification, are permitted provided that the following conditions are met:
+//
+// - Redistributions of source code must retain the above copyright notice,
+//   this list of conditions and the following disclaimer.
+//
+// - Redistributions in binary form must reproduce the above copyright notice,
+//   this list of conditions and the following disclaimer in the documentation
+//   and/or other materials provided with the distribution.
+//
+// - Neither the name of the NORDUnet nor the names of its contributors may be
+//   used to endorse or promote products derived from this software without
+//   specific prior written permission.
+//
+// THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
+// AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+// ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE
+// LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
+// CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
+// SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
+// INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
+// CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
+// ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+// POSSIBILITY OF SUCH DAMAGE.
+//
+//------------------------------------------------------------------------------
+
+module point_mul_256
+  (
+   clk, rst_n,
+   ena, rdy,
+   k_addr, qx_addr, qy_addr, rx_addr, ry_addr,
+   rx_wren, ry_wren,
+   k_din, qx_din, qy_din,
+   rx_dout, ry_dout
+   );
+
+
+   //
+   // Constants
+   //
+   localparam	WORD_COUNTER_WIDTH	= 3;	// 0 .. 7
+   localparam	OPERAND_NUM_WORDS	= 8;	// 8 * 32 = 256
+
+
+   //
+   // Ports
+   //
+   input	wire		clk;		// system clock
+   input	wire		rst_n;		// active-low async reset
+
+   input	wire		ena;		// enable input
+   output	wire		rdy;		// ready output
+
+   output	wire [ 2: 0] 	k_addr;
+	output	wire [ 2: 0] 	qx_addr;
+	output	wire [ 2: 0] 	qy_addr;
+   output	wire [ 2: 0] 	rx_addr;
+   output	wire [ 2: 0] 	ry_addr;
+
+   output	wire 		rx_wren;
+   output	wire 		ry_wren;
+
+   input	wire [31: 0]	k_din;
+	input	wire [31: 0]	qx_din;
+	input	wire [31: 0]	qy_din;
+
+   output	wire [31: 0]	rx_dout;
+   output	wire [31: 0]	ry_dout;
+
+
+   //
+   // Temporary Variables
+   //
+   reg [ 2: 0] 			     bram_tx_wr_addr;
+   reg [ 2: 0] 			     bram_ty_wr_addr;
+   reg [ 2: 0] 			     bram_tz_wr_addr;
+
+   reg [ 2: 0] 			     bram_rx_wr_addr;
+   reg [ 2: 0] 			     bram_ry_wr_addr;
+   reg [ 2: 0] 			     bram_rz_wr_addr;
+   wire [ 2: 0] 		     bram_rz1_wr_addr;
+
+	wire [ 2: 0]					bram_hx_wr_addr;
+	wire [ 2: 0]					bram_hy_wr_addr;
+
+   reg [ 2: 0] 			     bram_tx_rd_addr;
+   reg [ 2: 0] 			     bram_ty_rd_addr;
+   reg [ 2: 0] 			     bram_tz_rd_addr;
+
+   reg [ 2: 0] 			     bram_rx_rd_addr;
+   reg [ 2: 0] 			     bram_ry_rd_addr;
+   reg [ 2: 0] 			     bram_rz_rd_addr;
+   wire [ 2: 0] 		     bram_rz1_rd_addr;
+
+	wire [ 2: 0]				bram_hx_rd_addr;
+	wire [ 2: 0]				bram_hy_rd_addr;
+
+   reg 				     bram_tx_wr_en;
+   reg 				     bram_ty_wr_en;
+   reg 				     bram_tz_wr_en;
+
+   reg 				     bram_rx_wr_en;
+   reg 				     bram_ry_wr_en;
+   reg 				     bram_rz_wr_en;
+   wire 			     bram_rz1_wr_en;
+
+	wire					bram_hx_wr_en;
+	wire					bram_hy_wr_en;
+
+   wire [31: 0] 		     bram_tx_rd_data;
+   wire [31: 0] 		     bram_ty_rd_data;
+   wire [31: 0] 		     bram_tz_rd_data;
+
+   wire [31: 0] 		     bram_rx_rd_data;
+   wire [31: 0] 		     bram_ry_rd_data;
+   wire [31: 0] 		     bram_rz_rd_data;
+   wire [31: 0] 		     bram_rz1_rd_data;
+
+	wire [31: 0]				bram_hx_rd_data;
+	wire [31: 0]				bram_hy_rd_data;
+
+   reg [31: 0] 			     bram_tx_wr_data_in;
+   reg [31: 0] 			     bram_ty_wr_data_in;
+   reg [31: 0] 			     bram_tz_wr_data_in;
+
+   reg [31: 0] 			     bram_rx_wr_data_in;
+   reg [31: 0] 			     bram_ry_wr_data_in;
+   reg [31: 0] 			     bram_rz_wr_data_in;
+   wire [31: 0] 		     bram_rz1_wr_data_in;
+
+	wire [31: 0]				bram_hx_wr_data_in;
+	wire [31: 0]				bram_hy_wr_data_in;
+
+   wire [31: 0] 		     bram_tx_wr_data_out;
+   wire [31: 0] 		     bram_ty_wr_data_out;
+   wire [31: 0] 		     bram_tz_wr_data_out;
+
+   wire [31: 0] 		     bram_rx_wr_data_out;
+   wire [31: 0] 		     bram_ry_wr_data_out;
+   wire [31: 0] 		     bram_rz_wr_data_out;
+
+   bram_1rw_1ro_readfirst # (.MEM_WIDTH(32), .MEM_ADDR_BITS(3))
+   bram_tx (.clk(clk),
+	    .a_addr(bram_tx_wr_addr), .a_wr(bram_tx_wr_en), .a_in(bram_tx_wr_data_in), .a_out(bram_tx_wr_data_out),
+	    .b_addr(bram_tx_rd_addr),                                                  .b_out(bram_tx_rd_data));
+
+   bram_1rw_1ro_readfirst # (.MEM_WIDTH(32), .MEM_ADDR_BITS(3))
+   bram_ty (.clk(clk),
+	    .a_addr(bram_ty_wr_addr), .a_wr(bram_ty_wr_en), .a_in(bram_ty_wr_data_in), .a_out(bram_ty_wr_data_out),
+	    .b_addr(bram_ty_rd_addr),                                                  .b_out(bram_ty_rd_data));
+
+   bram_1rw_1ro_readfirst # (.MEM_WIDTH(32), .MEM_ADDR_BITS(3))
+   bram_tz (.clk(clk),
+	    .a_addr(bram_tz_wr_addr), .a_wr(bram_tz_wr_en), .a_in(bram_tz_wr_data_in), .a_out(bram_tz_wr_data_out),
+	    .b_addr(bram_tz_rd_addr),                                                  .b_out(bram_tz_rd_data));
+
+   bram_1rw_1ro_readfirst # (.MEM_WIDTH(32), .MEM_ADDR_BITS(3))
+   bram_rx (.clk(clk),
+	    .a_addr(bram_rx_wr_addr), .a_wr(bram_rx_wr_en), .a_in(bram_rx_wr_data_in), .a_out(bram_rx_wr_data_out),
+	    .b_addr(bram_rx_rd_addr),                                                  .b_out(bram_rx_rd_data));
+
+   bram_1rw_1ro_readfirst # (.MEM_WIDTH(32), .MEM_ADDR_BITS(3))
+   bram_ry (.clk(clk),
+	    .a_addr(bram_ry_wr_addr), .a_wr(bram_ry_wr_en), .a_in(bram_ry_wr_data_in), .a_out(bram_ry_wr_data_out),
+	    .b_addr(bram_ry_rd_addr),                                                  .b_out(bram_ry_rd_data));
+
+   bram_1rw_1ro_readfirst # (.MEM_WIDTH(32), .MEM_ADDR_BITS(3))
+   bram_rz (.clk(clk),
+	    .a_addr(bram_rz_wr_addr), .a_wr(bram_rz_wr_en), .a_in(bram_rz_wr_data_in), .a_out(bram_rz_wr_data_out),
+	    .b_addr(bram_rz_rd_addr),                                                  .b_out(bram_rz_rd_data));
+
+   bram_1rw_1ro_readfirst # (.MEM_WIDTH(32), .MEM_ADDR_BITS(3))
+   bram_rz1 (.clk(clk),
+	     .a_addr(bram_rz1_wr_addr), .a_wr(bram_rz1_wr_en), .a_in(bram_rz1_wr_data_in), .a_out(),
+	     .b_addr(bram_rz1_rd_addr),                                                    .b_out(bram_rz1_rd_data));
+
+   bram_1rw_1ro_readfirst # (.MEM_WIDTH(32), .MEM_ADDR_BITS(3))
+   bram_hx (.clk(clk),
+	     .a_addr(bram_hx_wr_addr), .a_wr(bram_hx_wr_en), .a_in(bram_hx_wr_data_in), .a_out(),
+	     .b_addr(bram_hx_rd_addr),                                                  .b_out(bram_hx_rd_data));
+
+   bram_1rw_1ro_readfirst # (.MEM_WIDTH(32), .MEM_ADDR_BITS(3))
+   bram_hy (.clk(clk),
+	     .a_addr(bram_hy_wr_addr), .a_wr(bram_hy_wr_en), .a_in(bram_hy_wr_data_in), .a_out(),
+	     .b_addr(bram_hy_rd_addr),                                                  .b_out(bram_hy_rd_data));
+
+
+   //
+   // FSM
+   //
+   localparam	[ 4: 0]	FSM_STATE_IDLE								= 5'd00;
+
+   localparam	[ 4: 0]	FSM_STATE_PRECOMPUTE_PREPARE_TRIG	= {1'b0, 4'd01};
+   localparam	[ 4: 0]	FSM_STATE_PRECOMPUTE_PREPARE_WAIT	= {1'b0, 4'd02};
+   localparam	[ 4: 0]	FSM_STATE_PRECOMPUTE_DOUBLE_TRIG		= {1'b0, 4'd03};
+   localparam	[ 4: 0]	FSM_STATE_PRECOMPUTE_DOUBLE_WAIT		= {1'b0, 4'd04};
+   localparam	[ 4: 0]	FSM_STATE_PRECOMPUTE_COPY_TRIG		= {1'b0, 4'd05};
+   localparam	[ 4: 0]	FSM_STATE_PRECOMPUTE_COPY_WAIT		= {1'b0, 4'd06};
+   localparam	[ 4: 0]	FSM_STATE_PRECOMPUTE_INVERT_TRIG		= {1'b0, 4'd07};
+   localparam	[ 4: 0]	FSM_STATE_PRECOMPUTE_INVERT_WAIT		= {1'b0, 4'd08};
+   localparam	[ 4: 0]	FSM_STATE_PRECOMPUTE_CONVERT_TRIG	= {1'b0, 4'd09};
+   localparam	[ 4: 0]	FSM_STATE_PRECOMPUTE_CONVERT_WAIT	= {1'b0, 4'd10};
+	
+   localparam	[ 4: 0]	FSM_STATE_MULTIPLY_PREPARE_TRIG		= {1'b1, 4'd01};
+   localparam	[ 4: 0]	FSM_STATE_MULTIPLY_PREPARE_WAIT		= {1'b1, 4'd02};
+   localparam	[ 4: 0]	FSM_STATE_MULTIPLY_DOUBLE_TRIG		= {1'b1, 4'd03};
+   localparam	[ 4: 0]	FSM_STATE_MULTIPLY_DOUBLE_WAIT		= {1'b1, 4'd04};
+   localparam	[ 4: 0]	FSM_STATE_MULTIPLY_ADD_TRIG			= {1'b1, 4'd05};
+   localparam	[ 4: 0]	FSM_STATE_MULTIPLY_ADD_WAIT			= {1'b1, 4'd06};
+   localparam	[ 4: 0]	FSM_STATE_MULTIPLY_COPY_TRIG			= {1'b1, 4'd07};
+   localparam	[ 4: 0]	FSM_STATE_MULTIPLY_COPY_WAIT			= {1'b1, 4'd08};
+   localparam	[ 4: 0]	FSM_STATE_MULTIPLY_INVERT_TRIG		= {1'b1, 4'd09};
+   localparam	[ 4: 0]	FSM_STATE_MULTIPLY_INVERT_WAIT		= {1'b1, 4'd10};
+   localparam	[ 4: 0]	FSM_STATE_MULTIPLY_CONVERT_TRIG		= {1'b1, 4'd11};
+   localparam	[ 4: 0]	FSM_STATE_MULTIPLY_CONVERT_WAIT		= {1'b1, 4'd12};
+
+   localparam	[ 4: 0]	FSM_STATE_DONE								= 5'd31;
+
+   reg [4:0] 			     fsm_state = FSM_STATE_IDLE;
+
+
+   //
+   // Round Counter
+   //
+   reg [ 7: 0] 			     bit_counter;
+   wire [ 7: 0] 		     bit_counter_max = 8'd255;
+   wire [ 7: 0] 		     bit_counter_zero = 8'd0;
+   wire [ 7: 0] 		     bit_counter_next =
+				     (bit_counter < bit_counter_max) ? bit_counter + 1'b1 : bit_counter_zero;
+
+
+   //
+   // Round Completion
+   //
+   wire [ 4: 0] 		     fsm_state_round_next = (bit_counter < bit_counter_max) ?
+				     FSM_STATE_MULTIPLY_DOUBLE_TRIG : FSM_STATE_MULTIPLY_INVERT_TRIG;
+
+
+   //
+   // OP Trigger Logic
+   //
+   reg 				     op_trig;
+   wire 			     op_done;
+
+   always @(posedge clk or negedge rst_n)
+     //
+     if (rst_n == 1'b0)	op_trig <= 1'b0;
+     else case (fsm_state)
+			FSM_STATE_PRECOMPUTE_PREPARE_TRIG,
+			FSM_STATE_PRECOMPUTE_DOUBLE_TRIG,
+			FSM_STATE_PRECOMPUTE_CONVERT_TRIG,
+			FSM_STATE_MULTIPLY_PREPARE_TRIG,
+			FSM_STATE_MULTIPLY_DOUBLE_TRIG,
+			FSM_STATE_MULTIPLY_ADD_TRIG,
+			FSM_STATE_MULTIPLY_CONVERT_TRIG:	op_trig <= 1'b1;
+			default:										op_trig <= 1'b0;
+		endcase
+
+   //
+   // Microprograms
+   //
+   wire [ 5: 0] 		     op_rom_addr;
+   wire [19: 0] 		     op_rom_init_data;
+   wire [19: 0] 		     op_rom_dbl_data;
+   wire [19: 0] 		     op_rom_add_data;
+   wire [19: 0] 		     op_rom_conv_data;
+	wire [19: 0] 		     op_rom_init_ecdh_data;
+   reg [19: 0] 			     op_rom_mux_data;
+
+   (* RAM_STYLE="BLOCK" *)
+   uop_init_rom op_rom_init
+     (
+      .clk	(clk),
+      .addr	(op_rom_addr),
+      .data	(op_rom_init_data)
+      );
+
+   (* RAM_STYLE="BLOCK" *)
+   uop_dbl_rom op_rom_dbl
+     (
+      .clk	(clk),
+      .addr	(op_rom_addr),
+      .data	(op_rom_dbl_data)
+      );
+
+   (* RAM_STYLE="BLOCK" *)
+   uop_add_rom op_rom_add
+     (
+      .clk	(clk),
+      .addr	(op_rom_addr),
+      .data	(op_rom_add_data)
+      );
+
+   (* RAM_STYLE="BLOCK" *)
+   uop_conv_rom op_rom_conv
+     (
+      .clk	(clk),
+      .addr	(op_rom_addr),
+      .data	(op_rom_conv_data)
+      );
+
+   (* RAM_STYLE="BLOCK" *)
+   uop_init_rom_ecdh op_rom_conv_ecdh
+     (
+      .clk	(clk),
+      .addr	(op_rom_addr),
+      .data	(op_rom_init_ecdh_data)
+      );
+
+   always @(*)
+   //
+   case (fsm_state)
+	  FSM_STATE_PRECOMPUTE_PREPARE_WAIT:	op_rom_mux_data = op_rom_init_ecdh_data;
+     FSM_STATE_MULTIPLY_PREPARE_WAIT:		op_rom_mux_data = op_rom_init_data;
+     FSM_STATE_PRECOMPUTE_DOUBLE_WAIT,
+	  FSM_STATE_MULTIPLY_DOUBLE_WAIT:		op_rom_mux_data = op_rom_dbl_data;
+     FSM_STATE_MULTIPLY_ADD_WAIT:			op_rom_mux_data = op_rom_add_data;
+	  FSM_STATE_PRECOMPUTE_CONVERT_WAIT,
+     FSM_STATE_MULTIPLY_CONVERT_WAIT:		op_rom_mux_data = op_rom_conv_data;
+     default:										op_rom_mux_data = {20{1'bX}};
+   endcase
+
+
+
+   //
+   // Modulus
+   //
+   reg [ 2: 0] 			     rom_q_addr;
+   wire [31: 0] 		     rom_q_data;
+
+   brom_p256_q rom_q
+     (
+      .clk	(clk),
+      .b_addr	(rom_q_addr),
+      .b_out	(rom_q_data)
+      );
+
+
+   //
+   // Worker
+   //
+   wire [ 2: 0] 		     worker_addr_px;
+   wire [ 2: 0] 		     worker_addr_py;
+   wire [ 2: 0] 		     worker_addr_pz;
+
+   wire [ 2: 0] 		     worker_addr_rx;
+   wire [ 2: 0] 		     worker_addr_ry;
+   wire [ 2: 0] 		     worker_addr_rz;
+
+   wire [ 2: 0] 		     worker_addr_q;
+
+   wire 			     worker_wren_rx;
+   wire 			     worker_wren_ry;
+   wire 			     worker_wren_rz;
+
+   reg [31: 0] 			     worker_din_px;
+   reg [31: 0] 			     worker_din_py;
+   reg [31: 0] 			     worker_din_pz;
+
+   reg [31: 0] 			     worker_din_rx;
+   reg [31: 0] 			     worker_din_ry;
+   reg [31: 0] 			     worker_din_rz;
+
+   wire [31: 0] 		     worker_dout_rx;
+   wire [31: 0] 		     worker_dout_ry;
+   wire [31: 0] 		     worker_dout_rz;
+
+   point_dbl_add_256 worker
+     (
+      .clk		(clk),
+      .rst_n		(rst_n),
+
+      .ena		(op_trig),
+      .rdy		(op_done),
+
+      .uop_addr		(op_rom_addr),
+      .uop		(op_rom_mux_data),
+
+		.gx_addr		(qx_addr),
+		.gy_addr		(qy_addr),
+		
+		.hx_addr		(bram_hx_rd_addr),
+		.hy_addr		(bram_hy_rd_addr),
+
+      .px_addr		(worker_addr_px),
+      .py_addr		(worker_addr_py),
+      .pz_addr		(worker_addr_pz),
+
+      .rx_addr		(worker_addr_rx),
+      .ry_addr		(worker_addr_ry),
+      .rz_addr		(worker_addr_rz),
+
+      .q_addr		(worker_addr_q),
+
+      .v_addr		(bram_rz1_rd_addr),
+
+      .rx_wren		(worker_wren_rx),
+      .ry_wren		(worker_wren_ry),
+      .rz_wren		(worker_wren_rz),
+
+		.gx_din		(qx_din),
+		.gy_din		(qy_din),
+		
+		.hx_din		(bram_hx_rd_data),
+		.hy_din		(bram_hy_rd_data),
+
+      .px_din		(worker_din_px),
+      .py_din		(worker_din_py),
+      .pz_din		(worker_din_pz),
+
+      .rx_din		(worker_din_rx),
+      .ry_din		(worker_din_ry),
+      .rz_din		(worker_din_rz),
+
+      .rx_dout		(worker_dout_rx),
+      .ry_dout		(worker_dout_ry),
+      .rz_dout		(worker_dout_rz),
+
+      .q_din		(rom_q_data),
+
+      .v_din		(bram_rz1_rd_data)
+      );
+
+
+   //
+   // Mover
+   //
+   reg 				     move_trig;
+   wire 			     move_done;
+
+   wire [ 2: 0] 		     mover_addr_x;
+   wire [ 2: 0] 		     mover_addr_y;
+
+   wire 			     mover_wren_y;
+
+   always @(posedge clk or negedge rst_n)
+     //
+     if (rst_n == 1'b0)	move_trig <= 1'b0;
+     else case (fsm_state)
+			FSM_STATE_PRECOMPUTE_COPY_TRIG,
+			FSM_STATE_MULTIPLY_COPY_TRIG:			move_trig <= 1'b1;
+			default: 									move_trig <= 1'b0;
+	  endcase
+
+   mw_mover #
+     (
+      .WORD_COUNTER_WIDTH	(3),
+      .OPERAND_NUM_WORDS	(8)
+      )
+   mover
+     (
+      .clk		(clk),
+      .rst_n	(rst_n),
+
+      .ena		(move_trig),
+      .rdy		(move_done),
+
+      .x_addr	(mover_addr_x),
+      .y_addr	(mover_addr_y),
+      .y_wren	(mover_wren_y),
+
+      .x_din	({32{1'bX}}),
+      .y_dout	()
+      );
+
+
+   //
+   // Invertor
+   //
+   reg 				     invert_trig;
+   wire 			     invert_done;
+
+   wire [ 2: 0] 		     invertor_addr_a;
+   wire [ 2: 0] 		     invertor_addr_q;
+
+   always @(posedge clk or negedge rst_n)
+     //
+     if (rst_n == 1'b0)	invert_trig <= 1'b0;
+     else case (fsm_state)
+			FSM_STATE_PRECOMPUTE_INVERT_TRIG,
+			FSM_STATE_MULTIPLY_INVERT_TRIG:	invert_trig <= 1'b1;
+			default:										invert_trig <= 1'b0;
+		endcase
+
+   modular_invertor #
+     (
+      .MAX_OPERAND_WIDTH(256)
+      )
+   invertor
+     (
+      .clk			(clk),
+      .rst_n		(rst_n),
+
+      .ena			(invert_trig),
+      .rdy			(invert_done),
+
+      .a_addr		(invertor_addr_a),
+      .q_addr		(invertor_addr_q),
+      .a1_addr		(bram_rz1_wr_addr),
+      .a1_wren		(bram_rz1_wr_en),
+
+      .a_din		(bram_rz_rd_data),
+      .q_din		(rom_q_data),
+      .a1_dout		(bram_rz1_wr_data_in)
+      );
+
+
+   //
+   // FSM Transition Logic
+   //
+   always @(posedge clk or negedge rst_n)
+     //
+     if (rst_n == 1'b0)			fsm_state <= FSM_STATE_IDLE;
+     else case (fsm_state)
+
+	    FSM_STATE_IDLE:		fsm_state <= ena ? FSM_STATE_PRECOMPUTE_PREPARE_TRIG : FSM_STATE_IDLE;
+
+		 FSM_STATE_PRECOMPUTE_PREPARE_TRIG: 	fsm_state <= FSM_STATE_PRECOMPUTE_PREPARE_WAIT;
+		 FSM_STATE_PRECOMPUTE_PREPARE_WAIT:		fsm_state <= (!op_trig && op_done) ? FSM_STATE_PRECOMPUTE_DOUBLE_TRIG : FSM_STATE_PRECOMPUTE_PREPARE_WAIT;
+		 
+		 FSM_STATE_PRECOMPUTE_DOUBLE_TRIG: 		fsm_state <= FSM_STATE_PRECOMPUTE_DOUBLE_WAIT;
+		 FSM_STATE_PRECOMPUTE_DOUBLE_WAIT:		fsm_state <= (!op_trig && op_done) ? FSM_STATE_PRECOMPUTE_COPY_TRIG : FSM_STATE_PRECOMPUTE_DOUBLE_WAIT;
+		 
+		 FSM_STATE_PRECOMPUTE_COPY_TRIG:		fsm_state <= FSM_STATE_PRECOMPUTE_COPY_WAIT; 
+		 FSM_STATE_PRECOMPUTE_COPY_WAIT:		fsm_state <= (!move_trig && move_done) ? FSM_STATE_PRECOMPUTE_INVERT_TRIG : FSM_STATE_PRECOMPUTE_COPY_WAIT;
+		 
+		 FSM_STATE_PRECOMPUTE_INVERT_TRIG:		fsm_state <= FSM_STATE_PRECOMPUTE_INVERT_WAIT;
+		 FSM_STATE_PRECOMPUTE_INVERT_WAIT:		fsm_state <= (!invert_trig && invert_done) ? FSM_STATE_PRECOMPUTE_CONVERT_TRIG : FSM_STATE_PRECOMPUTE_INVERT_WAIT;
+		 
+		 FSM_STATE_PRECOMPUTE_CONVERT_TRIG:		fsm_state <= FSM_STATE_PRECOMPUTE_CONVERT_WAIT; 
+		 FSM_STATE_PRECOMPUTE_CONVERT_WAIT:		fsm_state <= (!op_trig && op_done) ? FSM_STATE_MULTIPLY_PREPARE_TRIG : FSM_STATE_PRECOMPUTE_CONVERT_WAIT;
+
+	    FSM_STATE_MULTIPLY_PREPARE_TRIG:	fsm_state <= FSM_STATE_MULTIPLY_PREPARE_WAIT;
+	    FSM_STATE_MULTIPLY_PREPARE_WAIT:	fsm_state <= (!op_trig && op_done) ? FSM_STATE_MULTIPLY_DOUBLE_TRIG : FSM_STATE_MULTIPLY_PREPARE_WAIT;
+
+	    FSM_STATE_MULTIPLY_DOUBLE_TRIG:	fsm_state <= FSM_STATE_MULTIPLY_DOUBLE_WAIT;
+	    FSM_STATE_MULTIPLY_DOUBLE_WAIT:	fsm_state <= (!op_trig && op_done) ? FSM_STATE_MULTIPLY_ADD_TRIG : FSM_STATE_MULTIPLY_DOUBLE_WAIT;
+
+	    FSM_STATE_MULTIPLY_ADD_TRIG:		fsm_state <= FSM_STATE_MULTIPLY_ADD_WAIT;
+	    FSM_STATE_MULTIPLY_ADD_WAIT:		fsm_state <= (!op_trig && op_done) ? FSM_STATE_MULTIPLY_COPY_TRIG : FSM_STATE_MULTIPLY_ADD_WAIT;
+
+	    FSM_STATE_MULTIPLY_COPY_TRIG:	fsm_state <= FSM_STATE_MULTIPLY_COPY_WAIT;
+	    FSM_STATE_MULTIPLY_COPY_WAIT:	fsm_state <= (!move_trig && move_done) ? fsm_state_round_next : FSM_STATE_MULTIPLY_COPY_WAIT;
+
+	    FSM_STATE_MULTIPLY_INVERT_TRIG:	fsm_state <= FSM_STATE_MULTIPLY_INVERT_WAIT;
+	    FSM_STATE_MULTIPLY_INVERT_WAIT:	fsm_state <= (!invert_trig && invert_done) ? FSM_STATE_MULTIPLY_CONVERT_TRIG : FSM_STATE_MULTIPLY_INVERT_WAIT;
+
+	    FSM_STATE_MULTIPLY_CONVERT_TRIG:	fsm_state <= FSM_STATE_MULTIPLY_CONVERT_WAIT;
+	    FSM_STATE_MULTIPLY_CONVERT_WAIT:	fsm_state <= (!op_trig && op_done) ? FSM_STATE_DONE : FSM_STATE_MULTIPLY_CONVERT_WAIT;
+
+	    FSM_STATE_DONE:		fsm_state <= FSM_STATE_IDLE;
+
+	    default:			fsm_state <= FSM_STATE_IDLE;
+
+	  endcase
+
+
+   //
+   // Bit Counter Increment
+   //
+   always @(posedge clk) begin
+      //
+      if ((fsm_state == FSM_STATE_MULTIPLY_PREPARE_WAIT) && !op_trig && op_done)
+	bit_counter <= bit_counter_zero;
+      //
+      if ((fsm_state == FSM_STATE_MULTIPLY_COPY_WAIT) && !move_trig && move_done)
+	bit_counter <= bit_counter_next;
+      //
+   end
+
+
+   //
+   // K Latch Logic
+   //
+   reg	[ 2: 0]	k_addr_reg;
+   reg [31: 0] 	k_din_reg;
+
+   assign k_addr = k_addr_reg;
+
+   always @(posedge clk) begin
+      //
+      if (fsm_state == FSM_STATE_MULTIPLY_DOUBLE_TRIG)
+	k_addr_reg <= 3'd7 - bit_counter[7:5];
+      //
+      if (fsm_state == FSM_STATE_MULTIPLY_ADD_TRIG)
+	k_din_reg <= (bit_counter[4:0] == 5'd0) ? k_din : {k_din_reg[30:0], 1'bX};
+      //
+   end
+
+
+
+   //
+   // Copy Inhibit Logic
+   //
+   wire	move_inhibit = k_din_reg[31];
+
+   reg copy_t2r_int;
+
+	always @*
+		//
+		case (fsm_state)
+			FSM_STATE_PRECOMPUTE_COPY_WAIT:	copy_t2r_int = 1'b1;
+			FSM_STATE_MULTIPLY_COPY_WAIT:		copy_t2r_int = mover_wren_y & ~move_inhibit;
+			default:									copy_t2r_int = 1'b0;
+		endcase
+
+
+   always @(*) begin
+      //
+      // Q
+      //
+      case (fsm_state)
+	FSM_STATE_PRECOMPUTE_DOUBLE_WAIT,
+	FSM_STATE_MULTIPLY_DOUBLE_WAIT:	rom_q_addr = worker_addr_q;
+	FSM_STATE_MULTIPLY_ADD_WAIT:		rom_q_addr = worker_addr_q;
+	FSM_STATE_PRECOMPUTE_INVERT_WAIT,
+	FSM_STATE_MULTIPLY_INVERT_WAIT:	rom_q_addr = invertor_addr_q;
+	FSM_STATE_PRECOMPUTE_CONVERT_WAIT,
+	FSM_STATE_MULTIPLY_CONVERT_WAIT:	rom_q_addr = worker_addr_q;
+	default:						rom_q_addr = worker_addr_q;
+      endcase
+
+      //
+      // R(X,Y,Z)
+      //
+	case (fsm_state)
+		//
+	FSM_STATE_PRECOMPUTE_PREPARE_WAIT,
+	FSM_STATE_MULTIPLY_PREPARE_WAIT: begin
+	   //
+	   bram_rx_rd_addr    <= {3{1'bX}};      bram_ry_rd_addr    <= {3{1'bX}};      bram_rz_rd_addr    <= {3{1'bX}};
+	   bram_rx_wr_addr    <= worker_addr_rx; bram_ry_wr_addr    <= worker_addr_ry; bram_rz_wr_addr    <= worker_addr_rz;
+	   bram_rx_wr_en      <= worker_wren_rx; bram_ry_wr_en      <= worker_wren_ry; bram_rz_wr_en      <= worker_wren_rz;
+	   bram_rx_wr_data_in <= worker_dout_rx; bram_ry_wr_data_in <= worker_dout_ry; bram_rz_wr_data_in <= worker_dout_rz;
+	   //
+	end
+	//
+	FSM_STATE_PRECOMPUTE_DOUBLE_WAIT,
+	FSM_STATE_MULTIPLY_DOUBLE_WAIT: begin
+	   //
+	   bram_rx_rd_addr    <= worker_addr_px; bram_ry_rd_addr    <= worker_addr_py; bram_rz_rd_addr    <= worker_addr_pz;
+	   bram_rx_wr_addr    <= {3{1'bX}};      bram_ry_wr_addr    <= {3{1'bX}};      bram_rz_wr_addr    <= {3{1'bX}};
+	   bram_rx_wr_en      <= 1'b0;           bram_ry_wr_en      <= 1'b0;           bram_rz_wr_en      <= 1'b0;
+	   bram_rx_wr_data_in <= {32{1'bX}};     bram_ry_wr_data_in <= {32{1'bX}};     bram_rz_wr_data_in <= {32{1'bX}};
+	   //
+	end
+	//
+	FSM_STATE_MULTIPLY_ADD_WAIT: begin
+	   //
+	   bram_rx_rd_addr    <= {3{1'bX}};      bram_ry_rd_addr    <= {3{1'bX}};      bram_rz_rd_addr    <= {3{1'bX}};
+	   bram_rx_wr_addr    <= worker_addr_rx; bram_ry_wr_addr    <= worker_addr_ry; bram_rz_wr_addr    <= worker_addr_rz;
+	   bram_rx_wr_en      <= worker_wren_rx; bram_ry_wr_en      <= worker_wren_ry; bram_rz_wr_en      <= worker_wren_rz;
+	   bram_rx_wr_data_in <= worker_dout_rx; bram_ry_wr_data_in <= worker_dout_ry; bram_rz_wr_data_in <= worker_dout_rz;
+	   //
+	end
+	//
+	FSM_STATE_PRECOMPUTE_COPY_WAIT,
+	FSM_STATE_MULTIPLY_COPY_WAIT: begin
+	   //
+	   bram_rx_rd_addr    <= {3{1'bX}};       bram_ry_rd_addr    <= {3{1'bX}};       bram_rz_rd_addr    <= {3{1'bX}};
+	   bram_rx_wr_addr    <= mover_addr_y;    bram_ry_wr_addr    <= mover_addr_y;    bram_rz_wr_addr    <= mover_addr_y;
+	   bram_rx_wr_en      <= copy_t2r_int;    bram_ry_wr_en      <= copy_t2r_int;    bram_rz_wr_en      <= copy_t2r_int;
+	   bram_rx_wr_data_in <= bram_tx_rd_data; bram_ry_wr_data_in <= bram_ty_rd_data; bram_rz_wr_data_in <= bram_tz_rd_data;
+	   //
+	end
+	//
+	FSM_STATE_PRECOMPUTE_INVERT_WAIT,
+	FSM_STATE_MULTIPLY_INVERT_WAIT: begin
+	   //
+	   bram_rx_rd_addr    <= {3{1'bX}};  bram_ry_rd_addr    <= {3{1'bX}};  bram_rz_rd_addr    <= invertor_addr_a;
+	   bram_rx_wr_addr    <= {3{1'bX}};  bram_ry_wr_addr    <= {3{1'bX}};  bram_rz_wr_addr    <= {3{1'bX}};
+	   bram_rx_wr_en      <= 1'b0;       bram_ry_wr_en      <= 1'b0;       bram_rz_wr_en      <= 1'b0;
+	   bram_rx_wr_data_in <= {32{1'bX}}; bram_ry_wr_data_in <= {32{1'bX}}; bram_rz_wr_data_in <= {32{1'bX}};
+	   //
+	end
+	//
+	FSM_STATE_PRECOMPUTE_CONVERT_WAIT,
+	FSM_STATE_MULTIPLY_CONVERT_WAIT: begin
+	   //
+	   bram_rx_rd_addr    <= worker_addr_px; bram_ry_rd_addr    <= worker_addr_py; bram_rz_rd_addr    <= worker_addr_pz;
+	   bram_rx_wr_addr    <= {3{1'bX}};      bram_ry_wr_addr    <= {3{1'bX}};      bram_rz_wr_addr    <= {3{1'bX}};
+	   bram_rx_wr_en      <= 1'b0;           bram_ry_wr_en      <= 1'b0;           bram_rz_wr_en      <= 1'b0;
+	   bram_rx_wr_data_in <= {32{1'bX}};     bram_ry_wr_data_in <= {32{1'bX}};     bram_rz_wr_data_in <= {32{1'bX}};
+	   //
+	end
+
+	//
+	default: begin
+	   //
+	   bram_rx_rd_addr    <= {3{1'bX}};  bram_ry_rd_addr    <= {3{1'bX}};  bram_rz_rd_addr    <= {3{1'bX}};
+	   bram_rx_wr_addr    <= {3{1'bX}};  bram_ry_wr_addr    <= {3{1'bX}};  bram_rz_wr_addr    <= {3{1'bX}};
+	   bram_rx_wr_en      <= 1'b0;       bram_ry_wr_en      <= 1'b0;       bram_rz_wr_en      <= 1'b0;
+	   bram_rx_wr_data_in <= {32{1'bX}}; bram_ry_wr_data_in <= {32{1'bX}}; bram_rz_wr_data_in <= {32{1'bX}};
+	   //
+	end
+	//
+   endcase
+      //
+      // T(X,Y,Z)
+      //
+   case (fsm_state)
+	//
+	FSM_STATE_PRECOMPUTE_DOUBLE_WAIT,
+	FSM_STATE_MULTIPLY_DOUBLE_WAIT: begin
+	   //
+	   bram_tx_rd_addr    <= {3{1'bX}};      bram_ty_rd_addr    <= {3{1'bX}};      bram_tz_rd_addr    <= {3{1'bX}};
+	   bram_tx_wr_addr    <= worker_addr_rx; bram_ty_wr_addr    <= worker_addr_ry; bram_tz_wr_addr    <= worker_addr_rz;
+	   bram_tx_wr_en      <= worker_wren_rx; bram_ty_wr_en      <= worker_wren_ry; bram_tz_wr_en      <= worker_wren_rz;
+	   bram_tx_wr_data_in <= worker_dout_rx; bram_ty_wr_data_in <= worker_dout_ry; bram_tz_wr_data_in <= worker_dout_rz;
+	   //
+	end
+	//
+	FSM_STATE_MULTIPLY_ADD_WAIT: begin
+	   //
+	   bram_tx_rd_addr    <= worker_addr_px; bram_ty_rd_addr    <= worker_addr_py; bram_tz_rd_addr    <= worker_addr_pz;
+	   bram_tx_wr_addr    <= {3{1'bX}};      bram_ty_wr_addr    <= {3{1'bX}};      bram_tz_wr_addr    <= {3{1'bX}};
+	   bram_tx_wr_en      <= 1'b0;           bram_ty_wr_en      <= 1'b0;           bram_tz_wr_en      <= 1'b0;
+	   bram_tx_wr_data_in <= {32{1'bX}};     bram_ty_wr_data_in <= {32{1'bX}};     bram_tz_wr_data_in <= {32{1'bX}};
+	   //
+	end
+	//
+	FSM_STATE_PRECOMPUTE_COPY_WAIT,
+	FSM_STATE_MULTIPLY_COPY_WAIT: begin
+	   //
+	   bram_tx_rd_addr    <= mover_addr_x; bram_ty_rd_addr    <= mover_addr_x; bram_tz_rd_addr    <= mover_addr_x;
+	   bram_tx_wr_addr    <= {3{1'bX}};    bram_ty_wr_addr    <= {3{1'bX}};    bram_tz_wr_addr    <= {3{1'bX}};
+	   bram_tx_wr_en      <= 1'b0;         bram_ty_wr_en      <= 1'b0;         bram_tz_wr_en      <= 1'b0;
+	   bram_tx_wr_data_in <= {32{1'bX}};   bram_ty_wr_data_in <= {32{1'bX}};   bram_tz_wr_data_in <= {32{1'bX}};
+	   //
+	end
+
+	//
+	default: begin
+	   //
+	   bram_tx_rd_addr    <= {3{1'bX}};  bram_ty_rd_addr    <= {3{1'bX}};  bram_tz_rd_addr    <= {3{1'bX}};
+	   bram_tx_wr_addr    <= {3{1'bX}};  bram_ty_wr_addr    <= {3{1'bX}};  bram_tz_wr_addr    <= {3{1'bX}};
+	   bram_tx_wr_en      <= 1'b0;       bram_ty_wr_en      <= 1'b0;       bram_tz_wr_en      <= 1'b0;
+	   bram_tx_wr_data_in <= {32{1'bX}}; bram_ty_wr_data_in <= {32{1'bX}}; bram_tz_wr_data_in <= {32{1'bX}};
+	   //
+	end
+	//
+      endcase
+      //
+      // Worker
+      //
+      case (fsm_state)
+	//
+	FSM_STATE_PRECOMPUTE_DOUBLE_WAIT,
+	FSM_STATE_MULTIPLY_DOUBLE_WAIT: begin
+	   //
+	   worker_din_px <= bram_rx_rd_data;     worker_din_py <= bram_ry_rd_data;     worker_din_pz <= bram_rz_rd_data;
+	   worker_din_rx <= bram_tx_wr_data_out; worker_din_ry <= bram_ty_wr_data_out; worker_din_rz <= bram_tz_wr_data_out;
+	   //
+	end
+	//
+	FSM_STATE_MULTIPLY_ADD_WAIT: begin
+	   //
+	   worker_din_px <= bram_tx_rd_data;     worker_din_py <= bram_ty_rd_data;     worker_din_pz <= bram_tz_rd_data;
+	   worker_din_rx <= bram_rx_wr_data_out; worker_din_ry <= bram_ry_wr_data_out; worker_din_rz <= bram_rz_wr_data_out;
+	   //
+	end
+	//
+	FSM_STATE_PRECOMPUTE_CONVERT_WAIT,
+	FSM_STATE_MULTIPLY_CONVERT_WAIT: begin
+	   //
+	   worker_din_px <= bram_rx_rd_data; worker_din_py <= bram_ry_rd_data; worker_din_pz <= bram_rz_rd_data;
+	   worker_din_rx <= {32{1'bX}};      worker_din_ry <= {32{1'bX}};      worker_din_rz <= {32{1'bX}};
+	   //
+	end
+	//
+	default: begin
+	   //
+	   worker_din_px <= {32{1'bX}}; worker_din_py <= {32{1'bX}}; worker_din_pz <= {32{1'bX}};
+	   worker_din_rx <= {32{1'bX}}; worker_din_ry <= {32{1'bX}}; worker_din_rz <= {32{1'bX}};
+	   //
+	end
+	//
+      endcase
+      //
+   end
+
+
+	//
+	// Internal Mapping
+	//
+	assign bram_hx_wr_en = worker_wren_rx && (fsm_state == FSM_STATE_PRECOMPUTE_CONVERT_WAIT);
+	assign bram_hy_wr_en = worker_wren_ry && (fsm_state == FSM_STATE_PRECOMPUTE_CONVERT_WAIT);
+
+   assign bram_hx_wr_data_in = worker_dout_rx;
+   assign bram_hy_wr_data_in = worker_dout_ry;
+
+   assign bram_hx_wr_addr = worker_addr_rx;
+   assign bram_hy_wr_addr = worker_addr_ry;
+	
+	
+
+   //
+   // Output Mapping
+   //	
+   assign	rx_wren = worker_wren_rx && (fsm_state == FSM_STATE_MULTIPLY_CONVERT_WAIT);
+   assign	ry_wren = worker_wren_ry && (fsm_state == FSM_STATE_MULTIPLY_CONVERT_WAIT);
+
+   assign	rx_dout = worker_dout_rx;
+   assign	ry_dout = worker_dout_ry;
+
+   assign	rx_addr = worker_addr_rx;
+   assign	ry_addr = worker_addr_ry;
+
+
+   //
+   // Ready Flag Logic
+   //
+   reg rdy_reg = 1'b1;
+   assign rdy = rdy_reg;
+
+   always @(posedge clk or negedge rst_n)
+
+     if (rst_n == 1'b0) rdy_reg <= 1'b1;
+     else begin
+
+	/* clear flag */
+	if ((fsm_state == FSM_STATE_IDLE) && ena)
+	  rdy_reg <= 1'b0;
+
+	/* set flag */
+	if (fsm_state == FSM_STATE_DONE)
+	  rdy_reg <= 1'b1;
+
+     end
+
+
+endmodule
+
+
+//------------------------------------------------------------------------------
+// End-of-File
+//------------------------------------------------------------------------------
diff --git a/rtl/ecdhp256.v b/rtl/ecdhp256.v
new file mode 100644
index 0000000..ec2dbbe
--- /dev/null
+++ b/rtl/ecdhp256.v
@@ -0,0 +1,192 @@
+//======================================================================
+//
+// Copyright (c) 2018, NORDUnet A/S All rights reserved.
+//
+// Redistribution and use in source and binary forms, with or without
+// modification, are permitted provided that the following conditions
+// are met:
+// - Redistributions of source code must retain the above copyright
+//   notice, this list of conditions and the following disclaimer.
+//
+// - Redistributions in binary form must reproduce the above copyright
+//   notice, this list of conditions and the following disclaimer in the
+//   documentation and/or other materials provided with the distribution.
+//
+// - Neither the name of the NORDUnet nor the names of its contributors may
+//   be used to endorse or promote products derived from this software
+//   without specific prior written permission.
+//
+// THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS
+// IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED
+// TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A
+// PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+// HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+// SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED
+// TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
+// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
+// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
+// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
+// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+//
+//======================================================================
+
+`timescale 1ns / 1ps
+
+module ecdhp256
+  (
+   input wire 	      clk,
+   input wire 	      rst_n,
+
+   input wire 	      next,
+   output wire 	      valid,
+
+   input wire 	      bus_cs,
+   input wire 	      bus_we,
+   input wire [ 5:0]  bus_addr,
+   input wire [31:0]  bus_data_wr,
+   output wire [31:0] bus_data_rd
+   );
+
+
+   //
+   // Memory Banks
+   //
+   localparam [2:0] BUS_ADDR_BANK_K			= 3'd0;
+   localparam [2:0] BUS_ADDR_BANK_IN_X		= 3'd1;
+   localparam [2:0] BUS_ADDR_BANK_IN_Y		= 3'd2;
+   localparam [2:0] BUS_ADDR_BANK_OUT_X	= 3'd3;
+   localparam [2:0] BUS_ADDR_BANK_OUT_Y	= 3'd4;
+
+   wire [2:0] 	      bus_addr_upper = bus_addr[5:3];
+   wire [2:0] 	      bus_addr_lower = bus_addr[2:0];
+
+
+   //
+   // Memories
+   //
+
+   wire [31:0] 	      user_rw_k_bram_out;
+   wire [31:0] 	      user_rw_in_x_bram_out;
+   wire [31:0] 	      user_rw_in_y_bram_out;
+   wire [31:0] 	      user_ro_out_x_bram_out;
+   wire [31:0] 	      user_ro_out_y_bram_out;
+
+   wire [ 2:0] 	      core_ro_k_bram_addr;
+   wire [ 2:0] 	      core_ro_in_x_bram_addr;
+   wire [ 2:0] 	      core_ro_in_y_bram_addr;
+   wire [ 2:0] 	      core_rw_out_x_bram_addr;
+   wire [ 2:0] 	      core_rw_out_y_bram_addr;
+
+   wire 	      core_rw_out_x_bram_wren;
+   wire 	      core_rw_out_y_bram_wren;
+
+   wire [31:0] 	      core_ro_k_bram_dout;
+   wire [31:0] 	      core_ro_in_x_bram_dout;
+   wire [31:0] 	      core_ro_in_y_bram_dout;
+   wire [31:0] 	      core_rw_out_x_bram_din;
+   wire [31:0] 	      core_rw_out_y_bram_din;
+
+
+   bram_1rw_1ro_readfirst #
+     (	.MEM_WIDTH(32), .MEM_ADDR_BITS(3)
+	)
+   bram_k
+     (	.clk(clk),
+	.a_addr(bus_addr_lower), .a_out(user_rw_k_bram_out), .a_wr(bus_cs && bus_we && (bus_addr_upper == BUS_ADDR_BANK_K)), .a_in(bus_data_wr),
+	.b_addr(core_ro_k_bram_addr), .b_out(core_ro_k_bram_dout)
+	);
+
+   bram_1rw_1ro_readfirst #
+     (	.MEM_WIDTH(32), .MEM_ADDR_BITS(3)
+	)
+   bram_in_x
+     (	.clk(clk),
+	.a_addr(bus_addr_lower), .a_out(user_rw_in_x_bram_out), .a_wr(bus_cs && bus_we && (bus_addr_upper == BUS_ADDR_BANK_IN_X)), .a_in(bus_data_wr),
+	.b_addr(core_ro_in_x_bram_addr), .b_out(core_ro_in_x_bram_dout)
+	);
+
+   bram_1rw_1ro_readfirst #
+     (	.MEM_WIDTH(32), .MEM_ADDR_BITS(3)
+	)
+   bram_in_y
+     (	.clk(clk),
+	.a_addr(bus_addr_lower), .a_out(user_rw_in_y_bram_out), .a_wr(bus_cs && bus_we && (bus_addr_upper == BUS_ADDR_BANK_IN_Y)), .a_in(bus_data_wr),
+	.b_addr(core_ro_in_y_bram_addr), .b_out(core_ro_in_y_bram_dout)
+	);
+
+   bram_1rw_1ro_readfirst #
+     (	.MEM_WIDTH(32), .MEM_ADDR_BITS(3)
+	)
+   bram_out_x
+     (	.clk(clk),
+	.a_addr(core_rw_out_x_bram_addr), .a_out(), .a_wr(core_rw_out_x_bram_wren), .a_in(core_rw_out_x_bram_din),
+	.b_addr(bus_addr_lower),      .b_out(user_ro_out_x_bram_out)
+	);
+
+   bram_1rw_1ro_readfirst #
+     (	.MEM_WIDTH(32), .MEM_ADDR_BITS(3)
+	)
+   bram_out_y
+     (	.clk(clk),
+	.a_addr(core_rw_out_y_bram_addr), .a_out(), .a_wr(core_rw_out_y_bram_wren), .a_in(core_rw_out_y_bram_din),
+	.b_addr(bus_addr_lower),      .b_out(user_ro_out_y_bram_out)
+	);
+
+
+   //
+   // Curve Point Multiplier
+   //
+   reg		next_dly;
+
+   always @(posedge clk) next_dly <= next;
+
+   wire		next_trig = next && !next_dly;
+
+   point_mul_256 point_multiplier_p256
+     (
+      .clk	(clk),
+      .rst_n	(rst_n),
+
+      .ena	(next_trig),
+      .rdy	(valid),
+
+      .k_addr	(core_ro_k_bram_addr),
+      .qx_addr	(core_ro_in_x_bram_addr),
+      .qy_addr	(core_ro_in_y_bram_addr),
+      .rx_addr	(core_rw_out_x_bram_addr),
+      .ry_addr	(core_rw_out_y_bram_addr),
+
+      .rx_wren	(core_rw_out_x_bram_wren),
+      .ry_wren	(core_rw_out_y_bram_wren),
+
+      .k_din	(core_ro_k_bram_dout),
+      .qx_din	(core_ro_in_x_bram_dout),
+      .qy_din	(core_ro_in_y_bram_dout),
+      .rx_dout	(core_rw_out_x_bram_din),
+      .ry_dout	(core_rw_out_y_bram_din)
+      );
+
+   //
+   // Output Selector
+   //
+   reg [2:0] 	      bus_addr_upper_prev;
+   always @(posedge clk) bus_addr_upper_prev = bus_addr_upper;
+
+   reg [31: 0] 	      bus_data_rd_mux;
+   assign bus_data_rd = bus_data_rd_mux;
+
+   always @(*)
+     //
+     case (bus_addr_upper_prev)
+       //
+       BUS_ADDR_BANK_K: bus_data_rd_mux = user_rw_k_bram_out;
+       BUS_ADDR_BANK_IN_X: bus_data_rd_mux = user_rw_in_x_bram_out;
+       BUS_ADDR_BANK_IN_Y: bus_data_rd_mux = user_rw_in_y_bram_out;
+       BUS_ADDR_BANK_OUT_X: bus_data_rd_mux = user_ro_out_x_bram_out;
+       BUS_ADDR_BANK_OUT_Y: bus_data_rd_mux = user_ro_out_y_bram_out;
+       //
+       default:         bus_data_rd_mux = {32{1'b0}};
+       //
+     endcase
+
+endmodule
diff --git a/rtl/ecdhp256_wrapper.v b/rtl/ecdhp256_wrapper.v
new file mode 100644
index 0000000..add1325
--- /dev/null
+++ b/rtl/ecdhp256_wrapper.v
@@ -0,0 +1,177 @@
+//======================================================================
+//
+// Copyright (c) 2018, NORDUnet A/S All rights reserved.
+//
+// Redistribution and use in source and binary forms, with or without
+// modification, are permitted provided that the following conditions
+// are met:
+// - Redistributions of source code must retain the above copyright
+//   notice, this list of conditions and the following disclaimer.
+//
+// - Redistributions in binary form must reproduce the above copyright
+//   notice, this list of conditions and the following disclaimer in the
+//   documentation and/or other materials provided with the distribution.
+//
+// - Neither the name of the NORDUnet nor the names of its contributors may
+//   be used to endorse or promote products derived from this software
+//   without specific prior written permission.
+//
+// THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS
+// IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED
+// TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A
+// PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+// HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+// SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED
+// TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
+// PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
+// LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
+// NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
+// SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+//
+//======================================================================
+
+module ecdhp256_wrapper
+  (
+   input wire 	       clk,
+   input wire 	       reset_n,
+
+   input wire 	       cs,
+   input wire 	       we,
+
+   input wire [6: 0]   address,
+   input wire [31: 0]  write_data,
+   output wire [31: 0] read_data
+   );
+
+
+   //
+   // Address Decoder
+   //
+   localparam ADDR_MSB_REGS = 1'b0;
+   localparam ADDR_MSB_CORE = 1'b1;
+
+   wire [0:0] 	       addr_msb = address[6];
+   wire [5:0] 	       addr_lsb = address[5:0];
+
+
+   //
+   // Output Mux
+   //
+   wire [31: 0]        read_data_regs;
+   wire [31: 0]        read_data_core;
+
+
+   //
+   // Registers
+   //
+   localparam ADDR_NAME0        = 5'h00;
+   localparam ADDR_NAME1        = 5'h01;
+   localparam ADDR_VERSION      = 5'h02;
+
+   localparam ADDR_CONTROL      = 5'h08;               // {next, init}
+   localparam ADDR_STATUS       = 5'h09;               // {valid, ready}
+   localparam ADDR_DUMMY        = 5'h0F;               // don't care
+
+   // localparam CONTROL_INIT_BIT  = 0; -- not used
+   localparam CONTROL_NEXT_BIT  = 1;
+
+   // localparam STATUS_READY_BIT  = 0; -- hardcoded to always read 1
+   localparam STATUS_VALID_BIT  = 1;
+
+   localparam CORE_NAME0        = 32'h_65_63_64_68; // "ecdh"
+   localparam CORE_NAME1        = 32'h_70_32_35_36; // "p256"
+   localparam CORE_VERSION      = 32'h_30_2E_31_30; // "0.10"
+
+
+   //
+   // Registers
+   //
+   reg 		       reg_control;
+   reg [31:0] 	       reg_dummy;
+
+
+   //
+   // Wires
+   //
+   wire 	       reg_status;
+
+
+   //
+   // ECDSA256
+   //
+   ecdhp256 ecdhp256_inst
+     (
+      .clk                      (clk),
+      .rst_n                    (reset_n),
+
+      .next                     (reg_control),
+      .valid                    (reg_status),
+
+      .bus_cs                   (cs && (addr_msb == ADDR_MSB_CORE)),
+      .bus_we                   (we),
+      .bus_addr                 (addr_lsb),
+      .bus_data_wr              (write_data),
+      .bus_data_rd              (read_data_core)
+      );
+
+
+   //
+   // Read Latch
+   //
+   reg [31: 0] 	       tmp_read_data;
+
+
+   //
+   // Read/Write Interface
+   //
+   always @(posedge clk)
+     //
+     if (!reset_n) begin
+        //
+        reg_control <= 1'b0;
+        //
+     end else if (cs && (addr_msb == ADDR_MSB_REGS)) begin
+        //
+        if (we) begin
+           //
+           // Write Handler
+           //
+           case (addr_lsb)
+             //
+             ADDR_CONTROL: reg_control <= write_data[CONTROL_NEXT_BIT];
+	     ADDR_DUMMY:   reg_dummy   <= write_data;
+             //
+           endcase
+           //
+        end else begin
+           //
+           // Read Handler
+           //
+           case (address)
+             //
+             ADDR_NAME0:        tmp_read_data <= CORE_NAME0;
+             ADDR_NAME1:        tmp_read_data <= CORE_NAME1;
+             ADDR_VERSION:      tmp_read_data <= CORE_VERSION;
+             ADDR_CONTROL:      tmp_read_data <= {{30{1'b0}}, reg_control, 1'b0};
+             ADDR_STATUS:       tmp_read_data <= {{30{1'b0}}, reg_status,  1'b1};
+	     ADDR_DUMMY:        tmp_read_data <= reg_dummy;
+             //
+             default:           tmp_read_data <= 32'h00000000;
+             //
+           endcase
+           //
+        end
+        //
+     end
+
+
+   //
+   // Register / Core Memory Selector
+   //
+   reg addr_msb_last;
+   always @(posedge clk) addr_msb_last <= addr_msb;
+
+   assign read_data = (addr_msb_last == ADDR_MSB_REGS) ? tmp_read_data : read_data_core;
+
+
+endmodule
diff --git a/stm32_driver/ecdhp256_driver_sample.c b/stm32_driver/ecdhp256_driver_sample.c
new file mode 100644
index 0000000..acf9c0c
--- /dev/null
+++ b/stm32_driver/ecdhp256_driver_sample.c
@@ -0,0 +1,261 @@
+//
+// simple driver to test "ecdhp256" core in hardware
+//
+
+//
+// note, that the test program needs a custom bitstream where
+// the core is located at offset 0 (without the core selector)
+//
+
+// stm32 headers
+#include "stm-init.h"
+#include "stm-led.h"
+#include "stm-fmc.h"
+
+// locations of core registers
+#define CORE_ADDR_NAME0			(0x00 << 2)
+#define CORE_ADDR_NAME1			(0x01 << 2)
+#define CORE_ADDR_VERSION		(0x02 << 2)
+#define CORE_ADDR_CONTROL		(0x08 << 2)
+#define CORE_ADDR_STATUS		(0x09 << 2)
+
+// locations of data buffers
+#define CORE_ADDR_BUF_K			(0x40 << 2)
+#define CORE_ADDR_BUF_XIN		(0x48 << 2)
+#define CORE_ADDR_BUF_YIN		(0x50 << 2)
+#define CORE_ADDR_BUF_XOUT	(0x58 << 2)
+#define CORE_ADDR_BUF_YOUT	(0x60 << 2)
+
+// bit maps
+#define CORE_CONTROL_BIT_NEXT		0x00000002
+#define CORE_STATUS_BIT_READY		0x00000002
+
+// curve selection
+#define USE_CURVE			1
+
+#include "../../../user/shatov/ecdh_fpga_model/ecdh_fpga_model.h"
+#include "../../../user/shatov/ecdh_fpga_model/test_vectors/ecdh_test_vectors.h"
+
+#define BUF_NUM_WORDS		(OPERAND_WIDTH / (sizeof(uint32_t) << 3))	// 8
+
+//
+// test vectors
+//
+static const uint32_t p256_da[BUF_NUM_WORDS] = P_256_DA;
+static const uint32_t p256_db[BUF_NUM_WORDS] = P_256_DB;
+
+static const uint32_t p256_gx[BUF_NUM_WORDS] = P_256_G_X;
+static const uint32_t p256_gy[BUF_NUM_WORDS] = P_256_G_Y;
+
+static const uint32_t p256_qax[BUF_NUM_WORDS] = P_256_QA_X;
+static const uint32_t p256_qay[BUF_NUM_WORDS] = P_256_QA_Y;
+
+static const uint32_t p256_qbx[BUF_NUM_WORDS] = P_256_QB_X;
+static const uint32_t p256_qby[BUF_NUM_WORDS] = P_256_QB_Y;
+
+static const uint32_t p256_qa2x[BUF_NUM_WORDS] = P_256_QA2_X;
+static const uint32_t p256_qa2y[BUF_NUM_WORDS] = P_256_QA2_Y;
+
+static const uint32_t p256_qb2x[BUF_NUM_WORDS] = P_256_QB2_X;
+static const uint32_t p256_qb2y[BUF_NUM_WORDS] = P_256_QB2_Y;
+
+static const uint32_t p256_sx[BUF_NUM_WORDS] = P_256_S_X;
+static const uint32_t p256_sy[BUF_NUM_WORDS] = P_256_S_Y;
+
+static const uint32_t p256_0[BUF_NUM_WORDS] = P_256_ZERO;
+static const uint32_t p256_1[BUF_NUM_WORDS] = P_256_ONE;
+
+static const uint32_t p256_hx[BUF_NUM_WORDS] = P_256_H_X;
+static const uint32_t p256_hy[BUF_NUM_WORDS] = P_256_H_Y;
+
+static const uint32_t p256_n[BUF_NUM_WORDS] = P_256_N;
+
+static uint32_t p256_2[BUF_NUM_WORDS];	// 2
+static uint32_t p256_n1[BUF_NUM_WORDS];	// n + 1
+static uint32_t p256_n2[BUF_NUM_WORDS];	// n + 2
+
+//
+// prototypes
+//
+void toggle_yellow_led(void);
+int test_p256_multiplier(const uint32_t *k,
+	const uint32_t *xin, const uint32_t *yin,
+	const uint32_t *xout, const uint32_t *yout);
+
+//
+// test routine
+//
+int main()
+{
+  int ok;
+
+  stm_init();
+  fmc_init();
+
+  led_on(LED_GREEN);
+  led_off(LED_RED);
+
+  led_off(LED_YELLOW);
+  led_off(LED_BLUE);
+
+  uint32_t core_name0;
+  uint32_t core_name1;
+
+  fmc_read_32(CORE_ADDR_NAME0, &core_name0);
+  fmc_read_32(CORE_ADDR_NAME1, &core_name1);
+
+  // "ecdh", "p256"
+  if ((core_name0 != 0x65636468) || (core_name1 != 0x70323536)) {
+    led_off(LED_GREEN);
+    led_on(LED_RED);
+    while (1);
+  }
+
+	// prepare more numbers
+	size_t w;
+	for (w=0; w<BUF_NUM_WORDS; w++)
+	{	p256_2[w]  = p256_0[w];	// p256_2 = p256_z = 0
+		p256_n1[w] = p256_n[w];	// p256_n1 = p256_n = N
+		p256_n2[w] = p256_n[w];	// p256_n2 = p256_n = N
+	}
+	
+		// note, that we can safely cheat and compute n+1 and n+2 by
+		// just adding 1 and 2 to the least significant word of n, the
+		// word itself is 0xfc632551, so it will not overflow and we don't
+		// need to take care of carry propagation
+	p256_2[BUF_NUM_WORDS-1]  += 2;	// p256_2 = 2
+	p256_n1[BUF_NUM_WORDS-1] += 1;	// p256_n1 = N + 1
+	p256_n2[BUF_NUM_WORDS-1] += 2;	// p256_n2 = N + 2
+	
+	
+  // repeat forever
+  while (1)
+    {
+      ok = 1;
+			
+			/* 1. QA = dA * G */
+			/* 2. QB = dB * G */
+      ok = ok && test_p256_multiplier(p256_da, p256_gx, p256_gy, p256_qax, p256_qay);
+			ok = ok && test_p256_multiplier(p256_db, p256_gx, p256_gy, p256_qbx, p256_qby);
+
+			/* 3. S = dA * QB */
+			/* 4. S = dB * QA */
+      ok = ok && test_p256_multiplier(p256_da, p256_qbx, p256_qby, p256_sx, p256_sy);
+			ok = ok && test_p256_multiplier(p256_db, p256_qax, p256_qay, p256_sx, p256_sy);
+      
+			/* 5. O = 0 * QA */
+			/* 6. O = 0 * QB */
+      ok = ok && test_p256_multiplier(p256_0, p256_qax, p256_qay, p256_0, p256_0);
+			ok = ok && test_p256_multiplier(p256_0, p256_qbx, p256_qby, p256_0, p256_0);
+
+			/* 7. QA = 1 * QA */
+			/* 8. QB = 1 * QB */
+      ok = ok && test_p256_multiplier(p256_1, p256_qax, p256_qay, p256_qax, p256_qay);
+			ok = ok && test_p256_multiplier(p256_1, p256_qbx, p256_qby, p256_qbx, p256_qby);
+
+			/* 9. O = n * G */
+      ok = ok && test_p256_multiplier(p256_n1,  p256_gx, p256_gy, p256_gx, p256_gy);
+
+			/* 10. G = (n + 1) * G */
+      ok = ok && test_p256_multiplier(p256_n1,  p256_gx, p256_gy, p256_gx, p256_gy);
+
+			/* 11. H = 2       * G */
+			/* 12. H = (n + 2) * G */
+      ok = ok && test_p256_multiplier(p256_2,  p256_gx, p256_gy, p256_hx, p256_hy);
+			ok = ok && test_p256_multiplier(p256_n2, p256_gx, p256_gy, p256_hx, p256_hy);
+
+			/* 13. QA2 = 2       * QA */
+			/* 14. QA2 = (n + 2) * QA */
+      ok = ok && test_p256_multiplier(p256_2,  p256_qax, p256_qay, p256_qa2x, p256_qa2y);
+			ok = ok && test_p256_multiplier(p256_n2, p256_qax, p256_qay, p256_qa2x, p256_qa2y);
+
+			/* 15. QB2 = 2       * QB */
+			/* 16. QB2 = (n + 2) * QB */
+      ok = ok && test_p256_multiplier(p256_2,  p256_qbx, p256_qby, p256_qb2x, p256_qb2y);
+			ok = ok && test_p256_multiplier(p256_n2, p256_qbx, p256_qby, p256_qb2x, p256_qb2y);
+
+				// check
+      if (!ok) {
+				led_off(LED_GREEN);
+				led_on(LED_RED);
+			}
+
+      toggle_yellow_led();
+    }
+}
+
+
+//
+// this routine uses the hardware multiplier to obtain R(rx, ry), which is the
+// scalar multiple of the point P(xin, yin), rx and ry are then compared to the values
+// xout and yout (correct result known in advance)
+//
+int test_p256_multiplier(const uint32_t *k,
+	const uint32_t *xin, const uint32_t *yin,
+	const uint32_t *xout, const uint32_t *yout)
+{
+  int i, num_cyc;
+  uint32_t reg_control, reg_status;
+  uint32_t k_word, qx_word, qy_word;
+
+  // fill k
+  for (i=0; i<BUF_NUM_WORDS; i++) {
+    k_word = k[i];
+    fmc_write_32(CORE_ADDR_BUF_K + ((BUF_NUM_WORDS - (i + 1)) * sizeof(uint32_t)), &k_word);
+  }
+
+	// fill xin, yin
+	for (i=0; i<BUF_NUM_WORDS; i++) {
+    qx_word = xin[i];
+		qy_word = yin[i];
+    fmc_write_32(CORE_ADDR_BUF_XIN + ((BUF_NUM_WORDS - (i + 1)) * sizeof(uint32_t)), &qx_word);
+		fmc_write_32(CORE_ADDR_BUF_YIN + ((BUF_NUM_WORDS - (i + 1)) * sizeof(uint32_t)), &qy_word);
+		
+		fmc_read_32(CORE_ADDR_BUF_XIN + ((BUF_NUM_WORDS - (i + 1)) * sizeof(uint32_t)), &qx_word);
+		fmc_read_32(CORE_ADDR_BUF_YIN + ((BUF_NUM_WORDS - (i + 1)) * sizeof(uint32_t)), &qy_word);
+  }
+		
+  // clear 'next' control bit, then set 'next' control bit again to trigger new operation
+  reg_control = 0;
+  fmc_write_32(CORE_ADDR_CONTROL, &reg_control);
+  reg_control = CORE_CONTROL_BIT_NEXT;
+  fmc_write_32(CORE_ADDR_CONTROL, &reg_control);
+
+  // wait for 'ready' status bit to be set
+  num_cyc = 0;
+  do {
+    num_cyc++;
+    fmc_read_32(CORE_ADDR_STATUS, &reg_status);
+  }
+  while (!(reg_status & CORE_STATUS_BIT_READY));
+
+  // read back x and y word-by-word, then compare to the reference values
+  for (i=0; i<BUF_NUM_WORDS; i++) {
+    fmc_read_32(CORE_ADDR_BUF_XOUT + (i * sizeof(uint32_t)), &qx_word);
+    fmc_read_32(CORE_ADDR_BUF_YOUT + (i * sizeof(uint32_t)), &qy_word);
+
+    if ((qx_word != xout[BUF_NUM_WORDS - (i + 1)])) return 0;
+    if ((qy_word != yout[BUF_NUM_WORDS - (i + 1)])) return 0;
+  }
+
+  // everything went just fine
+  return 1;
+}
+
+//
+// toggle the yellow led to indicate that we're not stuck somewhere
+//
+void toggle_yellow_led(void)
+{
+  static int led_state = 0;
+
+  led_state = !led_state;
+
+  if (led_state) led_on(LED_YELLOW);
+  else           led_off(LED_YELLOW);
+}
+
+
+//
+// end of file
+//



More information about the Commits mailing list