[Cryptech-Commits] [core/pkey/ecdsa256] 02/04: Added README.md with core description, API details, etc Added previously forgotten generic replacements for vendor-specific primitives Minor clean up of comments Slightly reduced power consumption

git at cryptech.is git at cryptech.is
Wed Mar 8 03:14:50 UTC 2017
Previous message (by thread): [Cryptech-Commits] [core/pkey/ecdsa256] 01/04: Initial commit of base point multiplier core for ECDSA curve P-256.
Next message (by thread): [Cryptech-Commits] [core/pkey/ecdsa256] 03/04: Various clean-ups
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
This is an automated email from the git hooks/post-receive script.

sra at hactrn.net pushed a commit to branch master
in repository core/pkey/ecdsa256.

commit a66be3237f5e9f4b6144cec093b047acfd70ffc6
Author: Pavel V. Shatov (Meister) <meisterpaul1 at yandex.ru>
AuthorDate: Sun Dec 4 23:07:58 2016 +0300

    Added README.md with core description, API details, etc
    Added previously forgotten generic replacements for vendor-specific primitives
    Minor clean up of comments
    Slightly reduced power consumption
---
 README.md                                          |  83 +++++++++++++
 rtl/ecdsa256.v                                     |   2 +-
 rtl/ecdsa256_wrapper.v                             |   6 +-
 rtl/lowlevel/artix7/dsp48e1_wrapper.v              |   2 +-
 rtl/lowlevel/artix7/mac16_artix7.v                 |   2 +-
 .../mac16_artix7.v => generic/adder32_generic.v}   |  61 +++-------
 .../mac16_artix7.v => generic/adder47_generic.v}   |  54 +++------
 .../mac16_artix7.v => generic/mac16_generic.v}     |  48 +++-----
 .../subtractor32_generic.v}                        | 133 +++++++++------------
 9 files changed, 193 insertions(+), 198 deletions(-)

diff --git a/README.md b/README.md
new file mode 100644
index 0000000..2ff17ae
--- /dev/null
+++ b/README.md
@@ -0,0 +1,83 @@
+# ecdsa256
+
+## Core Description
+
+This core implements the scalar base point multiplier for ECDSA curve P-256. It can be used during generation of public keys, the core can also be used as part of the signing operation. 
+
+## API Specification
+
+The core interface is similar to other Cryptech cores. FMC memory map looks like the following:
+
+`0x0000 | NAME0`
+`0x0004 | NAME1`
+`0x0008 | VERSION`
+
+`0x0020 | CONTROL`
+`0x0024 | STATUS`
+
+`0x0080 | K0`
+`0x0084 | K1`
+`...`
+`0x009C | K7`
+`0x00A0 | X0`
+`0x00A4 | X1`
+`...`
+`0x00BC | X7`
+`0x00C0 | Y0`
+`0x00C4 | Y1`
+`...`
+`0x00DC | Y7`
+
+The core has the following registers:
+
+ * **NAME0**, **NAME1**
+Read-only core name ("ecdsa256").
+
+ * **VERSION**
+Read-only core version, currently "0.11".
+
+ * **CONTROL**
+Control register bits:
+[31:2] Don't care, always read as 0
+[1] "next" control bit
+[0] Don't care, always read as 0
+The core starts multiplication when the "next" control bit changes from 0 to 1. This way when the bit is set, the core will only perform one multiplication and then stop. To start another operation, the bit must be cleared at first and then set to 1 again.
+
+ * **STATUS**
+Read-only status register bits:
+[31:2] Don't care, always read as 0
+[1] "valid" control bit
+[0] "ready" control bit (always read as 1)
+The "valid" control bit is cleared as soon as the core starts operation, and gets set after the multiplication operations is complete. Note, that unlike some other Cryptech cores, this core doesn't need any special initialization, so the "ready" control bit is simply hardwired to always read as 1. This is to keep general core interface consistency.
+
+ * **K0**-**K7**
+Buffer for the 256-bit multiplication factor (multiplier) K. The core will compute Q = K * G (the base point G is the multiplicand). K0 is the least significant 32-bit word of K, i.e. bits [31:0], while K7 is the most significant 32-bit word of K, i.e. bits [255:224].
+
+ * **X0**-**X7**, **Y0**-**Y7**
+Buffers for the 256-bit coordinates X and Y of the product Q = K * G. Values are returned in affine coordinates. X0 and Y0 contain the least significant 32-bit words, i.e. bits [31:0], while X7 and Y7 contain the most significant 32-bit words, i.e. bits [255:224].
+
+## Implementation Details
+
+The top-level core module contains block memory buffers for input and output operands and the base point multiplier, that reads from the input buffer and writes to the output buffers.
+
+The base point multiplier itself consists of the following:
+ * Buffers for storage of temporary values
+ * Configurable "worker" unit
+ * Microprograms for the worker unit
+ * Multi-word mover unit
+ * Modular inversion unit
+
+The "worker" unit can execute five basic operations:
+ * comparison
+ * copying
+ * modular addition
+ * modular subtraction
+ * modular multiplications
+ 
+There are two primary microprograms, that the worker runs: curve point doubling and addition of curve point to the base point. Those microprograms use projective Jacobian coordinates, so one more microprogram is used to convert the product into affine coordinates with the help of modular inversion unit.
+
+Note, that the core is supplemented by a reference model written in C, that has extensive comments describing tricky corners of the underlying math.
+
+## Vendor-specific Primitives
+
+Cryptech Alpha platform is based on Xilinx Artix-7 200T FPGA, so this core takes advantage of Xilinx-specific DSP slices to carry out math-intensive operations. All vendor-specific math primitives are placed under /rtl/lowlevel/artix7, the core also offers generic replacements under /rtl/lowlevel/generic, they can be used for simulation with 3rd party tools, that are not aware of Xilinx-specific stuff. Selection of vendor/generic primitives is done in ecdsa_lowlevel_settings.v, when port [...]
diff --git a/rtl/ecdsa256.v b/rtl/ecdsa256.v
index 86e22e5..1e712bf 100644
--- a/rtl/ecdsa256.v
+++ b/rtl/ecdsa256.v
@@ -108,7 +108,7 @@ module ecdsa256
 
 
    //
-   // Montgomery Coefficient Calculator
+   // Curve Base Point Multiplier
    //
 	reg  next_dly;
 	
diff --git a/rtl/ecdsa256_wrapper.v b/rtl/ecdsa256_wrapper.v
index 74f2cbe..c6e93ea 100644
--- a/rtl/ecdsa256_wrapper.v
+++ b/rtl/ecdsa256_wrapper.v
@@ -75,12 +75,12 @@ module ecdsa256_wrapper
 // localparam CONTROL_INIT_BIT  = 0; -- not used
    localparam CONTROL_NEXT_BIT  = 1;
 
-   localparam STATUS_READY_BIT  = 0;
-// localparam STATUS_VALID_BIT  = 1; -- hardcoded to always read 1
+// localparam STATUS_READY_BIT  = 0; -- hardcoded to always read 1
+   localparam STATUS_VALID_BIT  = 1;
 
    localparam CORE_NAME0        = 32'h65636473; // "ecds"
    localparam CORE_NAME1        = 32'h61323536; // "a256"
-   localparam CORE_VERSION      = 32'h302E3130; // "0.10"
+   localparam CORE_VERSION      = 32'h302E3131; // "0.11"
 
 
    //
diff --git a/rtl/lowlevel/artix7/dsp48e1_wrapper.v b/rtl/lowlevel/artix7/dsp48e1_wrapper.v
index 9f29ac1..11a21bc 100644
--- a/rtl/lowlevel/artix7/dsp48e1_wrapper.v
+++ b/rtl/lowlevel/artix7/dsp48e1_wrapper.v
@@ -63,7 +63,7 @@ module dsp48e1_wrapper
 		.AREG						(0),
 		.BREG						(0),
 		.CREG						(0),
-		.DREG						(1),
+		.DREG						(0),
 		.MREG						(0),
 		.PREG						(1),
 		.ADREG					(0),
diff --git a/rtl/lowlevel/artix7/mac16_artix7.v b/rtl/lowlevel/artix7/mac16_artix7.v
index 09a2413..63b74ab 100644
--- a/rtl/lowlevel/artix7/mac16_artix7.v
+++ b/rtl/lowlevel/artix7/mac16_artix7.v
@@ -2,7 +2,7 @@
 //
 // mac16_artix7.v
 // -----------------------------------------------------------------------------
-// Hardware (Artix-7 DSP48E1) 16-bit multiplier and 48-bit accumulator.
+// Hardware (Artix-7 DSP48E1) 16-bit multiplier and 47-bit accumulator.
 //
 // Authors: Pavel Shatov
 //
diff --git a/rtl/lowlevel/artix7/mac16_artix7.v b/rtl/lowlevel/generic/adder32_generic.v
similarity index 70%
copy from rtl/lowlevel/artix7/mac16_artix7.v
copy to rtl/lowlevel/generic/adder32_generic.v
index 09a2413..b9c94aa 100644
--- a/rtl/lowlevel/artix7/mac16_artix7.v
+++ b/rtl/lowlevel/generic/adder32_generic.v
@@ -1,8 +1,8 @@
 //------------------------------------------------------------------------------
 //
-// mac16_artix7.v
+// adder32_generic.v
 // -----------------------------------------------------------------------------
-// Hardware (Artix-7 DSP48E1) 16-bit multiplier and 48-bit accumulator.
+// Generic 32-bit adder.
 //
 // Authors: Pavel Shatov
 //
@@ -36,55 +36,32 @@
 //
 //------------------------------------------------------------------------------
 
-module mac16_artix7
+module adder32_generic
 	(
 		input					clk,		// clock
-		input					clr,		// clear accumulator (active-high)
-		input					ce,		// enable clock (active-high)
-		input		[15: 0]	a,			// operand input
-		input		[15: 0]	b,			// operand input
-		output	[46: 0]	s			// sum output
+		input		[31: 0]	a,			// operand input
+		input		[31: 0]	b,			// operand input
+		output	[31: 0]	s,			// sum output
+		input					c_in,		// carry input
+		output				c_out		// carry output
 	);
+	
+		//
+		// Sum
+		//
+	reg	[32: 0]	s_int;
 	
-			
-		//
-		// DSP48E1 Slice
-		//
-		
-		/* Operation Mode */
-	wire	[ 3: 0]	dsp48e1_alumode	= 4'b0000;
-	wire	[ 6: 0]	dsp48e1_opmode		= {2'b01, clr, 4'b0101};
-		
-		/* Internal Product */
-	wire	[47: 0]	p_int;
-
-	dsp48e1_wrapper dsp_adder
-	(
-		.clk			(clk),
-		
-		.ce			(ce),
-		
-		.carry		(1'b0),
+	always @(posedge clk)
+		s_int <= {1'b0, a} + {1'b0, b} + {{32{1'b0}}, c_in};
 		
-		.alumode		(dsp48e1_alumode),
-		.opmode		(dsp48e1_opmode),
-		
-		.a				({{14{1'b0}}, a}),
-		.b				({{ 2{1'b0}}, b}),
-		.c				({48{1'b0}}),
-		
-		.p				(p_int)
-	);
-
 		//
-		// Output Mapping
+		// Output
 		//
-	assign s = p_int[46:0];
-	
-
+	assign s = s_int[31:0];
+	assign c_out = s_int[32];
+		
 endmodule
 
-
 //------------------------------------------------------------------------------
 // End-of-File
 //------------------------------------------------------------------------------
diff --git a/rtl/lowlevel/artix7/mac16_artix7.v b/rtl/lowlevel/generic/adder47_generic.v
similarity index 71%
copy from rtl/lowlevel/artix7/mac16_artix7.v
copy to rtl/lowlevel/generic/adder47_generic.v
index 09a2413..f472061 100644
--- a/rtl/lowlevel/artix7/mac16_artix7.v
+++ b/rtl/lowlevel/generic/adder47_generic.v
@@ -1,8 +1,8 @@
 //------------------------------------------------------------------------------
 //
-// mac16_artix7.v
+// adder47_generic.v
 // -----------------------------------------------------------------------------
-// Hardware (Artix-7 DSP48E1) 16-bit multiplier and 48-bit accumulator.
+// Generic 47-bit adder.
 //
 // Authors: Pavel Shatov
 //
@@ -36,55 +36,29 @@
 //
 //------------------------------------------------------------------------------
 
-module mac16_artix7
+module adder47_generic
 	(
 		input					clk,		// clock
-		input					clr,		// clear accumulator (active-high)
-		input					ce,		// enable clock (active-high)
-		input		[15: 0]	a,			// operand input
-		input		[15: 0]	b,			// operand input
+		input		[46: 0]	a,			// operand input
+		input		[46: 0]	b,			// operand input
 		output	[46: 0]	s			// sum output
 	);
 	
-			
-		//
-		// DSP48E1 Slice
-		//
-		
-		/* Operation Mode */
-	wire	[ 3: 0]	dsp48e1_alumode	= 4'b0000;
-	wire	[ 6: 0]	dsp48e1_opmode		= {2'b01, clr, 4'b0101};
-		
-		/* Internal Product */
-	wire	[47: 0]	p_int;
-
-	dsp48e1_wrapper dsp_adder
-	(
-		.clk			(clk),
-		
-		.ce			(ce),
-		
-		.carry		(1'b0),
-		
-		.alumode		(dsp48e1_alumode),
-		.opmode		(dsp48e1_opmode),
-		
-		.a				({{14{1'b0}}, a}),
-		.b				({{ 2{1'b0}}, b}),
-		.c				({48{1'b0}}),
+		//
+		// Sum
+		//
+	reg	[46: 0]	s_int;
+	
+	always @(posedge clk)
+		s_int <= a + b;
 		
-		.p				(p_int)
-	);
-
 		//
-		// Output Mapping
+		// Output
 		//
-	assign s = p_int[46:0];
-	
+	assign s = s_int;
 
 endmodule
 
-
 //------------------------------------------------------------------------------
 // End-of-File
 //------------------------------------------------------------------------------
diff --git a/rtl/lowlevel/artix7/mac16_artix7.v b/rtl/lowlevel/generic/mac16_generic.v
similarity index 77%
copy from rtl/lowlevel/artix7/mac16_artix7.v
copy to rtl/lowlevel/generic/mac16_generic.v
index 09a2413..dc95645 100644
--- a/rtl/lowlevel/artix7/mac16_artix7.v
+++ b/rtl/lowlevel/generic/mac16_generic.v
@@ -1,8 +1,8 @@
 //------------------------------------------------------------------------------
 //
-// mac16_artix7.v
+// mac16_generic.v
 // -----------------------------------------------------------------------------
-// Hardware (Artix-7 DSP48E1) 16-bit multiplier and 48-bit accumulator.
+// Generic 16-bit multiplier and 47-bit accumulator.
 //
 // Authors: Pavel Shatov
 //
@@ -36,7 +36,7 @@
 //
 //------------------------------------------------------------------------------
 
-module mac16_artix7
+module mac16_generic
 	(
 		input					clk,		// clock
 		input					clr,		// clear accumulator (active-high)
@@ -46,41 +46,25 @@ module mac16_artix7
 		output	[46: 0]	s			// sum output
 	);
 	
-			
 		//
-		// DSP48E1 Slice
+		// Multiplier
 		//
+	wire	[31: 0]	p = {{16{1'b0}}, a} * {{16{1'b0}}, b};
+	wire	[46: 0]	p_ext = {{15{1'b0}}, p};
 		
-		/* Operation Mode */
-	wire	[ 3: 0]	dsp48e1_alumode	= 4'b0000;
-	wire	[ 6: 0]	dsp48e1_opmode		= {2'b01, clr, 4'b0101};
-		
-		/* Internal Product */
-	wire	[47: 0]	p_int;
-
-	dsp48e1_wrapper dsp_adder
-	(
-		.clk			(clk),
-		
-		.ce			(ce),
-		
-		.carry		(1'b0),
-		
-		.alumode		(dsp48e1_alumode),
-		.opmode		(dsp48e1_opmode),
-		
-		.a				({{14{1'b0}}, a}),
-		.b				({{ 2{1'b0}}, b}),
-		.c				({48{1'b0}}),
-		
-		.p				(p_int)
-	);
-
 		//
-		// Output Mapping
+		// Accumulator
 		//
-	assign s = p_int[46:0];
+	reg	[46: 0]	s_int;
 	
+	always @(posedge clk)
+		//
+		if (ce) s_int <= clr ? p_ext : p_ext + s_int;
+		
+		//
+		// Output
+		//
+	assign s = s_int;
 
 endmodule
 
diff --git a/rtl/lowlevel/artix7/mac16_artix7.v b/rtl/lowlevel/generic/subtractor32_generic.v
similarity index 69%
copy from rtl/lowlevel/artix7/mac16_artix7.v
copy to rtl/lowlevel/generic/subtractor32_generic.v
index 09a2413..46aefe8 100644
--- a/rtl/lowlevel/artix7/mac16_artix7.v
+++ b/rtl/lowlevel/generic/subtractor32_generic.v
@@ -1,90 +1,67 @@
-//------------------------------------------------------------------------------
-//
-// mac16_artix7.v
-// -----------------------------------------------------------------------------
-// Hardware (Artix-7 DSP48E1) 16-bit multiplier and 48-bit accumulator.
-//
-// Authors: Pavel Shatov
-//
-// Copyright (c) 2016, NORDUnet A/S
-//
-// Redistribution and use in source and binary forms, with or without
-// modification, are permitted provided that the following conditions are met:
-//
-// - Redistributions of source code must retain the above copyright notice,
-//   this list of conditions and the following disclaimer.
-//
-// - Redistributions in binary form must reproduce the above copyright notice,
-//   this list of conditions and the following disclaimer in the documentation
-//   and/or other materials provided with the distribution.
-//
-// - Neither the name of the NORDUnet nor the names of its contributors may be
-//   used to endorse or promote products derived from this software without
-//   specific prior written permission.
-//
-// THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
-// AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
-// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
-// ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE
-// LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
-// CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
-// SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
-// INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
-// CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
-// ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
-// POSSIBILITY OF SUCH DAMAGE.
-//
+//------------------------------------------------------------------------------
+//
+// subtractor32_generic.v
+// -----------------------------------------------------------------------------
+// Generic 32-bit subtractor.
+//
+// Authors: Pavel Shatov
+//
+// Copyright (c) 2016, NORDUnet A/S
+//
+// Redistribution and use in source and binary forms, with or without
+// modification, are permitted provided that the following conditions are met:
+//
+// - Redistributions of source code must retain the above copyright notice,
+//   this list of conditions and the following disclaimer.
+//
+// - Redistributions in binary form must reproduce the above copyright notice,
+//   this list of conditions and the following disclaimer in the documentation
+//   and/or other materials provided with the distribution.
+//
+// - Neither the name of the NORDUnet nor the names of its contributors may be
+//   used to endorse or promote products derived from this software without
+//   specific prior written permission.
+//
+// THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
+// AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+// IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+// ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE
+// LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
+// CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
+// SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
+// INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
+// CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
+// ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+// POSSIBILITY OF SUCH DAMAGE.
+//
 //------------------------------------------------------------------------------
 
-module mac16_artix7
+module subtractor32_generic
 	(
-		input					clk,		// clock
-		input					clr,		// clear accumulator (active-high)
-		input					ce,		// enable clock (active-high)
-		input		[15: 0]	a,			// operand input
-		input		[15: 0]	b,			// operand input
-		output	[46: 0]	s			// sum output
+		input					clk,
+		input		[31: 0]	a,
+		input		[31: 0]	b,
+		output	[31: 0]	d,
+		input					b_in,
+		output				b_out	
 	);
-	
-			
-		//
-		// DSP48E1 Slice
-		//
-		
-		/* Operation Mode */
-	wire	[ 3: 0]	dsp48e1_alumode	= 4'b0000;
-	wire	[ 6: 0]	dsp48e1_opmode		= {2'b01, clr, 4'b0101};
-		
-		/* Internal Product */
-	wire	[47: 0]	p_int;
 
-	dsp48e1_wrapper dsp_adder
-	(
-		.clk			(clk),
-		
-		.ce			(ce),
-		
-		.carry		(1'b0),
-		
-		.alumode		(dsp48e1_alumode),
-		.opmode		(dsp48e1_opmode),
-		
-		.a				({{14{1'b0}}, a}),
-		.b				({{ 2{1'b0}}, b}),
-		.c				({48{1'b0}}),
+		//
+		// Difference
+		//
+	reg	[32: 0]	d_int;
+	
+	always @(posedge clk)
+		d_int <= {1'b0, a} - {1'b0, b} - {{32{1'b0}}, b_in};
 		
-		.p				(p_int)
-	);
-
 		//
-		// Output Mapping
+		// Output
 		//
-	assign s = p_int[46:0];
-	
+	assign d = d_int[31:0];
+	assign b_out = d_int[32];
 
 endmodule
 
-
-//------------------------------------------------------------------------------
-// End-of-File
+//------------------------------------------------------------------------------
+// End-of-File
 //------------------------------------------------------------------------------
Previous message (by thread): [Cryptech-Commits] [core/pkey/ecdsa256] 01/04: Initial commit of base point multiplier core for ECDSA curve P-256.
Next message (by thread): [Cryptech-Commits] [core/pkey/ecdsa256] 03/04: Various clean-ups
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Commits mailing list