[Cryptech Tech] [Cryptech-Commits] [user/sra/aes-keywrap] 01/01: Initial commit of AES Key Wrap implementation.

Rob Austein sra at hactrn.net
Mon May 4 13:37:27 UTC 2015


At Mon, 04 May 2015 14:21:28 +0200, Joachim Strömbergson wrote:
> 
> Lack of communication. But I did create a ticket for keywrap about two
> months ago.

Saw that, after the fact, my bad.  But I needed a change from PKCS #11
in any case.

> The number of 64-bit blocks part of it scares me since the code in RFC
> 3394 (and the code) seems to suggest that you need to allocate n blocks
> of data in order to process a message of n blocks. This would require
> the HW to have an upper limit of how big the message (key) we could
> wrap. 4096 bits etc.
> 
> The second, SW targeted version in RFC 3394 actually looks better from a
> HW point of view too.
> 
> But I need to look at RFC 5649 and you models.

I did not look closely at the register-based description once I
figured out that it was not the one I wanted for software.

The software version needs a chunk of memory big enough to hold the
ciphertext, which will always be a bit larger than the plaintext due
to the initialization vector and rounding the plaintext up to the next
block boundary.  One can move stuff around to do encrypt in place, but
there's no simple way to avoid the buffer.

In a hardware version I guess one could unroll the j loop into a
series of six stages.  One could almost do the i loop in parallel,
except that (I think) one has to serialize the AES operations due to
the feedback cycle taking place in the A register.

In case it's not obvious: the specification uses more variable names
than are really necessary, for clarity.  I don't disagree with their
choice of tutorial method, but one has to squint a bit to see the
mapping between all these registers if one is trying to do it in
minimal memory.  If you think of A as P[0], it all comes out right: P,
R, and C are all really the same buffer, and Q is also the same
buffer, just offset by 8 bytes to leave room for A.

Do be careful of byte swapping: the code you cited looks like it will
only work on a big-endian architecture, another example I saw looked
like it would only work on little-endian.  By preference, I tend to
write byte-order-sensitive code using shift and mask, which might cost
a few extra instruction cycles but is order-independent (and
compiler-independent, which several of the usual byte twiddling hacks
are not).  The byte-order-sensitive bits in this algorithm are the m
and t values, see the _array version of the Python or see the C code.

> Lets start with these. Really, really good to have these test vectors
> and your models. If there isn't yet a way to make the models dump
> internal values during processing I would appreciate having it.
> 
> That would include what is returned from AES for a given block, the
> state of A, i, j, n, C, P, LSB, MSB operands.

Not currently present but should be easy to add.


More information about the Tech mailing list