Moving key unwrap out from under the keystore lock produces less surprising behavior: a bitstream with eight ModExp cores and four AES cores gets best throughput with four parallel clients, which is sane. So that's one mystery solved. Timing and profiling tests still running, no new numbers yet.