Cryptographic hash functions play a critical role in computer security providing a one-way transformation of sensitive data. Many information-security applications benefit from using hash functions, specifically digital signatures, message authentication codes, and other forms of authentication. The calculation of hash functions such as SHA512, SHA256, MD5 etc is a potential playground for Custom Compute. This is where the ISA flexibility enabled by RISC-V and empowered by the Zk extension, as well as the ability to merge inherently sequential bit manipulations in custom instructions help to improve the performance.
SHA512 hash function
SHA512 belongs to the ‘SHA-2’ family designed by the United States National Security Agency. Their compliance to FIPS standards have been validated through the CMVP program, jointly run by the National Institute of Standards and Technology and the Communications Security Establishment.
The high-level diagram of the SHA512 algorithm is given below. Since it operates with 1024-bit blocks of data, the input message is formatted and padded by dummy bits. Then each 1024-bit block is sequentially processed by a chain of 80 “rounds”. Each “round” heavily relies on additions and cyclic bit shifts upon the input data blocks, an initialization vector and a set of “round constants”:
The output of each sequence of 80 “rounds” is a 512-bit hash that serves as an initialization vector for the next input data block, or as a final result if the last data block has been processed.
Up to now SHA-512 remains a workhorse for many security protocols, including TLS and SSL, PGP, SSH, S/MIME, and IPsec. However algorithmically it is inherently sequential: each stage requires the result from the previous one to be ready thus preventing the parallel computation.
RISC-V & Zk for scalar cryptography
RISC-V’s instruction set architecture (ISA) is designed to be modular, allowing for the addition of various application-specific extensions. The scalar cryptography extension (February 2022) contains ‘Zk’ – a subset of instructions targeting AES, SM3(4), SHA256 and SHA512 algorithms.
Moreover, according to RISC-V standards it is allowed to add application-specific custom instructions on top of the baseline ISA and any kind of ratified optional extensions, targeting certain requirements on the core performance, memory footprint or power consumption. In this study we show the benefits of high-level-synthesis and retargetable LLVM compiler, both enabled by Codasip Studio tools, for the embedded core performance optimization when running SHA512.
Codasip Studio, CodAL, Zk and how it helps
Specifically for the SHA512 hashing algorithm, RISC-V Zk extension contains a subset of 6 instructions listed below. The whole subset has been implemented and included into the Codasip L31 core’s ISA. On top of that 2 custom instructions “sha512_ch” and “sha512_maj” have been considered. They are responsible for certain bit manipulations among 3 input operands and are frequently encountered in SHA512 “rounds”.
The Codasip L31 core, Zk subset and custom instructions have been described with CodAL – a high-level language for processor description. Unlike any other HDLs, instructions in CodAL are described in a compact and a unique way making the instruction semantics used both by LLVM compiler and Codasip high-level-synthesis tools. Once the CodAL description of custom instructions is done, Codasip Studio allows one to generate the synthesizable RTL and SDK that contains compiler, debugger, instruction-accurate and cycle-accurate simulators and profilers that are already aware of new instructions. Below is the exemplary implementation of custom “sha512_ch” instruction in the form of CodAL element:
The instruction body contains 3 sections: assembly, binary and semantics. The first two define the instruction syntax and map the operands, opcodes and immediates to instruction binary. Semantics contains a sequence of C-like statements that describe instruction behavior. In the taken example 3 operands are read out from the register file, the result is calculated in accordance with a certain bit manipulation pattern and then it is written to a destination register.
Due to semantics pattern simplicity both custom instructions are resolved by the compiler and automatically used at high optimization levels (-O2+):
The instructions from the ratified Zk extension have more complex semantics so they are directly called from the software by the generated intrinsics.
The entire set of 2 custom instructions and Zk subset for SHA512 required just 150 lines of CodAL code. Such a compact description alongside the automatically generated SDK significantly shortens the design turnaround and potentially time-to-market of the end product.
PPA effect: boost SHA512 performance
The SHA512 performance gain brought by RISC-V Zk subset and custom instructions has been evaluated on a SHA512 benchmark that is included into the RISC-V Crypto Github repository. The benchmark has been compiled with 3 different SDKs:
- “Reference” SDK with standard (RV32IMCB) ISA
- The same + ratified Zk subset for SHA512
- Same as previous + custom instructions
Then the obtained executables have been profiled with generated cycle-accurate profilers to obtain detailed information on the clock cycles the processor spends in each software routine as well as the PPA data.
The diagram above shows that the Zk instruction subset significantly improves the SHA512 performance by reducing the number of clock cycles by ~1.89x and the code size by ~9.7% at the cost of just 0.8% of added silicon area.Custom instructions on top of Zk bring further acceleration, pushing the performance gain above ~2x and further reducing the code size to -10.2%. The total area increase is about ~1.6% of the initial L31 area that can be considered to be a reasonable price for the two-fold performance improvement.
When it comes to a specific application, a general-purpose CPU can never be an ideal solution. Sometimes, even a seemingly simple instruction can prove to be a game-changer, and chip designers should take these benefits to end up with high quality products. Custom Compute, facilitated by Codasip Studio helps to leverage the flexibility of RISC-V instruction set, reduces the design time by a compact CodAL-based processor description and equips end chip users with a rich set of automatically generated software kits.
There is a rich set of ratified RISC-V extensions, which you may not even find in off-the-shelf CPUs, however, designers can integrate them and go even further thanks to Codasip’s offering. This study demonstrates the implementation of 8 custom instructions – 6 from the ratified Zk subset and 2 based on specific algorithm needs. This leads to a twofold performance improvement in SHA512 hash calculation as well as more than 10% of code size reduction. Notably, SHA512 and Cryptography are not the sole beneficiaries of hardware and software co-design.