Another RISC-V Summit is behind us. It was a very well-attended event with many exciting talks and companies highlighting their products at the exhibition. One of the main themes was, once again, customization. Many people and companies, including Meta in their keynote, insisted on the importance of customization and how this key aspect of the RISC-V architecture enables innovation and differentiation. Because RISC-V allows customization, we can overcome problems with new demands on new architectures. The benefits of customization in the form of new instructions or microarchitecture tweaks are well known, and evidence can be found across the industry, from mobile to automotive and HPC.
As champions of Custom Compute, we, at Codasip, constantly look into new ways to get better PPAC (power, performance, area, and cost). Customization is our thing. Our combination of customizable RISC-V processor IP and design & customization automation tools has been deployed and in use for almost a decade with clear evidence of benefits.
There are cases, though, where you don’t want to freeze the choice of customization at tape-out time. For example, when the processor is part of a device with a long lifetime, it is hard to project all possible ISA extensions or other customizations. Or perhaps you would like to use the same silicon for multiple applications that need different acceleration features. How do we address such cases?
In this blog, I share with you a solution with some preliminary results that show big potential for this new approach.
How to customize a processor after tape-out?
We all know that once we tape out a processor or any other device, we cannot change it anymore. So, how can we customize it after tape out? The solution is to prepare a special functional unit in the processor and the processor itself that leverages eFPGA (embedded FPGA) technology. That’s the essence of the idea of customization in the field. eFPGA allows you to change the logic even after tape out. You can take any RTL that represents new instructions, do the synthesis, and upload the bitstream to it. For example, you can do that on a device that has been in the field for five years but requires fixes or improvements because a new standard has been deployed.
Together with our partner Menta, expert provider of eFPGA technology, we can do that. Instead of generating a “normal” CPU with the Codasip flow, starting from a description in CodAL high-level language (as seen on the left-hand side of the following figure), we insert a Menta eFPGA block in the CPU, as explained in the next section. Once the chip is produced, the programming of new instructions happens as described in the right part of the figure. Codasip Studio tools generate the RTL for these instructions, described in CodAL, and Menta Origami Programmer then generates a bitstream that is ready to program the chip. On top of that, Codasip Studio also generates a new SDK with full support of the new instructions.
Preparing the processor with eFPGA
The design of the processor with eFPGA is not complicated. We just need to prepare it in the right way.
- First, we need to enhance the decoder and allocate some opcode space for the new instructions that should be put into the eFPGA. Depending on the size of the eFPGA, the allocation might be smaller or bigger.
- The next step is to add eFPGA as an execution unit with variable latency: it might produce the result immediately (for example the new instruction is combinatorial logic), or it might take multiple cycles (for example the new instruction is sequential logic).
- The input to the eFPGA unit is the opcode of an instruction that should be executed with operands (register or immediate values). The output is then the result (can be one or more). There are also control signals that are used for pipeline control or hazard handling.
- Finally, we need to route some status and control pins of the eFPGA to the top level, so the new bitstream can be downloaded to the eFPGA. The following figure shows the overall processor structure that we designed, in CodAL, as a proof of concept.
Zoom on the eFPGA
The following figure focuses on the eFPGA block itself. We can see inputs on the left side, that consist of opcode of the instruction, operands s1 and s2, and a control signal that informs the eFPGA that the main pipeline is able to accept the results. The top input/output bus represents a set of input/output pins that are used for bitstream uploading, etc. The outputs are then on the right side that represent the resulting value and a control signal that informs the pipeline that the multicycle instruction is being executed. Note that this is one of many possible implementations, designers may come up with different solutions if needed, such as multiple outputs or even access to load/store unit, etc. , and just have to describe the interface in CodAL.
Together with Menta, we created a new RISC-V processor in CodAL. It is a 5-stage, in-order processor with eFPGA in it. As a proof of concept, we created a handful of new instructions to go into the eFPGA. The following code snippets show some aspects of the eFPGA module with two instructions.
What about the SDK?
Having hardware ready is one thing, but we still need to have C/C++ compiler and other tools from SDK to program such a processor. Using Codasip Studio, we can describe new instructions, get the RTL for these (that we can then synthetize to eFPGA), and get the SDK from the same description. In other words, the SDK is automatically generated based on the description of the new instructions. Software developers then enjoy the fact that the new instructions are automatically used by a new C/C++ compiler without needing to go to intrinsic/inline assembly world (unless they want to). The same applies to debugger or other parts of the SDK. If you are interested in debuggers or compilers, I would encourage you to read my previous two blogs on the topic.
Conclusion
Custom Compute is the way to go, no doubt. Let’s make it available to everyone in all possible forms. With Codasip Studio, designers can enjoy full automation of the customization, either at ISA or microarchitecture level. When connected with eFPGA technology, such as Menta provides, you reach an additional level of customization, customization in the field. You can update/upgrade the processor ISA even after tape-out. That is important, especially for long life devices when the ISA extensions are not clear at the time when the processor is designed. Of course, ISA extensions are not the only customization that might be done with eFPGA. eFPGA can hold any logic, so, why not store important data or anything else you need?