Blog

Defining the software/hardware interface: A new paradigm enabled by Codasip Studio Fusion

by Keith Graham

Before there was a mainstream open standard Instruction Set Architecture (ISA) like RISC-V, a computer processor’s software/hardware interface was generally defined by processor architects. The decisions of the instructions set, multi-issue, out-of-order, speculation, branch prediction or multi-core were to accelerate general purpose or a class, such as Digital Signal Processing (DSP), of computing. The optimal solutions required a deep understanding of the processor’s micro-architecture to maximize frequency, Instructions Per Cycle, and minimize data, control, and structural hazards.

These decisions combined with improvements in semiconductor processing have given us the processor revolution of the last thirty to forty years. As Denard Scaling and Moore’s Law broke down, to continue to realize computational improvements to solve tomorrow’s challenges, a new paradigm from processor architect to software engineer is required.

Defining the software/hardware interface to accelerate the processor’s hardware

As processors have become ubiquitous, their usage has become quite broad covering dedicated tasks of embedded controllers to autonomous vehicle’s real-time processing to Artificial Intelligence (AI) training. Each of these applications execute different algorithms requiring different data widths and computational operations. A general set of instructions to optimize all applications would result in increased power and area to the small, dedicated controller and limit the performance of dedicated high-performance computing. The number of algorithms is quite large and vast.

As the software engineer develops their algorithms, defining the software/hardware interface will significantly optimize performance and reduce energy consumption. General Matrix Multiplication (GEMM) performs the heavy lifting of many AI and Scientific Computing applications. A new instruction to accelerate 8-bit Matrix Multiplies for a class of AI Classification can perform four multiplies and accumulate in a single cycle accelerating eight times compared to a general purpose 32-bit processor. Completing work in a shorter period allows a processor to enter low-energy mode sooner, reducing overall energy consumption.

Picture1\-1
Picture2\-2

Defining the software/hardware interface to match the software flow

A solution is optimized when both the hardware and software can be developed in the minimal amount of time while maximizing performance.  As the software engineer develops the algorithm, organizing data to optimize hardware may not be optimal for the software developer.

Defining instructions to match the natural organization of the application data, the software engineer can develop code more quickly while less error prone. The new software/hardware interface not only benefits the software engineer, but it can also benefit the application.

Picture3\-2

Defining a new post increment load instruction that in a single cycle can load unaligned data and pack four 8-bit values correctly into a Single Instruction Multiple Data (SIMD) data structure, the program matches the natural flow of the samples reducing the software effort as well as improving performance.  The ailw_row.psti and ailw_col.psti optimizes the GEMM data structure usage and computation enabling a single 8-bit MatMul computation per clock cycle.

Enabling the software/hardware interface for software engineers

This paradigm shift towards software engineering requires tools to abstract the processor’s microarchitecture enabling these new interfaces to be defined without an understanding of the processor’s underlying architecture. One example of this abstraction is Bounded Customization in Codasip Studio Fusion.  Through a framework that appears very much like C or C++, new instructions can be added. These instructions are inserted directly into the processor’s architecture for optimal performance, area, and energy.

Interface

Adding instructions through this framework is the programmer’s visible realization of the software/hardware interface. As or more importantly, the user of the processor must be confident that it will return the desired results. The Bounded Customization framework ensures that the base core’s operation is not affected by the new instructions. Codasip Studio will prevent the software engineer from changing the base core design files. In combination with a pre-verified core, reverification of the register file, the base execution units, pipeline, etc is not required. To verify the new instructions, Codasip Studio Fusion provides tools to assist in verification such as Random Instruction Generator and RTL plus Reference model via a Verification framework.

Conclusion

As we move into an era of specialized processing, the general purpose processing of the closed ISA architectures will be replaced by processor’s whose software/hardware interface is defined by a partnership between software and hardware engineering. The number of different applications far exceeds the number of traditional processor providers. A new generation of processor companies and designers will be solving tomorrow’s computational challenges using open ISAs like RISC-V with software engineering playing a vital architectural role.

Other blog posts