Processor design automation to drive innovation and foster differentiation
July 7, 2022
July 7, 2022
With semiconductor scaling no longer being an option in most situations, optimization means customizing the processor for your specific application. With the right approach and right tools, processor design automation can enable innovation and differentiation. One way of achieving this is to create an application-specific processor by owning the design. To do this efficiently, manual efforts should be reduced to the minimum. Let’s see, in this blog post, how processor design automation can drive innovation and foster differentiation.
The semiconductor industry is facing scaling limitations (if you haven’t yet, read our white paper on semiconductor scaling) for new applications that require efficient execution of algorithms for data processing. For example, vision, voice and vibration applications. In this context, the only way forward to differentiate is architectural innovation.
The ideal baseline for differentiating is the RISC-V ISA. Free, open, and modular, it allows custom extensions to create a unique processor tailored for specific needs and applications. Most of the time, there is no need to create an entirely new product from scratch. Customizing an existing commercial RISC-V processor is the most efficient way to design a new product with optimal features and PPA.
This approach, which is getting more and more attention, brings new opportunities for software and hardware developers, with complete design freedom. These new opportunities also come with efforts that have not been experienced before. Indeed, any modification to the processor architecture must be reflected in both hardware and software, and be verified. To minimize efforts, make the best use of resources and reduce time to market, processor design automation is key.
Customization requires the right tools and needs to be considered from the beginning. Codasip RISC-V cores are all designed in CodAL, with customization in mind, so they can be modified seamlessly. Based on C, CodAL is an architecture description language very close to standard programming languages, easy to adopt for processor design automation. This ownership gives you design freedom while keeping control of costs and resources.
Design freedom can start with architecture exploration. Codasip Studio with CodAL generates a Software Development Kit that includes all the tools software programmers will need. Profiling benchmarks, getting performance statistics, making some changes in the design, and seeing the results in just a few minutes: this is all possible with Codasip Studio. But that’s not all.
Studio and CodAL generate everything needed to be ready for production. Indeed, customization is not just about modifying the RTL. It also includes generating all tools required to design a quality core that can be monetized. Codasip customization solutions take care of this. Customers modify the core as needed in CodAL, the rest is automated.
This unique description language allows the automated generation of the hardware and software tools that are required. With a single, unified toolchain, our customers automatically get the RTL, simulators, testbenches, the verification environment, and a customized compiler that understands their custom hardware and how to take advantage of it. They create a unique product with tools that simplify processor design and verification for all developers.
Processor design automation with Codasip solutions is something we will talk about extensively at DAC 2022 in July, the Design Automation Conference held in San Francisco, California. If you would like to know more about it, visit us at our booth or book a meeting with us.
May 23, 2022
The RISC-V’s open Instruction Set Architecture (ISA) has spurred the innovation of free software tools and application software. Many of these software developments are software “islands” that must be combined through scripts. With different tools from different sources, continual interoperability is at risk and there is a support cost of monitoring and updating their interoperability. The alternative is a single unified toolchain. The benefit of the unified approach is twofold. First, the obvious case in that the toolchain upgrades are validated before releases ensuring a functional toolchain reducing or eliminating the required monitoring of different tools and upgrading required scripts. The second benefit is not as obvious but may be the most important: through a single unified toolchain, researchers benefit from enhanced tools.
The Codasip’s University Program makes available Codasip Studio, a single unified for processor architecture’s ISA specifications, development of the processor’s hardware architecture, software development tools, to outputting RTL such as Verilog for FPGA and ASIC synthesis. The University Program has been developed to spur innovation in research and student curriculum. Let’s see, in this blog post, how Codasip Studio will benefit researchers and engineering students.
Codasip Studio outputs a Hardware Development Kit (HDK) and a Software Development Kit (SDK). The HDK includes all the tools and tasks to specify a processor’s Instruction Accurate (IA) model and implement it as a Cycle Accurate (CA) model. These models are specified using CodAL, a C-like processor description language that enables high-level synthesis description of both the IA and CA models. The SDK includes all the tools required for software application development starting at the assembler to linker, to C-compiler, to C-libraries, through software simulator.
As the status of these integrated development tools is aware of each other, as in a makefile software build, only the required tasks are built. For example, if you modify a file that changes the IA model and is shared with the CA model, all the tasks for both the IA and CA model will be marked as not built. Upon requesting a higher-level task to be built, all lower dependent tasks that are not built will be built first. A good example is the SDK (IA) task. If this task is built, all dependent tasks such as Model Compilation, Assembler, Disassembler, Profiler, Simulator, Debugger, C/C++ compiler, and SDK libraries will be built if necessary. No script development or toolchain maintenance required.
How can Codasip Studio’s unified toolchain enhance research? Tightly coupling application integration with program acceleration into a processor’s core is a new research domain. To make engineering decisions, data is required. Codasip Studio’s integrated Profiler can analyze a software application to determine where clock cycles are spent, enabling researchers to focus where new instructions can result in program acceleration through cycle count reduction.
As in this example, the profiler annotates the C-program to highlight where clock cycles are spent, and the associated assembly sequence. The Researchers can minimize the original sequence into a single new instruction with the objective of not elongating the clock period.
Creating new instructions for acceleration remains in the hypothetical sphere until it can be incorporated into useful applications. Codasip Studio compiler’s input is the processor’s IA model. Analyzing data from the profiler, researchers define a new instruction into the processor’s ISA, and upon rebuilding the SDK (IA), a compiler will be aware of the new instruction and use it in subsequent program builds.
From the program disassembly above, the newly added instruction has been incorporated into the compiler to replace the original two RISC-V instructions. Application acceleration has been achieved through cycle count reduction. The program cycle count can also be verified through running the updated program through the Profiler.
The entire process from initial application profiling, minimizing the instructions, to implementing a new instruction into the IA and CA models, and verification of program cycle count reduction can be achieved in an hour or two. The unified toolchain enables a very tight loop from data to concept to experimentation. With these short development cycles, researchers can easily experiment to find the optimal solution.
Get started with the University Program to explore Codasip Studio’s unified toolchain and how it can benefit your research in Program Security, Functional Safety, Artificial Intelligence, Real-Time Embedded Systems and other Domain Specific Architectures.
May 2, 2022
With closed processor Instruction Set Architectures (ISA) with limited access to processor Intellectual Property (Arm and x86), university professors have often limited their research to two main spheres: optimizing software algorithm(s) and external hardware. University researchers have not been able to consider optimizing the processor due to the lack of access to processor Intellectual Property (IP). Where these two spheres overlap, trade-offs are made to optimize the solution. A conventional research barrier is the exclusion of processor architecture optimization. Coprocessors or external accelerators can be explored, but they are limited and costly in solving tomorrow’s technological challenges in Processor Security, Functional Safety, Intelligent Memories, and Artificial Intelligence.
We launched the Codasip University Program in March 2022 to support you, engineering professors and students, and advance technology that will solve tomorrow’s technological challenges. Because of Moore’s Law and Dennard’s Scaling challenges, computer architects have developed solutions through integrating multiple homogenous and heterogenous cores. Tightly coupling application acceleration and application-specific requirements into the processor core is a new research domain to solve tomorrow’s computational needs. Let see, in this blog post, how you can jump start on this opportunity with our program.
Conventional research has been limited to software algorithms and external hardware resources due to fixed and closed processor architectures. Unfortunately, an important component of the research equation – the processor – has been left out.
Let’s start with an example. Sequential memory elements such as register files and pipeline registers are not commonly protected against single bit upsets that may occur via an alpha particle or a security attack. To protect from these upsets, external processor monitors or a 1 out of 2 voting strategy can be considered, but greatly increasing the design and validation complexity at increased cost.
Optimizing the processor architecture itself is missing from the above solutions. Three elements are now available to you through the Codasip University Program to break through this barrier and to include tightly coupling the application’s requirements into the processor.
With access to RISC-V cores and Codasip Studio, you, university researchers and students, can now explore new processor architectures that integrate application-specific features and acceleration – and ultimately become tomorrow’s solutions and engineers.
With RISC-V IP and Codasip Studio, resources can now be brought into the processor for optimization and solution trade-offs can occur between all three spheres.
Continuing our single bit upset example, can we solve this fault by integrating a solution into the processor to protect its memory and register bits?
Processor architecture optimization involves two key concepts: tightly coupling application-specific functionality into the processor and enhancing the processor performance through cycle-count reduction.
Using Codasip Studio and RISC-V cores, you can add Hamming encoding to write to the register file and decoding upon reads. The register file is now protected through two-bit error detection and single-bit error correction (ECC) by developing functions in the processor’s Cycle Accurate (CA) model using CodAL.
CodAL is an architectural high-level description language that describes the processor’s ISA (Instruction Accurate (IA) model) and the hardware implementation (Cycle Accurate (CA) models). Pipeline registers can be protected with parity to provide real-time bit error detection. When a fault is detected, the parity checker can assert a processor exception for handling. ECC and parity can be extended to either the L1 or L2 caches.
For a set of applications, would integrating the processor into solving single bit upsets reduce design complexity, development and validation time, as well as solution cost? Applications can be accelerated by reducing program clock cycles assuming the clock frequency remains constant. Using Codasip Studio’s profiler, you can analyze the most common sequence of operations to replace two or more RISC-V instructions with a single new instruction. Using CodAL to update both the IA and CA models, this new instruction becomes available to the application developer through Codasip Studio’s assembler and C-compiler.
Empowering processor architecture optimization enables you to imagine new avenues of research that was not feasible before. Here are just three possibilities…
Get started with the Codasip University Program to explore new processor architecture optimizations through integrating application-specific functionality and acceleration.
March 16, 2022
Keith Graham has been appointed to lead the new Codasip University Program. From helping tomorrow’s processor experts to developing the technologies that will solve tomorrow’s technical challenges, and accelerating innovation, we asked Keith what it is all about. Keith explains how the University Program will help today’s engineering students become the next generation of processor engineers our industry needs.
Becoming the Head of Codasip’s University Program is my dream job. The technological challenges of tomorrow are yet to be solved and the next generation of processor engineers will need innovative, best-in-class IP and technology to achieve this. Before joining Codasip, I was already convinced by the benefits of customizing RISC-V processors using Codasip’s unique technology. Having many years of experience developing courses for the University of Colorado, it felt obvious to me that Codasip and universities could do great things together.
Over my thirty-seven years upon graduating Penn State, I have been a hardware design engineer, worked in start-ups, sold semiconductors, a small business owner, and a senior instructor at the University of Colorado at Boulder. It is time for me to give back to the next generation.
In the 1980s, it was an era that it was not difficult to find a company that was developing a custom processor, but it ended due to the need to standardize software. The number of mainstream processors narrowed to around 6 in the 1990s. Now, with the open architecture of RISC-V, it solves the issue of standardized software with the advantage of enabling processor customization.
To solve tomorrow’s technology challenges in security, artificial intelligence, and many other domain specific applications, we need a new generation of processor engineers.
We are at the start of a new golden age of processor designs. Through the University Program, we will be making available innovative curriculum material, supporting research faculty, and creating an ecosystem to spur innovation and product development.
The Codasip University program helps universities develop the theory and the design skills that companies developing tomorrow’s SoCs will need. Together with our technology partners we provide engineering students and researchers with the support they need for their research projects.
Students and researchers will be provided with computer engineering curriculums, assignments, materials, and industry-grade tools.
By partnering with universities, we create a Design for Differentiation Ecosystem that will encourage sharing of knowledge, experiences, ideas and designs. Universities will have access to FAQs, knowledge boards, a design database to share solutions, and will be able to participate in community activities such as design contests.
It is essential to provide students with access to CodAL and Studio. This unique technology will enable them to focus on becoming innovative processor designers. CodAL, our patented architecture description language, is more efficient and less error prone compared to using a less abstracted language like Verilog. Perfect for students.
With Studio, we want to provide the ideal processor design automation platform that will help future SoC designers build their ideas into something that could become a commercial product.
Interested in the Codasip University Program? Learn more on our website and get in touch with us.
February 26, 2021
CodAL, standing for Codasip Architectural Language, is central to developing a processor core using Codasip Studio. The language has a C-like syntax and it merges good practices and code constructs from conventional programming languages and hardware description languages. It has been developed from the outset to describe all aspects of a processor including both the instruction set architecture (ISA) and microarchitecture.
Each processor description includes four elements in its description:
The architectural or instruction accurate (IA) model contains the instruction set, architectural resources, and the semantics. The micro-architectural or cycle accurate (CA) model contains the instruction set, architectural resources and the micro-architectural implementation.
CodAL description is object-oriented, meaning that an object can be instantiated into a more complex object, the complex object into an even more complex one, etc. CodAL allows information to be passed through the object hierarchy without having to use complex function calls.
The CodAL element is a common example of an object, and in the following example we show an element describing a multiply-accumulate instruction.
The use statement describes the resources that are used by the instruction – a destination register (dst) and two source registers (src1, src2). Next, the assembly statement describes the assembler mnemonic (mac) and its arguments. The binary statement describes the binary representation of the instruction and arguments. Finally, the semantics statement describes the multiply-accumulate operation.
The CodAL description is used by the Codasip Studio toolset to generate an SDK and an HDK. For example, the element description would be used when generating the instruction set simulator (ISS), assembler, disassembler, C/C++ compiler, and debugger for the processor core.
The CA description would take advantage of the instruction set and resources descriptions used for the IA models. In addition, the CA description would specify microarchitectural features such as the pipeline length.
The cycle-accurate parts of the CodAL description would be used for generating the cycle-accurate simulator, RTL, testbench, and UVM environment. In this way CodAL is the single source for all aspects of the processor hardware and software. In contrast, some other processor development tools require two languages to describe the processor core. The methodology also enables powerful verification of generated RTL against a golden reference model in generated UVM environment.
February 11, 2021
Using the open RISC-V ISA is a great starting point for creating a domain-specific processor that combines application-specific capabilities and access to portable software. But how do you create an optimized ISA, profiling software and experimenting with adding/removing instructions, in a smart and easy way? In other words, how do you customize an existing RISC-V processor efficiently?
The industry will generally offer you two approaches. Either you do it manually or you automate the process as much as possible.
The old-fashioned way to modify the instruction set would be to:
This requires an extensive amount of manual work with associated technical risks. The resulting SDK will almost certainly need to make any custom instructions available as intrinsics or as inline assembler code. The alternative of modifying and verifying the compiler is costly in effort, but the end result is much better for the software developers. Similarly, if a processor is extended, traditionally it would be necessary to modify the microarchitecture by editing the RTL and then verify it against the ISS as the golden reference.
In contrast, describing the ISA in a processor description language like CodAL significantly improves the efficiency of this process using design automation tools. Codasip Studio can automatically generate both the ISS and a new compiler for the modified ISA, making the processor customization process more straightforward. But the CodAL processor description language is not just limited to creating instruction-accurate descriptions – it can also be used to describe microarchitecture (cycle-accurate). The consistency of the two descriptions can be checked within the Studio environment using static analysis.
A far easier approach is to not just to start with the RISC-V ISA, but with a complete RISC-V processor core design described in CodAL.
Codasip’s RISC-V processors are all designed in Codasip Studio using the CodAL language. The range of cores spans simple 32-bit embedded cores to 64-bit Linux-capable application processors with multi-core capabilities. Thus, you can choose a processor that meets known baseline requirements such as pipeline depth and/or OS support, and then focus on creating custom extensions to improve performance. Starting with an already-proven processor design means that the microarchitecture for any new instructions is incremental, saving time and significantly reducing risk.
Codasip Studio can be used to generate the HDK including the RTL, a testbench, EDA scripts, and a UVM environment. The UVM environment enables the all-important verification of the new processor RTL against its golden ISS reference. The generated UVM environment includes default cover points and assertions for key areas of functionality such as register files, bus protocols, memories, and caches. The third-party RTL simulator can measure the functional and RTL code coverage achieved.
After extending the microarchitecture, Codasip Studio profiler provides coverage analysis tools to assess code coverage of the CodAL, including line, condition, and expression coverage. In order to achieve acceptable code coverage, Codasip Studio provides random assembler generators to exercise the code very thoroughly. In difficult corner cases, it may be necessary to write directed tests.
All-in-all a Codasip RISC-V processor licensed as CodAL source code can be efficiently modified in the Codasip Studio environment and thoroughly verified. This is a cost-effective and super-efficient approach to creating domain-specific processors.
Learn more in our white paper “Creating Domain-Specific Processors with RISC-V custom ISA instructions”.
January 29, 2021
If you are going to create a domain-specific processor, one of the key activities is to choose an instruction set architecture (ISA) that matches your software needs. So where do you start?
Some companies have created their instruction sets from scratch, but if you have such an ISA, a penalty may be the costs of porting software. Today, the RISC-V open ISA can provide you with an excellent starting point and a software ecosystem. Depending on what you need, there are several obvious starting points. In case of a 32-bit processor, if you start with RISC‑V, the base ISA (RV32I) is just 47 instructions. Using this base set is easier than creating proprietary instructions with similar functionality, as well as meaning that software is already available from the RISC-V ecosystem.
|Starting point||Number of instructions|
Many use cases require multiplication suggesting that [M] extensions would be useful, and it is sensible to take advantage of the 16-bit compressed [C] instructions for code density, so it is commonplace to use the RV32IMC set which amount to 101 instructions. Using RISC-V as a starting point will ensure that it is straightforward to use common software such as an RTOS or protocol stack. If you additionally require floating point computation, then the RV32GC (G=IMAFD) instructions may be appropriate, additionally including atomic [A], single-precision floating point [F], and double-precision floating point [D] extensions. Even RV32GC only has 164 instructions.
The RISC-V ISA is designed in a modular way that allows processor designers to add not only any of the standard extensions, but also to create their own custom instructions while keeping full RISC-V-compliance. The standard extensions are a convenient option thanks to being readily available; however, some may substantially increase the instruction set complexity. For example, the complete set of packed SIMD extensions [P] adds 331 additional instructions. In many cases, sufficient gains for a particular application can be made with custom instructions with potentially a lower overhead in silicon area and power.
Having chosen the starting point for your domain-specific processor, it is then necessary to work out what special instructions are needed to meet your computational requirements. This requires a careful analysis of the software that you need to run on your processor core. A profiling tool allows computational hotspots to be identified. Once such hotspots are known, a designer can create custom instructions to address them. Therefore, a designer can iterate by experimenting with adding or deleting instructions, then profiling the software again and assessing whether the changes have achieved their objectives.
However, while this iterative process is logical, how would you actually do it in practice? You might be able to access an open-source instruction set simulator and toolchains such as GNU or LLVM, but modifying these by hand is something for toolchain specialists and is time-consuming.
The alternative is to describe the instruction set using a processor description language. In Codasip Studio, an instruction accurate (IA) model of a processor can be created using the CodAL processor description language. An SDK including compiler, instruction set simulator (ISS), debugger, and profiler can be automatically generated from the IA description.
By describing the ISA at a high level and automatically generating the SDK, it is possible to rapidly iterate experiments in extending the instruction set. In this way, it is possible to choose a well-optimized ISA for a domain-specific processor, sometimes known as an Application Specific Instruction Processor (ASIP). Generating the SDK automatically is not only faster, but less prone to errors than manual changes, meaning that the design process is cheaper and more predictable, avoiding unnecessary risk and roadmap disruptions.
July 22, 2020
In the last three months, Codasip’s RISC-V processor offering has expanded considerably. For some years, Codasip has supplied Bk3 and Bk5 RISC-V cores aimed at low- to medium-complexity embedded applications. But recently four additional cores have joined the Codasip RISC-V offering.
Three of the cores, the SweRV Core™ EH1, EH2 and EL2, were designed by Western Digital and were open-sourced through CHIPS Alliance. These 32-bit cores are mainly aimed at high-performance embedded applications and complement the existing 32-bit Bk3 and Bk-5 cores. The EH1 offers outstanding embedded performance due to its superscalar, dual issue architecture. Even more performance is delivered by the EH2 which provides two hardware threads (harts). The EL2 core is more compact and is a single-issue core.
The RTL for all three SweRV Cores is available on GitHub free of license fees. However, RTL alone is not sufficient to use a SweRV Core in an SoC design. Firstly, a complete software toolchain is needed to allow embedded software to be developed. Secondly, a comprehensive EDA design flow needs to exist to undertake simulation, static analysis, and synthesis of the core’s RTL. It is important that the core can be easily integrated with peripherals, memories, and buses in order to implement a sub-system. EDA design flows need to keep up with revisions in both the processor IP and the EDA tools.
In December 2019, Western Digital and Codasip announced that they were cooperating to enable the deployment of open-source SweRV Cores in production silicon. Codasip’s SweRV Core Support Package (SSP) provides all of the components necessary to design, implement, test, and write software for a SweRV Core-based system-on-chip, including but not limited to verification testbenches and intellectual property, reference scripts for leading EDA flows, models for simulation and emulation, and software development tools.
The Support Package is available in a Free version consisting of open-source components and mainly aimed at academic use, and in a Pro version aimed at commercial SoC design using commercial EDA tools. The SweRV Core Support Package for EH1 was released in April and support for EH2 and EL2 was added in June. In addition, Codasip offers services for customizing SweRV Cores.
Although the semiconductor industry regularly talks of comparing processor cores in terms of performance versus complexity or in terms of PPA (performance, power, area), both performance and complexity have different aspects. Many Systems-on-Chip (SoCs) use multiple cores and face different requirements for different functions. For example, a core for a subsystem such as Wi-Fi will have quite different needs to one running a feature-rich OS such as Linux.
The Codasip Bk7 RISC-V core, announced yesterday, is Codasip’s first application processor. Like all previous Bk core designs, it has been designed in the Codasip Studio processor design system. This means that its architecture can be readily modified to create application-specific processors. It has all the features needed for running embedded Linux and will be the cornerstone for further application processor developments. The Bk7 core is a 64-bit core with a 7-stage pipeline and memory management unit (MMU). Future versions of the Bk7 will support symmetric and heterogenous multi-processing.
In future posts we will be looking into some of the different facets of processor performance and complexity in order to see how the expanded Codasip offering can be applied to varying applications. We will also provide more detailed information on the Bk7 processor core.
May 6, 2020
Last month was the 55th anniversary of Gordon Moore’s famous paper Cramming more components onto integrated circuits. He took a long-term view of the trends in integrated circuits being implemented using successively smaller feature sizes in silicon. Since that paper, integrated circuit developers have been relying on three of his predictions:
These predictions have largely held true for almost half a century, enabling successive generations of processors to achieve higher computational performance through greater processor complexity and higher clock speeds. These improvements were mainly delivered through general-purpose processors implemented in new technology nodes.
From about 2005, the improvements in clock frequency began to level off, leading to a levelling-off of single thread performance. Since then, using multiple cores on a single die has become commonplace, but again these cores were mainly general-purpose ones, whether application processors or MCUs.
If you are designing a chip with some performance challenges, do you simply follow Moore’s law and move into a smaller silicon geometry and use general-purpose processor cores? That could be a costly approach since mask-making costs are higher in small geometries. Also, you may not achieve your performance in the most efficient way.
Many embedded applications involve cryptography, DSP, encoding/decoding or machine learning. Each of these operations typically run inefficiently on a general-purpose core. For example, Galois fields are commonly used in cryptography, but multiplication operations take many clock cycles.
So, what can be done to deal with computational bottlenecks? In extreme cases, like dealing with real time video data, it may make sense to create dedicated computational units to handle a narrow range of computationally intensive operations. However, this may not be the best trade-off.
It will often be desirable to run both computationally intensive operations and other embedded functions on the same processor core. Ideally, the most silicon-efficient processor implementation will be one that is tuned for your particular application.
This can be achieved by creating a processor core that has custom instructions targeted to address the bottlenecks. Adding custom instructions does not have to be expensive in silicon resources. For example, Microsemi created custom DSP instructions for their RISC-V-based audio processor products: Their custom Bk RISC-V processor delivered 4.24× the performance of the original RV32IMC core but required only a 48% increase in silicon area. Furthermore, the code size shrunk to 43% of the original size1.
So how can you efficiently implement custom instructions? And equally importantly, how do you verify that they are implemented correctly? We will be visiting these topics on future posts.
1 Dan Ganousis & Vijay Subramaniam, Implementing RISC-V for IoT Applications, Design Automation Conference 2017