We’re exhibiting at Embedded World 2023. Book a meeting with us.

What is an ASIP?


April 16, 2021

ASIP stands for “application-specific instruction-set processor” and simply means a processor which has been designed to be optimal for a particular application or domain.

General-purpose vs. domain-specific or application-specific processors

Most processor cores to date have been general-purpose, which means that they have been designed to handle a wide range of applications with good average performance. This may mean that if you have some special computationally intensive algorithm, such as audio processing, you may need a high-performance core (for example with a SIMD unit, or zero overhead loops) or a high clock frequency to achieve your needs. This may result in exceeding your silicon or power budget.

An alternative is to create an ASIP which has a specialized architecture, optimized to efficiently achieve the required performance for the audio processing. The ASIP would not usually be designed to optimally handle more generic operations such as those needed for an operating system. Instead, if an OS is needed, you would probably run it on a separate general-purpose core which would not be constrained by the need to run audio processing algorithms. Thus, the ASIP design is optimized for performance with just enough flexibility to meet its use case.

ASIPs have been the subject of research in universities around the world and have been applied to a number of domains such as audio signal processing, image sensors, and baseband signal processing. Codasip founder and President Karel Masařík with CTO Zdeněk Přikryl researched design automation for ASIPs at the TU Brno. To read more on their research, check the papers listed at the end of this article.

ASIPs to address semiconductor scaling issues

ASIPs or domain-specific processors are likely to be used more widely in the future due to semiconductor scaling issues. For decades, developers of SoCs have relied on Moore’s law and Dennard Scaling to get more and more performance and circuit density through successively finer silicon geometries.

While this scaling applied, it was generally reasonable to use general-purpose processor cores and to rely on new generations of silicon technology to deliver the performance required with an acceptable power consumption. However, this scaling has been known to have broken down, with smaller increments in performance improvement and leakage current worsening power consumption. Because of this, the semiconductor industry must change, and a new approach to processing is needed.

To date, this industry challenge has been tackled by incorporating different types of general-purpose cores on a single SoC. For example, mobile phone SoCs have combined application processors, GPUs, DSPs, and MCUs, but none of these classes of processor have application-specific instruction sets.

With new products demanding new algorithms for artificial intelligence, advanced graphics and advanced security, more specialized hardware accelerators are needed and are being developed. Such accelerators are designed to handle computationally demanding algorithms efficiently. Each accelerator will need an optimized instruction set and microarchitecture – in other words, an ASIP.

Domain-specific accelerators. Source Codasip.

ASIP and software development

Domain-specific processors (Application-specific processors, ASIP… you will notice we use these terms interchangeably) combine hardware specialization with flexibility through software programmability.

One challenge of creating custom hardware is ensuring the needs of software developers are met too. Techniques such as intrinsic instructions allow direct access to the instruction set from C, but reduce the flexibility of the code. They may still be a viable answer when complex functions are encapsulated in a single instruction.

But better still would be a top-down approach. Instead, a compiler created for your domain-specific accelerator, automatically targets the custom instructions you create. This, along with an instruction set simulator and profiler, accelerates the rate at which you can try different design iterations to converge on an optimal solution.

Codasip Studio provides a complete solution for this process with the ability to generate ISA and compiler from a simple instruction level model.

Design automation and verification of application-specific processors

With the need to develop many and varied accelerators, it is not efficient to develop the instruction sets and microarchitecture manually. Application software can be analyzed by profiling and the instruction set tuned to those needs. A processor design automation toolset like Codasip Studio can be then used to generate a software toolchain and an instruction set simulator.

Specifically, Codasip Studio automates the generation of a C/C++ compiler that is fully aware of the instruction set and can infer specific instructions automatically. It is important to have the C/C++ compiler in the loop, to see the impact of the application-specific instructions. The same applies for instruction set simulators, debuggers, profilers, and other tools in SDK that are automatically generated. Codasip Studio can be also used to generate hardware design including RTL, testbenches, and a UVM environment.

Did you know?

Codasip was originally founded to create co-design tools for ASIPs, hence the name “Co-dASIP”. Codasip Studio has subsequently been used for creating more general-purpose cores such as RISC-V embedded and application cores too.

Processor design does not end with generating the RTL code – the dominant part of the design cycle is verification and it needs to be rigorous. As Philippe Luc explained to Semiconductor Engineering, RTL verification is multi-layered and complex.

Codasip Studio can automate key parts of the verification process in order to help the full flow of ASIP design being quicker. Among other things, Codasip Studio provides automated coverage points for the optimised instructions. The provided UVM environment makes it easy to run programs (including the new instructions) on both model and RTL with result comparison. A constrained random program generator is provided to help closing coverage and ensure quicker verification for our customers.

To learn more about the power of Codasip Studio combined with RISC-V, read our white paper on “Creating Domain-Specific Processors with custom RISC-V ISA instructions”.

Codasip research papers on design automation for ASIP

  • MASAŘÍK Karel, UML in design of ASIP, IFAC Proceedings Volumes 39(17):209-214, September 2006.
  • ZACHARIÁŠOVÁ Marcela, PŘIKRYL Zdeněk, HRUŠKA Tomáš and KOTÁSEK Zdeněk. Automated Functional Verification of Application Specific Instruction-set Processors. IFIP Advances in Information and Communication Technology, vol. 4, no. 403, pp. 128-138. ISSN 1868-4238.
  • PŘIKRYL Zdeněk. Fast Simulation of Pipeline in ASIP simulators. In: 15th International Workshop on Microprocessor Test and Verification. Austin: IEEE Computer Society, 2014, pp. 1-6. ISBN 978-0-7695-4000-9.
  • HUSÁR Adam, PŘIKRYL Zdeněk, DOLÍHAL Luděk, MASAŘÍK Karel and HRUŠKA Tomáš. ASIP Design with Automatic C/C++ Compiler Generation. Haifa, 2013.
Roddy Urquhart

Roddy Urquhart

Related Posts

Check out the latest news, posts, papers, videos, and more!

Why Codasip Cares About Processor Verification – and Why you Should too

February 28, 2022
By Philippe Luc

Why and How to Customize a Processor

October 1, 2021
By Roddy Urquhart

What does RISC-V stand for?

March 17, 2021
By Roddy Urquhart

Customizing an Existing RISC-V Processor


February 11, 2021

Using the open RISC-V ISA is a great starting point for creating a domain-specific processor that combines application-specific capabilities and access to portable software. But how do you create an optimized ISA, profiling software and experimenting with adding/removing instructions, in a smart and easy way? In other words, how do you customize an existing RISC-V processor efficiently?

How to modify the instruction set?

The industry will generally offer you two approaches. Either you do it manually or you automate the process as much as possible.

The manual approach

The old-fashioned way to modify the instruction set would be to:

  1. Modify the instruction set simulator (ISS) to change the ISA.
  2. Update the SDK to reflect the new set of target instructions.

This requires an extensive amount of manual work with associated technical risks. The resulting SDK will almost certainly need to make any custom instructions available as intrinsics or as inline assembler code. The alternative of modifying and verifying the compiler is costly in effort, but the end result is much better for the software developers. Similarly, if a processor is extended, traditionally it would be necessary to modify the microarchitecture by editing the RTL and then verify it against the ISS as the golden reference.

The automated approach

In contrast, describing the ISA in a processor description language like CodAL significantly improves the efficiency of this process using design automation tools. Codasip Studio can automatically generate both the ISS and a new compiler for the modified ISA, making the processor customization process more straightforward. But the CodAL processor description language is not just limited to creating instruction-accurate descriptions – it can also be used to describe microarchitecture (cycle-accurate). The consistency of the two descriptions can be checked within the Studio environment using static analysis.

A far easier approach is to not just to start with the RISC-V ISA, but with a complete RISC-V processor core design described in CodAL.

Automating the customization of a RISC-V CPU with Codasip Studio

Codasip’s RISC-V processors are all designed in Codasip Studio using the CodAL language. The range of cores spans simple 32-bit embedded cores to 64-bit Linux-capable application processors with multi-core capabilities. Thus, you can choose a processor that meets known baseline requirements such as pipeline depth and/or OS support, and then focus on creating custom extensions to improve performance. Starting with an already-proven processor design means that the microarchitecture for any new instructions is incremental, saving time and significantly reducing risk.

Codasip Studio verification flow diagram

Codasip Studio can be used to generate the HDK including the RTL, a testbench, EDA scripts, and a UVM environment. The UVM environment enables the all-important verification of the new processor RTL against its golden ISS reference. The generated UVM environment includes default cover points and assertions for key areas of functionality such as register files, bus protocols, memories, and caches. The third-party RTL simulator can measure the functional and RTL code coverage achieved.

After extending the microarchitecture, Codasip Studio profiler provides coverage analysis tools to assess code coverage of the CodAL, including line, condition, and expression coverage. In order to achieve acceptable code coverage, Codasip Studio provides random assembler generators to exercise the code very thoroughly. In difficult corner cases, it may be necessary to write directed tests.

Diagram that shows the Codasip Studio flow

All-in-all a Codasip RISC-V processor licensed as CodAL source code can be efficiently modified in the Codasip Studio environment and thoroughly verified. This is a cost-effective and super-efficient approach to creating domain-specific processors.

Learn more in our white paper “Creating Domain-Specific Processors with RISC-V custom ISA instructions”.

Roddy Urquhart

Roddy Urquhart

Related Posts

Check out the latest news, posts, papers, videos, and more!

design automation to drive innovation and differentiation

July 7, 2022
By Lauranne Choquin

Single unified toolchain empowering processor research

May 23, 2022
By Keith Graham

Design for differentiation: architecture licenses for custom compute in RISC‑V

May 16, 2022
By Rupert Baines

How to Choose an Architecture for a Domain-Specific Processor


January 29, 2021

If you are going to create a domain-specific processor, one of the key activities is to choose an instruction set architecture (ISA) that matches your software needs. So where do you start?

Some companies have created their instruction sets from scratch, but if you have such an ISA, a penalty may be the costs of porting software. Today, the RISC-V open ISA can provide you with an excellent starting point and a software ecosystem. Depending on what you need, there are several obvious starting points. In case of a 32-bit processor, if you start with RISC‑V, the base ISA (RV32I) is just 47 instructions. Using this base set is easier than creating proprietary instructions with similar functionality, as well as meaning that software is already available from the RISC-V ecosystem.

Starting point Number of instructions
RV32I 47
RV32IMC 101
RV32GC 164

Many use cases require multiplication suggesting that [M] extensions would be useful, and it is sensible to take advantage of the 16-bit compressed [C] instructions for code density, so it is commonplace to use the RV32IMC set which amount to 101 instructions. Using RISC-V as a starting point will ensure that it is straightforward to use common software such as an RTOS or protocol stack. If you additionally require floating point computation, then the RV32GC (G=IMAFD) instructions may be appropriate, additionally including atomic [A], single-precision floating point [F], and double-precision floating point [D] extensions. Even RV32GC only has 164 instructions.

The RISC-V ISA is designed in a modular way that allows processor designers to add not only any of the standard extensions, but also to create their own custom instructions while keeping full RISC-V-compliance. The standard extensions are a convenient option thanks to being readily available; however, some may substantially increase the instruction set complexity. For example, the complete set of packed SIMD extensions [P] adds 331 additional instructions. In many cases, sufficient gains for a particular application can be made with custom instructions with potentially a lower overhead in silicon area and power.

Having chosen the starting point for your domain-specific processor, it is then necessary to work out what special instructions are needed to meet your computational requirements. This requires a careful analysis of the software that you need to run on your processor core. A profiling tool allows computational hotspots to be identified. Once such hotspots are known, a designer can create custom instructions to address them. Therefore, a designer can iterate by experimenting with adding or deleting instructions, then profiling the software again and assessing whether the changes have achieved their objectives.

However, while this iterative process is logical, how would you actually do it in practice? You might be able to access an open-source instruction set simulator and toolchains such as GNU or LLVM, but modifying these by hand is something for toolchain specialists and is time-consuming.

The alternative is to describe the instruction set using a processor description language. In Codasip Studio, an instruction accurate (IA) model of a processor can be created using the CodAL processor description language. An SDK including compiler, instruction set simulator (ISS), debugger, and profiler can be automatically generated from the IA description.

By describing the ISA at a high level and automatically generating the SDK, it is possible to rapidly iterate experiments in extending the instruction set. In this way, it is possible to choose a well-optimized ISA for a domain-specific processor, sometimes known as an Application Specific Instruction Processor (ASIP). Generating the SDK automatically is not only faster, but less prone to errors than manual changes, meaning that the design process is cheaper and more predictable, avoiding unnecessary risk and roadmap disruptions.

Roddy Urquhart

Roddy Urquhart

Related Posts

Check out the latest news, posts, papers, videos, and more!

design automation to drive innovation and differentiation

July 7, 2022
By Lauranne Choquin

Single unified toolchain empowering processor research

May 23, 2022
By Keith Graham

Design for differentiation: architecture licenses for custom compute in RISC‑V

May 16, 2022
By Rupert Baines

When Considering Processor PPA, Don’t Forget the Instruction Memory


November 12, 2020

The area of any part of a processor design contributes both to the silicon cost and to the power consumption. A simplistic following of the “A” in a processor IP vendor’s PPA numbers can be misleading. A processor is never used in isolation but is part of a subsystem additionally including instruction memory, data memory, and peripherals. In most cases, instruction memory will be dominant and the processor area is much less important.

What impacts the size of the instruction memory in a processor?

The size of the instruction memory will be influenced by the target instruction set, the compiler and the compiler switches used. In the case of RISC-V, the choice of optional standard extensions and custom extensions can greatly influence the codesize.

Instruction set and processor memory: the example of Microsemi

To illustrate this, the following table shows the effect of adding extensions to both core and codesize.

ISA Core size (kgates) Increase over base Codesize (kbytes) Decrease over base
RV32I (base) 16.0 × 1.0 232 × 1.0
RV32IM 26.2 × 1.6 148 × 1.6
RV32IM+DSP 38.7 × 2.4 64 × 3.6

Source: Implementing RISC-V for IoT applications, Dan Ganousis & Vijay Subramaniam, DAC 2017

In this example, Microsemi used a Codasip RISC-V L31 processor to implement an audio processing application. Starting with just the 32-bit base instruction set, they had an unacceptably high codesize and cycle count. Some improvement was achieved by adding multiplication [M] extensions, but the breakthrough was using custom DSP instructions. These led to a 3.6× reduction in codesize at the price of a 2.4× increase in core size compared with the base core. With instruction memory dominating the area, this was a good trade-off; furthermore, the performance goals were readily achieved.

Compilers, complex switches and PPA

With typical vendor PPA data, synthetic benchmarks such as CoreMark/MHz are often quoted with a complex set of compiler switches – this is something we discussed in our article dedicated to processor performance. But in practice, embedded software is probably going to be compiled using common switches such as ‑Os or ‑O3.

Consider compiling the CoreMark benchmark with different switches using the common GCC compiler. In this case, the target was a Codasip RV32IMC RISC-V core with a 3-stage pipeline. 

The chart below shows CoreMark/MHz and codesize measures for different compiler settings. The last example is one that is typical of vendor performance data where many switches are used for CoreMark (CM = “-O3 -flto -fno-common -funroll-loops -finline-functions -falign-functions=16 -falign-jumps=8 -falign-loops=8 -finline-limit=1000 -fno-if-conversion2 -fselective-scheduling -fno-tree-dominator-opts -fno-reg-struct-return -fno-rename-registers –param case-values-threshold=8 -fno-crossjumping -freorder-blocks-and-partition -fno-tree-loop-if-convert -fno-tree-sink -fgcse-sm -fgcse-las -fno-strict-overflow”).

In this example, the CoreMark/MHz score grows as the switches change from left to right. However, it is interesting to note that the most complex set of switches increases the codesize by 40 % over ‘‑O3’ while the performance only improves by 14 %.

Not every example will behave in this way, but compiler switches influence both performance and codesize. It is important to be realistic about what compiler switches you would like to use, and to ensure that the switches for any performance benchmark data matches those you would use for assessing codesize.

The impact of PPA on your processor choice

PPA numbers will inevitably be used to compare processors. However, these indicators need context to be representative of your actual needs, and not be misleading. Some key considerations, including OS support and ISA choices, will also influence your processor choice. To find out more, read our white paper on “What you should consider when choosing a processor IP core”.

Roddy Urquhart

Roddy Urquhart

Related Posts

Check out the latest news, posts, papers, videos, and more!

Building the highway to automotive innovation

May 5, 2022
By Jamie Broome

Why and How to Customize a Processor

October 1, 2021
By Roddy Urquhart

What is the difference between processor configuration and customization?

August 12, 2021
By Roddy Urquhart

What is needed to support an operating system?


October 15, 2020

For each embedded product, software developers need to consider whether they need an operating system; and if so, what type of embedded OS. Operating systems vary considerably, from real-time operating systems with a very small memory footprint to general-purpose OSes such as Linux with a rich set of features.

Which type of OS is typically found on an embedded system?

Choosing a proper type of operating system for your product – and consequently working out the required features of the embedded processor – depends significantly on whether you face a hard real-time requirement. Safety-critical and industrial systems such as an anti-lock braking system or motor control will have hard maximum response times. At the other end of the spectrum, consumer systems such as audio or gaming devices may be able to tolerate buffering, as long as the average performance is adequate. Such systems are said to have soft real-time requirements.

Bare metal

A hard real-time requirement can be achieved by writing so-called bare-metal software that directly controls the underlying hardware. Bare-metal programming is typically used when the processor resources are very limited, the software is simple enough, and/or the real-time requirements are so tight that introduction of a further abstraction layer would complicate meeting these hard real-time requirements. The disadvantage to this approach is that such bare-metal software needs to be written as a single task (plus interrupt routines), making it difficult for programmers to maintain the software as its complexity grows.

Real-time Operating Systems

When dealing with more complex embedded software, it is often advantageous to employ a Real-Time Operating System (RTOS). It allows the programmer to split the embedded software into multiple threads whose execution is managed by the small, low-overhead “kernel” of the RTOS. The use of the multi-threaded paradigm enables developers to create and maintain more complex software while still allowing for sufficient reactivity.

RTOSes typically operate with a concept of “priority” assigned to individual threads. The RTOS can then “pre-empt” (temporarily halt) lower-priority threads in favour of those with higher priority, so that the required real-time constraints can be met. The use of an RTOS often becomes necessary when adopting complex libraries or protocol stacks (such as TCP/IP or Bluetooth) as this third-party software normally consists of multiple threads already.

The embedded processor requirements of a simple RTOS, such as FreeRTOS or Zephyr, are truly modest. It is sufficient to have a RISC-V processor with just machine mode (M) and a timer peripheral. However, rigorous software development is needed as machine mode offers unconstrained access to all memory and peripherals with associated risks. Extra protection is possible through a specialized RTOS such as those developed for functional safety, like SAFERTOS, or for security.

If a processor core supports both machine (M) and user (U) privilege modes and has physical memory protection (PMP), it is possible to establish separation between trusted code (with unconstrained access) and other application code. With PMP, the trusted code sets up rules for each portion of the application code, saying which parts of memory (or peripherals) it is allowed to access. PMP can for instance be used to prevent third-party code from interfering with the data of the rest of the application, or to detect stack overflows. Employing PMP therefore increases the safety and security of a system, but at the cost of additional hardware required for its support.

We also discuss embedded OS support in this video!

Rich operating systems

For applications requiring a more advanced user interface, sophisticated I/O and networking, such as in set-top boxes or entertainment systems, an RTOS is likely to be too simplistic. The same applies if there are complex computations, requirements for a full process isolation and multitasking, filesystem & storage support, or a full separation of application code from hardware via device drivers. Systems like these generally have soft real-time requirements and can be best served by a general-purpose rich operating system such as Linux. As mentioned in our blog post dedicated to processor complexity, Linux requires multiple RISC-V privilege modes – machine, supervisor, and user modes (M, S, U) – as well as a memory management unit (MMU) for virtual-to-physical address translation. Also, the memory footprint of such a system is significantly larger compared to a simple RTOS.

Finally, for embedded systems that require both hard real-time responses and features of a rich OS like Linux, it is common to design them with two communicating processor subsystems, one supporting an RTOS and the other running Linux.

The OS support will impact your processor choice

Choosing the appropriate embedded OS for your product and identifying the features required for your embedded processor depends heavily on the type of real-time requirements you face. Together with processor performance and complexity, among other key considerations, the OS support you need should be taken into account when choosing a processor. To find out more, read our white paper on “What you should consider when choosing a processor IP core”.

Roddy Urquhart

Roddy Urquhart

Related Posts

Check out the latest news, posts, papers, videos, and more!

5 things I will remember from the 2022 RISC-V Summit

December 22, 2022
By Lauranne Choquin

How to reduce the risk when making the shift to RISC-V

November 17, 2022
By Lauranne Choquin

DAC 2022 – Is it too risky not to adopt RISC-V?

July 18, 2022
By Brett Cline

What is processor core complexity?


September 10, 2020

The more complex a processor core, the larger the area and power consumption. But increasing complexity is not a single dimension as processors can be more complex in different ways. In selecting a processor IP core, it is important to choose the right sort of processor complexity for your project.

What defines the complexity of a processor?

There are different ways of thinking about processor complexity. Word length, execution units, privilege modes, virtual memory and security features are important considerations that will make your processor core more complex. It is important to understand what you really need for your project.

Word length

Generally, the smaller the word length, the smaller the core and the lower the power, however this is not always the case. An 8-bit core, such as the 8051, is comparable in gate count to the smallest 32-bit cores, but power consumption is usually worse. An 8-bit core requires more memory accesses due to less computation per clock cycle requiring more cycles. The net impact is that it requires more power to complete a computation.

Execution units

Processor cores vary considerably in the complexity of their execution units. The simplest are basic single ALUs requiring many common operations to be implemented by the simple instructions – for example using shift and add to implement a multiplication. It is therefore commonplace for cores to have a hardware multiplier and divider. In the event of needing good floating-point performance, adding a hardware Floating Point Unit (FPU) will provide significantly better performance. This option is available for Codasip’s Low-Power (L) and High-Performance (H) Embedded RISC-V processor cores but at the price of roughly doubling the core size.

Superscalar architectures with instruction-level parallelism

So far, we have assumed a single computational thread and scalar processing units which execute one instruction at a time. Superscalar architectures have instruction-level parallelism able to fetch multiple instructions and dispatch them to different execution units. A dual-issue core processing one thread can theoretically have up to double the performance of a single-issue core. However, a thread can stall making both execution units temporarily inactive. If there are two hardware threads (harts), then if one thread stalls, the other can continue execution.

Processors can vary considerably in pipeline depth and there is a direct relationship between this depth and latency. Some applications can tolerate high latency, with the consequence being slower response to interrupts, in return for high clock frequencies and throughput. Other applications require rapid responses to interrupts so need shorter pipelines.

We also discuss processor complexity in this video!

Privilege modes

Another area of complexity is privilege modes. The more modes, the more complex the core logic. Many embedded applications run in machine mode, which means that the code has full access to the core – like root privilege in Linux. Such code must be completely trusted to avoid negative consequences. In more sophisticated applications, a range of privileges such as machine, supervisor and user may be offered. Normal applications will run in user mode with the greatest amount of protection and some software requiring greater privilege will use supervisor mode.

Virtual memory

Virtual memory also requires additional processor resources such as a memory management unit (MMU) and translation lookaside buffer (TLB) to handle translating virtual memory addresses to physical addresses. This brings additional costs in terms of area and power dissipation without improving processor throughput. Nevertheless, virtual memory is necessary for using rich operating systems such as Linux which enable more complex software to be used.

So, when choosing a processor core, work out what sort of execution units, memory management, privilege and security you need. That combination will determine the complexity of the core.

Consider processor complexity when choosing a core – but not only that!

So, when choosing a processor core, work out what sort of execution units, memory management, privilege and security you need. That combination will determine the complexity of the core. But that’s not all. If PPA numbers are typically considered when looking at the wide choice of processor IP cores on the market, that’s not enough. Processor complexity is one element, but processor performance, software requirements and the ISA, among others, are key considerations to investigate. We cover these in our white paper “What you should consider when choosing a processor IP core”.

Roddy Urquhart

Roddy Urquhart

Related Posts

Check out the latest news, posts, papers, videos, and more!

5 things I will remember from the 2022 RISC-V Summit

December 22, 2022
By Lauranne Choquin

How to reduce the risk when making the shift to RISC-V

November 17, 2022
By Lauranne Choquin

DAC 2022 – Is it too risky not to adopt RISC-V?

July 18, 2022
By Brett Cline

Understanding the Performance of Processor IP Cores


August 20, 2020

Looking at any processor IP, you will find that their vendors emphasize PPA (performance, power & area) numbers. In theory, they should provide a level playing field for comparing different processor IP cores, but in reality, the situation is more complex. Let us consider processor performance.

What does processor performance mean?

The first thing to think about is what aspect of performance you care about. Do you care more about the absolute throughput that you want (performance per second), or the performance per MHz? In an application such as machine vision, which is continuously running and requiring the use of complex algorithms, it is likely that you will care about the absolute throughput. However, if you have a wireless sensor node with a low duty cycle, when the node wakes up, you will want it to be active for as few clock cycles as possible. This means you will care about how much computation you achieve per MHz.

About 40 years ago, computers were compared on the basis of MIPS (millions of instructions per second) although the problem is – what is an instruction? Instructions vary considerably in complexity and from one architecture to another, thus an operation will generally require less cycles in a CISC processor than a RISC one. MIPS were only helpful when comparing products with similar architectures and were called “meaningless indices of performance” by some!

Another thing to think about is the type of computation that you expect to care most about. Is it integer operations – and if so, which ones – or, say, floating-point computations? In the past, MFLOPS (million floating point operations per second) was a popular measure. But again, what is an ‘operation’?

Popular synthetic benchmarks

Today, synthetic benchmarks are universally used with processor IP cores. They have the following characteristics:

  • They are relatively small and portable.
  • They are representative of commonly used relevant applications.
  • They are reproducible and transparent.
  • They can be applied to a range of processors fairly.
  • They express the benchmark result as a single number.

Dhrystone

A benchmark that has been popular for the last 36 years is the Dhrystone benchmark. Its name is a play on words comparing it with the once-popular Whetstone benchmark. While Whetstone focused on floating point operations, Dhrystone focused on integer and string operations. The Dhrystone benchmark results are generally quoted as DMIPS (the Dhrystone score divided by that of a nominally 1 MIPS machine). The benchmark has been criticized because modern compilers can optimize away parts of the work, meaning that it partly tests compiler rather than processor performance.

For floating point, Whetstone is rarely used at present and it is more likely that LINPACK would be used. LINPACK involves LU decomposition of a matrix using floating point numbers. The result is expressed in MFLOPS.

CoreMark

Another popular synthetic benchmark for embedded applications has been EEMBC’s CoreMark® which aims to undertake operations that are representative of embedded integer processing needs. These include list processing, matrix operations, finite state machines, and CRC.

Find more details and some tips to measure processor performance according to your needs in this video!

Assessing performance when choosing a processor

There are various benchmark systems out there, each suited for measuring a slightly different type of performance. So how do you assess performance when choosing processor IP for your project?

If your embedded software has similar operations to a synthetic benchmark, then that benchmark may give you useful initial guidance quickly and simply. However, normally such benchmarks are quoted per MHz, for example CoreMark/MHz. The per MHz figure is normally a good indication for a low-power application where you are looking for good results per cycle. However, if you are looking for high absolute performance, this may be misleading. Instead you should consider, say, the CoreMarks achievable at your target clock frequency.

If your main issue is floating-point performance, bear in mind that DMIPS and CoreMark are integer benchmarks. You would be better comparing cores on the basis of a floating-point benchmark such as LINPACK.

Ultimately, it always makes sense to invest the time in running realistic software on a processor core to assess whether the core gives you the performance you need. If you are looking at RISC-V, then profiling your software to understand where the computational bottlenecks are can also lead to assessing whether adding custom instructions can give you improvements in performance.

It is not just about processor performance and scores

In this article we have looked at processor performance, but that is only one aspect of PPA and one factor to consider when choosing a processor. PPA numbers are always about balance and all of them matter when choosing an IP for a project, among other key considerations. The ISA, processor complexity, processor memory or even the licensing model will impact your choice. Find out more in our white paper “What you should consider when choosing a processor IP core“.

Roddy Urquhart

Roddy Urquhart

Related Posts

Check out the latest news, posts, papers, videos, and more!

5 things I will remember from the 2022 RISC-V Summit

December 22, 2022
By Lauranne Choquin

How to reduce the risk when making the shift to RISC-V

November 17, 2022
By Lauranne Choquin

DAC 2022 – Is it too risky not to adopt RISC-V?

July 18, 2022
By Brett Cline

Codasip’s Expanding RISC-V Offering


July 22, 2020

Watch as video

In the last three months, Codasip’s RISC-V processor offering has expanded considerably. For some years, Codasip has supplied Bk3 and Bk5 RISC-V cores aimed at low- to medium-complexity embedded applications. But recently four additional cores have joined the Codasip RISC-V offering.

Three of the cores, the SweRV Core™ EH1, EH2 and EL2, were designed by Western Digital and were open-sourced through CHIPS Alliance. These 32-bit cores are mainly aimed at high-performance embedded applications and complement the existing 32-bit Bk3 and Bk-5 cores. The EH1 offers outstanding embedded performance due to its superscalar, dual issue architecture. Even more performance is delivered by the EH2 which provides two hardware threads (harts). The EL2 core is more compact and is a single-issue core.

The RTL for all three SweRV Cores is available on GitHub free of license fees. However, RTL alone is not sufficient to use a SweRV Core in an SoC design. Firstly, a complete software toolchain is needed to allow embedded software to be developed. Secondly, a comprehensive EDA design flow needs to exist to undertake simulation, static analysis, and synthesis of the core’s RTL. It is important that the core can be easily integrated with peripherals, memories, and buses in order to implement a sub-system. EDA design flows need to keep up with revisions in both the processor IP and the EDA tools.

In December 2019, Western Digital and Codasip announced that they were cooperating to enable the deployment of open-source SweRV Cores in production silicon. Codasip’s SweRV Core Support Package (SSP) provides all of the components necessary to design, implement, test, and write software for a SweRV Core-based system-on-chip, including but not limited to verification testbenches and intellectual property, reference scripts for leading EDA flows, models for simulation and emulation, and software development tools.

The Support Package is available in a Free version consisting of open-source components and mainly aimed at academic use, and in a Pro version aimed at commercial SoC design using commercial EDA tools. The SweRV Core Support Package for EH1 was released in April and support for EH2 and EL2 was added in June. In addition, Codasip offers services for customizing SweRV Cores.

Although the semiconductor industry regularly talks of comparing processor cores in terms of performance versus complexity or in terms of PPA (performance, power, area), both performance and complexity have different aspects. Many Systems-on-Chip (SoCs) use multiple cores and face different requirements for different functions. For example, a core for a subsystem such as Wi-Fi will have quite different needs to one running a feature-rich OS such as Linux.

The Codasip Bk7 RISC-V core, announced yesterday, is Codasip’s first application processor. Like all previous Bk core designs, it has been designed in the Codasip Studio processor design system. This means that its architecture can be readily modified to create application-specific processors.  It has all the features needed for running embedded Linux and will be the cornerstone for further application processor developments. The Bk7 core is a 64-bit core with a 7-stage pipeline and memory management unit (MMU). Future versions of the Bk7 will support symmetric and heterogenous multi-processing.

In future posts we will be looking into some of the different facets of processor performance and complexity in order to see how the expanded Codasip offering can be applied to varying applications. We will also provide more detailed information on the Bk7 processor core.

 

Roddy Urquhart

Roddy Urquhart

Related Posts

Check out the latest news, posts, papers, videos, and more!

5 things I will remember from the 2022 RISC-V Summit

December 22, 2022
By Lauranne Choquin

How to reduce the risk when making the shift to RISC-V

November 17, 2022
By Lauranne Choquin

DAC 2022 – Is it too risky not to adopt RISC-V?

July 18, 2022
By Brett Cline

What is SweRV Core EH2?


June 4, 2020

In mid-May, CHIPS Alliance announced the open sourcing of the SweRV Core™ EH2 and SweRV Core EL2 designed by Western Digital. These cores, as well as the earlier EH1, are now supported by Codasip’s SweRV Core Support Package which provides all of the components necessary to design, implement, test, and write software for a SweRV Core-based system-on-chip. But what is SweRV Core EH2?

The SweRV Core EH1 was the first to be released through CHIPS Alliance and was a core aimed at high-end embedded applications including Western Digital’s flash controllers and SSDs. The core is a dual issue, superscalar, high-performance core with 9 pipeline stages. The EH2 is an exciting further development aimed at delivering even more performance for IoT, artificial intelligence and data-intensive embedded applications.

The EH2 breaks new ground for RISC-V cores by being the world’s first dual threaded, commercial RISC-V core. Its 9-stage pipeline is based on that of the EH1, but it contains additional resources to support dual threading. There are doubled RISC-V general purpose registers, control and status registers, fetch buffers, instruction buffers, and other logic. By adding these incremental resources, the EH2 allows the host software to effectively ‘see two cores’. The simulated performance of the dual threaded EH2 core is an outstanding 6.3 CoreMark/MHz.

The SweRV Core EH2 supports machine mode only, meaning that it is aimed at applications using real-time operating systems or bare metal software. It has four 64-bit AXI4 bus interfaces for instruction fetch, load/stores, debug, and for accessing optional closely coupled memories. There is an optional instruction cache with either parity-based or ECC-based error protection, an optional instruction closely coupled memory (ICCM), and an optional data closely coupled memory (DCCM) which use ECC-based error protection.

The open-source RTL for the SweRV Core EH2 is available from CHIPS Alliance. However, considerably more than just RTL is needed to integrate the core into a system-on-chip (SoC) and to develop the associated firmware. A considerable amount of effort will be needed to create the EDA flows in particular and to maintain them over multiple RTL and EDA software versions.

The SweRV Core Support Package (SCSP) contains everything needed to deploy a Western Digital SweRV™ EH2 core in an integrated circuit, providing support for both EDA tool flows and embedded software development. SCSP saves the considerable effort that would be needed to set up EDA flows for the EH2 core from scratch. The SweRV Core Support Package for EH2 is available in both basic Free and Pro versions.

The Free version consists of open-source deliverables and infrastructure for using open-source EDA tools and an SDK. Users can access an internet forum for support.

The Pro version combines open-source and commercial deliverables. It provides flows, examples and models for using commercial EDA tools. This version includes professional support by e-mail and telephone.

Roddy Urquhart

Roddy Urquhart

Related Posts

Check out the latest news, posts, papers, videos, and more!

How to reduce the risk when making the shift to RISC-V

November 17, 2022
By Lauranne Choquin

Codasip’s Expanding RISC-V Offering

July 22, 2020
By Roddy Urquhart

Vidtoo Technology Licenses Codasip’s Bk3 RISC‑V Processor for High‑Performance Computing SoC

December 17, 2018