Meet us at the RISC-V Summit in San Jose, CA, on December 13-14, 2022! 

Does ISA ownership matter? A Tale of Three ISAs


December 22, 2020

An instruction set architecture (ISA) is crucial to the development of processors and their software ecosystems. In the last half century, the majority of ISAs have been owned by single companies, whether product companies for their own chips/systems or processor IP companies who licensed their processors to chip developers. Does ISA ownership matter? Let’s consider three proprietary ISAs and their history.

Firstly, the Alpha ISA was developed by Digital Equipment Corporation (DEC) for its workstations and servers and was released in 1992. In the mid-1990s, this was considered a worthy competitor to SPARC and MIPS RISC architectures. However, the ownership of the ISA transferred to Compaq when DEC was acquired in 1998. Compaq in turn sold the rights to the Alpha ISA to Intel in 2001, and in the same year Compaq was acquired by Hewlett Packard. The last Alpha-based products were released in 2004, meaning that the ISA was effectively dead because of a series of acquisitions.

MIPS Technologies was spun out of Silicon Graphics as an independent IP company in 1998. For some years it enjoyed some success, particularly at the higher end of the processor IP market, and was only the second architecture to have Android ported in 2009. However, with a declining share price, MIPS sold 498 patents to AST and agreed to an acquisition by Imagination Technology in 2013. After Canyon Bridge acquired Imagination, MIPS was spun-out again ending up, after a series of transactions, as part of Wave Computing. As an artificial intelligence silicon provider, Wave is a potential competitor to some MIPS licensees.

Wave tried to encourage the adoption of the MIPS ISA in competition to RISC-V through their MIPS Open Initiative in late 2018. However, the licensing terms contained some onerous conditions relating to patents. In late 2019, Wave suddenly shut down the program, giving zero notice. The important lesson is that even if an ISA is open, its future is not secure if it is commercially owned. Seven years of ownership change have seen MIPS’ market share spiral downwards.

The third example is Arm, the biggest processor IP company of them all. Arm has long been seen as not only a big, successful IP company, but one offering “Swiss neutrality” in the semiconductor industry. Arm was quite distinct from both semiconductor companies and EDA companies. As such, it enjoyed a position of trust from its licensees as it did not have a conflict of interest. With its acquisition by SoftBank in 2016, Arm lost control over its destiny, even though SoftBank was not competing with its licensees. With the planned acquisition of Arm by NVIDIA, announced in September 2020, Arm will lose its neutrality completely. As a semiconductor company, there is a conflict of interest between Arm’s owners and its licensees, meaning it can no longer be trusted in the same way.

As can be seen from the ‘Tale of three ISAs’, the ownership of an ISA matters a lot, regardless of whether the ISA is commercially licensed or open. Acquisitions can lead to the disappearance of an ISA through merging of product lines or through making licensing difficult. Another motive for taking over a company can even be to kill off a competing product line, which in the case of an ISA could catastrophically impact licensees.

ISA ownership is one of the key issues that the developers of RISC-V have thought about. By transferring the ownership of the ISA to RISC-V International, the original developers of the ISA have assured its longevity. Longevity is assured both by the independent ownership of the ISA and the fact that licensees have a choice of IP vendors supporting the same open standard. Thirdly, once ratified, the ISA is frozen assuring software developers that their code will be able run on suitable cores indefinitely.

Roddy Urquhart

Roddy Urquhart

Related Posts

Check out the latest news, posts, papers, videos, and more!

Design for differentiation: architecture licenses in RISC‑V

May 16, 2022
By Rupert Baines

Closing the Gap in SoC Open Standards with RISC-V

March 24, 2022
By Roddy Urquhart

What is the difference between processor configuration and customization?

August 12, 2021
By Roddy Urquhart

Open Source vs Commercial RISC-V Licensing Models


November 26, 2020

Everybody is familiar with commercial licensing from traditional processor IP vendors such as Arm, Cadence, and Synopsys. But in discussing the RISC-V Open Instruction Set Architecture (ISA), there is widespread confusion of terminology with RISC-V often being described as “open source”. Some have even accused vendors of commercial RISC-V IP such as Codasip or Andes as not being in the spirit of RISC-V. But what is reality? What does the RISC-V licensing model for processor IP look like?

What does open source, open standard and commercial mean?

Let’s look at definitions briefly. An open standard like C, Verilog or HTTP is defined by a document that is maintained by an independent organization. Thus, C is maintained by ISO, Verilog by IEEE, and HTTP by IETF. These organizations maintain the technical standards using a set of impartial rules. Such open standards are generally freely accessible.

With open source, the source code for a software package or the hardware description language source for a hardware block are made available using a license. Open-source licenses vary from restrictive ones, such as copyleft license, to permissive ones, such as Apache. An open-source license defines rights for using, studying, modifying, and distributing the code. A copyleft license will require that any modifications be open-sourced, while a permissive license will not.

RISC-V is an open standard, and the ISA does not define any microarchitecture or business model. Therefore, a RISC-V microarchitecture can be licensed either as a commercial IP license or as an open source one. Nothing is prescribed.

If we think of a classic commercial processor IP license, you are generally paying for:

  •       The right to use the vendor’s ISA
  •       The right to use the vendor’s microarchitecture
  •       A warranty
  •       Vendor commitment to fix errors
  •       Indemnification

In practice, the warranty is usually time-bound, and the indemnification is limited. However, for the licensee, the vendor has some commitments to fix a design if bugs are found, which is valuable particularly on a tight schedule. If a licensee is accused of patent infringement, intellectual property indemnification means that the vendor will either defend the accusation or settle it on behalf of the licensee.

Traditional architecture and IP licensing model vs. RISC-V licensing model

Classic IP vendors have jealously guarded their own ISAs as well as their microarchitectures. A normal license bundles the use of the ISA with the microarchitecture and there are no rights to modify the deliverables. Very rarely such vendors have offered an architectural license which has enabled the licensee to use the ISA with their own microarchitecture, but such licenses have commanded substantial fees. One reason why RISC-V is very disruptive is that with a free and open ISA, one of the most valuable possible deliverables has no license fee.

Is RISC-V free? Open-source and commercial RISC-V based IP cores

Given that RISC-V does not prescribe microarchitecture or how it is licensed, there are both commercially licensed and open-source RISC-V IP cores. With an open-source license, you pay no license fee for the microarchitecture, but you also do not get all the benefits of a commercial license. Generally, deliverables have no warranty and are accepted “as is”. Similarly, there is not the indemnification that exists with a commercial license. If bugs are found, then either the licensee or the open-source community needs to fix them.

With commercially licensed RISC-V cores, the only fees are associated with the microarchitecture as the RISC-V ISA is licensed free of charge. With this license, you get the warranty, indemnification and bug fixing commitments normally associated with a commercial license.

What is the right licensing model for RISC-V?

We often get the question “Is RISC-V really open source?”. RISC-V is an open standard that allows companies to create RISC-V microarchitectures. Companies can then license the IP as either open-source or commercial. Which is the right choice for RISC-V? Both commercial licenses and open source licenses have advantages and disadvantages. You need to weigh up what is best for your design project.

At Codasip, we offer commercial RISC-V IP licenses and Codasip Studio technology that enables our customers to modify both the microarchitecture and the architecture.

In the past, commercial and open source licenses were seen as bitter competitors. However, in the software world, companies such as Microsoft have embraced both models. Microsoft offers commercial licenses, supports open source projects, and has cloud-based business models. Commercial and open-source RISC-V licenses can co-exist and complement each other.

Roddy Urquhart

Roddy Urquhart

Related Posts

Check out the latest news, posts, papers, videos, and more!

Design for differentiation: architecture licenses in RISC‑V

May 16, 2022
By Rupert Baines

Building the highway to automotive innovation

May 5, 2022
By Jamie Broome

Closing the Gap in SoC Open Standards with RISC-V

March 24, 2022
By Roddy Urquhart

When Considering Processor PPA, Don’t Forget the Instruction Memory


November 12, 2020

The area of any part of a processor design contributes both to the silicon cost and to the power consumption. A simplistic following of the “A” in a processor IP vendor’s PPA numbers can be misleading. A processor is never used in isolation but is part of a subsystem additionally including instruction memory, data memory, and peripherals. In most cases, instruction memory will be dominant and the processor area is much less important.

What impacts the size of the instruction memory in a processor?

The size of the instruction memory will be influenced by the target instruction set, the compiler and the compiler switches used. In the case of RISC-V, the choice of optional standard extensions and custom extensions can greatly influence the codesize.

Instruction set and processor memory: the example of Microsemi

To illustrate this, the following table shows the effect of adding extensions to both core and codesize.

ISA Core size (kgates) Increase over base Codesize (kbytes) Decrease over base
RV32I (base) 16.0 × 1.0 232 × 1.0
RV32IM 26.2 × 1.6 148 × 1.6
RV32IM+DSP 38.7 × 2.4 64 × 3.6

Source: Implementing RISC-V for IoT applications, Dan Ganousis & Vijay Subramaniam, DAC 2017

In this example, Microsemi used a Codasip RISC-V L31 processor to implement an audio processing application. Starting with just the 32-bit base instruction set, they had an unacceptably high codesize and cycle count. Some improvement was achieved by adding multiplication [M] extensions, but the breakthrough was using custom DSP instructions. These led to a 3.6× reduction in codesize at the price of a 2.4× increase in core size compared with the base core. With instruction memory dominating the area, this was a good trade-off; furthermore, the performance goals were readily achieved.

Compilers, complex switches and PPA

With typical vendor PPA data, synthetic benchmarks such as CoreMark/MHz are often quoted with a complex set of compiler switches – this is something we discussed in our article dedicated to processor performance. But in practice, embedded software is probably going to be compiled using common switches such as ‑Os or ‑O3.

Consider compiling the CoreMark benchmark with different switches using the common GCC compiler. In this case, the target was a Codasip RV32IMC RISC-V core with a 3-stage pipeline. 

The chart below shows CoreMark/MHz and codesize measures for different compiler settings. The last example is one that is typical of vendor performance data where many switches are used for CoreMark (CM = “-O3 -flto -fno-common -funroll-loops -finline-functions -falign-functions=16 -falign-jumps=8 -falign-loops=8 -finline-limit=1000 -fno-if-conversion2 -fselective-scheduling -fno-tree-dominator-opts -fno-reg-struct-return -fno-rename-registers –param case-values-threshold=8 -fno-crossjumping -freorder-blocks-and-partition -fno-tree-loop-if-convert -fno-tree-sink -fgcse-sm -fgcse-las -fno-strict-overflow”).

In this example, the CoreMark/MHz score grows as the switches change from left to right. However, it is interesting to note that the most complex set of switches increases the codesize by 40 % over ‘‑O3’ while the performance only improves by 14 %.

Not every example will behave in this way, but compiler switches influence both performance and codesize. It is important to be realistic about what compiler switches you would like to use, and to ensure that the switches for any performance benchmark data matches those you would use for assessing codesize.

The impact of PPA on your processor choice

PPA numbers will inevitably be used to compare processors. However, these indicators need context to be representative of your actual needs, and not be misleading. Some key considerations, including OS support and ISA choices, will also influence your processor choice. To find out more, read our white paper on “What you should consider when choosing a processor IP core”.

Roddy Urquhart

Roddy Urquhart

Related Posts

Check out the latest news, posts, papers, videos, and more!

Building the highway to automotive innovation

May 5, 2022
By Jamie Broome

Why and How to Customize a Processor

October 1, 2021
By Roddy Urquhart

What is the difference between processor configuration and customization?

August 12, 2021
By Roddy Urquhart

What is needed to support an operating system?


October 15, 2020

For each embedded product, software developers need to consider whether they need an operating system; and if so, what type of embedded OS. Operating systems vary considerably, from real-time operating systems with a very small memory footprint to general-purpose OSes such as Linux with a rich set of features.

Which type of OS is typically found on an embedded system?

Choosing a proper type of operating system for your product – and consequently working out the required features of the embedded processor – depends significantly on whether you face a hard real-time requirement. Safety-critical and industrial systems such as an anti-lock braking system or motor control will have hard maximum response times. At the other end of the spectrum, consumer systems such as audio or gaming devices may be able to tolerate buffering, as long as the average performance is adequate. Such systems are said to have soft real-time requirements.

Bare metal

A hard real-time requirement can be achieved by writing so-called bare-metal software that directly controls the underlying hardware. Bare-metal programming is typically used when the processor resources are very limited, the software is simple enough, and/or the real-time requirements are so tight that introduction of a further abstraction layer would complicate meeting these hard real-time requirements. The disadvantage to this approach is that such bare-metal software needs to be written as a single task (plus interrupt routines), making it difficult for programmers to maintain the software as its complexity grows.

Real-time Operating Systems

When dealing with more complex embedded software, it is often advantageous to employ a Real-Time Operating System (RTOS). It allows the programmer to split the embedded software into multiple threads whose execution is managed by the small, low-overhead “kernel” of the RTOS. The use of the multi-threaded paradigm enables developers to create and maintain more complex software while still allowing for sufficient reactivity.

RTOSes typically operate with a concept of “priority” assigned to individual threads. The RTOS can then “pre-empt” (temporarily halt) lower-priority threads in favour of those with higher priority, so that the required real-time constraints can be met. The use of an RTOS often becomes necessary when adopting complex libraries or protocol stacks (such as TCP/IP or Bluetooth) as this third-party software normally consists of multiple threads already.

The embedded processor requirements of a simple RTOS, such as FreeRTOS or Zephyr, are truly modest. It is sufficient to have a RISC-V processor with just machine mode (M) and a timer peripheral. However, rigorous software development is needed as machine mode offers unconstrained access to all memory and peripherals with associated risks. Extra protection is possible through a specialized RTOS such as those developed for functional safety, like SAFERTOS, or for security.

If a processor core supports both machine (M) and user (U) privilege modes and has physical memory protection (PMP), it is possible to establish separation between trusted code (with unconstrained access) and other application code. With PMP, the trusted code sets up rules for each portion of the application code, saying which parts of memory (or peripherals) it is allowed to access. PMP can for instance be used to prevent third-party code from interfering with the data of the rest of the application, or to detect stack overflows. Employing PMP therefore increases the safety and security of a system, but at the cost of additional hardware required for its support.

We also discuss embedded OS support in this video!

Rich operating systems

For applications requiring a more advanced user interface, sophisticated I/O and networking, such as in set-top boxes or entertainment systems, an RTOS is likely to be too simplistic. The same applies if there are complex computations, requirements for a full process isolation and multitasking, filesystem & storage support, or a full separation of application code from hardware via device drivers. Systems like these generally have soft real-time requirements and can be best served by a general-purpose rich operating system such as Linux. As mentioned in our blog post dedicated to processor complexity, Linux requires multiple RISC-V privilege modes – machine, supervisor, and user modes (M, S, U) – as well as a memory management unit (MMU) for virtual-to-physical address translation. Also, the memory footprint of such a system is significantly larger compared to a simple RTOS.

Finally, for embedded systems that require both hard real-time responses and features of a rich OS like Linux, it is common to design them with two communicating processor subsystems, one supporting an RTOS and the other running Linux.

The OS support will impact your processor choice

Choosing the appropriate embedded OS for your product and identifying the features required for your embedded processor depends heavily on the type of real-time requirements you face. Together with processor performance and complexity, among other key considerations, the OS support you need should be taken into account when choosing a processor. To find out more, read our white paper on “What you should consider when choosing a processor IP core”.

Roddy Urquhart

Roddy Urquhart

Related Posts

Check out the latest news, posts, papers, videos, and more!

How to reduce the risk when making the shift to RISC-V

November 17, 2022
By Lauranne Choquin

DAC 2022 – Is it too risky not to adopt RISC-V?

July 18, 2022
By Brett Cline

Processor design automation to drive innovation and foster differentiation

July 7, 2022
By Lauranne Choquin

What is processor core complexity?


September 10, 2020

The more complex a processor core, the larger the area and power consumption. But increasing complexity is not a single dimension as processors can be more complex in different ways. In selecting a processor IP core, it is important to choose the right sort of processor complexity for your project.

What defines the complexity of a processor?

There are different ways of thinking about processor complexity. Word length, execution units, privilege modes, virtual memory and security features are important considerations that will make your processor core more complex. It is important to understand what you really need for your project.

Word length

Generally, the smaller the word length, the smaller the core and the lower the power, however this is not always the case. An 8-bit core, such as the 8051, is comparable in gate count to the smallest 32-bit cores, but power consumption is usually worse. An 8-bit core requires more memory accesses due to less computation per clock cycle requiring more cycles. The net impact is that it requires more power to complete a computation.

Execution units

Processor cores vary considerably in the complexity of their execution units. The simplest are basic single ALUs requiring many common operations to be implemented by the simple instructions – for example using shift and add to implement a multiplication. It is therefore commonplace for cores to have a hardware multiplier and divider. In the event of needing good floating-point performance, adding a hardware Floating Point Unit (FPU) will provide significantly better performance. This option is available for Codasip’s Low-Power (L) and High-Performance (H) Embedded RISC-V processor cores but at the price of roughly doubling the core size.

Superscalar architectures with instruction-level parallelism

So far, we have assumed a single computational thread and scalar processing units which execute one instruction at a time. Superscalar architectures have instruction-level parallelism able to fetch multiple instructions and dispatch them to different execution units. A dual-issue core processing one thread can theoretically have up to double the performance of a single-issue core. However, a thread can stall making both execution units temporarily inactive. If there are two hardware threads (harts), then if one thread stalls, the other can continue execution.

Processors can vary considerably in pipeline depth and there is a direct relationship between this depth and latency. Some applications can tolerate high latency, with the consequence being slower response to interrupts, in return for high clock frequencies and throughput. Other applications require rapid responses to interrupts so need shorter pipelines.

We also discuss processor complexity in this video!

Privilege modes

Another area of complexity is privilege modes. The more modes, the more complex the core logic. Many embedded applications run in machine mode, which means that the code has full access to the core – like root privilege in Linux. Such code must be completely trusted to avoid negative consequences. In more sophisticated applications, a range of privileges such as machine, supervisor and user may be offered. Normal applications will run in user mode with the greatest amount of protection and some software requiring greater privilege will use supervisor mode.

Virtual memory

Virtual memory also requires additional processor resources such as a memory management unit (MMU) and translation lookaside buffer (TLB) to handle translating virtual memory addresses to physical addresses. This brings additional costs in terms of area and power dissipation without improving processor throughput. Nevertheless, virtual memory is necessary for using rich operating systems such as Linux which enable more complex software to be used.

So, when choosing a processor core, work out what sort of execution units, memory management, privilege and security you need. That combination will determine the complexity of the core.

Consider processor complexity when choosing a core – but not only that!

So, when choosing a processor core, work out what sort of execution units, memory management, privilege and security you need. That combination will determine the complexity of the core. But that’s not all. If PPA numbers are typically considered when looking at the wide choice of processor IP cores on the market, that’s not enough. Processor complexity is one element, but processor performance, software requirements and the ISA, among others, are key considerations to investigate. We cover these in our white paper “What you should consider when choosing a processor IP core”.

Roddy Urquhart

Roddy Urquhart

Related Posts

Check out the latest news, posts, papers, videos, and more!

How to reduce the risk when making the shift to RISC-V

November 17, 2022
By Lauranne Choquin

DAC 2022 – Is it too risky not to adopt RISC-V?

July 18, 2022
By Brett Cline

Processor design automation to drive innovation and foster differentiation

July 7, 2022
By Lauranne Choquin

Understanding the Performance of Processor IP Cores


August 20, 2020

Looking at any processor IP, you will find that their vendors emphasize PPA (performance, power & area) numbers. In theory, they should provide a level playing field for comparing different processor IP cores, but in reality, the situation is more complex. Let us consider processor performance.

What does processor performance mean?

The first thing to think about is what aspect of performance you care about. Do you care more about the absolute throughput that you want (performance per second), or the performance per MHz? In an application such as machine vision, which is continuously running and requiring the use of complex algorithms, it is likely that you will care about the absolute throughput. However, if you have a wireless sensor node with a low duty cycle, when the node wakes up, you will want it to be active for as few clock cycles as possible. This means you will care about how much computation you achieve per MHz.

About 40 years ago, computers were compared on the basis of MIPS (millions of instructions per second) although the problem is – what is an instruction? Instructions vary considerably in complexity and from one architecture to another, thus an operation will generally require less cycles in a CISC processor than a RISC one. MIPS were only helpful when comparing products with similar architectures and were called “meaningless indices of performance” by some!

Another thing to think about is the type of computation that you expect to care most about. Is it integer operations – and if so, which ones – or, say, floating-point computations? In the past, MFLOPS (million floating point operations per second) was a popular measure. But again, what is an ‘operation’?

Popular synthetic benchmarks

Today, synthetic benchmarks are universally used with processor IP cores. They have the following characteristics:

  • They are relatively small and portable.
  • They are representative of commonly used relevant applications.
  • They are reproducible and transparent.
  • They can be applied to a range of processors fairly.
  • They express the benchmark result as a single number.

Dhrystone

A benchmark that has been popular for the last 36 years is the Dhrystone benchmark. Its name is a play on words comparing it with the once-popular Whetstone benchmark. While Whetstone focused on floating point operations, Dhrystone focused on integer and string operations. The Dhrystone benchmark results are generally quoted as DMIPS (the Dhrystone score divided by that of a nominally 1 MIPS machine). The benchmark has been criticized because modern compilers can optimize away parts of the work, meaning that it partly tests compiler rather than processor performance.

For floating point, Whetstone is rarely used at present and it is more likely that LINPACK would be used. LINPACK involves LU decomposition of a matrix using floating point numbers. The result is expressed in MFLOPS.

CoreMark

Another popular synthetic benchmark for embedded applications has been EEMBC’s CoreMark® which aims to undertake operations that are representative of embedded integer processing needs. These include list processing, matrix operations, finite state machines, and CRC.

Find more details and some tips to measure processor performance according to your needs in this video!

Assessing performance when choosing a processor

There are various benchmark systems out there, each suited for measuring a slightly different type of performance. So how do you assess performance when choosing processor IP for your project?

If your embedded software has similar operations to a synthetic benchmark, then that benchmark may give you useful initial guidance quickly and simply. However, normally such benchmarks are quoted per MHz, for example CoreMark/MHz. The per MHz figure is normally a good indication for a low-power application where you are looking for good results per cycle. However, if you are looking for high absolute performance, this may be misleading. Instead you should consider, say, the CoreMarks achievable at your target clock frequency.

If your main issue is floating-point performance, bear in mind that DMIPS and CoreMark are integer benchmarks. You would be better comparing cores on the basis of a floating-point benchmark such as LINPACK.

Ultimately, it always makes sense to invest the time in running realistic software on a processor core to assess whether the core gives you the performance you need. If you are looking at RISC-V, then profiling your software to understand where the computational bottlenecks are can also lead to assessing whether adding custom instructions can give you improvements in performance.

It is not just about processor performance and scores

In this article we have looked at processor performance, but that is only one aspect of PPA and one factor to consider when choosing a processor. PPA numbers are always about balance and all of them matter when choosing an IP for a project, among other key considerations. The ISA, processor complexity, processor memory or even the licensing model will impact your choice. Find out more in our white paper “What you should consider when choosing a processor IP core“.

Roddy Urquhart

Roddy Urquhart

Related Posts

Check out the latest news, posts, papers, videos, and more!

How to reduce the risk when making the shift to RISC-V

November 17, 2022
By Lauranne Choquin

DAC 2022 – Is it too risky not to adopt RISC-V?

July 18, 2022
By Brett Cline

Processor design automation to drive innovation and foster differentiation

July 7, 2022
By Lauranne Choquin