Meet us at the RISC-V Summit in San Jose, CA, on December 13-14, 2022! 

Building a Swiss cheese model approach for processor verification


April 29, 2022

Processors all have high quality requirements and their reliability is the main concern of processor verification teams. Providing best-in-class quality products requires a strategic, diligent and thorough approach. Processor verification therefore plays a major role and it takes a combination of all industry standard techniques – like in a Swiss cheese model.

The need for a strong, layered processor verification strategy

You’ve heard me say this before: processor verification is a subtle art. We need to take into account uncertainty, which means opening the scope of our verification while optimizing resources. On one hand, we want to find all critical bugs before final production, and on the other hand we must have an efficient verification strategy to fulfill time to market requirements. Producing smart processor verification means finding meaningful bugs as efficiently and as early as possible during the development of the product. One way of achieving this consists in combining all industry standard verification techniques. It is by creating redundancy that we find all critical bugs.

There are different types of bugs and each bug has a complexity – or bug score – that depends on the number of events and types of events required to trigger the bugs. Some might be found with coverage, others with formal proofs, etc. Imagine the Swiss cheese model applied to processor verification. Each slice of cheese is a verification technique which has some specific strengths to catch some categories of bugs. The risk of a bug escaping and making it into the end product is mitigated by the different layers and types of verification which are layered behind each other.

In a Swiss cheese model applied to processor verification, the principle is similar to the aviation industry: if there is a direct path going through all the slices, then there is a risk of plane crash. That is why the aviation industry is strict about procedures, checklists, and redundant systems. The objective is to add more slices and reduce the size of the holes on a slice so that in the end, there is no hole going through, and we deliver a quality processor.

Source: Codasip

Swiss cheese model principles applied to processor verification

By using several slices of cheese, or verification methods:

  • We create redundancy to ensure continuity if one of the layers fails.
  • When bugs are found during development, it indicates that there was a hole in several slices. So we can improve several verification methods and reduce the size of the holes in each slice. In doing that, we increase our chances to hit bugs, from silly bugs to corner cases and from simple to complex bugs.
  • We maximize the potential of each technique.

A hole in a slice is a hole in the verification methodology. The more holes, and the bigger the holes, the more bugs can escape. If the same area of the design (overlapping holes between cheese slices) is not covered and tested by any of the verification techniques, then the bug will make it through and end up in the final deliverables.

A good verification methodology must present as few holes as possible, as small as possible, on each slice. A solid strategy, experience, and efficient communication are important factors to deliver quality products.

When we find a bug, or a hole in a slice, during verification, we always fix it and check other slices for similar holes. Every slice should find the holes in the previous one and address them before progressing. Sanity checks are an efficient way to achieve this, for example by comparing our design with industry standard models such as Spike or Imperas.

Source: Codasip

In the Swiss cheese model applied to processor verification, if one technique is strengthened – an improved testbench, new assertions, etc. – the bug is found and fixed before the product goes into production. All processor verification techniques are important and it is the combination of all of them that makes each of them more efficient.

A single verification technique cannot do everything by itself, it is the action of all of them that improves the overall quality of the verification and processor design. There can be unexpected changes or factors during the development of a product, external actions that can impact the efficiency of a technique. For example, a change in the design not communicated to the verification team or a difficult Friday afternoon leading to human mistakes. These factors can increase the size of a hole in a slice, hence the importance of having more than one – and the importance of keeping engineering specifications up to date and communicating regularly between designers and verification engineers.  Code reviews conducted by other team members is one efficient solution to achieve this, and that is what we do at Codasip.

At Codasip, we use verification technology and techniques that allows us to create redundancy, preventing holes to go through the pile of slices of cheese, and to deliver best-in-class RISC-V processors.

Philippe Luc

Philippe Luc

Related Posts

Check out the latest news, posts, papers, videos, and more!

Measuring the complexity of processor bugs to improve testbench quality

April 4, 2022
By Philippe Luc

Improve Your Verification Methodology: Hunt Bugs Flying in Squadrons

March 14, 2022
By Philippe Luc

4 Types of CPU Bug You Should Pay Attention To

March 7, 2022
By Philippe Luc

Measuring the complexity of processor bugs to improve testbench quality


April 4, 2022

I am often asked the question “When is the processor verification done?” or in other words “how do I measure the efficiency of my testbench and how can I be confident in the quality of the verification?”. There is no easy answer. There are several common indicators used in the industry such as coverage and bug curve. While they are absolutely necessary, these are not enough to reach the highest possible quality. Indeed, such indicators do not really unveil the ability of verification methodologies to find the last bugs. With experience, I learned that measuring the complexity of processor bugs is an excellent indicator to use throughout the development of the project.

What defines the complexity of a processor bug and how to measure it?

Experience taught me that we can define the complexity of a bug by counting the number of independent events or conditions that are required to hit the bug.

What do we consider an event?

Let’s take a simple example. A typical bug is found in the caches, when a required hazard is missing. Data corruption can occur when:

  1. A cache line at address @A is Valid and Dirty in the cache.
  2. A load at address @B causes an eviction of line @A.
  3. Another load at address @A starts.
  4. The external write bus is slower than the read, so the load @A completes before the end of the eviction.

External memory returns the previous data because the most recent data from the eviction got lost, causing data corruption.
In this example, 4 events – or conditions – are required to hit the bug. These 4 events give the bug a score of 4, or in other words a complexity of 4.

Classifying processor bugs

To measure the complexity of a bug, we can come up with a classification that will be used by the entire processor verification team. In a previous blog post, we discussed 4 types of bugs and explained how we use these categories to improve the quality of our testbench and verification. Let’s go one step further and combine this method with bug complexity.

An easy bug can require between 1 and 3 events to be triggered. The first simple test fails. A corner case is going to need 4 or more events.

Going back to our example above, we have a bug with a score of 4. If one of the four conditions is not present, then the bug is not hit.

A constrained random testbench will need several features to be able to hit the example above. The sequence of addresses should be smart enough to reuse previous addresses from previous requests, delays on external buses should be sufficiently atypical to have fast Reads and slow-enough Writes.

A hidden case will need even more events. Perhaps a more subtle bug has the same conditions as our example, but it only happens when an ECC error is discovered on the cache, at the exact same time as an interrupt happens, and only when the core finishes an FPU operation that results in a divide-by-zero error. With typical random testbenches, the probability to have all these conditions together is extremely low, making it a “hidden” bug.

Making these hidden bugs more reachable in the testbench is improving the quality of verification. It consists in making hidden cases become corner cases.

Analyzing the complexity of a bug helps improve processor quality

This classification does not have any limit. Experience has shown me that a testbench capable of finding bugs with a score of 8 or 9 is a strong simulation testbench and is key to delivering quality RTL. From what I have seen, today the most advanced simulation testbenches can find bugs with a complexity level up to 10.  Fortunately, the use of formal verification makes it much easier to find bugs that have an even higher complexity, paving the way to even better design, and giving clues about what to improve in simulation.

Using bug complexity to improve the quality of a verification testbench

This classification and methodology is useful only if it is used from the moment verification starts and throughout the project development, for 2 reasons:

  1. Bugs must be fixed as they are discovered. Leaving a level 2 or 3 bug unfixed means that a lot of failures happen when launching large soak testing. Statistically, a similar bug (from the same squadron) that requires more events could be unnoticed.
  2. Bug complexity is used to improve and measure the quality of a testbench. As the level of complexity matches with the number of events required to trigger the bug, the higher the complexity score the more stressing the testbench is. Keeping track and analyzing the events that triggered a bug is very useful to understand how to tune random constraints or to create a new functional coverage point.

Finally, by combining this approach with our methodology that consists of hunting bugs flying in squadrons, we ensure high-level quality verification that helps us be confident that are going beyond  verification sign-off criteria.

Philippe Luc

Philippe Luc

Related Posts

Check out the latest news, posts, papers, videos, and more!

Building a Swiss cheese model approach for processor verification

April 29, 2022
By Philippe Luc

Improve Your Verification Methodology: Hunt Bugs Flying in Squadrons

March 14, 2022
By Philippe Luc

4 Types of CPU Bug You Should Pay Attention To

March 7, 2022
By Philippe Luc

Improve Your Verification Methodology: Hunt Bugs Flying in Squadrons


March 14, 2022

Creating a quality RISC-V processor requires a verification methodology that enforces the highest standards. In this article, Philippe Luc, Director of Verification at Codasip, explains the methodology that is adopted at Codasip to bring processor verification to the next level.

After analyzing bugs on several generations of CPUs, I came to the conclusion that “bugs fly in squadrons”. In other words, when a bug is found in a given area of the design, the probability that there are other bugs with similar conditions, in the same area of the design, is quite high.

Processor bugs don’t fly alone

Finding a CPU bug is always satisfying, however it should not be an end in itself. If we consider that bugs do not fly alone but rather fly in groups – or squadrons – finding one bug should be a hint for the processor verification team to search for more of them, in the same area.

Here is a scenario. A random test found a bug after thousands of hours of testing. We could ask ourselves: How did it find this bug? The answer is likely to be a combination of events that had not been encountered before. Another question could be: Why did the random test find this bug? It would most likely be due to an external modification: a change in parameter in the test, an RTL modification, or a simulator modification for example.

With this new, rare bug found, we know that we have a more performant testbench that can now test a new area of the design. However we also learn that, before the testbench got improved, that area of the design was not stressed. If we consider that bugs fly in squadrons, it means we have a new area of the design to further explore to find more bugs. How are we going to improve our verification methodology?

Using smart-random testing improves your verification

To improve our testbench and hit these bugs, we can add checkers and assertions, and we can add tests. Let’s focus on testing.

To enlarge the scope so that we are confident we will hit these bugs, we use smart-random testing. When reproducing this bug with a directed testing approach, only the exact same bug is hit. However, we said that bugs fly in groups and the probability that there are other bugs in the same area, with similar conditions, is high. The idea is then to enlarge our scope. Random testing will not be as useful in this case, because we have an idea of what we want to target, following the squadron pattern.

Let’s assume that the bug was found on a particular RISC-V instruction. Can we improve our testing by increasing the probability of having this instruction tested? At first glance, probably, because statistically you get more failures exposing the same bug. However, most bugs are found with a combination of rare events: a stalled pipeline, a full FIFO, or some other microarchitectural implementation details. Standard testbenches can easily tune the probability of an instruction by simply changing a test parameter. But making a FIFO full is not directly accessible from the test parameter. It is a combination of other independent parameters (such as delays) that make the FIFO full more often.

Using smart-random testing in our verification methodology allows us to be both targeted and broad enough to efficiently find more bugs in this newly discovered area. It consists in tuning the test to activate more often the other events that trigger the bug. In other words, it means adjusting several parameters of the test, and not just one. It may seem more time consuming, but this methodology is really efficient in terms of improving the quality of our testing.

Improving testbenches by following bug squadrons, and killing each of them during the product development is key. This is exactly what the Codasip verification teams do to offer best-in-class quality RISC-V processors to our customers.

Philippe Luc

Philippe Luc

Related Posts

Check out the latest news, posts, papers, videos, and more!

How to reduce the risk when making the shift to RISC-V

November 17, 2022
By Lauranne Choquin

DAC 2022 – Is it too risky not to adopt RISC-V?

July 18, 2022
By Brett Cline

Processor design automation to drive innovation and foster differentiation

July 7, 2022
By Lauranne Choquin

4 Types of CPU Bug You Should Pay Attention To


March 7, 2022

Philippe Luc, Director of Verification at Codasip, shares his view on what bugs verification engineers should pay attention to.

Did you know that between 1,000 and 2,000 bugs can appear during the design of a complex processor core? Really, a thousand bugs? Well, that’s what experience showed us. And not all bugs were born equal: their importance and consequences can vary significantly. Let’s go through 4 categories of CPU bugs, how to find them, and what the consequences would be for the user if we did not find them.

Type 1: the processor bug that verification engineers can easily find

“Oh, I forgot the semicolon”. Yes, that is one bug. Very easy to detect, it is typically one you find directly at compile time. Apart from having your eyes wide-open, there is nothing else to do to avoid these.

“Oh, it turns out that a part of the specification has not been implemented”. That is another easy CPU bug for you to find with any decent testbench – provided that an explicit test exists. In this scenario, the first simple test exercising the feature will fail. What does your processor verification team need to do? Make sure you have exhaustive tests. The design team, on the other hand, needs to make an effort to carefully read the specifications, and follow any changes in the specification during the development.

In other words, the easy bug is one that is found simply by running a test that exercises the feature. Its (bad) behavior is systematic, not a timing condition. Being exhaustive in your verification is the key to finding such CPU bugs. Code coverage will help you but is definitely not enough. If a feature is not coded in the RTL, how can coverage report that it is missing? A code review – with the specification at hand – definitely helps.

Type 2: the corner case that verification teams like to find

A corner case CPU bug is more complex to find and requires a powerful testbench. The simple test cases that exercise the feature are correctly passing, even with random delays. Quite often, you find these bugs when asynchronous events join the party. For example, an interrupt arriving just between 2 instructions, at a precise timing. Or a line in the cache got evicted just when the store buffer wants to merge into. To reach these bugs, you need a testbench that juggles with the instructions, the parameters and the delays so that all the possible interleaving of instructions and events have been exercised. Obviously, a good checker should spot any deviation from what is expected.

Does code coverage help in that case? Unfortunately not. Simply because the condition of the bug is a combination of several events that are already covered individually. Here, condition coverage or branch coverage might be helpful. But it is painful to analyze and it is rarely beneficial in the end.

Type 3: The hidden CPU bug found by accident – or by customers

The hidden bugs are found by customers (which is bad), or by chance (internally, before release). In both cases, it means that the verification methodology was not able to find them.

If you use different testbenches or environments, you could find other cases just because the stimuli are different. Fair enough. Then, what do we mean by “found by chance”? Here comes the limit of random testbench methodology.

With random stimuli, the testbench usually generates the “same” thing. If you roll a dice to get a random number, there are very few chances to get 10 times in a row the number 6. One chance in 60 million, to be accurate. With a RISC-V CPU that has 100 different instructions, a (equiprobable) random instruction generator has only 1 chance every 10²⁰ times to generate 10 times in a row the same instruction. Just twice the number of different positions of a Rubik’s Cube… On a 10-stage pipeline processor, it is not unreasonable to test it with the same instruction present on all pipeline stages. Good luck if you don’t tune your random constraints…

Type 4: The silly bug that would not happen in real life

You can take looking for corner cases and hidden cases too far and end up creating tests that are simply too silly.

Changing the endianness back and forth every cycle while connecting the debugger is probably something that will never ever happen on a consumer product, if the consequences of a CPU bug are never visible to a customer, then it is not really a bug. If you deliberately unplug your USB stick while you copy a file, and the file is corrupted, I consider this not a bug. If some operation causes the USB controller to hang, then yes, that is a bug.

Beware of extending the scope of the verification . When silly cases are found, then you are probably investing engineering effort in the wrong place.

There are different verification techniques you can apply to efficiently find CPU bugs before your customers do. At Codasip, we use multiple component testbenches, various random test generators, random irritators, and several other techniques to verify our products. As the project evolves, we develop these techniques to have a robust verification methodology. Learn more in our blog post where we explain how we continuously improve our verification methodology.

Philippe Luc

Philippe Luc

Related Posts

Check out the latest news, posts, papers, videos, and more!

Building a Swiss cheese model approach for processor verification

April 29, 2022
By Philippe Luc

Measuring the complexity of processor bugs to improve testbench quality

April 4, 2022
By Philippe Luc

Improve Your Verification Methodology: Hunt Bugs Flying in Squadrons

March 14, 2022
By Philippe Luc

Why Codasip Cares About Processor Verification – and Why you Should too


February 28, 2022

Finding a hardware bug in silicon has consequences. The severity of these consequences for the end user can depend on the use case. For the product manufacturer, fixing a bug once a design is in mass-production can incur a significant cost. Investing in processor verification is therefore fundamental to ensure quality. This is something we care passionately about at Codasip, here is why you should too.

Luckily for the semiconductor industry, there are statistically more bugs in software than in hardware, and in processors in particular. However, software can easily be upgraded over the air, directly in the end-products used by consumers. With hardware, on the other hand, this is not as straightforward  and a hardware issue can have severe consequences. The quality of our deliverables, which will end up in real silicon, seriously matters.

Processors all have different quality requirements

Processors are ubiquitous. They control the flash memory in your laptop, the braking system of your car or the chip on your credit card. These CPUs have different performance requirements but also different security and safety requirements. In other words, different quality requirements.

Is it a major issue if the Wi-Fi chip in your laptop is missing a few frames? The Wi-Fi protocol retransmits the packet and it goes largely unnoticed. If your laptop’s SSD controller drops a few packets and corrupts the document you have been working on all day It will be a serious disruption to your work, there may be some shouting, but you will recover. It’s a bug that you might be able to accept.

Other hardware failures have much more severe consequences: What if your car’s braking system fails because of a hardware issue? Or the fly-by-wire communication in a plane fails? Or what if a satellite falls to earth because its orbit control fails? Some bugs and hardware failures are simply not acceptable.

Processor quality and therefore its reliability is the main concern of processor verification teams. And processor verification is a subtle art.

The subtle art of processor verification

Processor verification requires strategy, diligence and completeness.

Verifying a processor means taking uncertainty into account. What software will run on the end product? What will be the use cases? What asynchronous events could occur? These unknowns mean significantly opening the verification scope. However, it is impossible to cover the entire processor state space, and it is not something to aim for.

Processor quality must be ensured while making the best use of time and resources. At the end of the day, the ROI must be positive. Nobody wants to find costly bugs after the product release, and nobody wants to delay a project because of an inefficient verification strategy. Doing smart processor verification means finding relevant bugs efficiently and as early as possible in the product development.

In other words, processor verification must:

  • Be done during processor development, in collaboration with design teams. Verifying a processor design once it is finalized is not enough. Verification must drive the design, to some extent.
  • Be a combination of all industry standard techniques. There are different types of bugs of different levels of complexity that you might find using random testing, code coverage, formal proofs, power-aware simulation, etc. Using multiple techniques allows you to maximize the potential of each of them (what we could also call the Swiss cheese model) and efficiently cover as many corner cases as possible.
  • It is an ongoing science. What works is a combination of techniques that evolve as processors become more complex. We develop different approaches as we learn from previous bugs and designs to refine our verification methodology and offer best-in-class quality IP.

Processor quality is fundamental. The art of verifying a processor is a subtle one that is evolving as the industry is changing and new requirements arise. At Codasip, we put in place verification methodologies that allow us to deliver high-quality RISC-V customizable processors. With Codasip Studio and associated tools, we provide our customers with the best technology that helps them follow up and verify their specific processor customization.

Philippe Luc

Philippe Luc

Related Posts

Check out the latest news, posts, papers, videos, and more!

Building a Swiss cheese model approach for processor verification

April 29, 2022
By Philippe Luc

Measuring the complexity of processor bugs to improve testbench quality

April 4, 2022
By Philippe Luc

Improve Your Verification Methodology: Hunt Bugs Flying in Squadrons

March 14, 2022
By Philippe Luc

Codasip and Avery Partner to Improve Regression Test Methodology of RISC-V Processors


November 8, 2017

Brno, Czech Republic – November 8th 2017 – Codasip, the leading supplier of RISC-V® embedded CPU cores, today announced its partnership with Avery Design Systems, the provider of cutting-edge verification intellectual property (VIP) solutions for SoC and IP companies.

Codasip develops licensable RISC-V processors, the Berkelium (Bk) series, via a unique customization tool called Codasip Studio, allowing for fast configuration and optimization of the cores. Studio enables practically an endless number of RISC-V variants, which places extensive demands on verification.

“With the flexibility of Codasip Studio, extensive verification becomes essential, and we are constantly on the lookout for innovative VIP solutions that will make a part of the verification process faster, easier, or more reliable,” says Marcela Zachariášová, the VP of Verification at Codasip. “Avery Design Systems offer some very useful features.”

Specifically, Codasip employs the Avery VIP fault injection feature, which introduces random or precisely-planned faults into the communication lines between the processor and the surrounding components. This allows simulation of unexpected corner cases. Such stress testing is vital to ensure that the processors are robust and reliable even when faults occur.

“We need to ensure that all variants of our RISC-V processors handle error scenarios correctly and can respond to any type of error from the surrounding components without crashing or freezing. Avery’s fault injection helps us analyze such scenarios in our cores,” explains Mrs. Zachariášová.

Codasip itself, and with assistance of industry alliances, has introduced innovations in the field of verification, achieving best-in-class results in verification automation and acceleration. The recent introduction of Avery’s fault injection technology has helped to further improve Codasip’s regression methodology.

“We have partnered with Avery because of unique benefits their fault injection technology brings,” concludes Mrs. Zachariášová.

About Codasip

Codasip delivers leading-edge processor IP and high-level design tools that provide ASIC designers with all the advantages of an open standard, such as the RISC-V ISA, along with the unique ability to automatically optimize the processor IP. As a founding member of the RISC-V Foundation and a long-term supplier of LLVM and GNU based processor solutions, Codasip is committed to open standards for embedded processors.

Formed in 2006 and headquartered in Brno, Czech Republic, Codasip currently has offices in the US and Europe, with representatives in Asia and Israel.

For more information about Codasip’s products and services, visit codasip.com.

About Avery

Founded in 1999, Avery Design Systems, Inc. enables system and SoC design teams to achieve dramatic functional verification productivity improvements through formal analysis applications for RTL and gate-level X verification, and robust verification IPs for PCI Express, USB, AMBA, UFS, MIPI, DDR/LPDDR, HBM, HMC, ONFI/Toggle, NVM Express, SCSI Express, SATA Express, eMMC, SD/SDIO, Unipro, CSI/DSI, Soundwire, and CAN FD standards.

Avery is headquartered in Tewksbury, Massachusetts, and operates an R&D center in Taipei, Taiwan. Avery’s products are directly marketed and distributed in the US, Europe, Japan, Korea, and Taiwan.

For further information about Avery Design Systems, visit www.avery-design.com.

Kava

Related Posts

Check out the latest news, posts, papers, videos, and more!

How to reduce the risk when making the shift to RISC-V

November 17, 2022
By Lauranne Choquin

DAC 2022 – Is it too risky not to adopt RISC-V?

July 18, 2022
By Brett Cline

Processor design automation to drive innovation and foster differentiation

July 7, 2022
By Lauranne Choquin