Meet us at the RISC-V Summit in San Jose, CA, on December 13-14, 2022! 

Measuring the complexity of processor bugs to improve testbench quality


April 4, 2022

I am often asked the question “When is the processor verification done?” or in other words “how do I measure the efficiency of my testbench and how can I be confident in the quality of the verification?”. There is no easy answer. There are several common indicators used in the industry such as coverage and bug curve. While they are absolutely necessary, these are not enough to reach the highest possible quality. Indeed, such indicators do not really unveil the ability of verification methodologies to find the last bugs. With experience, I learned that measuring the complexity of processor bugs is an excellent indicator to use throughout the development of the project.

What defines the complexity of a processor bug and how to measure it?

Experience taught me that we can define the complexity of a bug by counting the number of independent events or conditions that are required to hit the bug.

What do we consider an event?

Let’s take a simple example. A typical bug is found in the caches, when a required hazard is missing. Data corruption can occur when:

  1. A cache line at address @A is Valid and Dirty in the cache.
  2. A load at address @B causes an eviction of line @A.
  3. Another load at address @A starts.
  4. The external write bus is slower than the read, so the load @A completes before the end of the eviction.

External memory returns the previous data because the most recent data from the eviction got lost, causing data corruption.
In this example, 4 events – or conditions – are required to hit the bug. These 4 events give the bug a score of 4, or in other words a complexity of 4.

Classifying processor bugs

To measure the complexity of a bug, we can come up with a classification that will be used by the entire processor verification team. In a previous blog post, we discussed 4 types of bugs and explained how we use these categories to improve the quality of our testbench and verification. Let’s go one step further and combine this method with bug complexity.

An easy bug can require between 1 and 3 events to be triggered. The first simple test fails. A corner case is going to need 4 or more events.

Going back to our example above, we have a bug with a score of 4. If one of the four conditions is not present, then the bug is not hit.

A constrained random testbench will need several features to be able to hit the example above. The sequence of addresses should be smart enough to reuse previous addresses from previous requests, delays on external buses should be sufficiently atypical to have fast Reads and slow-enough Writes.

A hidden case will need even more events. Perhaps a more subtle bug has the same conditions as our example, but it only happens when an ECC error is discovered on the cache, at the exact same time as an interrupt happens, and only when the core finishes an FPU operation that results in a divide-by-zero error. With typical random testbenches, the probability to have all these conditions together is extremely low, making it a “hidden” bug.

Making these hidden bugs more reachable in the testbench is improving the quality of verification. It consists in making hidden cases become corner cases.

Analyzing the complexity of a bug helps improve processor quality

This classification does not have any limit. Experience has shown me that a testbench capable of finding bugs with a score of 8 or 9 is a strong simulation testbench and is key to delivering quality RTL. From what I have seen, today the most advanced simulation testbenches can find bugs with a complexity level up to 10.  Fortunately, the use of formal verification makes it much easier to find bugs that have an even higher complexity, paving the way to even better design, and giving clues about what to improve in simulation.

Using bug complexity to improve the quality of a verification testbench

This classification and methodology is useful only if it is used from the moment verification starts and throughout the project development, for 2 reasons:

  1. Bugs must be fixed as they are discovered. Leaving a level 2 or 3 bug unfixed means that a lot of failures happen when launching large soak testing. Statistically, a similar bug (from the same squadron) that requires more events could be unnoticed.
  2. Bug complexity is used to improve and measure the quality of a testbench. As the level of complexity matches with the number of events required to trigger the bug, the higher the complexity score the more stressing the testbench is. Keeping track and analyzing the events that triggered a bug is very useful to understand how to tune random constraints or to create a new functional coverage point.

Finally, by combining this approach with our methodology that consists of hunting bugs flying in squadrons, we ensure high-level quality verification that helps us be confident that are going beyond  verification sign-off criteria.

Philippe Luc

Philippe Luc

Related Posts

Check out the latest news, posts, papers, videos, and more!

Building a Swiss cheese model approach for processor verification

April 29, 2022
By Philippe Luc

Improve Your Verification Methodology: Hunt Bugs Flying in Squadrons

March 14, 2022
By Philippe Luc

4 Types of CPU Bug You Should Pay Attention To

March 7, 2022
By Philippe Luc

Improve Your Verification Methodology: Hunt Bugs Flying in Squadrons


March 14, 2022

Creating a quality RISC-V processor requires a verification methodology that enforces the highest standards. In this article, Philippe Luc, Director of Verification at Codasip, explains the methodology that is adopted at Codasip to bring processor verification to the next level.

After analyzing bugs on several generations of CPUs, I came to the conclusion that “bugs fly in squadrons”. In other words, when a bug is found in a given area of the design, the probability that there are other bugs with similar conditions, in the same area of the design, is quite high.

Processor bugs don’t fly alone

Finding a CPU bug is always satisfying, however it should not be an end in itself. If we consider that bugs do not fly alone but rather fly in groups – or squadrons – finding one bug should be a hint for the processor verification team to search for more of them, in the same area.

Here is a scenario. A random test found a bug after thousands of hours of testing. We could ask ourselves: How did it find this bug? The answer is likely to be a combination of events that had not been encountered before. Another question could be: Why did the random test find this bug? It would most likely be due to an external modification: a change in parameter in the test, an RTL modification, or a simulator modification for example.

With this new, rare bug found, we know that we have a more performant testbench that can now test a new area of the design. However we also learn that, before the testbench got improved, that area of the design was not stressed. If we consider that bugs fly in squadrons, it means we have a new area of the design to further explore to find more bugs. How are we going to improve our verification methodology?

Using smart-random testing improves your verification

To improve our testbench and hit these bugs, we can add checkers and assertions, and we can add tests. Let’s focus on testing.

To enlarge the scope so that we are confident we will hit these bugs, we use smart-random testing. When reproducing this bug with a directed testing approach, only the exact same bug is hit. However, we said that bugs fly in groups and the probability that there are other bugs in the same area, with similar conditions, is high. The idea is then to enlarge our scope. Random testing will not be as useful in this case, because we have an idea of what we want to target, following the squadron pattern.

Let’s assume that the bug was found on a particular RISC-V instruction. Can we improve our testing by increasing the probability of having this instruction tested? At first glance, probably, because statistically you get more failures exposing the same bug. However, most bugs are found with a combination of rare events: a stalled pipeline, a full FIFO, or some other microarchitectural implementation details. Standard testbenches can easily tune the probability of an instruction by simply changing a test parameter. But making a FIFO full is not directly accessible from the test parameter. It is a combination of other independent parameters (such as delays) that make the FIFO full more often.

Using smart-random testing in our verification methodology allows us to be both targeted and broad enough to efficiently find more bugs in this newly discovered area. It consists in tuning the test to activate more often the other events that trigger the bug. In other words, it means adjusting several parameters of the test, and not just one. It may seem more time consuming, but this methodology is really efficient in terms of improving the quality of our testing.

Improving testbenches by following bug squadrons, and killing each of them during the product development is key. This is exactly what the Codasip verification teams do to offer best-in-class quality RISC-V processors to our customers.

Philippe Luc

Philippe Luc

Related Posts

Check out the latest news, posts, papers, videos, and more!

How to reduce the risk when making the shift to RISC-V

November 17, 2022
By Lauranne Choquin

DAC 2022 – Is it too risky not to adopt RISC-V?

July 18, 2022
By Brett Cline

Processor design automation to drive innovation and foster differentiation

July 7, 2022
By Lauranne Choquin

4 Types of CPU Bug You Should Pay Attention To


March 7, 2022

Philippe Luc, Director of Verification at Codasip, shares his view on what bugs verification engineers should pay attention to.

Did you know that between 1,000 and 2,000 bugs can appear during the design of a complex processor core? Really, a thousand bugs? Well, that’s what experience showed us. And not all bugs were born equal: their importance and consequences can vary significantly. Let’s go through 4 categories of CPU bugs, how to find them, and what the consequences would be for the user if we did not find them.

Type 1: the processor bug that verification engineers can easily find

“Oh, I forgot the semicolon”. Yes, that is one bug. Very easy to detect, it is typically one you find directly at compile time. Apart from having your eyes wide-open, there is nothing else to do to avoid these.

“Oh, it turns out that a part of the specification has not been implemented”. That is another easy CPU bug for you to find with any decent testbench – provided that an explicit test exists. In this scenario, the first simple test exercising the feature will fail. What does your processor verification team need to do? Make sure you have exhaustive tests. The design team, on the other hand, needs to make an effort to carefully read the specifications, and follow any changes in the specification during the development.

In other words, the easy bug is one that is found simply by running a test that exercises the feature. Its (bad) behavior is systematic, not a timing condition. Being exhaustive in your verification is the key to finding such CPU bugs. Code coverage will help you but is definitely not enough. If a feature is not coded in the RTL, how can coverage report that it is missing? A code review – with the specification at hand – definitely helps.

Type 2: the corner case that verification teams like to find

A corner case CPU bug is more complex to find and requires a powerful testbench. The simple test cases that exercise the feature are correctly passing, even with random delays. Quite often, you find these bugs when asynchronous events join the party. For example, an interrupt arriving just between 2 instructions, at a precise timing. Or a line in the cache got evicted just when the store buffer wants to merge into. To reach these bugs, you need a testbench that juggles with the instructions, the parameters and the delays so that all the possible interleaving of instructions and events have been exercised. Obviously, a good checker should spot any deviation from what is expected.

Does code coverage help in that case? Unfortunately not. Simply because the condition of the bug is a combination of several events that are already covered individually. Here, condition coverage or branch coverage might be helpful. But it is painful to analyze and it is rarely beneficial in the end.

Type 3: The hidden CPU bug found by accident – or by customers

The hidden bugs are found by customers (which is bad), or by chance (internally, before release). In both cases, it means that the verification methodology was not able to find them.

If you use different testbenches or environments, you could find other cases just because the stimuli are different. Fair enough. Then, what do we mean by “found by chance”? Here comes the limit of random testbench methodology.

With random stimuli, the testbench usually generates the “same” thing. If you roll a dice to get a random number, there are very few chances to get 10 times in a row the number 6. One chance in 60 million, to be accurate. With a RISC-V CPU that has 100 different instructions, a (equiprobable) random instruction generator has only 1 chance every 10²⁰ times to generate 10 times in a row the same instruction. Just twice the number of different positions of a Rubik’s Cube… On a 10-stage pipeline processor, it is not unreasonable to test it with the same instruction present on all pipeline stages. Good luck if you don’t tune your random constraints…

Type 4: The silly bug that would not happen in real life

You can take looking for corner cases and hidden cases too far and end up creating tests that are simply too silly.

Changing the endianness back and forth every cycle while connecting the debugger is probably something that will never ever happen on a consumer product, if the consequences of a CPU bug are never visible to a customer, then it is not really a bug. If you deliberately unplug your USB stick while you copy a file, and the file is corrupted, I consider this not a bug. If some operation causes the USB controller to hang, then yes, that is a bug.

Beware of extending the scope of the verification . When silly cases are found, then you are probably investing engineering effort in the wrong place.

There are different verification techniques you can apply to efficiently find CPU bugs before your customers do. At Codasip, we use multiple component testbenches, various random test generators, random irritators, and several other techniques to verify our products. As the project evolves, we develop these techniques to have a robust verification methodology. Learn more in our blog post where we explain how we continuously improve our verification methodology.

Philippe Luc

Philippe Luc

Related Posts

Check out the latest news, posts, papers, videos, and more!

Building a Swiss cheese model approach for processor verification

April 29, 2022
By Philippe Luc

Measuring the complexity of processor bugs to improve testbench quality

April 4, 2022
By Philippe Luc

Improve Your Verification Methodology: Hunt Bugs Flying in Squadrons

March 14, 2022
By Philippe Luc