Creating specialized architectures with processor design automation
August 29, 2022
August 29, 2022
With semiconductor scaling slowing down if not failing, SoC designers are challenged to find ways of meeting the demand for greater computational performance. In their 2018 Turing lecture, Hennessey & Patterson pointed out that new methods are needed to work around failing scaling and predicted ‘A Golden Age for Computer Architecture’. A key approach in addressing this challenge is to innovate architecturally and to create more specialized processing units – domain-specific processors and accelerators.
If you haven’t already, I recommend reading our white paper on semiconductor scaling and what is next for processors.
Specialized processor cores are, by definition, going to vary a lot depending on their workload. Some may be readily developed by customizing existing RISC-V processor cores. However, that approach will not work in every instance and sometimes developing a novel architecture may be necessary. In such cases it will be necessary to explore the instruction set architecture (ISA) and microarchitecture to find a good design solution.
Traditionally custom cores were developed by manually creating an instruction set simulator (ISS), software toolchain, and RTL. This process can be time-consuming and error prone. The alternative is to describe the processor core in a high-level language and to use processor design automation to generate the ISS, software toolchain, RTL and verification environment.
This is exactly what Codasip offers. Codasip Studio is our unique processor design automation toolset. It has been applied to RISC, DSP, and VLIW designs, and we use it for developing our own RISC-V cores. Processors are described using the CodAL architectural language which covers both instruction accurate (IA) and cycle accurate (CA) descriptions.
The first stage in developing a core optimized to a particular application is to define the instruction set. If a RISC-V standard wordlength such as 32-bits or 64-bits is acceptable, then the corresponding base integer instruction set can be a good starting point to save time. However, if a different wordlength such as 8- or 16-bits is chosen then RISC-V cannot be directly used. Instead instructions based on the smaller wordlength should be developed, such as Google did when developing their 8-bit tensor processing unit (TPU).
Once an initial instruction set is defined, an iterative exploration phase with the SDK in the loop can be undertaken. Rather than using an open-source toolchain or instruction set simulator such as GCC, LLVM or Spike, we recommend describing the instruction set, resources and semantics using our CodAL processor description language. Once the instruction set is modelled, the software toolchain (or SDK) can be automatically generated using the Codasip Studio toolset. The SDK can be used to profile real application software and any hotspots in the software can be identified.
Once hotspots are known, the instruction set can be adjusted to better meet the needs of the computational workload. The updated description in CodAL can be used to generate an SDK for further profiling. This can be repeated through further iterations until the ISA is finalized.
With a stable instruction set, the next phase is to develop the microarchitecture using CodAL. Decisions on forms of parallelism (such as multi-issue) or pipeline length or privilege modes need to be taken. This leads to a new iterative phase with the HDK in the loop.
CodAL is used to define a cycle accurate description of the core using the existing instruction set and resources. This in turn can be used to generate a CA simulator, RTL and testbench. The performance of the design can be assessed using the CA simulator, RTL simulation or an FPGA prototype. The RTL can be analyzed for silicon area and timing. If the microarchitecture is not meeting its targets, it can be modified and the HDK regenerated.
The final and essential stage is to verify the final RTL against an IA golden reference. Codasip Studio can generate the UVM verification environment for this. Additionally, Codasip Studio has tools to help with functional coverage and to generate random assembler programs. Users should also use their own directed tests and apply multiple verification strategies.
April 29, 2022
Processors all have high quality requirements and their reliability is the main concern of processor verification teams. Providing best-in-class quality products requires a strategic, diligent and thorough approach. Processor verification therefore plays a major role and it takes a combination of all industry standard techniques – like in a Swiss cheese model.
You’ve heard me say this before: processor verification is a subtle art. We need to take into account uncertainty, which means opening the scope of our verification while optimizing resources. On one hand, we want to find all critical bugs before final production, and on the other hand we must have an efficient verification strategy to fulfill time to market requirements. Producing smart processor verification means finding meaningful bugs as efficiently and as early as possible during the development of the product. One way of achieving this consists in combining all industry standard verification techniques. It is by creating redundancy that we find all critical bugs.
There are different types of bugs and each bug has a complexity – or bug score – that depends on the number of events and types of events required to trigger the bugs. Some might be found with coverage, others with formal proofs, etc. Imagine the Swiss cheese model applied to processor verification. Each slice of cheese is a verification technique which has some specific strengths to catch some categories of bugs. The risk of a bug escaping and making it into the end product is mitigated by the different layers and types of verification which are layered behind each other.
In a Swiss cheese model applied to processor verification, the principle is similar to the aviation industry: if there is a direct path going through all the slices, then there is a risk of plane crash. That is why the aviation industry is strict about procedures, checklists, and redundant systems. The objective is to add more slices and reduce the size of the holes on a slice so that in the end, there is no hole going through, and we deliver a quality processor.
By using several slices of cheese, or verification methods:
A hole in a slice is a hole in the verification methodology. The more holes, and the bigger the holes, the more bugs can escape. If the same area of the design (overlapping holes between cheese slices) is not covered and tested by any of the verification techniques, then the bug will make it through and end up in the final deliverables.
A good verification methodology must present as few holes as possible, as small as possible, on each slice. A solid strategy, experience, and efficient communication are important factors to deliver quality products.
When we find a bug, or a hole in a slice, during verification, we always fix it and check other slices for similar holes. Every slice should find the holes in the previous one and address them before progressing. Sanity checks are an efficient way to achieve this, for example by comparing our design with industry standard models such as Spike or Imperas.
In the Swiss cheese model applied to processor verification, if one technique is strengthened – an improved testbench, new assertions, etc. – the bug is found and fixed before the product goes into production. All processor verification techniques are important and it is the combination of all of them that makes each of them more efficient.
A single verification technique cannot do everything by itself, it is the action of all of them that improves the overall quality of the verification and processor design. There can be unexpected changes or factors during the development of a product, external actions that can impact the efficiency of a technique. For example, a change in the design not communicated to the verification team or a difficult Friday afternoon leading to human mistakes. These factors can increase the size of a hole in a slice, hence the importance of having more than one – and the importance of keeping engineering specifications up to date and communicating regularly between designers and verification engineers. Code reviews conducted by other team members is one efficient solution to achieve this, and that is what we do at Codasip.
At Codasip, we use verification technology and techniques that allows us to create redundancy, preventing holes to go through the pile of slices of cheese, and to deliver best-in-class RISC-V processors.
April 4, 2022
I am often asked the question “When is the processor verification done?” or in other words “how do I measure the efficiency of my testbench and how can I be confident in the quality of the verification?”. There is no easy answer. There are several common indicators used in the industry such as coverage and bug curve. While they are absolutely necessary, these are not enough to reach the highest possible quality. Indeed, such indicators do not really unveil the ability of verification methodologies to find the last bugs. With experience, I learned that measuring the complexity of processor bugs is an excellent indicator to use throughout the development of the project.
Experience taught me that we can define the complexity of a bug by counting the number of independent events or conditions that are required to hit the bug.
Let’s take a simple example. A typical bug is found in the caches, when a required hazard is missing. Data corruption can occur when:
External memory returns the previous data because the most recent data from the eviction got lost, causing data corruption.
In this example, 4 events – or conditions – are required to hit the bug. These 4 events give the bug a score of 4, or in other words a complexity of 4.
To measure the complexity of a bug, we can come up with a classification that will be used by the entire processor verification team. In a previous blog post, we discussed 4 types of bugs and explained how we use these categories to improve the quality of our testbench and verification. Let’s go one step further and combine this method with bug complexity.
An easy bug can require between 1 and 3 events to be triggered. The first simple test fails. A corner case is going to need 4 or more events.
Going back to our example above, we have a bug with a score of 4. If one of the four conditions is not present, then the bug is not hit.
A constrained random testbench will need several features to be able to hit the example above. The sequence of addresses should be smart enough to reuse previous addresses from previous requests, delays on external buses should be sufficiently atypical to have fast Reads and slow-enough Writes.
A hidden case will need even more events. Perhaps a more subtle bug has the same conditions as our example, but it only happens when an ECC error is discovered on the cache, at the exact same time as an interrupt happens, and only when the core finishes an FPU operation that results in a divide-by-zero error. With typical random testbenches, the probability to have all these conditions together is extremely low, making it a “hidden” bug.
Making these hidden bugs more reachable in the testbench is improving the quality of verification. It consists in making hidden cases become corner cases.
This classification does not have any limit. Experience has shown me that a testbench capable of finding bugs with a score of 8 or 9 is a strong simulation testbench and is key to delivering quality RTL. From what I have seen, today the most advanced simulation testbenches can find bugs with a complexity level up to 10. Fortunately, the use of formal verification makes it much easier to find bugs that have an even higher complexity, paving the way to even better design, and giving clues about what to improve in simulation.
This classification and methodology is useful only if it is used from the moment verification starts and throughout the project development, for 2 reasons:
Finally, by combining this approach with our methodology that consists of hunting bugs flying in squadrons, we ensure high-level quality verification that helps us be confident that are going beyond verification sign-off criteria.
March 14, 2022
Creating a quality RISC-V processor requires a verification methodology that enforces the highest standards. In this article, Philippe Luc, Director of Verification at Codasip, explains the methodology that is adopted at Codasip to bring processor verification to the next level.
After analyzing bugs on several generations of CPUs, I came to the conclusion that “bugs fly in squadrons”. In other words, when a bug is found in a given area of the design, the probability that there are other bugs with similar conditions, in the same area of the design, is quite high.
Finding a CPU bug is always satisfying, however it should not be an end in itself. If we consider that bugs do not fly alone but rather fly in groups – or squadrons – finding one bug should be a hint for the processor verification team to search for more of them, in the same area.
Here is a scenario. A random test found a bug after thousands of hours of testing. We could ask ourselves: How did it find this bug? The answer is likely to be a combination of events that had not been encountered before. Another question could be: Why did the random test find this bug? It would most likely be due to an external modification: a change in parameter in the test, an RTL modification, or a simulator modification for example.
With this new, rare bug found, we know that we have a more performant testbench that can now test a new area of the design. However we also learn that, before the testbench got improved, that area of the design was not stressed. If we consider that bugs fly in squadrons, it means we have a new area of the design to further explore to find more bugs. How are we going to improve our verification methodology?
To improve our testbench and hit these bugs, we can add checkers and assertions, and we can add tests. Let’s focus on testing.
To enlarge the scope so that we are confident we will hit these bugs, we use smart-random testing. When reproducing this bug with a directed testing approach, only the exact same bug is hit. However, we said that bugs fly in groups and the probability that there are other bugs in the same area, with similar conditions, is high. The idea is then to enlarge our scope. Random testing will not be as useful in this case, because we have an idea of what we want to target, following the squadron pattern.
Let’s assume that the bug was found on a particular RISC-V instruction. Can we improve our testing by increasing the probability of having this instruction tested? At first glance, probably, because statistically you get more failures exposing the same bug. However, most bugs are found with a combination of rare events: a stalled pipeline, a full FIFO, or some other microarchitectural implementation details. Standard testbenches can easily tune the probability of an instruction by simply changing a test parameter. But making a FIFO full is not directly accessible from the test parameter. It is a combination of other independent parameters (such as delays) that make the FIFO full more often.
Using smart-random testing in our verification methodology allows us to be both targeted and broad enough to efficiently find more bugs in this newly discovered area. It consists in tuning the test to activate more often the other events that trigger the bug. In other words, it means adjusting several parameters of the test, and not just one. It may seem more time consuming, but this methodology is really efficient in terms of improving the quality of our testing.
Improving testbenches by following bug squadrons, and killing each of them during the product development is key. This is exactly what the Codasip verification teams do to offer best-in-class quality RISC-V processors to our customers.
March 7, 2022
Philippe Luc, Director of Verification at Codasip, shares his view on what bugs verification engineers should pay attention to.
Did you know that between 1,000 and 2,000 bugs can appear during the design of a complex processor core? Really, a thousand bugs? Well, that’s what experience showed us. And not all bugs were born equal: their importance and consequences can vary significantly. Let’s go through 4 categories of CPU bugs, how to find them, and what the consequences would be for the user if we did not find them.
“Oh, I forgot the semicolon”. Yes, that is one bug. Very easy to detect, it is typically one you find directly at compile time. Apart from having your eyes wide-open, there is nothing else to do to avoid these.
“Oh, it turns out that a part of the specification has not been implemented”. That is another easy CPU bug for you to find with any decent testbench – provided that an explicit test exists. In this scenario, the first simple test exercising the feature will fail. What does your processor verification team need to do? Make sure you have exhaustive tests. The design team, on the other hand, needs to make an effort to carefully read the specifications, and follow any changes in the specification during the development.
In other words, the easy bug is one that is found simply by running a test that exercises the feature. Its (bad) behavior is systematic, not a timing condition. Being exhaustive in your verification is the key to finding such CPU bugs. Code coverage will help you but is definitely not enough. If a feature is not coded in the RTL, how can coverage report that it is missing? A code review – with the specification at hand – definitely helps.
A corner case CPU bug is more complex to find and requires a powerful testbench. The simple test cases that exercise the feature are correctly passing, even with random delays. Quite often, you find these bugs when asynchronous events join the party. For example, an interrupt arriving just between 2 instructions, at a precise timing. Or a line in the cache got evicted just when the store buffer wants to merge into. To reach these bugs, you need a testbench that juggles with the instructions, the parameters and the delays so that all the possible interleaving of instructions and events have been exercised. Obviously, a good checker should spot any deviation from what is expected.
Does code coverage help in that case? Unfortunately not. Simply because the condition of the bug is a combination of several events that are already covered individually. Here, condition coverage or branch coverage might be helpful. But it is painful to analyze and it is rarely beneficial in the end.
The hidden bugs are found by customers (which is bad), or by chance (internally, before release). In both cases, it means that the verification methodology was not able to find them.
If you use different testbenches or environments, you could find other cases just because the stimuli are different. Fair enough. Then, what do we mean by “found by chance”? Here comes the limit of random testbench methodology.
With random stimuli, the testbench usually generates the “same” thing. If you roll a dice to get a random number, there are very few chances to get 10 times in a row the number 6. One chance in 60 million, to be accurate. With a RISC-V CPU that has 100 different instructions, a (equiprobable) random instruction generator has only 1 chance every 10²⁰ times to generate 10 times in a row the same instruction. Just twice the number of different positions of a Rubik’s Cube… On a 10-stage pipeline processor, it is not unreasonable to test it with the same instruction present on all pipeline stages. Good luck if you don’t tune your random constraints…
You can take looking for corner cases and hidden cases too far and end up creating tests that are simply too silly.
Changing the endianness back and forth every cycle while connecting the debugger is probably something that will never ever happen on a consumer product, if the consequences of a CPU bug are never visible to a customer, then it is not really a bug. If you deliberately unplug your USB stick while you copy a file, and the file is corrupted, I consider this not a bug. If some operation causes the USB controller to hang, then yes, that is a bug.
Beware of extending the scope of the verification . When silly cases are found, then you are probably investing engineering effort in the wrong place.
There are different verification techniques you can apply to efficiently find CPU bugs before your customers do. At Codasip, we use multiple component testbenches, various random test generators, random irritators, and several other techniques to verify our products. As the project evolves, we develop these techniques to have a robust verification methodology. Learn more in our blog post where we explain how we continuously improve our verification methodology.
February 28, 2022
Finding a hardware bug in silicon has consequences. The severity of these consequences for the end user can depend on the use case. For the product manufacturer, fixing a bug once a design is in mass-production can incur a significant cost. Investing in processor verification is therefore fundamental to ensure quality. This is something we care passionately about at Codasip, here is why you should too.
Luckily for the semiconductor industry, there are statistically more bugs in software than in hardware, and in processors in particular. However, software can easily be upgraded over the air, directly in the end-products used by consumers. With hardware, on the other hand, this is not as straightforward and a hardware issue can have severe consequences. The quality of our deliverables, which will end up in real silicon, seriously matters.
Processors are ubiquitous. They control the flash memory in your laptop, the braking system of your car or the chip on your credit card. These CPUs have different performance requirements but also different security and safety requirements. In other words, different quality requirements.
Is it a major issue if the Wi-Fi chip in your laptop is missing a few frames? The Wi-Fi protocol retransmits the packet and it goes largely unnoticed. If your laptop’s SSD controller drops a few packets and corrupts the document you have been working on all day It will be a serious disruption to your work, there may be some shouting, but you will recover. It’s a bug that you might be able to accept.
Other hardware failures have much more severe consequences: What if your car’s braking system fails because of a hardware issue? Or the fly-by-wire communication in a plane fails? Or what if a satellite falls to earth because its orbit control fails? Some bugs and hardware failures are simply not acceptable.
Processor quality and therefore its reliability is the main concern of processor verification teams. And processor verification is a subtle art.
Processor verification requires strategy, diligence and completeness.
Verifying a processor means taking uncertainty into account. What software will run on the end product? What will be the use cases? What asynchronous events could occur? These unknowns mean significantly opening the verification scope. However, it is impossible to cover the entire processor state space, and it is not something to aim for.
Processor quality must be ensured while making the best use of time and resources. At the end of the day, the ROI must be positive. Nobody wants to find costly bugs after the product release, and nobody wants to delay a project because of an inefficient verification strategy. Doing smart processor verification means finding relevant bugs efficiently and as early as possible in the product development.
In other words, processor verification must:
Processor quality is fundamental. The art of verifying a processor is a subtle one that is evolving as the industry is changing and new requirements arise. At Codasip, we put in place verification methodologies that allow us to deliver high-quality RISC-V customizable processors. With Codasip Studio and associated tools, we provide our customers with the best technology that helps them follow up and verify their specific processor customization.
October 1, 2021
Processor customization is one approach to optimizing a processor IP core to handle a specific workload. The idea is to take an existing core that could partially meet your requirements and use it as a starting point for your optimized processor. Now, why and how to customize a processor?
Before we start, let’s make sure we are all on the same page. Processor configuration and processor customization are two different things. Configuring a processor means setting the options made available by your IP vendor (cache size, MMU support, etc.). Customizing a processor means adding or changing something that requires more invasive changes such as changing the ISA, writing new instructions. In this blog post we focus on processor customization.
Customizing an existing processor is particularly relevant when you create a product that must be performant, area efficient, and energy efficient at the same time. Whether you are designing a processor for an autonomous vehicle that requires both vector instructions and low-power features, or a processor for computational storage with real-time requirements and power and area constraints, you need an optimized and specialized core.
Processor customization allows you to bring in a single processor IP all the architecture extensions you need, standard or custom, that would have been available in either multiple IPs on the SoC or in one big, energy intensive IP. Optimizing an existing processor for your unique needs has significant advantages:
Now, one may think that this is not so easy. How reliable is the verification of a custom processor? Differentiation is becoming more difficult, time consuming, and sometimes more expensive. The success of processor customization relies on two things:
Remember: the RISC-V Instruction Set Architecture (ISA) was created with customization in mind. If you want to create a custom processor, starting from an existing RISC-V processor is ideal.
You can add optional standard extensions and non-standard custom extensions on top of the base instruction set to tailor your processor for a given application.
For a robust customization process that ensures quality in the design and confidence in the verification, automation is key.
With Codasip you can license RISC-V processors:
CodAL is used to design Codasip RISC-V processors and generate the SDK and HDK. You can then edit the CodAL source code to create your own custom extensions and modify other architectural features as needed.
Microsemi opted for this approach as they wanted to replace a proprietary embedded core with a RISC-V one. Check this great processor customization use case with Codasip IP and technology!
The legacy approach to adding new instructions to a core is based on manual editing. Adding custom instructions must be reflected in the following areas:
With the software toolchain, intrinsics can be created so that the new instructions are used by the compiler, but this also means that the application code needs updating. However, modifying the existing ISS and RTL are both potential sources of errors. Lastly, if the verification environment needs changing, this is a further area for problems. Verifying these manual changes is a big challenge and adds risk to the design project.
Some vendors offer partially automated solutions, but by not covering all aspects of processor customization they still leave room for error due to the manual changes.
In contrast, with Codasip the changes are only made to the CodAL source code. The LLVM toolchain is automatically generated with support for the new instructions. Similarly, the ISS and RTL are generated to include the custom instructions and can be checked using the updated UVM environment. This approach not only saves time, but is a more robust customization process.
As differentiation is becoming more difficult, time consuming and sometimes more expensive with traditional processor design, customizing a processor so that it will meet your unique requirements is the key. Creating an application-specific processor efficiently, without compromising PPA, requires an open-source architecture and tools to automate the design and verification process. Find out more in our white paper on “Creating Domain-Specific Processors using custom RISC-V ISA instructions”.
August 12, 2021
For many years, people have been talking about configuring processor IP cores, but especially with growing interest in the open RISC-V ISA, there is much more talk about customization. So, processor configuration vs. customization: what is the difference?
A simple analogy is to think of ordering a pizza. With most pizzerias, you have standard bases and a choice of toppings from a limited list. You can configure the pizza to the sort of taste you would like based on the standard set of options available.
Processor IP vendors have typically offered some standard options to their customers, such as optional caches, tightly coupled memories, and on-chip debug, so that they could combine them and provide the customers with suitable configurations for their needs. While doing so, the core itself remains the same, or has very limited variations. Certainly, the instruction set, register set, and pipeline would remain the same, and only optional blocks such as caches are allowed to vary.
Today, many users are demanding greater specialization and variability in their processor cores. This may be to achieve enhanced performance while keeping down silicon area and power consumption. There may be a number of ways in which this can be achieved, for example, by creating custom instructions optimized to the target application, adding extra ports and registers. Such changes fundamentally alter the processor core itself.
Returning to the pizza analogy, customization is like if a private chef has an underlying pizza base recipe but is willing not only to let you provide alternative toppings, but to modify the pizza base, with alternatives to the standard flour, oil, and yeast ingredients used. This is quite a good reason why you would want to customize a processor, isn’t it!
And this is exactly what RISC-V allows you to do. You can customize an existing RISC-V processor to meet your specific requirements by adding optional standard extensions and non-standard custom extensions.
Although some proprietary IP suppliers allow their cores to be extended, the greatest customization opportunity lies with RISC-V. The ISA was conceived from the outset to support custom instructions.Codasip RISC-V processorsCodasip RISC-V Processors were developed using the CodAL architecture description language and are readily customized using Codasip Studio.Codasip StudioFor more information on how custom instructions can be used to create domain-specific processors, download our white paper.
May 21, 2021
For about fifty years, IC designers have been relying on different types of semiconductor scaling to achieve gains in performance. Best known is Moore’s Law which predicted that the number of transistors in a given silicon area and clock frequency would double every two years. This was combined with Dennard scaling which predicted that with silicon geometries and supply voltages shrinking, the power density would remain the same from generation to generation, meaning that power would remain proportional to silicon area. Combining these effects, the industry became used to processor performance per watt doubling approximately every 18 months. With successively smaller geometries, designers could use similar processor architectures but rely on more transistors and higher clock frequencies to deliver improved performance.
48 Years of Microprocessor Trend Data. Source K. Rupp.
Since about 2005, we have seen the breakdown of these predictions. Firstly, Dennard scaling ended with leakage current rather than transistor switching being the dominant component of chip power consumption. Increased power consumption means that a chip is at the risk of thermal runaway. This has also led to maximum clock frequencies levelling out over the last decade.
Secondly, the improvements in transistor density have fallen short of Moore’s Law. It has been estimated that by 2019, actual improvements were 15× lower than predicted by Moore in 1975. Additionally, Moore predicted that improvements in transistor density would be accompanied by the same cost. This part of his prediction has been contradicted by the exponential increases in building wafer fabs for newer geometries. It has been estimated that only Intel, Samsung, and TSMC can afford to manufacture in the next generation of process nodes.
With the old certainties of scaling silicon geometries gone forever, the industry is already changing. As shown in the chart above, the number of cores has been increasing and complex SoCs, such as mobile phone processors, will combine application processors, GPUs, DSPs, and microcontrollers in different subsystems.
However, in a post-Dennard, post-Moore world, further processor specialization will be needed to achieve performance improvements. Emerging applications such as artificial intelligence are demanding heavy computational performance that cannot be met by conventional architectures. The good news is that for a fixed task or limited range of tasks, energy scaling works better than for a wide range of tasks. This inevitably leads to creating special purpose, domain-specific accelerators.
This is a great opportunity for the industry.
A domain-specific accelerator (DSA) is a processor or set of processors that are optimized to perform a narrow range of computations. They are tailored to meet the needs of the algorithms required for their domain. For example, for audio processing, a processor might have a set of instructions to optimally implement algorithms for echo-cancelling. In another example, an AI accelerator might have an array of elements including multiply-accumulate functionality in order to efficiently undertake matrix operations.
Accelerators should also match their wordlength to the needs of their domain. The optimal wordlength might not match common ones (like 32-bits or 64-bits) encountered with general-purpose cores. Commonly used formats, such as IEEE 754 which is widely used, may be overkill in a domain-specific accelerator.
Also, accelerators can vary considerably in their specialization. While some domain-specific cores may be similar to or derived from an existing embedded core, others might have limited programmability and seem closer to hardwired logic. More specialized cores will be more efficient in terms of silicon area and power consumption.
With many and varied DSAs, the challenge will be how to define them efficiently and cost-effectively.