The Cache Coherency TrekApp provides an automated solution for one of the toughest challenges in system verification: ensuring that multi-processor designs with multi-level caches structures remain consistent even under high system stress. This verification is essential for any engineers building their own multi-processor solution, modifying a commercially available IP product, or adding coherent elements such as a DSP or GPU to an IP product. For those engineers using a commercial solution unmodified, the Cache Coherency TrekApp may be valuable as a validation that the supplier has no lingering cache-related bugs.
Cache coherency has several aspects that make it difficult to verify. Different caches in different clusters and at different levels (L1, L2, and L3) may have different cache line widths and different address maps. Referenced data not crossing a cache line in one cache may cross a line in another. Proper updates of all caches are essential to preserve data ordering. For example, two CPUs may pass data as follows:
- CPU0 writes data A to a specific memory location
- CPU0 writes flag B to a different memory location
- CPU1 monitors flag B, and sees that it is set
- CPU1 reads data A and resets flag B
In this scenario, CPU1 must get the updated value of data A as written by CPU0, not a stale copy of an older value from its cache or from a higher-level (L2 or L3) cache.
The third major issue in cache coherency verification is non-determinism. Fully verifying all the individual cache policies is very hard since it is impossible to predict the exact transitions that will occur, especially under heavy system load. Breker’s approach is to use Trek family technology to generate complex test cases in which multiple processors are reading and writing from multiple memories in a manner expressly designed to stress the caches and the cache algorithms. Aspects of these test cases include:
- Exercising single-processor and multi-processor cache state transitions
- Crossing cache line boundaries with misaligned multi-byte operations
- Forcing multi-level (to L2, L3, and main memory) cache line evictions
- Creating different page tables with various security and coherency rules
- Accessing all memory types to stress different timing characteristics
- Exercising processor instructions for different sizes of data and bursts
The generated test cases can be run on any verification platform, from simulation and acceleration to emulation and FPGA prototyping, and on actual silicon in the lab. The features of the Cache Coherency TrekApp, including aggressive memory management and built-in inter-processor communication, are ideal for generating multi-threaded cache coherency tests. The following screenshot from the runtime TrekBox display shows an actual design with eight processors running in parallel with TrekApp-generated test cases that includes many different varieties of memory reads and write as well as cache snoops across the multi-level cache architecture.
The following graphs show the tremendous increase in cache verification provided by the test cases from the Cache Coherency TrekApp. The left-hand side shows the results from a set of hand-written tests for an actual design. Three different coherency-related metrics are displayed; clearly the design is not being well stressed. The right-hand side shows the results from a group of TrekApp test cases. Much more of the verification space is being covered, with more exercise of the design throughout that space.
To help you achieve similar results, the Cache Coherency TrekApp includes a graph-based scenario model designed to generate cache-related test cases. Training and documentation are available to help you configure the model for your specific system (number of processors, size and configuration of cache, multi-level cache structure, etc.) You do not need to be or become an expert on either scenario models or cache coherency in order to achieve a very high level of coverage with a minimal investment.