Catapult provides designers the ability to explore many different implementations of their C++ or SystemC design in a short period of time. With this ability to rapidly create different implementations, verification becomes a key component in the process.
Catapult provides the SCVerify environment, which allows one to verify the HLS-synthesized RTL code in the same testbench used to verify the C++ or SystemC model. Therefore, the testbench environment is a critical part of the overall HLS design experience for generating correct-by-construction RTL.
There are a few key items to consider when developing a testbench for a SystemC design that will be synthesized to RTL by Catapult. The testbench must be able to interact with the RTL to verify its functionality and performance.
1) The testbench should validate the “correctness of the design”:
As the source is modified you need to know that you haven’t changed the functionality of the design.
2) The testbench should be able to handle the synchronization, or handshaking, when interfacing with the resultant RTL model:
The RTL may not be able to accept a continuous stream of data; it may need to accept the data in bursts; or it may need large gaps between data samples. The testbench and RTL will thus need some type of handshaking to properly exchange the data.
3) The testbench should not limit the performance of the resulting RTL model:
The testbench should be “elastic” enough to process, accept, and/or provide the data to and from the RTL so that the actual performance is limited by the RTL and not the testbench.
A recent project highlighted the need for having proper synchronization and elasticity in the testbench. With Catapult, a library of Modular IO elements is provided to give the synchronization that is needed. We refer to this as the P2P I/O.
The initial SystemC design used signals as the means to connect the testbench to the block that would be synthesized. In the SystemC simulation this was sufficient. A snippet of code below shows this. Shown below are the sc_module declaration of the tx_sys, the portion of the testbench where the tx_sys is instantiated, and the sc_signals that are used to interconnect this tx_sys with the rest of the testbench.
However, the use of sc_signals and sc_in/out does not allow for any synchronization to take place. This causes the RTL simulation in SCVerify to fail. Upon inspection of the waveforms of the tx_data feeding into the tx_sys, we can see that a new input is being produced every clock cycle by the testbench.
The tx_sys code can process a value on tx_data only every 8 clock cycles. A snippet of code that reads tx_data shows that once a value is read, each bit of that 8-bit value is then stored in an array (that is mapped to a memory).
Adding P2P Interfaces for Synchronization
Realizing there was no synchronization, P2P interfaces were introduced into the design and testbench. These interfaces are not difficult to add to the design; basically, all you do is make a change in the port declaration in the sc_module and the “interconnect” between the modules in the testbench.
In the waveforms below, synchronization between the tx_sys and the testbench can be seen through the use of ready (rdy) and valid (vld) signals that are already present. The “rdy” and “vld” signals provide the handshaking necessary for the testbench to provide the input data stream (tx_data_i_dat) to the RTL model. This handshaking, or synchronization, will allow the testbench to provide data only when the RTL is able to use it (or consume it). There is a new input on tx_data every nine clocks.
The testbench and design (tx_sys) now appear to be “talking” to each other properly on the input side. We now shift to the outputs, the code that generates the tx_i and tx_q outputs is expected to generate an output every clock cycle. The data for the outputs has been stored in an array (memory), and the code is reading these out to the output ports.
The waveforms show that these outputs are “stuttering” when they were expected to be available every clock cycle. This stuttering is a result of the testbench not being able to consume the data as fast as tx_sys can produce it.
Adding P2P FIFOs as an Interconnect
To eliminate the stuttering, FIFOs are used to connect the tx_sys to the testbench to allow for some elasticity in the production and consumption of data. The code snippet below shows the addition of FIFOs. The template parameters after the type are:
- Depth: since this is in the testbench, the depth can be quite large.
- The other two parameters indicate the sense of the clock and reset.
Looking at the final waveforms after the FIFOs have been added, we see the final performance of the tx_sys RTL model; the testbench is not limiting the performance and the tx_sys is able to output a sample every clock.
SummarySystemC designs require some effort in creating a testbench that can interface to the resulting, synthesized RTL model. The addition of P2P interfaces and FIFOs allows the designer to utilize SCVerify and enables the existing testbench to simulate the performance of the original SystemC model as well as the synthesized RTL model.More information on the Modular IO and P2P interfaces can be found in the Catapult C Synthesis SystemC User’s Guide and the ccs_psp.h header file available in the Catapult Installation directory ($MGC_HOME/shared/include/ccs_p2p directory).
Written by Wayne Marking, Field Applications Engineer at Calypto Design Systems