Data Preparation
Last updated
Last updated
We begin by collecting 1,189 high-quality RTL code repository from the DeepCircuitX, which can be processed by synthesis tool and has no syntax error.
We employ Synopsys Design Compiler and 12nm Process Design Kit (PDK) to synthesize RTL code repositories into post-mapped (PM) netlists, along with detailed synthesis reports. To expand our dataset, each module is treated as the top module in its respective synthesis flow. As a result, a total of 4,450 netlists are generated. Following synthesis, we utilize Cadence Innovus to carry out the floorplanning and placement steps in the ASIC physical design flow. This generates a placed netlist and a physical design report. After placement, the precise timing information can be extracted since the distances between ports and cells are established.
To support the AI4EDA solution, we construct a dataset using graphs represented in PyTorch Geometric. These graphs are derived from the PM netlists and AIGs. We also provide the sub-AIGs for model training, which are randomly extracted sub-circuits with 500-5,000 nodes. In total, 83,155 sub-circuits and their corresponding graph representations are generated for model training.