LCM-Team
  • Datasets of Large Circuit Model
  • DeepCircuitX
    • Introduction
    • Source RTL code
    • RTL code annotations by GPT
    • Other modality information
    • RTL-Language Data for LLM Finetune
    • Data for PPA prediction
    • Tasks, experiments and results
      • LLM Finetune Results
      • PPA Prediction Results
  • ForgeEDA
    • Introduction
    • Data Preparation
    • Dataset
    • Practical Downstream Tasks
      • Practical EDA Applications
      • AI for EDA Applications
  • ForgeHLS
Powered by GitBook
On this page
  1. ForgeEDA

Data Preparation

PreviousIntroductionNextDataset

Last updated 5 months ago

We begin by collecting 1,189 high-quality RTL code repository from the DeepCircuitX, which can be processed by synthesis tool and has no syntax error.

We employ Synopsys Design Compiler and 12nm Process Design Kit (PDK) to synthesize RTL code repositories into post-mapped (PM) netlists, along with detailed synthesis reports. To expand our dataset, each module is treated as the top module in its respective synthesis flow. As a result, a total of 4,450 netlists are generated. Following synthesis, we utilize Cadence Innovus to carry out the floorplanning and placement steps in the ASIC physical design flow. This generates a placed netlist and a physical design report. After placement, the precise timing information can be extracted since the distances between ports and cells are established.

To support the AI4EDA solution, we construct a dataset using graphs represented in PyTorch Geometric. These graphs are derived from the PM netlists and AIGs. We also provide the sub-AIGs for model training, which are randomly extracted sub-circuits with 500-5,000 nodes. In total, 83,155 sub-circuits and their corresponding graph representations are generated for model training.