Page cover

Datasets of Large Circuit Model

The team is composed of members from CURE Lab at The Chinese University of Hong Kong, under the supervision of Qiang Xu.

DeepCircuitX provides a holistic, multilevel resource that spans repository, file, module, and block-level RTL code and corresponding annotations. Our dataset enables more nuanced training and evaluation of large language models (LLMs) for RTL-specific tasks.

Citation

If you use DeepCircuitX dataset in your research, please cite the original paper:

@misc{li2025deepcircuitxcomprehensiverepositoryleveldataset,
      title={DeepCircuitX: A Comprehensive Repository-Level Dataset for RTL Code Understanding, Generation, and PPA Analysis}, 
      author={Zeju Li and Changran Xu and Zhengyuan Shi and Zedong Peng and Yi Liu and Yunhao Zhou and Lingfeng Zhou and Chengyu Ma and Jianyuan Zhong and Xi Wang and Jieru Zhao and Zhufei Chu and Xiaoyan Yang and Qiang Xu},
      year={2025},
      eprint={2502.18297},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2502.18297}, 
}

ForgeEDA is an open-source, multifaceted dataset comprising 1,189 practical circuit designs across 6 categories: Processor, Arithmetic, Encoder/Decoder, Interface, Controller. ForgeEDA includes diverse circuit representations such as Register Transfer Level (RTL) code, Post- mapping (PM) netlists, And-Inverter Graphs (AIGs), and placed netlists, enabling comprehensive analysis and development.

Citation

@article{shi2025forgeeda,
  title={ForgeEDA: A Comprehensive Multimodal Dataset for Advancing EDA},
  author={Shi, Zhengyuan and Li, Zeju and Ma, Chengyu and Zhou, Yunhao and Zheng, Ziyang and Liu, Jiawei and Pan, Hongyang and Zhou, Lingfeng and Li, Kezhi and Zhu, Jiaying and others},
  journal={arXiv preprint arXiv:2505.02016},
  year={2025}
}

ForgeHLS is large-scale, open-source dataset explicitly designed for ML-driven HLS research. ForgeHLS comprises over 400,000 diverse designs generated from 536 kernels covering a broad range of application domains. Each kernel includes systematically automated pragma insertions (loop unrolling, pipelining, array partitioning), combined with extensive design space exploration using Bayesian optimization.

Citation

@article{
}

For more details, please visit this introduction of these data.

Contact

lizeju0727@gmail.com (DeepCircuitX) zyshi21@cse.cuhk.edu.hk (ForgeEDA) zedongpeng1@gmail.com (ForgeHLS)

Last updated