Datasets of Large Circuit Model
The team is composed of members from CURE Lab at The Chinese University of Hong Kong, under the supervision of Qiang Xu.
DeepCircuitX (ICLAD 2025) provides a holistic, multilevel resource that spans repository, file, module, and block-level RTL code and corresponding annotations. Our dataset enables more nuanced training and evaluation of large language models (LLMs) for RTL-specific tasks.
Citation
If you use DeepCircuitX dataset in your research, please cite the original paper:
@misc{li2025deepcircuitxcomprehensiverepositoryleveldataset,
title={DeepCircuitX: A Comprehensive Repository-Level Dataset for RTL Code Understanding, Generation, and PPA Analysis},
author={Zeju Li and Changran Xu and Zhengyuan Shi and Zedong Peng and Yi Liu and Yunhao Zhou and Lingfeng Zhou and Chengyu Ma and Jianyuan Zhong and Xi Wang and Jieru Zhao and Zhufei Chu and Xiaoyan Yang and Qiang Xu},
year={2025},
eprint={2502.18297},
archivePrefix={arXiv},
primaryClass={cs.LG},
url={https://arxiv.org/abs/2502.18297},
}
ForgeEDA (ISEDA 2025) is an open-source, multifaceted dataset comprising 1,189 practical circuit designs across 6 categories: Processor, Arithmetic, Encoder/Decoder, Interface, Controller. ForgeEDA includes diverse circuit representations such as Register Transfer Level (RTL) code, Post- mapping (PM) netlists, And-Inverter Graphs (AIGs), and placed netlists, enabling comprehensive analysis and development.
Citation
@article{shi2025forgeeda,
title={ForgeEDA: A Comprehensive Multimodal Dataset for Advancing EDA},
author={Shi, Zhengyuan and Li, Zeju and Ma, Chengyu and Zhou, Yunhao and Zheng, Ziyang and Liu, Jiawei and Pan, Hongyang and Zhou, Lingfeng and Li, Kezhi and Zhu, Jiaying and others},
journal={arXiv preprint arXiv:2505.02016},
year={2025}
}
ForgeHLS is large-scale, open-source dataset explicitly designed for ML-driven HLS research. ForgeHLS comprises over 400,000 diverse designs generated from 536 kernels covering a broad range of application domains. Each kernel includes systematically automated pragma insertions (loop unrolling, pipelining, array partitioning), combined with extensive design space exploration using Bayesian optimization.
Citation
@article{peng2025forgehls,
title={ForgeHLS: A Large-Scale, Open-Source Dataset for High-Level Synthesis},
author={Peng, Zedong and Li, Zeju and Gao, Mingzhe and Xu, Qiang and Zhang, Chen and Zhao, Jieru},
journal={arXiv preprint arXiv:2507.03255},
year={2025}
}
For more details, please visit this introduction of these data.
Contact
lizeju0727@gmail.com (DeepCircuitX) zyshi21@cse.cuhk.edu.hk (ForgeEDA) zedongpeng1@gmail.com (ForgeHLS)
Last updated