# Datasets of Large Circuit Model

*<mark style="color:purple;">**The team is composed of members from CURE Lab at The Chinese University of Hong Kong, under the supervision of Qiang Xu.**</mark>*&#x20;

**DeepCircuitX (ICLAD 2025) provides a holistic, multilevel resource that spans repository, file, module, and block-level RTL code and corresponding annotations. Our dataset enables more nuanced training and evaluation of large language models (LLMs) for RTL-specific tasks.**&#x20;

### Citation

If you use DeepCircuitX dataset in your research, please cite the original paper:

```
@misc{li2025deepcircuitxcomprehensiverepositoryleveldataset,
      title={DeepCircuitX: A Comprehensive Repository-Level Dataset for RTL Code Understanding, Generation, and PPA Analysis}, 
      author={Zeju Li and Changran Xu and Zhengyuan Shi and Zedong Peng and Yi Liu and Yunhao Zhou and Lingfeng Zhou and Chengyu Ma and Jianyuan Zhong and Xi Wang and Jieru Zhao and Zhufei Chu and Xiaoyan Yang and Qiang Xu},
      year={2025},
      eprint={2502.18297},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2502.18297}, 
}
```

**ForgeEDA (ISEDA 2025) is an open-source, multifaceted dataset comprising 1,189 practical circuit designs across 6 categories: Processor, Arithmetic, Encoder/Decoder, Interface, Controller. ForgeEDA includes diverse circuit representations such as Register Transfer Level (RTL) code, Post- mapping (PM) netlists, And-Inverter Graphs (AIGs), and placed netlists, enabling comprehensive analysis and development.**

### Citation

```
@article{shi2025forgeeda,
  title={ForgeEDA: A Comprehensive Multimodal Dataset for Advancing EDA},
  author={Shi, Zhengyuan and Li, Zeju and Ma, Chengyu and Zhou, Yunhao and Zheng, Ziyang and Liu, Jiawei and Pan, Hongyang and Zhou, Lingfeng and Li, Kezhi and Zhu, Jiaying and others},
  journal={arXiv preprint arXiv:2505.02016},
  year={2025}
}

```

**ForgeHLS is large-scale, open-source dataset explicitly designed for ML-driven HLS research.**\
**ForgeHLS comprises over 400,000 diverse designs generated from 536 kernels covering a broad range of application domains. Each kernel includes systematically automated pragma insertions**\
**(loop unrolling, pipelining, array partitioning), combined with extensive design space exploration using Bayesian optimization.**<br>

### Citation

```
@article{peng2025forgehls,
  title={ForgeHLS: A Large-Scale, Open-Source Dataset for High-Level Synthesis},
  author={Peng, Zedong and Li, Zeju and Gao, Mingzhe and Xu, Qiang and Zhang, Chen and Zhao, Jieru},
  journal={arXiv preprint arXiv:2507.03255},
  year={2025}
}

```

For more details, please visit this introduction of these data.

### Contact

***<lizeju0727@gmail.com> (DeepCircuitX)***\
***<zyshi21@cse.cuhk.edu.hk> (ForgeEDA)***\
***<zedongpeng1@gmail.com> (ForgeHLS)***


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://zeju.gitbook.io/lcm-team/overview.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
