Introduction
Last updated
Last updated
DeepCircuitX is a holistic, repository-level dataset curated to address limitations in existing datasets. It provides data and annotations across multiple levels:
Chip Level: 109repositories, 5508files.
IP Level: 225repositories, 12,961files.
Module Level: 2,383 repositories, 38,692 files.
RISCV: 2,078 repositories, 98,450 files.
Key Features:
Multi-level Source RTL code:Repository, file, module, and block
Multi-level annotations by GPT4o : Repository, file, module, and block.
Includes synthesized netlists, PPA metrics, and layout designs.
Benchmarks for RTL understanding, generation, and completion.
Chip
17
109
5,508
IP
3
225
12,961
Module
57
2,383
38,692
RISC-V
-
2,078
98,450
This table summarizes the number of functional categories, repositories, and RTL files across different levels of DeepCircuitX, including Chip, IP, Module, and RISC-V levels.
Chip
5,471
36,955
84
IP
12,863
20,101
183
Module
28,901
-
1,389
RISC-V
2,116
-
560
The table illustrates the number of annotations at the module, block, and repository levels for various RTL categories.
RTL Code Understanding
6,386
14,499
1,348
3,922
26,155
RTL Code Completion
6,178
14,131
1,312
3,822
25,443
RTL Code Generation
6,479
16,511
1,393
3,950
28,333
This table displays the data distribution for code understanding, completion, and generation tasks across different RTL categories.