RTL code annotations by GPT
Last updated
Last updated
To construct the RTL-language dataset, we organize the data into four distinct levels: repository, file, module, and block. The detailed example shown in the figure.
We employ a Chain of Thought (CoT) approach for RTL code annotation, leveraging GPT-4 and Claude to generate detailed comments, descriptions, and question-answer pairs.
Chip
5,471
36,955
84
IP
12,863
20,101
183
Module
28,901
-
1,389
RISC-V
2,116
-
560
The table illustrates the number of annotations at the module, block, and repository levels for various RTL categories.
The annotation download url:
The annotation test case download url:
One case of our data structure:
chip/Communications_Processor/Design-of-reduced-latency-and-increased-throughput-Polar-Decoder
The structure for the annotated Verilog code in the 'Design-of-reduced-latency-and-increased-throughput-Polar-Decoder' project.
The design_files
folder contains individual Verilog files, with pe_1
serving as an example. Each module's source code (e.g., pe_1.v
) is accompanied by various annotation files, such as intermediate comments, specifications, and a textual description (pe_1.txt
).
These annotations are organized into subdirectories like intermediate_comment
and spec
. This structure enables detailed documentation and analysis of the Verilog code for various modules across the project.