RTL code annotations by GPT

To construct the RTL-language dataset, we organize the data into four distinct levels: repository, file, module, and block. The detailed example shown in the figure.

We employ a Chain of Thought (CoT) approach for RTL code annotation, leveraging GPT-4 and Claude to generate detailed comments, descriptions, and question-answer pairs.

RTL Category
Module-Level Annotations
Block-Level Annotations
Repository-Level Annotations

Chip

5,471

36,955

84

IP

12,863

20,101

183

Module

28,901

-

1,389

RISC-V

2,116

-

560

The table illustrates the number of annotations at the module, block, and repository levels for various RTL categories.

The annotation download url:

All the RTL code and corresponding different level annotations

if you get some problems when using this link, such as 'cannot unzip' , 'cannot download' and 'the link is not valid', you can try this new link: https://huggingface.co/datasets/zeju-0727/DeepCirCuitX_Dataset

The annotation test case download url:

One complete case of our annotation data (with RTL code)

One case of our data structure:

chip/Communications_Processor/Design-of-reduced-latency-and-increased-throughput-Polar-Decoder

design_files
├── design_files/pe_1         //pe_1 in original code is a Verilog file
│   ├── design_files/pe_1/intermediate_comment
│   │   ├── design_files/pe_1/intermediate_comment/pe_1_QA.json
│   │   ├── design_files/pe_1/intermediate_comment/pe_1_module.json
│   │   └── design_files/pe_1/intermediate_comment/pe_1_spec.json
│   ├── design_files/pe_1/pe_1.txt  // Module-level comment
│   └── design_files/pe_1/spec
│       └── design_files/pe_1/spec/spec.txt   // file-level specification annotation
│   └── design_files/pe_1/pe_1.v  // file-level code
├── design_files/pe_2   ...
├── design_files/t_to_s_or_s_to_t   ...
├── design_files/sign_processing_unit   ...
├── design_files/half_adder_subtractor   ...
├── design_files/pe_1_modified_merge   ...
├── design_files/comparator_module   ...
├── design_files/full_adder_subtractor   ...
├── design_files/merged_pe_2   ...
TestBench_Files ...

Design-of-reduced-latency-and-increased-throughput-Polar-Decoder.txt  // Repo-level comment

The structure for the annotated Verilog code in the 'Design-of-reduced-latency-and-increased-throughput-Polar-Decoder' project.

The design_files folder contains individual Verilog files, with pe_1 serving as an example. Each module's source code (e.g., pe_1.v) is accompanied by various annotation files, such as intermediate comments, specifications, and a textual description (pe_1.txt).

These annotations are organized into subdirectories like intermediate_comment and spec. This structure enables detailed documentation and analysis of the Verilog code for various modules across the project.

Illustration of the dataset repository structure with multi-level annotations

Last updated