LCM-Team
  • Datasets of Large Circuit Model
  • DeepCircuitX
    • Introduction
    • Source RTL code
    • RTL code annotations by GPT
    • Other modality information
    • RTL-Language Data for LLM Finetune
    • Data for PPA prediction
    • Tasks, experiments and results
      • LLM Finetune Results
      • PPA Prediction Results
  • ForgeEDA
    • Introduction
    • Data Preparation
    • Dataset
    • Practical Downstream Tasks
      • Practical EDA Applications
      • AI for EDA Applications
  • ForgeHLS
Powered by GitBook
On this page
  1. DeepCircuitX

RTL code annotations by GPT

PreviousSource RTL codeNextOther modality information

Last updated 5 months ago

To construct the RTL-language dataset, we organize the data into four distinct levels: repository, file, module, and block. The detailed example shown in the figure.

We employ a Chain of Thought (CoT) approach for RTL code annotation, leveraging GPT-4 and Claude to generate detailed comments, descriptions, and question-answer pairs.

RTL Category
Module-Level Annotations
Block-Level Annotations
Repository-Level Annotations

Chip

5,471

36,955

84

IP

12,863

20,101

183

Module

28,901

-

1,389

RISC-V

2,116

-

560

The table illustrates the number of annotations at the module, block, and repository levels for various RTL categories.

The annotation download url:

The annotation test case download url:

One case of our data structure:

chip/Communications_Processor/Design-of-reduced-latency-and-increased-throughput-Polar-Decoder

design_files
├── design_files/pe_1         //pe_1 in original code is a Verilog file
│   ├── design_files/pe_1/intermediate_comment
│   │   ├── design_files/pe_1/intermediate_comment/pe_1_QA.json
│   │   ├── design_files/pe_1/intermediate_comment/pe_1_module.json
│   │   └── design_files/pe_1/intermediate_comment/pe_1_spec.json
│   ├── design_files/pe_1/pe_1.txt  // Module-level comment
│   └── design_files/pe_1/spec
│       └── design_files/pe_1/spec/spec.txt   // file-level specification annotation
│   └── design_files/pe_1/pe_1.v  // file-level code
├── design_files/pe_2   ...
├── design_files/t_to_s_or_s_to_t   ...
├── design_files/sign_processing_unit   ...
├── design_files/half_adder_subtractor   ...
├── design_files/pe_1_modified_merge   ...
├── design_files/comparator_module   ...
├── design_files/full_adder_subtractor   ...
├── design_files/merged_pe_2   ...
TestBench_Files ...

Design-of-reduced-latency-and-increased-throughput-Polar-Decoder.txt  // Repo-level comment

The structure for the annotated Verilog code in the 'Design-of-reduced-latency-and-increased-throughput-Polar-Decoder' project.

The design_files folder contains individual Verilog files, with pe_1 serving as an example. Each module's source code (e.g., pe_1.v) is accompanied by various annotation files, such as intermediate comments, specifications, and a textual description (pe_1.txt).

These annotations are organized into subdirectories like intermediate_comment and spec. This structure enables detailed documentation and analysis of the Verilog code for various modules across the project.

deepcircuitX-annotation-data.zipGoogle Docs
All the RTL code and corresponding different level annotations
Communications_Processor.zipGoogle Docs
One complete case of our annotation data (with RTL code)
Illustration of the dataset repository structure with multi-level annotations
Logo
Logo