Introduction

PreviousDatasets of Large Circuit Model NextSource RTL code

Last updated 3 months ago

Introduction

Overview

DeepCircuitX is a holistic, repository-level dataset curated to address limitations in existing datasets. It provides data and annotations across multiple levels:

Chip Level: 109repositories, 5508files.
IP Level: 225repositories, 12,961files.
Module Level: 2,383 repositories, 38,692 files.
RISCV: 2,078 repositories, 98,450 files.

Key Features:

Multi-level Source RTL code：Repository, file, module, and block
Multi-level annotations by GPT4o : Repository, file, module, and block.
Includes synthesized netlists, PPA metrics, and layout designs.
Benchmarks for RTL understanding, generation, and completion.

Table 1: Dataset Summary of DeepCircuitX

Level

Functional Categories

Number of Repositories

Number of RTL Files

Chip

109

5,508

225

12,961

Module

2,383

38,692

RISC-V

2,078

98,450

This table summarizes the number of functional categories, repositories, and RTL files across different levels of DeepCircuitX, including Chip, IP, Module, and RISC-V levels.

Table 2: Overview of Annotations in DeepCircuitX

RTL Category

Module-Level Annotations

Block-Level Annotations

Repository-Level Annotations

Chip

5,471

36,955

12,863

20,101

183

Module

28,901

1,389

RISC-V

2,116

560

The table illustrates the number of annotations at the module, block, and repository levels for various RTL categories.

Table 3: Dataset Counts for RTL Code Tasks

Task

Module

RISC-V

Chip

Total

RTL Code Understanding

6,386

14,499

1,348

3,922

26,155

RTL Code Completion

6,178

14,131

1,312

3,822

25,443

RTL Code Generation

6,479

16,511

1,393

3,950

28,333

This table displays the data distribution for code understanding, completion, and generation tasks across different RTL categories.

PreviousDatasets of Large Circuit Model NextSource RTL code

Last updated 3 months ago