RTL-Language Data for LLM Finetune
example code completion for LLM training:
{
"instruction": "Given the detailed specifications of a module, generate the corresponding Verilog code.",
"input": "Module-level description:\nThe dout_monitor module tracks the rising edge of a sample clock signal to increment an address register. Implementation entails two primary blocks: one capturing the previous state of the sample clock using a temporary register, and another using this captured state to detect rising edges and conditionally increment the address, which is reset to 448 on a low reset signal.\nModule head:\n\nmodule dout_monitor(\ninput clk,rst_n,\noutput reg [`ADDRESS_WIDTH-1:0] addr,\ninput [`DATA_WIDTH-1:0] d_ch0,\ninput [`DATA_WIDTH-1:0] d_ch1,\ninput sample_clk\n)",
"output": "`include \"defines.v\"\nmodule dout_monitor(\ninput clk,rst_n,\noutput reg [`ADDRESS_WIDTH-1:0] addr,\ninput [`DATA_WIDTH-1:0] d_ch0,\ninput [`DATA_WIDTH-1:0] d_ch1,\ninput sample_clk\n);\nreg sample_clk_t;\nalways @(posedge clk)\nbegin\n\tsample_clk_t<=sample_clk;\nend\nalways @(posedge clk or negedge rst_n)\nbegin\n\tif (!rst_n) begin\n\t\taddr<=10'd448;\n\tend else begin\n\t\tif (sample_clk_t == 1'b0 && sample_clk == 1'b1)\n\t\tbegin\n\t\t\taddr <= addr + 10'd1;\n\t\tend\n\tend\nend\nendmodule"
},The code completion dataset download url:
example code generation for LLM training:
The code generation dataset download url:
example code understanding for LLM training:
The code understanding dataset download url:
Testing Dataset & Benchmark:
Dataset Counts for RTL Code Tasks
RTL Code Understanding
6,386
14,499
1,348
3,922
26,155
RTL Code Completion
6,178
14,131
1,312
3,822
25,443
RTL Code Generation
6,479
16,511
1,393
3,950
28,333
This table displays the data distribution for code understanding, completion, and generation tasks across different RTL categories.
RTL Code Understanding
This task evaluates the model’s ability to interpret and describe RTL code. Given a module’s RTL code as input, the model generates a detailed, concise description, covering key aspects such as the module’s purpose, input/output signals, internal logic, and overall behavior. This task is crucial for assessing the model’s ability to generate human-readable explanations for code analysis and documentation.
RTL Code Completion
In this task, the model is provided with a partial RTL code (typically the module header with input/output ports and parameters). The goal is for the model to complete the code by generating the missing internal logic, control structures, and signal definitions. This task mirrors autocompletion functionality found in modern code editors and evaluates the model’s ability to infer and generate code from context.
RTL Code Generation
In the RTL code generation task, the model is tasked with producing a full implementation of RTL code based on a high-level description and specified input and output parameters. The goal is to generate a fully functional Verilog module that adheres to the provided specifications. This task assesses the model’s ability to translate design requirements into precise RTL implementations, which is critical for automating the hardware design process.
Last updated
