# LLM Finetune Results

### Code Understanding with LLM

The experimental results, shown in Table VI, demon- strate the performance of various base Large Language Models (LLMs) and fine-tuned LLMs on our RTL code understanding benchmark, using evaluation metrics BLEU-4, METEOR, ROUGE- 1, ROUGE-2, and ROUGE-L, which offer valuable insights into the quality of generated RTL code in terms of surface-level linguistic similarity.

<figure><img src="https://204291402-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FqpjfvyQt0RAeOzMVWp4g%2Fuploads%2FR97wPXGMsCLTtY8W6UEJ%2Fimage.png?alt=media&#x26;token=43b7a8fa-8dd5-4e19-aced-64a56499584e" alt=""><figcaption></figcaption></figure>

Initially, the original versions of the LLMs, such as CodeL- Lama, CodeT5+, CodeGen2, and DeepSeek, exhibit relatively low performance across most metrics.&#x20;

After fine-tuning on our dataset, every large model demonstrates significantly better performance across BLEU-4, METEOR, ROUGE- 1, ROUGE-2, and ROUGE-L metrics compared to their original, non-fine-tuned counterparts. This highlights the effectiveness of our dataset. Moreover, models of various sizes, such as the 220M CodeT5, as well as larger 7B and 16B models, all show substantial improvements after fine-tuning. This indicates that our dataset is well- suited for models of different scales, providing strong adaptability and generalization.

### Code Generation and Completion with LLM

In the realm of RTL code completion and generation, the evaluation of model performance is critical to advancing intelligent programming tools. The Pass\@k metric serves as a pivotal measure in this domain, quantifying the accuracy of code generation models by assessing their ability to produce valid solutions within the top-k predictions. Specifically, Pass\@k evaluates whether the correct code snippet appears among the model’s top k outputs, thereby providing insights into both the effectiveness and reliability of the model’s predictions.

<figure><img src="https://204291402-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FqpjfvyQt0RAeOzMVWp4g%2Fuploads%2FwGrHTnIZK7DfzlHaPusO%2Fimage.png?alt=media&#x26;token=53fd304f-9485-4227-ad22-dc85a94473b6" alt=""><figcaption></figcaption></figure>

Table VII compares the performance of both original and fine-tuned LLMs on RTL code completion and generation tasks, focusing on Pass\@1 and Pass\@5 on two evaluation benchmarks, RTLLM and VerilogEval. We choose the baseline LLMs as original versions of CodeLlama, CodeT5+, CodeGen2, and CodeGen2.5 exhibit negligible performance, with most Pass\@K scores at 0% or near 0%.

Notably, every model fine-tuned with our dataset significantly outperforms its original, non-fine-tuned counterpart, demonstrating the effectiveness of our data. Additionally, for models of different scales, such as the 220M CodeT5 and 7B models, the results after fine-tuning show substantial improvements. This highlights the adaptability and generalization capability of our dataset across various model sizes. Moreover, we include CodeV (QW-7B)  as an additional baseline, which achieves 14.80% Pass\@1 and Pass\@5 on RTLLM, and 4.5% on VerilogEval. Although CodeV has undergone prior fine-tuning for general-purpose code generation, its performance remains lower than our fine-tuned CodeGen2.5 (7B).

These findings highlight the effectiveness of our dataset in enhancing LLMs’ capability to generate syntactically and functionally accurate RTL code.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://zeju.gitbook.io/lcm-team/deepcircuitx/tasks-experiments-and-results/llm-finetune-results.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
