Pragma Insertion Dataset for LLM Fine-tune
Our dataset has a large number of C++ kernels without the HLS pragma, and corresponding designs with pragma. It is not difficult to envision the potential for utilizing our HLS data to training an automatic pragma inserter based on LLM.
{
"instruction": "optimize for low resource usage.",
"input": origin code,
"output": code with HLS pragma consuming low resource usage"
}.
Out of the LLM Synthetic Code designs, we use 90% as the training set and reserve the remaining 10% as the test set. We retain the Pareto-optimal points from the resulting two-dimensional latency-resource design space.
Then we divide the designs on Pareto curve into three categories: high, medium, and low resource usage. The top one-third of the points in terms of ARU are classified as high, the middle one-third as medium, and the remaining one-third as low. Next, we construct training dataset for the three categories. By doing so, our trained pragma inserter can generate Pareto-optimal pragma designs for varying resource.

Last updated