QoR Dataset for LLM Fine-tune

We conducted a cleaning process on our extensive dataset of all the HLS designs, from which we compiled a diverse set of data entries suitable for fine-tuning LLMs for QoR tasks. We randomly sampled the data, allocating 90% for the training set and 10% for the testing set.

Next, we construct input-output pairs in the format:

 
 {
 "input": "code with HLS pragma", 
 "output": {"lut": lut, "dsp": dsp, ...}
 }.

This setup enables the LLMs to directly predict QoR metrics.

The original model exhibits high QoR prediction errors on the dataset. After providing LLMs with relevant HLS data, their generalization ability improves significantly, demonstrating strong performance on our heterogeneous dataset.

Last updated