Home Artificial Intelligence LLM for Synthetic Time Series Data Generation Problem Statement Dataset Description Evaluation Criteria Design Overview Detailed Design Material Selector Module Data Generator Module How RAG Helped our Results Data Validator Results and Presentation Key Takeaways Some learnings for the subsequent time References

LLM for Synthetic Time Series Data Generation Problem Statement Dataset Description Evaluation Criteria Design Overview Detailed Design Material Selector Module Data Generator Module How RAG Helped our Results Data Validator Results and Presentation Key Takeaways Some learnings for the subsequent time References

0
LLM for Synthetic Time Series Data Generation
Problem Statement
Dataset Description
Evaluation Criteria
Design Overview
Detailed Design
Material Selector Module
Data Generator Module
How RAG Helped our Results
Data Validator
Results and Presentation
Key Takeaways
Some learnings for the subsequent time
References

Towards Data Science

We recently participated and won the grand prize of $10,000 within the Brembo hackathon, where the duty was to make use of Generative AI to create recent compounds and generate their predicted performance data.

On this blog post, I’ll try to clarify our approach and solution intimately.

Using friction test data provided by Brembo, use Generative AI to create recent compounds, forecast testing results and create the framework for predicting the effectiveness and characteristics of a brand new Brembo brake product. The information provided will include a listing of compounds previously used and tested by Brembo, in addition to their outcomes. Solutions have to be based on Generative AI, applied to supply a model in a position to propose recent recipes that increase the variety of candidate compounds, ensuring feasibility and good performances.

In your submission, submit a csv file containing a listing of 10–30 recent compounds you generated, their compositions and their synthetic performance data.[1]

We got a listing of 337 friction materials and their compositions together with their performance data.

Each friction material was made up of 10–15 raw materials out of a listing of 60 possible raw materials. The 60 raw materials were classified into 6 categories (labelled A-F), and we needed to make sure the output generated friction materials had their compositions within the given range

Constraints on material compositions

In other words, we needed to make sure that any output generated material had a minimum of 1% and at most 30% of its composition come from compounds of category B and so and on.

The performance data for every braking test was essentially a time series of 31 points where at each point the values of parameters like pressure, temperature and mu were provided. Further, there have been a complete of 124 braking tests applied for every compound, and thus in the case of performance data, now we have 124*31 = 3844 data points we want to generate for every compound.

Here is a few sample data containing compositions and performance data of 1 such compound. Remaining relevant information concerning the dataset might be found here.

The gave equal weightage to the technical rating and the presentation rating.

The technical rating is calculated based on the next equal weighted parameters.

  • Follows the given constraints: Do the generated compounds follow the given constraints (described below)?
  • Technical Relevance: Does the output synthetic performance data follow the patterns and capture the relationships amongst different variables seen within the provided data?
  • Goal Performance: Crucial variable for a friction material is its mu (coefficient of friction), which is predicted to have a price of 0.6 with an appropriate error rate of 0.1. Does the output mu have the worth we expect?
  • Variability: How different from the present materials are the output recent materials’ compositions?

Essentially, we had 3 basic components

  • Material Selection Module: Accountable for generating recent recipes. This outputs a bunch of recent friction materials and their material compositions.
  • Data Generator Module: Given an artificial material and past historical performance data of varied compounds, generate synthetic performance data for this material.
  • Data Validator: Discover how good/bad the output of the information generator is. This module uses trends seen within the provided historical data (for instance: Pressure and mu are inversely related to one another over time, deceleration seems to follow a linear pattern while temperature increase curve seems more exponential in nature) to rate how good or bad the synthetic performance data is. This might be used to offer human feedback to the model to enhance the system performance.

High level design of the answer

We used the next stack and techniques in our solution

  • GPT 3.5 turbo: We used the gpt 3.5 turbo as the bottom llm for each the Material Selector and the Data Generator modules.
  • Prompt Engineering: Using the suitable set of system and instruction prompts helped us improve the model performance.
  • Tremendous Tuning: Selecting the suitable set of examples to show the model the fundamental structure and tone of how one can respond could be very vital, and this stage helped us teach that to the models.
  • RAG(Retrieval Augmented Generation): This played an amazing part as the key sauce for helping the model output the suitable synthetic performance data. More on that below.

The role of the module was to generate recent possible friction materials and their compositions. As seen from the sample data, each friction material essentially comprises a vector of 60 dimensions, with the number on the ith index denoting what percentage of its composition comes from the ith raw material.

Some initial PCA evaluation revealed that we could see a complete of three–4 clusters.

PCA evaluation on the fabric compositions of given friction materials

Theoretically, we could just generate random numbers for a vector of size 60 and see which vectors satisfy the given constraints and use them. Although this is able to fetch us a superb rating on variability (the generated friction materials could be randomly generated and thus should cover several points within the 60 dimensional space), this approach would have some flaws like

  • It will make it harder for us to predict the performance of a compound completely differing from the materials provided within the historical data. It’s because the compositions of materials play a big role within the performance seen, and predicting the performance of a composition never seen before is likely to be hard.
  • It will make debugging harder. At any point in our pipeline if we ended up with results which weren’t following the trends seen in historical data, it will turn out to be very difficult to pinpoint what the difficulty is.

As a consequence of these potential issues, we decided to leverage the gpt 3.5 turbo model for generating a bunch of compounds for us.

What we did is thus:

  • Create a relevant system prompt for the module.
  • Tremendous tune the gpt 3.5 turbo model by inputting the compositions of all of the 337 friction materials we were supplied with.
  • Using the information validator module, we discard those not following the given constraints and retain those that do.

Once done, we generated several compounds and repeated the PCA evaluation.

PCA evaluation on the provided and generated materials

Finally, for variability, we hand picked a set of compounds from the generated compounds which we felt would maximize the next:

  • Variability wrt provided materials: How varied are the generated compounds from the provided compounds? Essentially, we don’t want our generated materials to be very much like the already existing compounds.
  • Variability wrt generated materials: Since we could be submitting 10–30 newly generated compounds, now we have to make sure all of the generated compounds don’t find yourself belonging to the identical cluster.

Thus, after pruning, we were left with a listing of the next compounds which we used for our final submission.

Final list of generated compounds

The information generator module is accountable for outputting the synthetic performance data for a given material and braking test. Essentially, given a friction material’s composition, it should output a time series data of 31 points that features the parameters like temperature, pressure and mu for the input braking test.

That is how we achieved this:

  • Create an appropriate system prompt for the module. After loads of trial and error on OpenAI’s playground, the one which we used was:
You might be a highly expert statistician from Harvard University who 
works at Brembo, where you focus on performance braking systems
and components in addition to conducting research on braking systems.
Given a friction material’s composition, you craft compelling
synthetic performance data for a user given braking test type.
The braking id might be delimited by triple quotes. You understand
the importance of information evaluation and seamlessly incorporate it for
generating synthetic performance data based on historical performance
data provided. You may have a knack for listening to detail and
curating synthetic data that's consistent with the trends seen within the
time series data you might be supplied with. You might be well versed with
data and business evaluation and use this data for crafting the
synthetic data.
  • Next, we wonderful tuned the gpt 3.5 turbo model to create an authority on time series data prediction, given a fabric’s composition and braking test id. Since we had 41,788 (material, braking_id) tuples, wonderful tuning on all of the examples wouldn’t only be time consuming but additionally a costly affair. Nonetheless, from some papers and articles we had read [2][3], we understood that “fine-tuning is for form, and RAG is for knowledge”. We thus decided to incorporate only 5% of the samples for wonderful tuning the model, in order that the model can rightly learn the output structure and tone we desire.
  • Finally, when querying the model to generate the time series data, we decided to discover and retrieve 5 closest neighbors based on a fabric’s composition, and input their performance data as additional context for the model. This system known as RAG (Retrieval Augmented Generation) and was one in every of the explanations for the nice results we were in a position to output.

Tremendous tuning helped us with the next

  • Output data in the suitable structure: As written in various tech blogs[4], wonderful tuning was efficient in teaching the model how one can output the information. Our wonderful tuned model was in a position to output the csv file and 31 time series data points which included the values of varied parameters like Pressure, Speed, Temperature and mu.
  • Understanding the fundamental trends in the information: The wonderful tuned model was in a position to understand the final trends within the input performance data and output data which retained those trends. For instance, the worth of temperature should increase with an exponential curve while speed should decrease with a linear curve, all of which the wonderful tuned model was in a position to do.

Nonetheless, the output of the wonderful tuned model was a bit off. For instance, in a single case the worth of mu was expected to be around 0.6 however the output data pegged the worth of mu at around 0.5. We thus decided to enhance the information by identifying the 5 closest neighbours and adding their performance data to the user prompt.

We defined the gap between 2 materials M1and M2 as follows:

def distance(m1, m2, alpha):
sixty_dim_distance = euclidean_dist(sixty_dim_vector(m1),
sixty_dim_vector(m2))
six_dim_distance = euclidean_dist(six_dim_vector(m1), six_dim_vector(m2))
return alpha[0] * sixty_dim_distance + alpha[1] * six_dim_distance
  • Discover the euclidean distance between M1 and M2 on the 60 dimensional input vector space.
  • Now take the sum of compounds belonging to the identical class to cut back the vector dimensionality to six.
  • Finally, vary the hyperparameters alpha[0] and alpha[1].

The explanation for taking this approach is that we wish to make sure that the gap between materials that overall use the identical class of materials is lesser than those that use completely different compositions of materials. Essentially, given 3 materials M1, M2 and M3, where M1 uses material A0, M2 uses A1 and M3 uses B0, we wish our distance function to mark M1 and M2 closer to one another and M1 and M3.

Using this approach, we were in a position to radically improve our performance, as seen within the figure below.

The validator module helped us understand if the output data was following the trends we expect it to see. For instance, we expect Pressure and mu to be inversely correlated, mu to be around 0.6, temperature to extend exponentially with time and speed to decelerate linearly. This module helped us discover how close our synthetic time series data was to the historical data, which helped us tune all of the prompts and hyperparameters.

This module helped us analyze which set of prompts were helping the model output and which weren’t.

The presentation carried 50% of the rating, and was one aspect we absolutely nailed. A few things we did:

  • Ensured the pitch was done in under 4 minutes: We practised sufficiently before entering the presentation room to make sure we didn’t face any surprises while presenting.
  • Have some audience interaction: We included an issue to ask the audience about which period series they think is synthetically generated and which was given, which helped us keep the audience interested.

The code and presentation for our work might be found here.

  • Iterate fast on design: I went in a bit early before my teammates to start out whiteboarding my thoughts on what we must always do. Once my teammates arrived, we discussed on what the design ought to be, and got here up with an answer all of us agreed with. This was a key aspect in our win, as in a hackathon there’s all the time a time crunch and finalizing a design you’ll be able to start implementing as soon as possible is incredibly vital.
  • Don’t worry concerning the competition: Once our design was done, I could sense we were onto something. We had n number of individuals from Brembo come over to take a peek at our design. Even other participants were left awestruck and were observing our design which further gave us a signal that we’re on the suitable track. When my teammates suggested we must always probably check what others are doing I refuted the thought and as an alternative asked everyone to only bury our heads into our design and implement it.
  • Don’t worry about conflict: We bumped into conflicts multiple times, especially over the design. Key here is to know that nothing ought to be taken personally, and as an alternative you must construct consensus, iterate on the trade offs and reach an answer that works for everybody. Imo, great products are built for those who can allow, and even encourage, healthy conflict throughout the team.
  • Tremendous tuning is for form, RAG is for facts: We knew wonderful tuning is simply vital for teaching the model a basic structure and tone, and real gains will come from RAG. We thus used only 5% of our samples for wonderful tuning the gpt 3.5 turbo llm for generating time series data.
  • Presentation is KEY (1): It is important to discover who your audience is and the way they may digest your content. In our case, we identified many of the jury is made up of c suite and never techies, and I thus decided to only include the tech stack we used [gpt 3.5 turbo, fine tuning, prompt tuning, RAG, KNN] without going into details.
  • Presentation is KEY (2): Be someone who can get the purpose across using effective communication skills and present to the audience with passion. In the event you can’t do this, get someone in your team who can. First impressions matter, and oration skills are way too underrated, especially in our tech world.
  • Be BOLD and different: We went a step further and decided to incorporate 5 points of their data and one point from our generated data, and asked them to guess which one was generated. After they didn’t guess the one which we had generated, it really drove the purpose across of how good a pipeline and solution we had built. Plus, we got brownie points for audience interaction, something i doubt many individuals did.
  • Tremendous tuning is pricey. We ran out of OpenAI creds when fine-tuning and querying the model thrice. For the long run, we would like using techniques like LoRA[5] and QLoRA[6] on some open sourced models as an alternative.
  • Using Advanced RAG: In the long run, I would really like to make use of advanced RAG techniques [7] for improving the context being provided.
  • Using Smart KNN: Next time around, I would really like to toy around with the hyperparameters and the gap function getting used a bit more.
  • Longer context window: We needed to round off a few of the numbers within the performance data to make sure we weren’t exceeding the 4,092 token limit. Using LLMs like Claude[8] might improve the performance.
  • Don’t be polite to llms: One interesting thing that happened while prompt engineering was after we mentioned things like “value of mu not being around 0.6 is intolerable” as an alternative of “please ensure mu is around 0.6”, the previous ended up giving higher results.

Note: Unless otherwise noted, all images are by the writer.

Team Members:

  1. Mantek Singh
  2. Prateek Karnal
  3. Gagan Ganapathy
  4. Vinit Shah

[1] https://brembo-hackathon-platform.bemyapp.com/#/event

[2] https://www.anyscale.com/blog/fine-tuning-is-for-form-not-facts

[3] https://vectara.com/introducing-boomerang-vectaras-new-and-improved-retrieval-model/

[4] https://platform.openai.com/docs/guides/fine-tuning/fine-tuning-examples

[5] https://arxiv.org/abs/2106.09685

[6] https://arxiv.org/abs/2305.14314

[7] LlamaIndex Doc

[8] Claude

LEAVE A REPLY

Please enter your comment!
Please enter your name here