Home Artificial Intelligence The Complexities and Challenges of Integrating LLMs into Applications

The Complexities and Challenges of Integrating LLMs into Applications

0
The Complexities and Challenges of Integrating LLMs into Applications

Planning to integrate some LLM service into your code? Listed below are a number of the common challenges you need to expect when doing so

Towards Data Science
Photo by Christina @ wocintechchat.com on Unsplash

Large Language Models (LLMs) existed before OpenAI’s ChatGPT and GPT API were released. But, because of OpenAI’s efforts, GPT is now easily accessible to developers and non-developers. This launch has undoubtedly played a big role within the recent resurgence of AI.

It is actually remarkable how quickly OpenAI’s GPT API was embraced inside just six months of its launch. Virtually every SaaS service has incorporated it in some option to increase its users’ productivity.

Nevertheless, only those that have accomplished the design and integration work of such APIs, genuinely understand the complexities and latest challenges that arise from it.

Over the previous couple of months, I actually have implemented several features that utilize OpenAI’s GPT API. Throughout this process, I actually have faced several challenges that appear common for anyone utilizing the GPT API or every other LLM API. By listing them out here, I hope to assist engineering teams properly prepare and design their LLM-based features.

Let’s take a have a look at a number of the typical obstacles.

Contextual Memory and Context Limitations

This might be probably the most common challenge of all. The context for the LLM input is proscribed. Only in the near past, OpenAI released context support for 16K tokens, and in GPT-4 the context limitation can reach 32K, which is a very good couple of pages (for instance for those who want the LLM to work on a big document holding a few pages). But there are a lot of cases where you would like greater than that, especially when working with quite a few documents, each having tens of pages (imagine a legal-tech company that should process tens of legal documents to extract answers using LLM).

There are different techniques to beat this challenge, and others are emerging, but this could mean you could implement a number of of those techniques yourself. Yet one more load of labor to implement, test and maintain.

Data Enrichment

Your LLM-based features likely take some form of proprietary data as input. Whether you’re inputting user data as a part of the context or using other collected data or documents that you just store, you would like a straightforward mechanism that can abstract the calls of fetching data from the assorted data sources that you just own.

Templating

The prompt you undergo the LLM will contain hard-coded text and data from other data sources. Because of this you’ll create a static template and dynamically fill within the blanks with data that ought to be a part of the prompt in run-time. In other words, you’ll create templates in your prompts and sure have a couple of.

Because of this you have to be using some sort of templating framework because you almost certainly don’t want your code to appear like a bunch of string concatenations.

This will not be a giant challenge but one other task that ought to be considered.

Testing and Superb-tuning

Getting the LLM to succeed in a satisfactory level of accuracy requires lots of testing (sometimes it’s just prompt engineering with lots of trial and error) and fine-tuning based on user feedback.

There are after all also tests that run as a part of the CI to say that every one integration work properly but that’s not the true challenge.

Once I say Testing, I’m talking about running the prompt repeatedly in a sandbox to fine-tune the outcomes for accuracy.

For testing, you’d want a way by which the testing engineer could change the templates, enrich them with the required data, and execute the prompt with the LLM to check that we’re getting what we wanted. How do you arrange such a testing framework?

As well as, we want to consistently fine-tune the LLM model by getting feedback from our users regarding the LLM outputs. How can we arrange such a process?

Caching

LLM models, reminiscent of OpenAI’s GPT, have a parameter to regulate the randomness of answers, allowing the AI to be more creative. Yet for those who are handling requests on a big scale, you’ll incur high charges on the API calls, chances are you’ll hit rate limits, and your app performance might degrade. If some inputs to the LLM repeat themselves in several calls, chances are you’ll consider caching the reply. For instance, you handle 100K’s calls to your LLM-based feature. If all those calls trigger an API call to the LLM provider, then costs will likely be very high. Still, if inputs repeat themselves (this may potentially occur whenever you use templates and feed it with specific user fields), there’s a high likelihood which you can save a number of the pre-processed LLM output and serve it from the cache.

The challenge here is constructing a caching mechanism for that. It will not be hard to implement that; it just adds one other layer and moving part that should be maintained and done properly.

Security and Compliance

Security and privacy are perhaps probably the most difficult elements of this process — how can we be certain that the method created doesn’t cause data leakage and the way can we be certain that no PII is revealed?

As well as, you will want to audit all of your actions so that every one the actions could be examined to be certain that no data leak or privacy policy infringement happened.

It is a common challenge for any software company that relies on third party services, and it must be addressed here as well.

Observability

As with all external API you’re using, you could monitor its performance. Are there any errors? How long does the processing take? Are we exceeding or about to exceed the API’s rate limits or thresholds?

As well as, it would be best to log all calls, not only for security audit purposes but in addition to show you how to fine-tune your LLM workflow or prompts by grading the outputs.

Workflow Management

Let’s say we develop a legal-tech software that lawyers use to extend productivity. In our example, we have now an LLM-based feature that takes a client’s details from a CRM system and the final description of the case worked on, and provides a solution for the lawyer’s query based on legal precedents.

Let’s see what must be done to perform that:

  1. Look up all of the client’s details based on a given client ID.
  2. Look up all the main points of the present case being worked on.
  3. Extract the relevant info from the present case being worked on using LLM, based on the lawyer’s query.
  4. Mix all of the above info onto a predefined query template.
  5. Enrich the context with the various legal cases. (recall the Contextual Memory challenge)
  6. Have the LLM find the legal precedents that best match the present case, client, and lawyer’s query.

Now, imagine that you could have 2 or more features with such workflows, and at last try to assume what your code looks like after you implement those workflows. I bet that just eager about the work to be done here makes you progress uncomfortably in your chair.

To your code to be maintainable and readable, you will want to implement various layers of abstraction and maybe consider adopting or implementing some form of workflow management framework, for those who foresee more workflows in the long run.

And at last, this instance brings us to the subsequent challenge:

Strong Code Coupling

Now that you just are aware of all of the above challenges and the complexities that arise, chances are you’ll start seeing that a number of the tasks that have to be done mustn’t be the developer’s responsibility.

Specifically, all of the tasks related to constructing workflows, testing, fine-tuning, monitoring the outcomes and external API usage could be done by someone more dedicated to those tasks and whose expertise will not be constructing software. Let’s call this persona the LLM engineer.

There’s no reason why the LLM workflows, testing, fine-tuning, and so forth, can be placed within the software developer’s responsibility — software developers are experts at constructing software. At the identical time, LLM engineers ought to be experts at constructing and fine-tuning the LLM workflows, not constructing software.

But with the present frameworks, the LLM workflow management is coupled into the codebase. Whoever is constructing these workflows must have the expertise of a software developer and an LLM engineer.

There are methods to do the decoupling, reminiscent of making a dedicate micro-service that handles all workflows, but that is yet one more challenge that should be handled.

LEAVE A REPLY

Please enter your comment!
Please enter your name here