SLM series - Finastra: Counting clever to cost an AI model
This is a guest post for the Computer Weekly Developer Network written by Adam Lieberman in his role as chief AI officer at Finastra.
Finastra is a global financial software company that provides applications and marketplaces for banks, credit unions and savings institutions. Finastra’s software helps financial institutions with lending, payments, treasury, capital markets and universal banking.
Lieberman writes in full as follows…
Whether choosing an LLM or an SLM, the fundamentals of selecting models must take precedence. An LLM trained on poor-quality data, for example, will perform worse than an SLM trained on high-quality data, which is why it’s crucial to experiment with different offerings before committing to one.
Most models today are open source and allow users to experiment and test use cases. If performance requirements are met with an SLM, there’s no need to fork out for an LLM. Starting small is always a good idea, as this leaves the potential for upgrading to an LLM if needed.
Sort out, scale up
A key consideration for selecting either option is whether the solution will need to scale up, or if the use case is specific enough to be contained. For example, a customer services chatbot will likely not need to produce wildly different responses from one month to the next. To ensure the right choice is made, a robust discovery phase in which the evolution of the use case is fully considered should be prioritised. This will determine the technical and financial constraints that will ultimately determine whether an SLM or an LLM is the right choice.
But as we know, SLMs are considerably less expensive to train and maintain than LLMs, making them accessible for smaller enterprises or specific departments within larger organisations.
Offering faster response times and lower latency, SLMs can be ideal for applications requiring real-time processing, such as chatbots and interactive systems. SLMs can be tailored for specific use cases, allowing organisations to achieve high performance with less training data. This adaptability makes them suitable for niche applications like sentiment analysis, language translation and customer service automation.
SLMs can be easily fine-tuned with domain-specific data, allowing them to adapt quickly to changing requirements or new information. This agility facilitates rapid development cycles and ensures that the models remain effective over time.
What is ‘adoptable’ AI?
In terms of deeper analysis, SLMs that underpin sentiment analysis use cases, such as chatbots, are quickly emerging as the most readily adoptable AI-powered applications. Chatbots can be used internally for knowledge discovery and Q&A tasks, or externally for customer services use cases. These are often domain-specific so will require some level of fine-tuning, but once they are up and running, they will quickly transform workflows and increase efficiency.
SLMs can also be used for language translation services, helping to bridge linguistic gaps during international communications and interactions; as well as for sentiment analysis, for example gauging public opinion and customer sentiment and feedback, crucial for adjusting marketing strategies and improving product offerings.
A particularly useful application powered by SLMs is mixture-of-experts (MoE) models, such as Mistral’s Mixtral 8x7b.
As the name suggests, this model is comprised of eight models, each being made up of 7-billion parameters. In essence, this collection of models works together and can often outperform larger models. Although it may seem like running eight SLMs will incur similar running costs to an LLM, Mixtral 8x7b’s will usually only require two of the SLMs to be in use at one time.
A use case like clustering documents, such as grouping customer support tickets by topics and assigning a priority level to each one is well served by an SLM. However, for more intricate tasks, such as parsing HR documents for niche information or a more advanced classification engine for documents and files across systems, an LLM is the more appropriate choice. This is because the context window—i.e. the amount of information surfaced by the model by a user’s prompt – provided by SLMs is generally much smaller.
Fancy footwork on footprints
SLMs can be deployed on-premises or in private cloud environments, which enhances data security and privacy. This is particularly advantageous for industries like finance and healthcare that handle sensitive information. Their smaller footprint reduces the risk of data leaks compared to larger models that may operate in less secure environments.
While small language models offer distinct advantages such as cost-effectiveness and efficiency for specific tasks, their limitations – particularly regarding generalisation, complexity handling, data quality dependence, scalability, customisation challenges, evaluation difficulties and text fluency – should be carefully considered by organisations looking to leverage AI technology effectively.
With a smaller knowledge base, SLMs are far more likely than LLMs to produce inaccurate guesses at the answers needed. This immediately rules them out of more sensitive applications, such as medical diagnostics, engineering and financial services use cases – which are also still at fairly early stages of development.
Perhaps the most exciting field of AI today, with many emerging use cases, is that of multimodal models in which diverse types of data, such as text, images and video can be processed simultaneously, moving us closer to mimicking the capabilities of a human brain.
Currently, SLMs are not powerful enough for the more advanced multimodal use cases, such as video generation, as their “brains” are not big enough to handle such complex tasks. LLMs will therefore be at the forefront of AI-led innovation, but SLMs will likely deliver the most immediate business value.