SLM series - DataStax: Inside the ‘moving parts’ of AI
This is a guest post for the Computer Weekly Developer Network written by Dom Couldwell in his capacity as head of field engineering EMEA, DataStax.
Couldwell writes in full as follows…
Building a generative AI app requires a lot of new moving parts, from the Large Language Models (LLMs) that create content responses through to the vector search, data management and database installs that add relevant data for models to use.
It was taken for granted that LLMs were necessary at the heart of these systems and that the bigger the model, the better responses you could generate. That took more money and more training data.
The launch of DeepSeek blew that argument out of the water.
Now, a different language model could deliver the same level of results at a far lower price, both computationally and in terms of cost. While DeepSeek has security problems and trust hurdles to overcome, it encouraged a lot of developers to reconsider their approach to AI models.
Small Language Models (SLMs) are trained using smaller and more specialised data sets and they are designed to cover a specific task. They cost less to create in the first place and smaller amounts of resources to operate. This means that companies can consider building their own SLMs, whereas a LLM is out of reach for all companies except those with the very deepest pockets.
The SLM option
The advent of SLMs means that you now have more options to consider in how you design your application. The first services based on LLMs leveraged their initial training data but found it difficult to keep the model up to date with the latest information.
In response, developers could improve the relevancy of responses by using vector data stores and retrieval augmented generation, or RAG, to provide more semantically similar material that the LLM can use in putting together a response. This got around the cost to retrain a LLM.
For SLMs, retraining is much cheaper so it is an option to keep the model up to date. At the same time, you can use SLMs with RAG as a complementary approach. Depending on the data you need, you can even set up pipelines to stream data in, convert to vectors and then feed into the SLM for more personalised and timely results.
For organisations that need to retain full control over their environments, setting up your own SLM is now a valid option. LLMs created by other organisations will always be beholden to their licensing – even Meta’s LLaMa is not fully open source and there are restrictions on how it can be used – creating your own SLM that is under your control is a valid option.
Agentic AI & a hybrid approach
Alongside SLMs as technical options, there is also the ‘next big thing’ of Agentic AI to consider.
Agentic AI involves turning a workflow around a specific goal into a set of processes and then using one or more AI agents as components to fulfil that requirement with less direct human interaction.
Where full LLMs might be overkill for components in an Agentic deployment, SLMs could be used to cover those specific tasks. Rather than trying to use the same set of components to fulfil the process, applying a best of breed approach and using specific SLMs can be a better response.
In reality, a hybrid approach will probably win out, with SLMs and LLMs used alongside each other to deliver the most relevant results at a given cost and compute level. To achieve this, we will need to look at how we define the “best” results in a given process. Generative AI services are ‘non-deterministic’ in that you will get a different form of response every time you ask, even when you have the right data included in that response.
Developer implications
What will this mean for developers in practice?
Testing SLMs alongside LLMs and other generative AI components will be a challenge. To get the most accurate insight into how things perform will involve running tests where one component – this SLM or that LLM, say – will be switched out while all the other components, data sets and weights remain the same. This ‘compare and contrast’ approach will be time consuming if you have to rebuild your application every time.
What developers will do is look at the frameworks that they have in place to implement generative AI applications. This will make it easier to plug in different components for running tests, as well as taking care of the integrations between components over time.
Open source projects like Langflow connect generative AI application components to each other using their APIs, so swapping out one LLM or SLM for another involves pointing to a different set of APIs rather than rebuilding the application completely. As a benefit, this makes it easier for developers to concentrate on the business logic side, rather than maintaining integrations over time.