The Rise of Local LLMs – what to look out for in 2024

During the holidays, as I took a short break from building AI at Zefort, I found myself reflecting on the whirlwind year that was 2023 in the world of Large Language Models (LLMs) – the underlying technology that enables generative AI solutions, such as ChatGPT.

As we now set our sights on the year ahead, I want to share some thoughts that I think will shed light on an important trajectory in 2024, especially for anyone interested in privacy and customized AI solutions.

Looking back a year ago, ChatGPT had just been launched and we were busy unpacking its implications. Come March 2023, OpenAI introduced GPT-4 to the public, and it seemed a certainty that the AI revolution would manifest in ever larger models, guarded behind APIs and veiled in growing secrecy, which would lead to a monopolization of AI by a few big industry players.

“The real revolution did not so much come from ever larger, more cognitively powerful models. Instead, we saw an outpouring of LLMs released openly by various labs by the month, sometimes even more often. The tide had unexpectedly turned towards openness.”

However, a look back at 2023 reveals a different story. The real revolution did not so much come from ever larger, more cognitively powerful models. Instead, we saw an outpouring of LLMs released openly by various labs by the month, sometimes even more often. The tide had unexpectedly turned towards openness.

When OpenAI launched GPT-3 in 2020, they set a precedent by not releasing the actual model for download, as was customary at the time, but rather make it available only to vetted users through an API, in the name of responsibility and safety. Ever since then, big tech companies have shown less readiness in making generative models openly available. Meanwhile, there has been much criticism that not being able to download and study models directly is prohibitive to AI safety research.

In February of 2023, in the middle of the ChatGPT-focused frenzy, Meta released a model called LLaMA. It was offered for download only to researchers, presumably in an attempt to strike a balance between openness and safety concerns. Nevertheless, within days it was leaked onto the public internet, and this became the first catalyst of the open LLM revolution.

As a 65-billion-parameter model, it was considerably smaller than several previously released open LLMs, yet its performance was commendable. It outperformed many of the much larger models, such as the open BLOOM and the closed GPT-3 models. Precisely its smaller size turned out to be a game changer, because it made it much more accessible and practical to use, the smallest version being 7B parameters. The LLaMA model was swiftly followed by the Alpaca model from Stanford University and Vicuna from UC Berkeley, which were versions fine-tuned to follow instructions and dialogue. It became the foundational building block for a wide range of derivative models, sparking a community-led counter-offensive to ChatGPT.

While the LLaMA model was out in the wild and ‘open’ in the sense that it now could be downloaded and run locally, it was still licensed only for research purposes. In May and June, MosaicML took open LLMs a step further as they released the MPT models. Similar to the smaller versions of LLaMA, they were offered in accessible 7B and 30B sizes, but were permitted for commercial use. Shortly after, the Falcon models by the Technology Innovation Institute of the UAE followed their lead. These models could now be run locally not only by researchers and hobbyists, but also by businesses.

These releases were shortly followed by the release of LLaMA-2 in July, which became another turning point, as it demonstrated capabilities in the same league as ChatGPT-3.5, and, this time Meta chose to allow commercial use. By this point, it was becoming clear that the choice of running a local LLM, for instance, for the sake of privacy or customizability, did not have to mean a big trade-off on quality. While GPT-4 remained the apex model, more capable at tasks involving complex reasoning, LLaMA-2 turned out to be sufficient for many practical applications, such as answering questions based on document sources. It also supports a broad set of languages, which currently sets it apart from most other open models.

In September, a new kid on the block, MistralAI, a French start-up, released a 7B model that was able to outperform much larger models, including the 34B version of LLaMA-2. By the end of the year, they again surprised with the release of the 56B Mixtral model, open for commercial use. Despite being medium size, it was able to beat GPT-3.5 and rival GPT-4 on a range of evaluation benchmarks. Mixtral supports 5 major European languages, which is broader than most models, but still less than LLaMA covers.

Alongside all these model releases, and many more that I have not covered, there has been tremendous community efforts that aim to make LLMs more accessible and practical to tinker with. This push to democratize LLMs includes methods for model compression and efficient fine-tuning, allowing for inference and adaptation even on consumer-grade hardware. On the data front, community efforts such as OpenAssistant and ShareGPT have arisen that crowd-source datasets for instruction following, while recent research has shown that highly competitive instruction-following models can be trained also with very small datasets (e.g., Meta’s LIMA) or synthetic data generated by other LLMs (e.g., Microsoft’s Orca-2).

This indicates that neither having enormous amounts of data nor compute will provide the big players in the LLM field any guarantee of maintaining dominance, or as a Googler’s internal memo put it: “We have no moat, and neither does OpenAI”, and that open source “has been quietly eating our lunch”, referring to the fact that it iterates much faster than a single organization with a huge monolithic model can do. Neither do I think that growing model sizes alone will take us to the next level, as that has seldom been the case in the history of NLP; innovation of model architectures is generally also needed.

So, what does this mean for individuals and businesses at large?

Simply put, the accelerating community development and increasing ease of running capable LLMs locally present exciting opportunities for highly privacy-sensitive and domain-specific applications. It opens up use cases that the likes of OpenAI, Google or Anthropic cannot serve with their current business models. At Zefort, we are particularly excited about having these powerful tools at our disposal as we continue to explore document understanding and question answering for contracts with an emphasis on privacy, verifiability and ease of use in order to empower the user with AI.

“The feverish development in all directions will most likely not diminish, and we can expect to be surprised. My bet is that we will see yet more freedom in building powerful AI locally, which can connect into an emerging Web of AIs.”

Looking at 2024, I, for one, am eagerly waiting for what the year will bring, be it the release of LLaMA-3, other models, or other kinds of innovation. While OpenAI is doing their best to build an ecosystem and community around custom GPTs and the accompanying GPT store (think apps running on ChatGPT, i.e., hosted by OpenAI and behind subscription), there is a significant segment they cannot serve. The feverish development in all directions will most likely not diminish, and we can expect to be surprised. My bet is that we will see yet more freedom in building powerful AI locally, which can connect into an emerging Web of AIs. These AIs can each be more specialized and learn to leverage each other for specific purposes. The Local LLMs offer hope that the revolution will not be centralized.

For more in-depth reading, see:

samuel rönnqvist zefort

Samuel Rönnqvist is the Machine Learning Lead at Zefort, where he is heading AI research and development related to document understanding for contracts. He holds a PhD in natural language processing and has been active as a researcher in the field for more than a decade. Since an early stage, he has been working on deep learning and pioneered its use in financial stability monitoring, through text mining of news. In 2018, he started working on neural language generation at University of Turku, Finland, and Goethe University Frankfurt, Germany, collaborating on controlled news generation with the STT News Agency through the Google News Initiative, as well as working on a project that produced the first fully generated, published scientific book in collaboration with Springer Publishing.

Contract Challenges #1 – Signed contracts are easily forgotten

So, you have just signed an agreement or a contract.…

With its latest update, Zefort Sign now allows anyone creating…

Today, various Artificial Intelligence (AI) tools and services are all…

Start your free
Zefort trial
in minutes

Start now

Or talk to sales

Cookie	Duration	Description
cookielawinfo-checkbox-advertisement	1 year	Set by the GDPR Cookie Consent plugin, this cookie is used to record the user consent for the cookies in the "Advertisement" category .
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Cookie	Duration	Description
bcookie	2 years	LinkedIn sets this cookie from LinkedIn share buttons and ad tags to recognize browser ID.
lang	session	This cookie is used to store the language preferences of a user to serve up content in that stored language the next time user visit the website.
lidc	1 day	LinkedIn sets the lidc cookie to facilitate data center selection.

Cookie	Duration	Description
_ga	2 years	The _ga cookie, installed by Google Analytics, calculates visitor, session and campaign data and also keeps track of site usage for the site's analytics report. The cookie stores information anonymously and assigns a randomly generated number to recognize unique visitors.
_gat_UA-100866936-1	1 minute	A variation of the _gat cookie set by Google Analytics and Google Tag Manager to allow website owners to track visitor behaviour and measure site performance. The pattern element in the name contains the unique identity number of the account or website it relates to.
_gcl_au	3 months	Provided by Google Tag Manager to experiment advertisement efficiency of websites using their services.
_gid	1 day	Installed by Google Analytics, _gid cookie stores information on how visitors use a website, while also creating an analytics report of the website's performance. Some of the data that are collected include the number of visitors, their source, and the pages they visit anonymously.
_lfa	2 years	This cookie is set by the provider Leadfeeder to identify the IP address of devices visiting the website, in order to retarget multiple users routing from the same IP address.
CONSENT	2 years	YouTube sets this cookie via embedded youtube-videos and registers anonymous statistical data.

Cookie	Duration	Description
__adroll	1 year 1 month	This cookie is set by AdRoll to identify users across visits and devices. It is used by real-time bidding for advertisers to display relevant advertisements.
__adroll_fpc	1 year	AdRoll sets this cookie to target users with advertisements based on their browsing behaviour.
__adroll_shared	1 year 1 month	Adroll sets this cookie to collect information on users across different websites for relevant advertising.
__ar_v4	1 year	This cookie is set under the domain DoubleClick, to place ads that point to the website in Google search results and to track conversion rates for these ads.
_fbp	3 months	This cookie is set by Facebook to display advertisements when either on Facebook or on a digital platform powered by Facebook advertising, after visiting the website.
_opt_expid	past	Set by Google Analytics, this cookie is created when running a redirect experiment. It stores the experiment ID, the variant ID and the referrer to the page that is being redirected.
anj	3 months	AppNexus sets the anj cookie that contains data stating whether a cookie ID is synced with partners.
B	1 year	This Cookie is used by Yahoo to anonymously store data related to user's visits, such as the number of visits, average time spent on the website and what pages have been loaded. This data helps to customize website content to enhance user experience.
bscookie	2 years	This cookie is a browser ID cookie set by Linked share Buttons and ad tags.
c	1 year	This cookie is set by Rubicon Project to control synchronization of user identification and exchange of user data between various ad services.
fr	3 months	Facebook sets this cookie to show relevant advertisements to users by tracking user behaviour across the web, on sites that have Facebook pixel or Facebook social plugin.
i	1 year	This cookie is set by OpenX to record anonymized user data, such as IP address, geographical location, websites visited, ads clicked by the user etc., for relevant advertising.
IDE	1 year 24 days	Google DoubleClick IDE cookies are used to store information about how the user uses the website to present them with relevant ads and according to the user profile.
test_cookie	15 minutes	The test_cookie is set by doubleclick.net and is used to determine if the user's browser supports cookies.
tuuid	1 year	The tuuid cookie, set by BidSwitch, stores an unique ID to determine what adverts the users have seen if they have visited any of the advertiser's websites. The information is used to decide when and how often users will see a certain banner.
tuuid_lu	1 year	This cookie, set by BidSwitch, stores a unique ID to determine what adverts the users have seen while visiting an advertiser's website. This information is then used to understand when and how often users will see a certain banner.
uuid2	3 months	The uuid2 cookie is set by AppNexus and records information that helps in differentiating between devices and browsers. This information is used to pick out ads delivered by the platform and assess the ad performance and its attribute payment.
VISITOR_INFO1_LIVE	5 months 27 days	A cookie set by YouTube to measure bandwidth that determines whether the user gets the new or old player interface.
YSC	session	YSC cookie is set by Youtube and is used to track the views of embedded videos on Youtube pages.
yt-remote-connected-devices	never	YouTube sets this cookie to store the video preferences of the user using embedded YouTube video.
yt-remote-device-id	never	YouTube sets this cookie to store the video preferences of the user using embedded YouTube video.

Cookie	Duration	Description
_gaexp_rc	past	No description available.
_lfa_test_cookie_stored	past	No description
_te_	session	No description
1e5a17c8ab	session	No description available.
A3	1 year	No description
AnalyticsSyncHistory	1 month	No description
li_gc	2 years	No description
UserMatchHistory	1 month	Linkedin - Used to track visitors on multiple websites, in order to present relevant advertisement based on the visitor's preferences.

See Zefort in action

The Rise of Local LLMs – what to look out for in 2024

Read next

Contract Challenges #1 – Signed contracts are easily forgotten

Zefort Sign update: Introducing handwritten signatures and signature stamps

ChatGPT and privacy – what you need to know

Start your free
Zefort trial
in minutes

Start 14-days trial

See Zefort in action

The Rise of Local LLMs – what to look out for in 2024

Read next

Contract Challenges #1 – Signed contracts are easily forgotten

Zefort Sign update: Introducing handwritten signatures and signature stamps

ChatGPT and privacy – what you need to know

Start your free Zefort trial in minutes

Start 14-days trial

Start your free
Zefort trial
in minutes