VentureBeat and other experts have argued that open-source large language models (LLMs) may have a more powerful impact on generative AI in the enterprise.
More powerful, that is, than closed models, like the ones behind OpenAI’s popular ChatGPT, or competitor Anthropic.
But that’s been hard to prove when you consider examples of actual deployments. While there’s a ton of experimentation, or proofs of concept, going on with open-source models, relatively few established companies have announced publicly that they have deployed open-source models in real business applications.
So we decided to contact the major open source LLM providers, to find examples of actual deployments by enterprise companies. We reached out to Meta and Mistral AI, two of the major providers of open-source providers, and to IBM, Hugging Face, Dell, Databricks, AWS and Microsoft, all of which have agreements to distribute open-source models.
From interviews with these companies, it turns out that several initial public examples exist (we found 16 namable cases, see list below), but it’s still very early. Industry observers say the number of cases will pick up strongly later this year.
Delays to the open-source LLM feedback loop
One reason is that open source was slow off the starting block. Meta released the first major open-source model, Llama, in Feb 2023, three months after OpenAI released its ChatGPT model publicly in November 2022. And Mistral AI released Mixtral, the top performing open source LLM according to many benchmarks, in December 2023, so just one month ago.
So it follows that examples of deployment are only now emerging. Open-source advocates agree there are many more examples of closed-model deployments, but it’s only a matter of time before open-source catches up with the closed-source models.
There are some limitations with the open-source models in circulation today. Amjad Masad, CEO of a software tool startup Replit, kicked off a popular Twitter thread about how the feedback loop isn’t working properly because you can’t contribute easily to model development.
But it’s also true that people may have underestimated how much experimentation would happen with open-source models. Open-source developers have created thousands of derivatives of models like Llama, including increasingly, mixing models – and they are steadily achieving parity with, or even superiority over closed models on certain metrics (see examples like FinGPT, BioBert, Defog SQLCoder, and Phind).
Large public models by themselves have “little to no value” for enterprise
Matt Baker, SVP of AI Strategy at Dell, which has partnered with Meta to help bring Llama 2 open-source AI to enterprise users, is blunt about the close-model limitations: “Large public models on their own have little to no value to offer private companies,” Baker said. He said they’ve become bloated by trying to offer a very generally competent model, but they don’t allow enterprise users to access their own data easily. About 95 percent of the AI work performed by organizations, Baker estimates, is on the workflow needed to infuse the models with that data through techniques like retrieval augmented generation (RAG). And even then, RAG isn’t always reliable. “A lot of customer are asking themselves: Wait a second, why am I paying for super large model that knows very little about my business? Couldn’t I just use one of these open-source models, and by the way, maybe use a much smaller, open-source model for that (information retrieval) workflow?”
Many enterprise companies are building, and experimenting with, open source-based customer support and code generation applications to interact with their own custom code, which sometimes is not understandable to the general closed-model LLMs built by OpenAI or Anthropic, Baker said. Those companies have prioritized Python and other popular cloud languages at the expense of supporting legacy enterprise code.
Other reasons why open-source LLMs deployments are slow off the start line
Hugging Face is arguably the biggest provider of open-source LLM infrastructure, and hundreds of thousands of developers have been downloading LLMs and other open-source tools, including libraries and frameworks like LangChain and LlamaIndex, to cook up their own applications. Andrew Jardine, an exec at Hugging Face responsible for advising companies looking to use open-source LLMs, said that enterprise companies take a while to move forward with LLM applications because they know they first need to consider implications for data privacy, customer experience, and ethics. Companies typically start with use cases they can use internally with their own employees, and deploy those only after doing a proof-of-concept. And only then do most companies start looking at external use cases, where again they go through a proof-of-concept stage. Only at the end of 2023, he says, were OpenAI’s closed-model deployments emerging in bigger numbers, and so he expects open-source deployments to emerge this year.
Still, others say that enterprise companies should stay away from open source because it can be too much work. Calling an API from OpenAI, which also provides on-demand cloud services and indemnification, is so much easier than having to work the headache of support licensing and other governance challenges of using open source, they say. Also, GPT models do reasonably well across languages, while opens source LLMs are hit and miss.
The dichotomy between open versus closed models is is increasingly a false one, Hugging Face’s Jardine said: “The reality is, most people are going to be using both open and closed.” He mentioned a big pharma company he talked with recently that was using a closed LLM for its internal chat bot, but using Llama for the same use case but do things like flagging messages that had personally identifiable information. It did this because open source gave the company more control over the data. The company was concerned that if closed-model LLMs interacted with sensitive data, that data could be sent back to the closed-model provider, Jardine said.
Reasons open source will catch up
Other model changes, including around cost, and specialization, are happening so quickly that most companies will want to be able to switch between different open and closed models as they see fit, and realize that relying on only one model leaves them open to risk. For example, a company’s customers could be impacted negatively, Jardine said, if a model provider suddenly updated a model unexpectedly, or worse, failed to update a model to stay up with the times. Companies often choose the open source route, he said, when they’re concerned about controlling access to their data, but also when they want more control over the fine-tuning of a model for specialized purposes. “You can do fine-tuning of the model using your own data to make it a better fit for you,” Jardine said.
We found several companies, like Intuit and Perplexity, which like the pharma company mentioned above, want to use multiple models in a single application so that they can pick and choose LLMs that are advantageous for specific sub-tasks. These companies have built generative AI “orchestration layers” to do this autonomously, by calling the best model for the task that is being accomplished, be it open or closed.
Also, while it can be more cumbersome initially to deploy an open-source model, if you are running a model at scale, you can save money with open-source models, especially if you have access to your own infrastructure. “In the long term, I think it’s likely that open source will be more cost effective, simply because you’re not paying for this additional cost of IP and development,” Jardine said.
He said he’s aware of several global pharma and other tech companies deploying open-source models in applications, but they are doing so quietly. Closed-model companies Anthropic and OpenAI have marketing teams that write up and publicly trumpet case studies, whereas open source has no one vendor tracking deployments like that.
We learned of several enterprise companies experimenting extensively with open-source LLMs, and it’s only a matter of time before they have deployed LLMs. For example, the automotive company Edmunds and European airline EasyJet are leveraging Databricks’ lakehouse platform (which now includes Dolly, a way to support of open-source LLMs), to experiment and build open-source LLM-driven applications (see here and here).
Other challenges with defining open-source deployment examples
Even defining bonafide enterprise opens source examples here is tricky. An explosion of developers and start-ups are building any number of applications based on open-source LLMs, but we wanted to find examples of established companies using them for clearly useful projects. For our purposes, we defined an enterprise company as having at least 100 employees.
Also, the examples we looked for are enterprise companies that are primarily “end users” of the LLM technology, not suppliers of it. Even this can get murky. Another challenge is how to define open source. Meta’s Llama, one of the more popular open-source LLMs, had a restricted open-source license: Only its model weights were leaked online, for example. It did not release other aspects, such as data sources, training code, or fine-tuning methods. Purists argue that for this and other reasons, Llama should not be considered proper open source. (Meta released Llama 2 in July, which opened it up for commercial license, instead of just research, but it still has some restrictions).
And then there are examples like, Writer, which has developed its own family of LLMs, called Palmyra, to power an application that people to generate content quickly and creatively. It has enterprise customers like Accenture, Vanguard, Hubspot and Pinterest. While Writer has open sourced two of of those models, its main Large Palmyra model remains closed, and is the default used by those enterprise customers — so these aren’t examples of open source usage.
With all those caveats, below we provide the list of examples we were able to find through our reporting. We’re certain there’s more out there. Many companies just don’t want to talk publicly about what they’re doing with open-source LLMs or otherwise. An explosion of new open-source LLMs geared for enterprise have emerged from startups in recent months, including those from Deci and Together’s Redpajama. Even Microsoft, Amazon’s AWS, and Google, have gotten into the supply game (see here, here, and here), and consultants like McKinsey (see here) leverage open LLMs in part to build apps for customers — so it’s nearly impossible to track the universe of enterprise usage. Many enterprises force providers to sign non-disclosure agreements. That said, we’ll add to this list if we hear of more as a result of this story.
VMWare deployed the HuggingFace StarCoder model, which helps make developers more efficient by helping them generate code. VMWare wanted to self-host the model, instead of use an external system like Microsoft-owned Github’s Copilot, likely because VMWare was sensitive about its code base and didn’t want to provide Microsoft access to it.
The security-focused web browser startup seeks to differentiate itself around privacy and has deployed a conversational assistant called Leo. Leo previously leveraged Llama 2, but yesterday Brave announced Leo now defaults to open-source model Mixtral 8x7B from Mistral AI. (Again, we’re including this as a bonafide example because Brave has more than 100 employees.)
The children-friendly mobile phone company, which emphasizes safety and security, uses a suite of open-source models from Hugging Face to add a security layer to screen messages that children send and receive. This ensures no inappropriate content is being used in interactions with people they don’t know.
Wells Fargo has deployed open-source LLM-driven, including Meta’s Llama 2 model, for some internal uses, Wells Fargo CIO Chintan Mehta mentioned in an interview with me at VentureBeat’s AI Impact Tour event in SF, where we focus examples of generative AI being put to at work.
IBM is a provider of generative AI applications that use its own open-source LLMs named Granite, and which also leverage Hugging Face open-source LLMs. However, it wouldn’t be fair to exclude IBM from this list of bonafide users that have deployed applications. Its 285,000 employees rely on the company’s AskHR app, which answers questions employees have on all sorts of HR matters, and is built on IBM’s Watson Orchestration application, which leverages open-source LLMs.
And just last week, IBM announced its new internal consulting product, Consulting Advantage, which leverages open-source LLMs driven by Llama 2. This includes “Library of Assistants,” powered by IBM’s wasonx platform, and assists IBM’s 160,000 consultants in designing complex services for clients.
Finally, IBM’s thousands of marketing employees also use IBM’s open-source LLM-driven marketing application to generate content, Matt Candy, IBM Consulting’s global managing partner for generative AI, said in an interview with VentureBeat. While the application was in proof-of-concept last year, it has been rolling into deployment for specific units across marketing, he said. The application uses Adobe Firefly for image generation but augments that “with LLMs that we are training and tuning to become a brand brain,” Candy said. The app understands IBM’s persona guidelines, the brand’s tone of voice, campaign guidelines, and then creates derivatives of the content for sub-brands and the different countries IBM operates in, he said.
IBM also yesterday announced a deal to provide the Recording Academy, owner of the Grammy Awards, with a service called AI stories, which leverages Llama 2 running on IBM’s Wastonx.ai studio, to help the organization generate custom AI-generated insights and content. The service has vectorized data from relevant datasets around artists and their work so that the LLM can retrieve it through an RAG database. Fans will then be able to interact with the content.
IBM helps all of these organizations generate spoken voice commentary, as well as find video highlights, of relevant sports events using open-source LLMs, IBM’s Candy said. The IBM technology helps these sports event companies call out key things like plate facial gestures, and crowd noise to create an excitement index over the course of a competition.
This hot startup, which is taking on Google search by using LLMs to reinvent the search experience, has only 50 employees, but just raised $74 million and feels almost inevitably on its way to get to a 100. While it does not meet our definition of enterprise, it’s interesting enough to merit a mention. When a user poses a question to Perplexity, its engine uses about six steps to formulate a response, and multiple LLMs models are used in the process. Perplexity uses its own custom-built open-source LLMs as a default for the second-to-last step, said employee Dmitry Shevelenko. That step is the one that summarizes the material of the article or source that Perplexity has found as responsive to the user’s question. Perplexity built its models on top of Mistral and Llama models, and used AWS Bedrock for fine-tuning.
Using Llama was critical, Shevelenko said, because it helps Perplexity own its own destiny. Investing in fine-tuning models on OpenAI models isn’t worth it because you don’t own the result, he said. Notably, Perplexity has also agreed to power Rabbit’s new pocket-sized AI gadget R1, and so Rabbit will also be effectively using open-source LLMs via Perplexity’s API.
This Japanese digital advertising company uses open-source LLMs provided by Dell software, to power OpenCALM (Open CyberAgent Language Models), a general-purpose Japanese language model that can be fine-tuned to suit users’ needs.
Intuit, provider of software like TurboTax, Quickbooks, and Mailchimp, was early to build its own LLMs models, and leverages open-source models in the mix of LLMs driving its Intuit Assist feature, which helps users with things like customer support, analysis and task completion jobs. In interviews with VB about the company’s GenOS platform, Intuit exec Ashok Srivastava said its internal LLMs were built on open source and trained on Intuit’s own data.
The retail giant has built dozens of conversational AI applications, including a chatbot that a million Walmart associates interact with for customer care. Desirée Gosby, vice president of emerging technology at Walmart Global Tech, told VentureBeat the company uses GPT-4 and other LLMs, so as to “not unnecessarily lock ourselves in.” Walmart’s efforts began, Gosby said, by using Google’s BERT open-source models, which were released in 2018.
Shopify Sidekick is an AI-powered tool that utilizes Llama 2 to help small business owners automate various tasks for managing their commerce sites, such as generating product descriptions, responding to customer inquiries, and creating marketing content.
This U.S.-based talent matching start-up uses a chatbot built on Llama that interacts like a human recruiter, helping business find and hire top AI and data talent from a pool of high-quality profiles in Africa across various industries.
The creator of Pokemon Go launched a new feature called Peridot which uses Llama 2 to generate environment specific reactions and animations for the pet characters in the game.
VentureBeat’s mission is to be a digital town square for technical decision-makers to gain knowledge about transformative enterprise technology and transact. Discover our Briefings.