The State of Open Source AI
How did the leaked document from google "We Have No Moat and Neither Does OpenAI" hold up and what is the current state of open source LLM's?
The year is almost over, and that’s always a great time to look back. As many of us have remarked, time zips by in the world of AI—a week can feel like months, a year like decades, given the insane pace of innovation. With the meteoric rise of ChatGPT, the biggest tech companies have taken a deep dive into AI. In an unprecedented way, people are wary of the tech even before fully grasping what it's capable of. This sentiment comes from the company at the forefront, OpenAI. But there are other takes too, even within the world of Big Tech and Silicon Valley. Yann LeCun, the Chief Scientist of Meta’s AI Lab, stands out with his recent tweet advocating for the future dominance of open-source models over proprietary giants like GPT-4.
But let's not get ahead of ourselves. While open source may not be overtaking the leading proprietary models overnight, it's certainly nipping at their heels, outperforming them in specific tasks or subsets. So, let's dive into the 'We have no moat, and neither does OpenAI' document. It lays out several compelling reasons and examples demonstrating the rapid advancements of open source compared to proprietary models. We'll explore these examples and assess how they've aged, looking into scalable personal AI, responsible release, multimodality, and LLMs on a phone—buzzwords that are quickly becoming part of our everyday lexicon in the AI community.
Scalable personal AI, running strong models on consumer hardware, multimodality... these aren't just futuristic concepts; they're here, evolving in exciting and sometimes unexpected ways. So, strap in as we take a look back—and forward—at the wild ride that is AI development.
A Look Back at Google’s Leaked Document
"The “We Have No Moat, And Neither Does OpenAI” document released in May of 2023 provides a retrospective look at AI perspectives and lets us evaluate how those early views have stood the test of time. It's a nod to the open source community, examining the predictions and assertions made "back then."
In essence, the document challenges the idea that the realm of Generative AI is the exclusive domain of big tech companies. It highlights the significant contributions of developers, startups, and academic institutions.
From the individual efforts of contributors like https://huggingface.co/TheBloke, who has made numerous models accessible, largely self-financed through Patreon, to innovative start-ups like Mistral in France building open-source models, and universities pushing the envelope with groundbreaking research - it's a collective push forward.
Open-Source is Speeding Ahead: The document underscores that open-source isn't just on the heels of big tech; it's leading the charge, rapidly solving complex problems. This trend demonstrates that a collaborative and open approach can often surpass the more guarded, proprietary methods.
Practical AI is Key: It emphasizes making AI accessible and useful for everyone. Envision having AI on your phone or customizable tools on your laptop - the document champions practical, everyday AI applications that fit into our lives.
Collaboration is King: It's clear about the future of AI being inherently collaborative. It encourages a shift from a "my tech" to an "our tech" mindset, advocating for a more inclusive and cooperative approach to AI development.
Frontiers back in May, Where Are They Now?
In the document a few frontiers and exciting examples where mentioned, and we can dive into those to see how the field has or hasn’t shifted.
These where the examples mentioned:
Scalable Personal AI: You can finetune a personalized AI on your laptop in an evening.
Responsible Release: This one isn’t “solved” so much as “obviated”. There are entire websites full of models with no restrictions whatsoever, and text is not far behind.
Multimodality: The current multimodal ScienceQA SOTA was trained in an hour.
LLMs on a Phone: People are running foundation models on a Pixel 6 at 5 tokens / sec.
And let’s see where they are now:
Scalable personal AI:
There’s quite a few reasons why this has become significantly more accessible - first of all, small models have become a lot better and they’ve become significantly easier to run. Both Mistral 7B and Llama 2 (at different sizes) have been released and they perform significantly better than state of the art models at similar that were runnable on good GPU’s (think Nvidia RTX 3060 or better) than the state of the art in march of last year.
The "7B” stands for the amount of parameters, in other words, the size of the model. 7B means 7 billion parameters, which can be run on fairly basic computer hardware at decent speeds. Generally, more is better. “7B” and smaller are generally referred to as smaller models. ChatGPT 3 is rumoured to be a 175B model.
Alpaca 13B was for a little while the strongest model at its size, and although its a little hard to quantify, Mistral 7B is about as close to GPT-4 in quality as Alpaca 13B is to Mistral 7B. In other words, a model that’s significantly smaller has become significantly better.
Running the model has become easy now too - you can download LM Studio https://lmstudio.ai/ and you can quickly load up a model on your own PC without having to write code. It might take a little trial and error to get the right settings going that work well with the model you’re trying to run, but with some searching around on the internet, you can figure that out.
This alone is an insane improvement - and there’s more. Specifically when it comes to funetuning, some developments that have come out, and one of those is QLoRA. The paper was released in May. This method allows anyone to finetune a model while needing about 30% less memory on your GPU, making finetuning significantly more accessible.
This means you can add knowledge on a specific niche, steer a model to respond a certain way (act like a chatbot, stick to a certain writing style), extract information, follow instructions better.
Earlier this month people on the /r/LocalLlama subreddit talked about finetuning a model on his own WhatsApp messages - making a language model write responses as himself on WhatsApp, which is one of many examples of this moving forward day by day.
Responsible Release
Can’t say there has been a lot of improvement here. One notable example is, again, Mistral. They added a “safe_mode” option allowing you to switch between a censored/uncensored version. A good option to have when you’d use one of these models for customer facing applications, but this definitely hasn’t been solved yet. Even if base models politely decline certain requests, a finetune on specific datasets will take off the guardrails. Any model with a Dolphin finetune is likely to comply with just about any request, whether dangerous, immoral, NSFW or anything else.
We’ll see how this will develop, but this is definitely not solved and something I’ll touch on some other time.
Multimodality
There’s been some really awesome developments here too - you can load up a model like BakLLaVA, allowing you to toss in an image and ask prompt about what the model “sees". LM Studio recently added support for these vision models, you can use these models quite easily, just like you’re used to with GPT-4V, but on your own PC.
For Open Source voice generation, XTTS v2 was just released. It can mimick a voice with seconds of audio (although that won’t make the quality amazing) and it’s not quite ElevenLabs level, it’s also getting better as we speak (literally and figuratively).
Local multimodal models can and will likely change the way we interact with tech in our daily lives. With vision it can navigate phone UI’s, take actions for us.
While this is all very exciting, we do need to educate ourselves fast on the possible opportunities for misuse - while a Nigerian prince is known to all of us, hyper-personal, LLM powered scams will be extremely cheap and likely more effective than anything we’ve seen before at scale. This could range from scams where the audio mimicks the voice of a loved one, or messages crafted by a language model that does a great job copying someone you know - whether its a colleague or friend. If something sounds sketchy or seems a little unlikely - contact whoever may or may not be impersonated directly yourself.
LLMs on a Phone
Back to the topic at hand. While small models have gotten better and it’s possible to get decent performance with a small model on a flagship phone, there haven’t been developments you can directly take advantage of right now. In defence of open source, closed source proprietary models aren’t making strides in public here. Google seems to be closest with running their Nano model on a Pixel 8, providing summaries in the recorder app and message suggestions within WhatsApp (no clue why they chose a Meta product).
The quality of the tiny Phi-2 model from Microsoft is promising and shows a clear example of how good input results in good output, as they mostly used data from textbooks to train their model. We’ll definitely see exciting things happening in 2024!
Mistral - a French Start-up Making Waves
I’d like to pick out one example, the company making the biggest waves in the world of open source LLM’s right now is Mistral. Their small, 7B model is punching well above its weight, beating much larger models in different types of performance. While it absolutely stands out, the French start-up has made an even more impressive model.
Its the first of its kind to be released open-source, called Mixtral 8x7B. It is what is called a “MoE” model, which stands for Mixture of Experts. Think of it as 8 smaller models, and one router that picks which of the smaller models will provide the best result. Although we never got confirmation, it is rumoured that GPT-4 is set up the same way but with much larger models.
In short, it runs at the speed of a much smaller model, while providing the performance/quality of a larger one.
This model is still quite big, you need a strong computer to run it. Think of a really good, modern gaming PC with a high-end graphics card (RTX 4070 or better).
Now that it has been shared with the people, they can get to experiment with it, try and figure out how this model can be quantized (made smaller at the cost of some quality, so it can be ran with less memory), finetuned, prompted, what works and what doesn’t.
Above is a great visualization based on the ELO rating of chatbots over time.
The ELO is determined in the Chatbot arena - simply put, you can chat with two models at the same time (picked at random) and you can decide what response you like best to determine the winner. Due to all kinds of problems with LLM benchmarks, this is one of the fairest comparisons we have.
GPT-4 has clearly been king for the whole year, but it’s great to see the Open Source Llama 2 and Mistral models move up the leaderboards, competing with Big Tech and Silicon Valley.
Looking Forward
Mistral and other players are not just participating; they're leading the charge, making AI more personal, responsible, and accessible than ever before. Giving us alternatives and freeing us from relying solely on Big Tech. Scalable, personal, and most importantly, private AI won’t be stopped. The rapid progress is not just encouraging; it's exhilarating, and it doesn't seem to be slowing down anytime soon.
Multimodality is part of this shift, changing how we interact with our technology on a fundamental level. Being able to handle complex tasks directly on our devices, without sending data back and forth to centralized servers, offers a more optimistic future.
The best is indeed yet to come, and the excitement is palpable. As we look forward to these developments, let's keep in mind the broader picture. The future of AI is about more than just technological breakthroughs; it's about making sure these advances benefit everyone. It's about steering this powerful tool so that it enriches our lives, respects our privacy, and remains under our control.
Let's move into this future with a sense of responsibility and anticipation. The potential is limitless, and as a community, we have the opportunity to shape it wisely.