![]() ![]() This suggests that much of the qualitative gains in state-of-the-art models like ChatGPT may owe to focused corpuses of instruction-following training data, rather than larger or better-tuned base models. The model underlying Dolly only has 6 billion parameters, compared to 175 billion in GPT-3, and is two years old, making it particularly surprising that it works so well. Dolly works by taking an existing open source 6 billion parameter model from EleutherAI and modifying it ever so slightly to elicit instruction following capabilities such as brainstorming and text generation not present in the original model, using data from Alpaca. Whereas the work from the Alpaca team showed that state-of-the-art models could be coaxed into high quality instruction-following behavior, we find that even years-old open source models with much earlier architectures exhibit striking behaviors when fine tuned on a small corpus of instruction training data. Today we are introducing Dolly, a cheap-to-build LLM that exhibits a surprising degree of the instruction following capabilities exhibited by ChatGPT. Then, in March, Stanford built the Alpaca model, which was based on LLaMA, but tuned on a small dataset of 50,000 human-like questions and answers that, surprisingly, made it exhibit ChatGPT-like interactivity. In February 2023, Meta released the weights for a set of high-quality (but not instruction-following) language models called LLaMA to academic researchers, trained for over 80,000 GPU-hours each. This quickly led to Google and other companies releasing their own proprietary instruction-following models. The model was trained on trillions of words from the web, requiring massive numbers of GPUs to develop. We believe models like Dolly will help democratize LLMs, transforming them from something very few companies can afford into a commodity every company can own and customize to improve their products.ĬhatGPT, a proprietary instruction-following model, was released in November 2022 and took the world by storm. We open source the code for our model (Dolly) and show how it can be re-created on Databricks. Surprisingly, instruction-following does not seem to require the latest or largest models: our model is only 6 billion parameters, compared to 175 billion for GPT-3. We show that anyone can take a dated off-the-shelf open source large language model (LLM) and give it magical ChatGPT-like instruction following ability by training it in 30 minutes on one machine, using high-quality training data. Update Apr 12, 2023: We have released Dolly 2.0, licensed for both research and commercial use. ![]()
0 Comments
Leave a Reply. |