A Dialogue Model for Academic Research – The Berkeley Artificial Intelligence Research Blog

By neub9
2 Min Read

In this post, we introduce Koala, a chatbot trained by fine-tuning Meta’s LLaMA on dialogue data gathered from the web. The post describes the dataset curation and training process of our model and presents the results of a user study comparing our model to ChatGPT and Stanford’s Alpaca. Results show that Koala can effectively respond to a variety of user queries, with responses often preferred over Alpaca and at least tied with ChatGPT in over half of the cases.

These findings contribute to the discourse around the relative performance of large closed-source models to smaller public models. They suggest that models small enough to be run locally can capture much of the performance of their larger counterparts if trained on carefully sourced data. This implication might suggest that curating high-quality datasets could do more to enable safer, more factual, and more capable models than simply increasing the size of existing systems.

It is important to note that Koala is a research prototype and has major shortcomings in terms of content, safety, and reliability, and should not be used outside of research.

The post also provides an overview of the differences between Koala and notable existing models and describes the dataset composition. It details how dialogue data was gathered and curated from the web and public datasets and presents the specific details of the dataset composition.

The Koala model is implemented with JAX/Flax in EasyLM, an open-source framework that makes it easy to pre-train, fine-tune, serve, and evaluate various large language models. The model was trained on a single Nvidia DGX server with 8 A100 GPUs and took 6 hours to complete the training for 2 epochs.

Preliminary Evaluation involved comparing the performance of two models: Koala-Distill, which employs solely distillation data, and Koala-All, which employs all of the data, including both distillation and open-source data. The aim was to evaluate the influence of distillation and open-source datasets on the final performance of the models.

Share This Article
Leave a comment

Leave a Reply

Your email address will not be published. Required fields are marked *