Autolabel: Open-source library to label all your NLP datasets

Schedule a Demo

June 14, 2023

Refuel Team
Refuel Team

Introducing Autolabel: Label your data 25-100x faster with LLMs

Today, we’re excited to announce Autolabel, an open-source Python library to label NLP datasets with any LLM of your choice. 

The library supports common NLP tasks such as classification, named entity recognition and question answering, and popular LLM providers such as OpenAI, Anthropic and HuggingFace. 

We’ve benchmarked the performance of the library across a variety of open source and proprietary datasets, and are able to achieve human-level label quality at 25-100x the speed (full technical report available here). If you have any questions, come chat with our team on Discord here.


For the longest time, ML teams have been bottlenecked by access to clean, labeled data. This is because data labeling is a very manual and time-consuming task, and there just aren’t enough annotators and experts to create the large, clean and diverse datasets that ML models require. We spoke with tens of ML teams that were spending half their time just labeling data (and consequently, not training or improving models), or blocked on external annotators to get back to them with clean, labeled data. 

LLMs are an incredibly powerful piece of technology – and while they can write poems, correct your code and solve the BAR and SAT exams, the research community is noticing that they’re also very capable at labeling datasets. LLMs, however, suffer from a few challenges. Hallucinations can impact data quality for training datasets severely, the most powerful LLMs can be quite costly at scale and it is tricky to “teach” LLMs about your specific problem and domain. 

In the last few months, we’ve experimented with many LLMs and prompting techniques to understand how we can label data to maximize accuracy with minimal human effort. The result is Autolabel: a Python library to label NLP datasets using LLMs achieving human-level accuracy, but significantly faster.

Autolabel: important features and use cases

Autolabel provides out-of-the-box support for NLP tasks such as classification, named entity recognition, entity matching and question-answering. It provides support for proprietary LLM providers such as OpenAI, Anthropic and Google Palm and for open-source and private models through HuggingFace. You can try prompting strategies such as few-shot and chain-of-thought prompting and techniques for estimating label confidence easily by simply updating a configuration file. Teams are able to start labeling data in a few minutes, rather than writing complicated guidelines and waiting weeks for external teams to annotate their data.

A before-after comparison diagram. The "before" panel depicts that for one task, it would take four weeks for humans to label 10,000 examples. The "after" panel depicts that it would take 30 minutes to auto-label over 100,000 examples.

This means that whether you want to use GPT-4 for classifying legal documents, or try out an in-house LLM to detect medical entities in sensitive healthcare data, Autolabel can help. Our early users include companies in fintech, HR and e-commerce and they’re seeing dramatic speed-ups in getting their data labeled. 

All it takes is for you to describe your labeling guidelines in natural language, provide a few helpful examples and let LLMs do the rest. It’s best to understand the library with a real dataset and labeling task, so let’s check out a simple example.

A simple example: toxic comment classification

Let’s imagine you worked on the content moderation team for a social media company. You were trying to train a classifier to determine whether a user comment was toxic or not toxic. You might have written guidelines for what constitutes a toxic comment. 

Before Refuel, the way to create a high-quality training dataset would be to collect a few thousand examples and get them labeled by a team of annotators. This would take a few weeks – getting familiar with the guidelines, going a few iterations from smaller to larger datasets, etc. 

Instead, you could use the Autolabel library. Here’s a simple guide that shows how to label this dataset in just a few minutes -- you can follow along each step with code yourself.

Compare the multiple weeks of labeling by a team of human annotators vs directing an LLM to do the labeling for you as a copilot – Autolabel makes this process 25-100x faster.

What’s next? 

In the next few months, we are going to be adding a host of new integrations and capabilities to Autolabel. Here is just a sample of what’s coming:

  • More LLM providers and LLMs
  • More labeling tasks such as entailment, summarization, etc. 
  • More input data types and improved robustness of LLM outputs
  • New prompting techniques (such as Tree of Thought) being discovered in research
  • Workflows for experimenting with multiple LLMs and prompts easily

A more detailed roadmap is available here – if you have thoughts or suggestions, we’d love to hear your feedback. Join our community or open an issue on Github to share your thoughts.

Get started today

Autolabel is available to try now, and getting started is as easy as running `pip install refuel-autolabel`. Please reach out to us on Discord if you have any questions, give us a star on Github and if you’re interested, contribute to the library. We look forward to hearing from you!