Refuel LLM-2

The world’s best large language model for unsexy data tasks in the enterprise

Refuel LLM is trained on a curated dataset of 50B+ tokens across 2750+ data labeling, cleaning and enrichment tasks from domains such as finance, HR, law and e-commerce.

“We’re amazed at how well Refuel’s LLMs were able to learn the nuances of our business identity data”
Ken Chew
Data Science Lead — Middesk
Why choose Refuel LLM

Best LLM in the world for unsexy data tasks

On a benchmark of 30 data labeling and enrichment tasks, RefuelLLM-2 (83.82%) outperforms all current state-of-the-art LLMs, including GPT-4-Turbo (80.88%), Claude-3-Opus (79.19%), and Gemini-1.5-Pro (74.59%).
RefuelLLM-2-small (79.67%) outperforms all comparable LLMs including Claude-3-Sonnet (70.99%), Haiku (69.23%), and GPT-3.5-Turbo (68.13%).
A table showing the performance of Refuel LLM-2 on a set of data labeling and enrichment tasks compared to all current state-of-the-art LLMs.

Extreme performance in verticals such as financial services, HR, and e-commerce

Across verticals where output quality really matters, Refuel-LLM-2 delivers higher accuracy compared to current state-of-the-art LLMs, at less than 1/10th the size.

Never worry about LLM hallucinations

Every single output from Refuel’s LLMs is accompanied by a calibrated confidence score, which estimates the model’s inherent level of confidence in generating the answer. Retry, ensemble or filter outputs based on confidence scores to improve reliability and reduce hallucinations (read our report on confidence scores).

Make the best LLM even better for your specific tasks with fine-tuning

While Refuel LLM-2 provides superhuman performance out-of-the-box, it's also adaptable to your domain and tasks. With <200 data points and 15 minutes of training, you can fine-tune Refuel LLM-2 further to achieve near-perfect performance, all within fully-managed infrastructure provided by Refuel Cloud.

Try out Refuel LLM yourself

