Find the latest Refuel news, research, demos, and engineering updates here.
In this post we examine different techniques for estimating confidence of LLM generated labels, and demonstrate how to leverage these to automatically reject low confidence labels and ensemble LLMs optimally
In this report, we compare the latest models from OpenAI against their previous versions on a data labeling benchmark to find that gpt-3.5-turbo is worse for 6/8 datasets, while gpt-4 performance remains the same.
In this report, we show that LLMs can label datasets 20x faster, and 7x cheaper, but at the same or better quality compared to skilled human annotators.
We are entering the era of AI abundance. Foundation models' broad capabilities to understand language and vision is a huge unlock for building AI applications across every vertical - from healthcare to education to supply chains.
Today, we’re excited to announce Autolabel, an open-source Python library to label NLP datasets with any LLM of your choice.