ChatGPT Outperforms Crowd-Workers for Text Annotation TasksPublished on Sat Sep 02 2023 by Dustin Van Tate Testa
A recent preprint paper titled "ChatGPT Outperforms Crowd-Workers for Text-Annotation Tasks" highlights the potential of large language models (LLMs) in text annotation tasks. The researchers compared the performance of ChatGPT, a popular language model, with that of crowd-workers on platforms like Amazon Mechanical Turk (MTurk) for various annotation tasks, including relevance, stance, topics, and frame detection.
The study used four datasets comprising 6,183 tweets and news articles. The results showed that ChatGPT's zero-shot accuracy (without additional training) exceeded that of crowd-workers by an average of 25 percentage points across the four datasets. Additionally, ChatGPT's intercoder agreement, which measures the consistency of annotations, surpassed both crowd-workers and trained annotators for all tasks.
Notably, ChatGPT also proved to be significantly more cost-effective compared to MTurk. The per-annotation cost of using ChatGPT was approximately $0.003, which is about thirty times cheaper than using MTurk. This cost advantage, coupled with the superior performance of ChatGPT, has the potential to revolutionize how researchers conduct data annotations and challenge the business model of platforms like MTurk.
The study's findings suggest that ChatGPT could be a highly efficient and accurate tool for various text classification tasks. It performed particularly well in relevance tasks, achieving accuracy rates of 70% for content moderation tweets, 81% for content moderation news articles, 83% for US Congress tweets, and 59% for 2023 content moderation tweets. The study also found that a lower temperature value, which controls the randomness of ChatGPT's output, improved both intercoder agreement and accuracy.
This research opens up exciting possibilities for the use of LLMs in different contexts and languages. Future studies could explore the performance of LLMs across multiple languages, implementation of few-shot learning, construction of semi-automated data labeling systems, and comparison of different types of LLMs. Overall, this study demonstrates the potential of LLMs, like ChatGPT, in transforming text annotation procedures and enhancing the efficiency of text classification tasks.