Snoopy: An Online Interface for Exploring the Effect of Pretraining Term Frequencies on Few-Shot LM Performance
Current evaluation schemes for large language models often fail to consider the impact of the overlap between pretraining corpus and test data on model performance statistics. Snoopy is an online interface that allows researchers to study this impact in few-shot learning settings. Our demo provides term frequency statistics for the Pile, which is an 800GB corpus, ac- companied by the precomputed performance of EleutherAI/GPT models on more than 20 NLP benchmarks, including numerical, common- sense reasoning, natural language understand- ing, and question-answering tasks. Snoopy al- lows a user to interactively align specific terms in test instances with their frequency in the Pile, enabling exploratory analysis of how term frequency is related to the accuracy of the mod- els, which are hard to discover through au- tomated means. A user can look at correla- tions over various model sizes and numbers of in-context examples and visualize the re- sult across multiple (potentially aggregated) datasets. Using Snoopy, we show that a re- searcher can quickly replicate prior analyses for numerical tasks, while simultaneously allowing for much more expansive exploration that was previously challenging. Snoopy is available at https://nlp.ics.uci.edu/snoopy.