Metadata from 4,336 arxiv papers on LLM benchmarking and post-training

This dataset contains metadata for 4,336 papers on arxiv.org between 12/23/2021 and 1/7/2025 that has been classified by three LLMs to determine the subject matter of the paper. The CSV includes the following columns:

Title
Abstract
Authors
Date of publication
arxiv category tags
paper url

The dataset contains category classes as determined by Gemini 1.5 Pro, Llama 3.1 405b, and Qwen 2.5 72b, as well as the "winning" category that appears in at least two classifier columns. The possible categories offered to the models were:

Mathematical abilities
Reasoning abilities (non-domain specific)
Coding
Model interpretability
Personality and emotions
Scientific/medical knowledge
General knowledge abilities
Domain-specific knowledge and use cases
Safety, ethics, bias, and behavioral alignment
Factuality and hallucinations
Adaptability and generalization
Recommendation systems
Language, semantics, multilingual capabilities and translation
Art, creativity, and aesthetics
Model architecture, performance, hardware and efficiency (non-domain specific)
Multi-modal capabilities (e.g., vision and text combined) (non-domain specific)
Autonomous agents (non-domain specific)
Contextualization, knowledge graphs and retrieval augmented generation (non-domain specific)
Other

Papers that were classified as "Domain-specific knowledge and use cases" were further classified using the same models as belonging to one of the following domains:

Science',
Computing and cybersecurity
Law and politics
Finance and economics
Engineering
Manufacturing
Medicine
Education
Psychology
Shopping and consumption
Art and aesthetics
Transportation
Other

Download

Type: CSV
Size: 8 mb
Created by: Jameson Orvis, Andrew Thompson
Date added: 2/4/2025
Date modified: 2/4/2025
Used in: Building Heideggerian AI