- 10,700 articles from the front page of the Times Used in: Pedagogy of the Press Size: 180 mb
- All the News 2.0 — 2.7 million news articles and essays from 27 American publications Size: 3.1 GB compressed, 8.8 GB uncompressed
- All the News 1.0 — 204,000 news articles and essays Used in: The Production of "Space" Size: 1.5 gb
- 1,000,000 Bandcamp sales Used in: The Chaos Bazaar Size: 76.9 mb compressed, 301 mb uncompressed
- Metadata from 4,336 arxiv papers on LLM benchmarking and post-training Used in: Building Heideggerian AI Size: 8 mb
- 1,069,972 Bandcamp items Used in: The Map and the Category Size: 942 mb compressed, 4.5 gb uncompressed
- r/Braincels, 10/21/2017 – 5/3/2019 Used in: Objects of Desire Size: 1.1 GB
- Acoustic and meta features of albums and songs on the Billboard 200 Used in: It Goes On Size: 117 mb
- Bodies and Spaces on Google Scholar Used in: The Production of "Space" Size: 58 mb
- 18,000 Cable News Transcripts Used in: Phantom Threads Size: 870 mb
- 450 responses to prompts for public figure quotes from 10 large language models Size: 2.1mb
- Film reviews and TV reviews: 208,000 critic reviews and 10.7 million user reviews Used in: Faucets of Despair: A conversation with A.S. Hamrah Size: 1.4 gb
- Metadata from 218,000 PornHub videos, Jan. 2008 – Dec. 2018 Used in: Every Story is an Epstein Story Size: 314 mb
- 20,783 Pitchfork Reviews Used in: Scored and Arranged Size: 89.9 mb
- 28,092 tech publication reviews Used in: The New Pornographers. Size: 212 mb
- 76,822 Product Hunt products Used in: The Gamer and the Nihilist Size: 242 mb
- Trump’s tweets Used in: America's Most Wanted Size: ~9mb
- r/TheRedPill, 10/25/2012 – 5/3/2019 Used in: Objects of Desire Size: 1.1 GB
- 10,375 YouTube tech reviews and unboxing videos metadata Used in: The New Pornographers. Size: 116 mb
- Think tank mentions in nine publications, 2016 – 2018 Used in: The Center of the Center Size: 14 mb
