
Components is a research project that assembles, investigates, and editorializes large datasets. We make most of the data collected for our research freely available on our Datasets page.

You can sign up for our newsletter here, and you can also join our Discord server. For any correspondence, contact mail@components.one.


  • Andrew Thompson

  • Kyle Paoletta

  • Jules Becker

  • Claire Peters

Press and studies

Classification of human- and AI-generated texts for different languages and domains - International Journal of Speech Technology, 12/4/2024

The United States of Automobiles - Business Insider, 10/20/2024

Robinhood, Reddit, and the news: The impact of traditional and social media on retail investor trading - Journal of Financial Markets, 7/9/2024

News Media Framing of Suicide Circumstances and Gender: Mixed Methods Analysis - JMIR Mental Health, 7/3/2024

Living in the Hub: A Platform Study of Desire Semantics - Knowledge@UChicago, 5/13/2024

Mirroring the inequalities of mainstream music platforms: popularity, revenue, and monetization strategies on Bandcamp - International Journal of Cultural Policy, 5/13/2024

A Dataset for The Study of Online Radicalization Through Incel Forum Archives - Journal of Quantitative Description: Digital Media, 4/12/2024

The Many Lives of ‘Sounds of North American Frogs’ - Atlas Obscura, 1/23/2024

American journalism sounds much more Democratic than Republican - The Economist, 12/14/2023

Why did the metaverse die? Because Silicon Valley doesn’t understand the concept of fun - Fast Company, 10/15/2023

How Bandcamp makes more money than Spotify - Fast Company, 10/5/2023

Text-to-Image Generation Tools: A Survey and NSFW Content Analysis - Companion Proceedings of the Brazilian Symposium on Multmedia and the Web, 10/23/2023

Billboard 200: The Lessons of Musical Success in the US - Music & Science, 7/12/2023

Moral Framing of Mental Health Discourse and Its Relationship to Stigma: A Comparison of Social Media and News - Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems, 4/23/2023

Goal Driven Discovery of Distributional Differences via Language Descriptions - Advances in Neural Processing Information Systems 36, 2/23/2023

Using word embedding models to capture changing media discourses - Journal of Computational Social Science, 9/16/2022

COVID-19 and Changing Values - Values for a Post-Pandemic Future, 9/14/2022

Bandcamp und Epic Games: Ein halbes Jahr später – kritische Betrachtung - DJ Lab, 9/2/2022

Lumen: A Machine Learning Framework to Expose Influence Cues in Text - Frontiers in Computer Science, 8/3/2021

Gender bias recognition in political news articles - Machine Learning with Applications, 6/15/2022

The Future of Streaming Services May Be In The Past - The New Inquiry, 6/2/2022

Mining for Fake News - Advanced Information Networking and Applications, 3/31/2022

What Is the Future of Digital Music After Bandcamp & Epic Games Acquisition? - Remezcla, 3/30/2022

What does Epic Games buying Bandcamp mean for DIY music? - Resident Advisor, 3/16/2022

The Best Online Articles of 2021 - Ted Goia, 12/16/2021

L'insatiable appétit des "Tech review" pour le design pornographique - Mais où va le Web?, 11/24/2021

Cultural cartography with word embeddings - Poetics, October 2021

What Spotify Follower Ratio Tells Us About Artist Growth and Fan Engagement - Chartmetric, 6/1/2021

Tous les genres musicaux et leur répartition sur la planète, dans une carte interactive - tsugi, 1/22/2021

Disinformation: analysis and identification - Computational and Mathematical Organization Theory, 06/18/2021

Bandcamp a créé une carte interactive regroupant tous les genres musicaux de chaque ville - Trax, 01/25/2021

The disturbing belly of the 'step' porn trend - Mashable, 8/10/2020

Questa estensione vuole fottere l’algoritmo dei suggeriti di Pornhub - Vice Italy, 12/20/2019

Building a Topic Modeling Pipeline with spaCy and Gensim - Towards Data Science, 9/17/2019

Trump et Twitter, c'est 24 heures sur 24 - Le Soir, 1/17/2019

GraphBTM: Graph Enhanced Autoencoded Variational Inference for Biterm Topic Model - Conference on Empirical Methods in Natural Language Processing, 1/1/2018