The Map and the Category

Datasets used

Join our Discord server!

This is the first of two projects exploring Bandcamp data. The second, which examines sales on the site, is available here.

Music genres — whatever their worth, lack of worth, qualified usefulness, or total uselessness — originate from one of two places: from above or from below. For the most part, categorization is imposed at the end of the supply chain, where once upon a time, the sorting happened literally and physically as albums were binned behind plastic labels affixed with "Alternative" and "R&B" and the various crude metacategories that formed the vocabulary of Tower Records et al. Between those metacategories, the music press sometimes filled the gaps, the language of early Pitchfork betraying a certain coinage-frenzy. In the streaming era, the genres are called cluster 589 and cluster 1333, the results of machines calculating cosine similarities between thousands of variables that float alongside one another in high-dimensional vector space. These outputs are then squinted at by a Spotify employee who, examining the contents of cluster 1092, hears a bunch of Marathi and electronic drums and declares the nebula "Indian pop."

Bandcamp, having embraced its ad hoc function as Not Spotify, imposes no such restrictions on the ability of artists to call their music whatever they want. And because the platform, to our knowledge, conducts no back-end acoustic analysis of the music, these genre tags (ignoring label pages and users who follow one another) comprise the most elemental way that one piece of music relates to another — the hands-off opposite of sending users down taste tunnels of its own machine-derived inferences, á la Discover Weekly. From a perspective of musical production, discovery, and culture, the results of this genre auteurship are somewhere between totally neutral to whatever a less cringe term for emancipatory would be. If you're here, on this page, on this website, we assume you already understand why this is and why algorithmic cultural determinism is a bad thing, and we don't want to waste your time by explaining it to you again.

The map is an attempt to represent the difference in musical cultures from city to city based on the genre tags attached to artists' albums and individually sold tracks. And for the purposes of the project, the unruly nature of Bandcamp's genre tags makes things more ambiguous.

As with nearly all Natural Language Processing work, data must be pre-processed and cleaned in order for it to take on any semblance of coherence. Pre-processing Bandcamp's genre tags, more than most work on other datasets that involve this step, requires an atypical level of discretion, as it threatens to once again usurp the agency of the artist by imposing order from above.

In most cases, pre-processing these genre tags is relatively inconsequential. The stray period of "electronic." can be uncontroversially removed to form "electronic," "hiphop" can be corrected to "hip-hop," and so on. Further, words can be lower-cased without incident, various non-alphanumeric characters can be removed, and spaces can be added between characters in order to facilitate analysis without much loss of meaning.

But what about "hip hop (real shit)"? Or "psychedelic trance," which linearly co-occurs with "psy trance," a more commonly used tag for what is basically the same type of music...or is it not the same type of music? Ideally, cleaning reduces redundancy and noise and allows the tags truly particular to cities (i.e., the genres that tend to appear in a given city more than they appear in other cities on average) to shine through.

Take, for example, Tblisi, Georgia, whose most distinctly Tblisian genres include "depressive suicidal black metal." This genre exists in a cluster that also includes other notably Tblisian genres, like its abbreviation, "dsbm," as well as "depressive black metal", "suicidal black metal," and just "black metal." Are these actually distinct? Can one be suicidal and not depressed? If you're depressed and not suicidal does your black metal sound different? Is there meaning in using the abbreviation versus its full category? Because each of the genre tags is a product of the artists themselves, they can say as much about how artists view themselves and their music as they do about the music itself. The pre-processing is vital to the attempt to make sense of any data, including Bandcamp's genres, but in this case, it is tied to a pully whose other end is tethered to expressions that emerge in self-categorization.

In addition to these more overtly musical considerations, geographic tags were removed from every city in which they appeared — "Richmond" was removed from Richmond, VA; "Berlin" from Berlin; and so on, as well as colloquialisms or variations like "philly" or "richmond va." Country names were also removed from countries in which they appeared ("Iceland" from Iceland, etc.), but country adjectives remained (e.g., "Icelandic"). Partly this was because "Icelandic" or "French" can refer to a supercategory of music, and partly it was because these country names were not perfunctorily included on every single page: every album in Philadelphia had "Philadelphia," but not every album in Japan had "Japanese." By this rubric, for example, "Germany" was allowed to stay in Mexico City, and "Brooklyn" could stay in New York.

As with the formation of the genres themselves, the rules to clean up their naming conventions follow a mixture of logic and arbitrariness. The attempt to perfectly consolidate the tags is the same Borgesian folly that inspires attempts to categorize anything in the first place — music, reptiles, neurological states. And yet, as imperfect as any taxonomical system can be, when done with care and respect, it more likely yields something rather than nothing. The map is something because of the alert of recognition one experiences when looking at it, that the differences in genres particular to cities often feel meaningful rather than meaningless. Yes, "tango" is expectantly more particular to Buenos Aires than to Melbourne, Australia, where "garage" ranks among the most Melbournian genres. Likewise Seoul, where “K-pop” takes the exact place you'd think it does. The way the map correlates with one's own instincts inspires trust towards its less predictable revelations, like the prominence of "spoken word" in Raleigh or the supremacy of "podcasts" in Toronto.

The way to circumvent this messiness would be to avoid Bandcamp altogether: Spotify makes its genre tags available through an API, and assuming the Bandcamp artists offer their music on both platforms, one could make a map with the same network visualizations using Spotify’s crisply normalized categories instead. Doing so would have allowed us to skip what ultimately amounted to months of sometimes grueling data cleaning that has never felt complete — in the final days of compiling the map, we discovered that the location-oriented genre tag "RVA" continued to linger in Richmond, eliciting a depleted "fuck it" from those involved. It's not perfect, because it can't be. But given most platforms' preoccupations with chasing holy grails of immaculate categorization, that's probably for the best.