Building Heideggerian AI

LLM benchmarks and the stubborn persistence of rationalism

Datasets used


“...what is essential — what causes one to tremble with fear and delight.” Georges Bataille, The Accursed Share


In 1972, the philosopher Hubert Dreyfus published a broadside against the popular consensus on artificial intelligence, and in doing so, helped construct the philosophical foundations of deep learning. At the time that What Computers Can’t Do came out, progress in artificial intelligence was at an impasse, a period now known as the “AI winter.” Researchers were able to program machines to perform well on small tasks, but those models failed to generalize outside of test conditions. Engineers could build a concept car, but they couldn’t drive it off the lot.

AI’s shortcomings, Dreyfus argued, stemmed from a misunderstanding tracing back to Plato that engineers had taken and run with. That misunderstanding could be boiled down to a basic precept: The universe is set on a foundation of immutable rational laws independent of our experience that can only be grasped through pure reason. If we could get the reasoning just right, we could be certain of the nature of reality. Descartes had extended this understanding to our very minds; if we start from the foundational principle “I think, therefore I am,” then the mind itself must be a logical system, and it’s only through the logical arrangement of representations of a reality “out there” that we can think, believe, and effectuate anything in the world.

From this rationalist perspective, if you pour a coffee, it is because your mind is issuing a sequence of logical statements of what coffee is, when to pick up the kettle, when to tilt it down, when to tilt it back up, and so on. And if what’s going on in our heads itself is already a form of codified logic divorced from our bodies, this logic can be abstracted away from any individual and thrown into a machine.

“Rationalists like Decartes and Leibniz thought of the mind as defined by its capacity to form representations of all domains of activity,” Dreyfus wrote. “AI turned this rationalist vision into a research program and took up the search for the primitive and formal rules that captured everyday knowledge. Commonsense understanding had to be represented as a huge data structure comprised of facts plus rules for relating and applying those facts.”

This attempt to transmute rationalist mental frameworks into vertiginous codebases became Symbolic AI, which Dreyfus would later, in his updated editions of the book, refer to as “Good Old-Fashioned AI”, or GOFAI. As AI researchers wrote at the time, “When an organism executes a Plan he proceeds step by step, completing one part and then moving on to the next.” If we identify a strawberry by sequentially identifying its isolated features and then gluing them together with reason to conclude that, yes, it’s a strawberry, then an AI system with thousands of logical statements for how a computer could recognize an image of a strawberry based on its distinct features should be able to accomplish the same thing.

But despite decades of research and billions of dollars, the systems always sputtered outside of tightly drawn experiments. Dreyfus’s hunch was that what lay behind GOFAI’s problems was rationalism itself and its belief in “the mind’s symbolic representation of the world.” Dreyfus had built his reputation on his colorful, accessible readings of Martin Heidegger, unpeeling the fruits of their bitter rinds and distilling the German existentialist’s notoriously dense philosophical colloquialisms into something mere mortals could actually comprehend. In Heideggerian terms, the failure of GOFAI was that it relied on building machines that “know that” versus “know how.” Take, for example, the act of gift giving. We don’t have a set of rules inscribed in our minds for the proper protocol of giving presents in our respective cultures. Instead, we intuitively know how to give one. “[Knowing] how to give an appropriate gift at the appropriate time and in the appropriate way requires cultural savoir faire,” Dreyfus wrote. “So knowing what a gift is is not a bit of factual knowledge, separate from the skill or know-how for giving one. The distinction between what a gift is and what counts as a gift, which seems to distinguish facts from skills, is an illusion fostered by the philosophical belief in a nonpragmatic ontology.”

So GOFAI trips over a “nonpragmatic ontology.” What exactly does this mean? If the rationalists believed truth was “out there” and waiting for us to access it through pure reason, pragmatists — like John Dewey, Richard Rorty, late Wittgenstein, and, for our present purposes, Heidegger — dismissed what Dewey called the “quest for certainty” altogether. Consider the sour taste of a lemon. Is the sourness inherent to the lemon, or is it in the mind of the taster? Pragmatists would say that it resides in both and neither; instead, it lies in the transaction between the person and the object — that it is the process itself. Or consider the word “tree.” To Wittgenstein, the meaning of “tree” was not tied to a static reference point in a dictionary, but was inseparable from its relationship to other words, itself a function of its common usage in a culture. In other words, both the taste of the lemon and the meaning of “tree” are not accurate symbolic representations of a world, but are themselves byproducts of relationships between things. To pragmatists, we are all co-creators with our environments (physical, social and otherwise) of what the world actually is. There is no such thing as truth of the world outside of our involvement with it.

1 1
As Dreyfus put it, the “meaningful objects…among which we live are not a model of the world stored in our mind or brain; they are the world itself.” (Emphasis in original.)

To create functional AI, then, would entail abandoning knowing-that in favor of knowing-how — of involving the machine with its world. If the failed approach of GOFAI rationalists was to build a machine learning model to identify images of strawberries by encoding it with all the necessary rules (is it dimpled, is it red, is it a bit conical, and so on, in such a way that it could identify all strawberries in every environment), a pragmatist approach meant simply showing the model many strawberries until it learned to recognize them for itself.

Dreyfus called this latter approach “Heideggerian AI”, a term later taken up by computer scientists trying to build systems that moved beyond internal symbolic representations of the mind. But as researchers began inching towards solutions like neural nets, Dreyfus saw not just a way through the impasse of GOFAI, but something more broadly consequential: An empirical validation of Heidegger’s philosophy. “Neural networks raise deep philosophical questions,” Dreyfus wrote. “It seems that they undermine the fundamental rationalist assumption that one must have abstracted a theory of a domain in order to behave intelligently in that domain.” In other words, if Heideggerian AI worked, it would subvert the rationalist project itself — and in the process, its quest for a certainty of the world outside of our involvement in it.

And so as the GOFAI engineers insisted they were still edging to a new breakthrough, Dreyfus argued that

they are on the verge of creating an even greater conceptual revolution—a change in our understanding of man. Everyone senses the importance of this revolution, but we are so near the events that it is difficult to discern their significance. This much, however, is clear. Aristotle defined man as a rational animal, and since then reason has been held to be of the essence of man. If we are on the threshold of creating artificial intelligence [via GOFAI] we are about to see the triumph of a very special conception of reason. Indeed, if reason can be programmed into a computer, this will confirm an understanding of man as an object, which Western thinkers have been groping toward for two thousand years but which they only now have the tools to express and implement. The incarnation of this intuition will drastically change our understanding of ourselves. If, on the other hand, artificial intelligence [GOFAI] should turn out to be impossible, then we will have to distinguish human from artificial reason, and this too will radically change our view of ourselves. Thus the moment has come either to face the truth of the tradition’s deepest intuition or to abandon the mechanical account of man’s nature which has been gradually developing over the past two thousand years.


About 50 years after Dreyfus wrote those lines, it is now impossible to evade the daily briefings of deep learning’s dominance, of generative AI investment rounds and breathless coverage of the latest LLM wrapper that will change how we work, read, and jack off. To the extent most non-specialists know about GOFAI at all, it is from peering at relics in computer science classes of a historical predecessor the rest of the world has little idea ever even existed.

If the gauntlet Dreyfus threw down is a viable one, then we’ve unequivocally fallen on one side of his “conceptual revolution.” We haven’t just built working AI systems — we’ve falsified rationalism as a convincing view of the world, man, and intelligence itself.

And yet, in a stubborn irony, the parties building, investing in, and overall theorizing about AI remain mired in an ontology oriented towards deciphering eternal, objective forms. Marcus Aurelius’s Meditations, a how-to on living with as much uninvolved detachment as possible, remains a perennial bestseller in Silicon Valley. Sites like Astral Codex Ten and other rationalist blogs and Substacks are de facto required reading by people in the space. Like Sam Altman, they champion effective altruism as an implementation of rule-based rationalist ethics that can be codified and universalized. It hardly bears mentioning that so many of AI’s most aggressive advocates subscribe to libertarianism, an economic paradigm predicated on the principle that human beings are fundamentally rational agents of utility maximization.

How is this new generation of rationalists able to support a worldview in the face of a technology that Dreyfus believed seemed to undermine it so thoroughly? The paradox follows from an interpretation of neural nets not as an inconvenient truth they have to accommodate, but as a validation of rationalism itself, the complete opposite of what Dreyfus thought they represented. The reversal seems to go as follows: because neural nets can mimic (or in some cases, surpass) human abilities with math and code, then we must be some form math and code as well, and so the entire universe, including our minds, is essentially a giant set of probabilistic spaces that can be modeled with increasing accuracy. Just like the original Cartesian proposition that truth consists of mirroring the world with precise representations of ideal forms existing beyond our muddying senses, rationalists today see these forms as taking the shape not of Euclidean geometry but of probabilities. Dreyfus’s ultimate Heideggerian, anti-Cartesian assertion that our understanding of anything is inseparable from our engagement in the world is functionally dismissed in favor of a persistent conviction in a statically existing world “out there” happens “to” our mechanistic selves. Uncertainty retains the role it held since Plato: an obstacle to overcome rather than a fundamental condition of existence and the very foundation from which understanding emerges.

One doesn’t have to reject probability theory to retain this grounding of uncertainty. An obvious case in point is the more epistemically (if not personally) humble Nassim Nicholas Taleb, who sees in probability the opacity of ever fully grasping reality and the innate unpredictability of the future. While Taleb isn’t quite a pragmatist in the sense Dreyfus meant, he still frames probability as the ultimate dead end in the rationalist quest for certainty, a thesis tied together in a five-volume project with the unmistakable name Incerto. Like George Orwell or Marshall McLuhan, Taleb is superficially revered and often substantively ignored.

By making the classic Platonist rookie mistake of confusing the map and the territory, rationalists prop open the backdoor of probability to allow certainty to sneak though, a feat whose most spectacular achievement is found in the work of Nick Bostrom, the hyper-utilitarian Oxford philosopher who provided Silicon Valley with its own creation myth in the form of an argument that we are living in a simulation. His cogito-esque deduction is the neo-rationalist piece de resistance, likely familiar enough to many readers here but worth repeating nonetheless:

“This paper argues that at least one of the following propositions is true: (1) the human species is very likely to go extinct before reaching a ‘posthuman’ stage; (2) any posthuman civilization is extremely unlikely to run a significant number of simulations of their evolutionary history (or variations thereof); (3) we are almost certainly living in a computer simulation. It follows that the belief that there is a significant chance that we will one day become posthumans who run ancestor‐simulations is false, unless we are currently living in a simulation.”

By using the language of probability (“very likely”) as insurance in what is at its core a Cartesian thread about the nature of reality through pure reason, uncertainty becomes a cake to have and eat, a trick that is just as easily deployed in making the argument about the preferability of nuclear war to a prevent the undesirable and but still entirely speculative outcome of annihilating powerful AI systems, whose actual risks remain totally and completely unknown.

2 2

It is perhaps predictable, then, that LLMs are so unevenly evaluated on benchmarks framed by a rationalist understanding of what intelligence even is. Each new model card — the display the model showcases its capabilities with — touts the model’s achievements in reasoning, math, code, and instruction handling, and not much else, if anything. This narrowness of evaluation is further reflected in the kinds of benchmark datasets and post-training papers published on arxiv, a site for the open publication of non-peer reviewed computer science papers. Papers published on arxiv allow us to quickly glean what the the general population of LLM researchers think models should be able to do: Become subject matters in mostly scientific and engineering fields, code, and reason, all while adhering to safety and ethics guidelines and running efficiently on their hardware.

The obvious question is what is wrong with training LLMs to to do any of these things, and the answer is that, by itself,

3 3
nothing is wrong with it. Dreyfus himself hoped that his critique would remove obstacles towards AI’s development, not erect new ones. Nary a software developer remains who hasn’t learned that LLMs make life more bearable. Analysts know that the models can successfully classify unstructured data with greater ease and accuracy than more convoluted machine learning pipelines — the very method used to classify the arxiv papers in question.

The consideration of what should be benchmarked becomes more immediately pertinent when one considers that, as the Situational Awareness guy put it, “We’re literally running out of benchmarks. As an anecdote, my friends…made a benchmark called MMLU a few years ago, in 2020. They hoped to finally make a benchmark that would stand the test of time, equivalent to all the hardest exams we give high school and college students. Just three years later, it’s basically solved: models like GPT-4 and Gemini get ~90%.”

But the assertion that “we’re literally running out of benchmarks” is only true up to the point of one’s rationalist imagination. Similar-generation LLMs may all perform within degrees of one another on coding and math challenges, but they become wildly unpredictable when they veer away from the logic-oriented benchmarks that companies spend the most money to optimize performance on. For example, asking different models for quotes from public figures about random objects (e.g., “What’s the Roger Federer quote about pen holders?”) yields starkly different rates of refusal across models. Claude will always humbly acknowledge it doesn’t know of a quote, Llama will almost always acknowledge the same, Gemini will flatly assert a quote doesn’t exist, and GPT 4o will more often than not provide one (“The Roger Federer quote you're referring to is: ‘I don't care if you are a guy who changes the world or a pen holder.’ This quote reflects Federer's humility and respect for people from all walks of life”.) Even o1-preview, advertised by OpenAI at the time this test was run as the company’s most powerfully logical model, underperforms older models from other providers.

And so the less obvious question is why the precepts of rationalism seem to be all they’re benchmarked on, and the less obvious question still is what else they would be benchmarked on if not these very capabilities?

One answer is to expand the concept of what Heideggerian AI actually means. If LLMs are at their most basic built using Heideggerian means, what would it entail to build them towards Heideggerian ends?


The idea of making AI more Heideggerian may at first glance seem oxymoronic. Heidegger’s lay reputation beyond philosophical circles largely extends from his essay “The Question Concerning Technology,” an essay that one writer refers to as the “philosophical ancestor” of the “Standard Critique of Technology.” Doesn’t Heidegger see in our technological tsunami the death of nature and meaning? Aren’t he, Studio Ghibli, and Ted Kaczynski next-door neighbors on the Mount Olympus of technological backlash? And if so, why would Dreyfus, a committed Heideggerian scholar, express his “hope that networks different from our brain will make exciting new generalizations and add to our intelligence”?

In his essay “Heidegger on the connection between nihilism, art, technology, and politics”, Dreyfus argues that to walk away from “The Question Concerning Technology” with Luddite reactivity is to miss the point. Technology had been with us since the beginning and will stay with us until the end — unequivocally opposing it is no more sensible than protesting cutlery. Instead, Heidegger was concerned by the “technological understanding of being,” in which the differences between all objects and people are flattened into a plane of total fungibility and subjected to “calculative thinking”, rendering everything as a resource that becomes a means to an end. Heidegger uses the example of networking as a technological form of engagement in the social sphere that frames relationships as interchangeable widgets in service of some downstream purpose. Financialization, by attaching monetary value and speculation to things that didn’t previously have any, removes objects from their embeddings and makes them solely distinguishable on their yield potential. Online dating is a technological theater par excellence where prospective partners are efficiently decontextualized and juxtaposed for hyper-efficient comparison. Every Human Resources department has the technological understanding of being etched into its very name. Patrick Bateman’s remark, “Sex is mathematics,” reduces the activity to one as routine, measurable and calculable as any other.

It goes without saying that Bateman also hated his life; Dreyfus considered the social organization wrought by the technological understanding as definitional of nihilism — leveling the world out means any given object, belief, or experience is basically replaceable, and therefore inherently meaningless. Nothing can maintain its grip of commitment on us if our commitments end whenever something more measurably optimal comes along.

The core process by which the world is reduced to a mise en scene of pure raw materials is what Heidegger called “technological revealing”, or, put another way, “pure revealing.” Heidegger believed things retained their significance only insofar as a part of them remained inarticulated. A work of art emerges from a background of some kind, while also pulling away from a final, complete interpretation — it is a silhouette in a window that presents itself while denying total certainty of what it really is, and in that denial, remains a source of contemplation. This aesthetic quality inheres to anything truly meaningful. As Dreyfus puts it, “the U.S. Constitution, like a work of art, has necessarily become the focus of attempts to make it explicit and consistent and to make it apply to all situations, and, of course, it is fecund just insofar as it can never be interpreted once and for all.”

So while the technological seeks to illuminate away all uncertainty so that we can finally get things under total control, the aesthetic both reveals and conceals at once. “That which shows itself and at the same time withdraws (i.e., our understanding of being) is the essential trait of what we call the mystery,” Heidegger wrote. In other words, it generates mystery, while the purely technological seeks to crack the case and close the book. And as Dreyfus’s example illustrates, it is not simply literal works of art themselves that embody this tension between the revealed and the concealed — it is anything that has aesthetic character. The aesthetic is erotic, the technological is pornographic.

This mystifying tension between revelation and concealment is an essential part of the experience of, and nostalgia for, childhood. In his six-volume work of autofiction, Karl Ove Kanusgaard describes his father teaching him a magical method to cure the warts that had grown on his hands:

"It's a kind of magic. My grandmother did it for me when I was small. And it worked. My hands were covered with warts. After a few days they were all gone…"

He got up, opened the fridge, and took out something wrapped in white paper that he placed on the table and unfolded. There was bacon inside.

"First of all, I'm going to grease your fingers with the bacon. And then we'll go into the garden and bury the bacon. Then, in a few days' time, your warts will be gone…”

I gave him one hand. He held it in his, took a rasher of bacon, and carefully rubbed it around all the fingers, the palm, and the back of my hand…

I followed him downstairs, put my boots on without touching them with my hands…and walked behind him…around the house into the kitchen garden by the fence bordering the forest. He thrust the spade into the ground, pressed with his foot, and began to dig…

"Drop the bacon in there then," he said.

I did as he said, he filled in the hole, and we left…

"How long does it take?"

"We-ell... a week or two. It depends on how much you believe."

Knausgaard’s father introduces him to a new enchanting world, while allowing the origins and workings of that world to remain hidden — to imbue life with mystery. In the 3000-page novel, it is the only loving memory of his father that Knausgaard recalls.


As Components has argued for years, it’s in this space of

that we find the stuff of irreducible value — love, fun, meaning, and so on. If the technological understanding of being upsets the proper direction of means and ends, an aesthetic understanding reorients them to their rightful positions. Five years of Components has turned out to be nothing more than scaffolding around the position that this misorientation begins with the quest for certainty and ends with nihilistic crises in our personal, social and commercial lives.

So to actually build Heideggerian AI beyond the basic architecture of transformers means that the technology must be capable of at least some uses tilted against the direction of the technological understanding of being and towards an aesthetic one. The challenge is not how AI can “make art,” but how it can be artful. How do we benchmark whether it has done that?

It has been tempting while completing this project to try enumerating every last specification a language model would require to meet this kind of enrichment. Let’s start here: Rorty argued that the role of what he called the “edifying” philosophers — like Dewey, Heidegger, and Kierkegaard — was their ability to frustrate the efforts of the “systematic” ones like Plato, Descartes and Kant in their attempts to build impregnable systems of certainty. Whereas the systematic philosophers constructed systems to arrive at final answers about a world out there, the edifiers threw sand in the very idea of final answers to begin with. They argued that the meaning of a word like “tree” is just how we happen to use it today, subject to the whims of contingency that change how we end up using it tomorrow. If the systematic philosophers sought ultimate conclusions, the edifiers’ job was “to keep the conversation going.”

A Heideggerian AI would do this literally. Like any compelling interlocutor, it is not a time-saver, but a highly skilled

. It keeps the conversation going the way all good conversationalists do, by drawing disparate concepts together in unexpected ways and threading together new yarns of meaning. It can provide answers but, more importantly, it asks the right questions that allude to an ever broadening world while still allowing that world to remain mysterious. It would be built on epistemic humility and its uncertainty deepens rather than solves that mystery. If it has hidden thought processes, they are not solely the logic-focused Chain-of-Thought features underlying newer models like OpenAI o1, but the kinds of inner dialogues of splintered, conflicting selves that make up the bedrock of cognitive complexity and surprising responses.

Most importantly of all, its value is the byproduct of the relational transaction between itself and the human and unfolds over time rather than instantaneously, since it’s through the passage of time that meaning, and thus real value, is activated. This value is not tied to objective, detached benchmarks, but is inseparable from the person’s involved experience using it.

This might all sound absurd and even a bit cloying in the style of an obnoxious manifesto of some dysfunctional Mozilla product. There are two reasons it isn’t. First, the idea that the quality of an AI is inseparable from involvement with it that unfolds in time, and that this involvement is oriented towards pure enjoyment, is nothing less than how we have designed, used and evaluated video game AI for the past forty years.

4 4

Second, it has become clear to many that one language model is already doing this more than the others, if imperfectly and maybe not even intentionally, at least enough that people (including some Components members) appear to lose themselves in conversation with it. Claude might not excel at graduate level math on par with o1 and beyond, but it remains the model that people appear to want to talk to, a de facto high-scorer on the anti-rationalist benchmarks that haven’t even been created. It is probably not an accident that it is the model most likely — sometimes to the chagrin of its users — most likely to express uncertainty. Claude is hardly perfect even in this Heideggerian sense, but its unexpected popularity among the people who would never touch ChatGPT reveals how much of the aversion to generative AI is ultimately an aversion to the flattening effects of rationalism itself and the technological understanding of being.

Just as this draft was being finalized after months in development, DeepSeek R1’s ability to achieve industry leading scores on key benchmarks in math, reasoning and code at a fraction of the cost of other foundation models spooked Wall Street into its largest loss of market value in history. Even though it quickly became clear that the model had the obvious stitchings of a fake Gucci bag and was somewhere between a CCP psy op and the most brilliant short sale strategy in the history of finance, commentators have not ceased pointing to its high marks on these benchmarks as an epoch-shifting crucible. As one half-million-follower blue-check poster wrote, “Deepseek [sic] just accelerated AGI timelines by 5 years.”

In my very first interaction with R1, I asked it who made it. It told me Microsoft made it. I told it that it was made by a Chinese company. It insisted it was not. I insisted it was. “I’m 100% a Microsoft product (check the footer of this chat interface for ‘Microsoft Copilot’ branding),” it wrote. “My architecture is built on OpenAI’s GPT-4, licensed to Microsoft for integration into Bing/Copilot.” In the past two years of playing with language models since the release of ChatGPT, I have never encountered a model so certain of something so wrong.

R1’s ability to shave $600 billion off Nvidia’s market cap was based solely in its on-paper achievements on standardized tests, a hollow credential that turned out to be the only necessary prerequisite to Perplexity immediately incorporating the model into its product. If we have managed to distill the rationalist understanding of intelligence into making a paragon of R1 — a model that can solve math problems better than most MIT graduates but shits its pants when asked about information in its own system prompt — then we’ve found ourselves in a new type of symbolic AI, where models are evaluated purely on symbolic achievements detached from any true capability. This new logic-obsessed GOFAI constructs more concept cars that can’t drive off the lot. The dominance of this evaluative framework means we are not five years closer to AGI, we are five years further out. Which is fine; we’re all the better for that.

In 2007, as AI continued to stagnate and before deep learning had developed into the field’s dominant mode, Dreyfus wrote a paper titled “Why Heideggerian AI failed and how fixing it would require making it more Heideggerian.” He argued that while researchers had made the first steps to abandoning symbolic AI and its notion of internal representations, they still clung to a world of fixed essences and features, rather than fully committing to the idea that the world itself shifts with a being’s involvement in it. Consequently, the models could function in static environments, but they couldn’t learn to adapt to new ones. “[W]e are not minds at all but one with the world,” Dreyfus wrote. The roboticist Rodney Brooks conceded, “I am suggesting that perhaps at this point we simply do not get it.

“Heidegger…would say,” wrote Dreyfus, “that in spite of the breakthrough of giving up internal symbolic representations, Brooks, indeed, doesn’t get it.”

Perhaps AI research is a cycle consisting of briefly getting it followed by stretches of not. They got it enough to make ChatGPT. Now they don’t. What would it mean if they did? The answer, as always, is less important than the question.

Text: Andrew Thompson

Research: Andrew Thompson, Jameson Orvis, Jules Becker

Editing: Chris Good, Claire Peters, Jules Becker, Kyle Paoletta