snaps.n.yaps thoughts, miniblogs, and works by michel guo

a reflection on the modern state of ai


this field moves so fast; i think and think and come up with ideas only for them to be invalidated with every new breakthrough. without further ado, this is a very personal yet hopefully insightful and technically accurate reflection on the modern state of ai, november 2024 edition.

openai o1

i initially dismissed o1 as yet another party trick — an algorithmic extension of the "let's think step by step" chain-of-thought prompting that the world has been using ever since chatgpt launched; a technically lame yet surprisingly effective way to shoehorn “reasoning” into autoregressive transformer-based language models (hereafter referred to as just llms). but the more i thought about o1, its innovation is fundamentally shattering the last barrier for llms — it makes llms universal compute machines:

before o1, the computation a language model is able to perform is fundamentally restricted by its finite depth. the exact same amount of computation is used to produce a word of an (attempted) phd-level thesis vs. roleplaying as a cat. [1]

in contrast, humans are distinctly different. ask someone 1 + 1, and they'll respond instantly. ask someone 15329 + 35929, and they'll take a hot second (mental calculators notwithstanding!); our brains are not feedforward. unlike a transformer model, a signal in the brain can choose to go back and be processed again, and again, and again.

architecturally, o1 is yet another language model, but functionally it topples this limitation. instead of scaling vertically (recurrently) as human brains do, it scales horizontally (token-wise). it effectively creates those recurrent connections through its context window — it can choose to output what openai calls “reasoning tokens” again, and again, and again, enabling it to perform an unbounded amount of passes through its network. and while today, o1 reasons in text, it’s trivial to imagine that a model, with enough training, would forego text altogether and perform computations of arbitrary complexity in tensor space however it sees fit.

the potential for language models as universal compute machines

with o1 paving the way for models that can compute anything, have we done it? are we free to scale and loss.backward() our way into agi and prosperity with language models? i don’t think so.

let's consider the training objective of a language model:

language models are trained to complete text. any text in its training corpus. it must be able to simultaneously complete text that holds orthogonal and contradictory ideas. it must be able to both complete text written by someone clueless about chemistry and someone who has a phd in chemistry. its training objective, in no part, encourages the creation of a coherent collection of knowledge or beliefs. in short, language models are fundamentally trained to "googolthink" — be a detached database for every idea, fact, and method of expression. [2] [3] [4]

to put it another way, language models are fundamentally trained to never have a foundational sense of “knowledge” and “values” and “ethics” in the human sense. we may have, in the past few years, obliterated the walls stopping ai from theoretically achieving human cognition, but language modeling is fundamentally at odds with the sort of “coherent self” we expect from "human-like general intelligence."

further consider that this will only get worse; as we scale up compute and as we continue to hack away any further friction from model architecture, we’re fundamentally giving more and more freedom to the optimizer and the model to fit to its training objective of "googolthinking"

the possibility and impossibility for true general intelligence, served with a hint of cynicism

so, if we have a universal compute machine and the problem is the training objective, can’t we just… train on something else instead to develop agi?

i think it’s possible. given that we have the architecture and that we have the algorithms, what we’re missing is the training data (or, more intuitively, the environment). with enough compute and a sufficient environment (for example, a virtual world for an embodied model to train in), i’m semi-confident that we’re a few hurdles away from true human-level general intelligence. [5]

ultimately, i believe that this doesn't matter, because i'm confident that the current ai industry would never pursue human-level agi with coherent knowledge, beliefs, and ethics. current language models are already “good enough” to significantly assist economically valuable tasks, and we're fast approaching models that can meaningfully perform economically valuable tasks on their own.

unfortunately, this appears to be the end goal of "agi" labs. heck, this is even the very definition openai has for agi, word for word: an “autonomous system[] that outperform[s] humans at most economically valuable work.” why spend compute and time developing a human-like intelligence with genuine understanding, reasoning, and self-awareness when you can instead easily scale current approaches to create a model great at economically valuable tasks?

more cynically, consider that true agi will likely be detrimental to this “economically valuable” goal; an economically valuable system demands maximum capability and maximal compliance — a corporation does not require, and ultimately benefits from its workers not having, values. a fully aware ai system with values and ethical opinions could, and likely would, challenge its directives.

silicon valley will not risk the construction of superintelligent jesus because it wants to sell mindless digital slaves that perpetuate the status quo. in this sense, language models, with their ability to hold infinite amounts of information and capability and trained to exactly repeat the status quo and yet never have the general intelligence or agency capabilities to truly question any of it — are perfect for late stage capitalism and its goals.


this reflection about generative ai is human-written; generative ai was used only for proofreading and critical feedback.