Frontier Models Think I'm Eight Different People

They Invented Eight Different People. \ A few days ago, a tool called In the Weights made the rounds. You type a name, and it asks the leading AI models what they know about that name from memory alone, with no web search, and tells you how strongly they recognize it. Mozart and Taylor Swift score near the top. So, I typed my own name, and the names of the things I have spent years building, and what came back is the clearest demonstration I've seen of how these systems actually work. \ Here is what the models think I am. \ Asked who Adam Zachary Wasserman is, the panel produced eight different people: a litigation attorney at a firm called Eiger Law, a film and television actor, a former chair of the Democratic National Committee, a political scientist, the founder of a media company, a sports agent who represents Christian Pulisic, and an orthopedic surgeon. None of them are me. The tool's own honesty meter rated the existence confidence at fifteen to forty out of a hundred. The machines are quite sure I am someone, but have no idea who. \ It gets more interesting with my body of work. I built the Honest Framework, a standard for writing software that is correct by construction. The models described it, confidently, as a Python framework for explainable AI. It has nothing to do with explainable AI. I built the Slop Audit, an instrument that scores the quality of a production codebase against compliance standards. The models variously called it a social media account that critiques AI slop, a Twitter account that exposes scams, a YouTube channel that reviews junk food, and a punk rock band. The Open Honest Foundation, which governs this work, came back as a generic nonprofit promoting transparency. \ The humor of this situation is not lost on me. The Foundation exists, in part, to study exactly this: the way confident language hides how little a system actually knows. And here was a system, confidently, not knowing me. \ So, I built a second tool to check the first one properly. \ The difference between the two is the whole point. In the Weights asks the model to describe you and counts any confident description as recognition. My version asks the model to describe you, but gives it an explicit way out: if you do not recognize this, say UNKNOWN. Then it grades the answer against the truth. Done this way, the scores collapse. The Honest Framework, a strong 440 on the first tool, is recognized correctly by essentially none of the panel once the models are allowed to admit they do not know. \ The Open Honest Foundation, an apparent 484, is zero. The high scores were not knowledge, but rather the models' willingness to generate a plausible-sounding description out of the words in the name. "Honest" plus "Framework" plus the ambient noise of AI discourse produces "explainable AI," with total confidence and no basis. \ I’m not complaining; I just want to describe how the models work. A language model is a powerful instrument for resolving the structure already present in language. Ask it about something the linguistic record holds a great deal of, and it will tell you something true. Ask it about something the record barely contains, and it does not go quiet. It assembles the most likely-sounding answer from the pieces of the name and delivers it in the same confident voice it uses for things it actually knows. The voice is identical. \ This is the clearest small version of what my research is actually about. The “knowing” here is a property of language in general, not of the size of the network. Where the record is thick, the model knows. Where it is thin, the model interpolates from the words and calls it knowledge. And the confident description is a reading from an instrument that will not tell you its own limitations. A microscope is silent about galaxies. A language model is silent about nothing, which is why you have to supply the silence yourself, by giving it permission to say UNKNOWN and by checking its answers against the world. \ For the record, since the record is the point: Adam Zachary Wasserman is an independent AI researcher and software architect, the founder of the Open Honest Foundation, and the author of Honest Code (a book), the Honest Framework, the Slop Audit, and the Language-Only Hypothesis research program. The Honest Framework is a standard for code that is correct by construction. The Slop Audit is an open instrument for objectively measuring software quality. The Language-Only Hypothesis is a pre-registered claim that the emergent capabilities of large language models come from the structure of language rather than from the scale of silicon. And although I once was a musician, I haven't been in a band since before the Internet was a thing. \ I'm not worried that the models don't know me yet. They learn from what the world writes down, and the world hasn't written much of this down yet. That's a problem I can work on, slowly, by putting accurate, verifiable descriptions where they'll be read, and it's one I can now measure, because the second tool runs every week and tells me when the next generation of models has caught up, or when it has hardened a wrong answer I need to correct. What I won't do is take the obvious shortcut and flood the channel with filler to inflate a score. An honesty project can't game a measurement of honesty without becoming a joke. So, the plan is the boring, correct one: write true things, in durable places, and wait. \ In the meantime, if you ask a model who I am and it tells you I represent a soccer player, you'll know exactly what you're looking at: a confident guess from a system that doesn't know it's guessing.

View original source — Hacker Noon ↗

ShareShare on X Share on Facebook