Fluidr / twoeightnine's contacts' photos and videos

“Every neural network we looked at, we would find a dedicated neuron for Donald Trump. That was the only person who had always had a dedicated neuron.” — Chris Olah, Anthropic’s head of mechanistic interpretability (trying to make sense of these neural nets after they have been trained), from the Lex Fridman podcast

This is a purely emergent phenomenon, not designed in, and it’s part of a broader resonant homology across neural networks, biological and artificial.

Chris: “This, actually, is indeed a really remarkable and exciting thing, where the same elements, the same features and circuits, form again and again. You can look at every vision model, and you’ll find curve detectors, and you’ll find high-low-frequency detectors. And in fact, there’s some reason to think that the same things form across biological neural networks and artificial neural networks. So, a famous example is vision models in the early layers. They have Gabor [edge-detecting] filters, and Gabor filters are something that neuroscientists are interested in and have thought a lot about. We find curve detectors in these models. Curve detectors are also found in monkeys. We discover these high-low-frequency detectors, and then some follow-up work went and discovered them in rats or mice. So, they were found first in artificial neural networks and then found in biological neural networks.” — from the Lex Fridman pod, and it’s quite interesting from this point onward

This field of study has fascinated me from my first exposure to neural networks in 1989 (when I started a PhD in EE to study them). How fascinating that artificial neural nets recapitulate some of the developmental processes and resulting structures seen in our sensory cortex!

But the biological analogy also carries over to the problem of interpretability. The complex artifacts created by an iterative algorithm — whether brain or LLM — are inherently inscrutable. I first wrote about this in the MIT Tech Review in 2006, concluding: “If we artificially evolve a smart AI, it will be an alien intelligence defined by its sensory interfaces, and understanding its inner workings may require as much effort as we are now expending to explain the human brain.”

So, I respect the difficulty of Mechinterp, and the appeal, unweaving the beauty of transcendence.

Chris concludes: “Biology has these simple rules, and it gives rise to all the life and ecosystems that we see around us. All the beauty of nature, that all just comes from evolution and from something very simple in evolution. And similarly, I think that neural networks build, create enormous complexity and beauty inside and structure inside themselves that people generally don’t look at and don’t try to understand because it’s hard to understand. But I think that there is an incredibly rich structure to be discovered inside neural networks, a lot of very deep beauty if we’re just willing to take the time to go and see it and understand it.”

Tags: Donald Trump Neuron Mechinterp Mechanistic Interpretability Chris Olah Anthropic