Activation patching reveals how facts are stored, routed, and read out across transformer layers, and why the residual stream does most of the work
The post A Three-Phase Factual Recall Circuit in Gemma-2B and Gemma-12B-IT appeared first on Towards Data Science.
