What happens when you stop concatenating and start decomposing: a new way to think about attention.
The post Mechanistic View of Transformers: Patterns, Messages, Residual Stream… and LSTMs appeared first on Towards Data Science.
What happens when you stop concatenating and start decomposing: a new way to think about attention.
The post Mechanistic View of Transformers: Patterns, Messages, Residual Stream… and LSTMs appeared first on Towards Data Science.