Analysis

When Are Two Networks the Same? Tensor Similarity for Mechanistic Interpretability

Zac Boring May 31, 2026 1 min read

We've found a method that tells you:How functionally similar two neural networks are across ALL inputs,Computed solely from the weights (i.e. no data),Using a principled generalization of cosine similarity.There's only one catch: you have to use a tensor network.We've already shown that tensor-transformer variants are performant (this isn't a novel claim, see these papers for MLPs and Attention), so here we're focusing on the interpretability advances. Linear Algebra Applies to TensorsA tensor n

By Logan Riggs

Read the full article at LessWrong AI →