Blog

  • This is a Mechanistic Interpretability Technical Portfolio Project and not peer-reviewed. TL;DR Names can bias how language models process information but we don’t fully understand the mechanisms behind this. This work investigates how a recognized named entity, ‘Mary Mallon’ (aka Typhoid Mary), influences a language model’s (Llama 3.1 8B) handling of in-context evidence in a…