A complete end-to-end pipeline for LLM interpretability with sparse autoencoders (SAEs) using Llama 3.2, written in pure PyTorch and fully reproducible.
pytorch feature-extraction open-research sparse-autoencoder llama3 llm-interpretability feature-steering
-
Updated
Mar 23, 2025 - Python