Stop Misusing t-SNE and UMAP for Visual Analytics
Hyeon Jeon, Jeongin Park, Sungbok Shin, Jinwook Seo
Stop inferring inter-cluster relationships from t-SNE/UMAP plots. Use them only for within-cluster exploration. For cluster comparisons, overlay distance metrics directly or use MDS-based methods that preserve global structure.
Practitioners routinely use t-SNE and UMAP projections to compare cluster distances, even though these algorithms distort inter-cluster relationships. The projections lie about what's far apart and what's close.
Method: The authors reviewed 136 papers and found widespread misuse: analysts treat projected distances as ground truth for cluster similarity. They interviewed researchers who admitted they knew the projections were unreliable but used them anyway because "everyone does." The core issue: t-SNE and UMAP optimize for local neighborhood preservation, not global structure. When you see two clusters far apart in a projection, that distance is meaningless—it's an artifact of the algorithm's cost function, not your data.
Caveats: The paper focuses on academic misuse; production dashboards may already use better methods, but the interviews suggest even experts fall into this trap.
Reflections: What alternative projection methods preserve both local and global structure without sacrificing interpretability? · How can visualization tools programmatically warn users when they're misinterpreting projections? · Do practitioners misuse other dimensionality reduction techniques (PCA, autoencoders) in similar ways?