A Rigorous Behavior Assessment of CNNs Using a Data-Domain Sampling Regime
Shuning Jiang, Wei-Lun Chao, Daniel Haehn, Hanspeter Pfister, Jian Chen
Stop assuming your vision model generalizes to new chart styles. Test it explicitly across distribution shifts before deployment. If you're building chart-reading AI for dashboards, prioritize training data diversity over model complexity—distribution coverage matters more than architecture depth.
CNNs are increasingly used to extract insights from data visualizations, but nobody knows if they actually perceive charts like humans do—or if they're just pattern-matching noise.
Method: The authors trained 800 CNN models on bar chart ratio estimation tasks and compared them against 113 human participants across 16 million trials. They tested three dimensions: sensitivity to training-test distribution shifts, stability with limited samples, and relative expertise versus humans. The finding: CNNs outperform humans in ratio estimation accuracy when trained on diverse bar chart distributions, but collapse when test data diverges from training distributions—humans maintain consistent performance across distribution shifts.
Caveats: Study focuses solely on bar chart ratio estimation. Doesn't address other chart types or visual reasoning tasks like trend detection.
Reflections: Do CNNs trained on diverse chart distributions develop human-like perceptual invariances, or do they rely on fundamentally different visual features? · Can we design training regimes that make CNNs robust to distribution shifts without sacrificing accuracy on in-distribution data? · What's the minimum training diversity needed for CNNs to match human-level generalization across chart styles?