On Evaluating Cognitive Capabilities in Machines (and Other "Alien" Intelligences)
Introduction
Reproduced with permission of author.
Mitchell argues that evaluating AI cognitive capabilities requires borrowing rigorous experimental methods from developmental and comparative psychology, rather than relying on benchmark scores that can mask shallow pattern-matching. She proposes six guiding principles—including skepticism about surface performance, testing for robustness across contexts, and embracing negative results—to assess whether AI systems possess genuine understanding or are exploiting statistical shortcuts in their training data.