AI Agents Score Half as Well as PhD Scientists

Stanford's AI Index: frontier agents hit only 50% of expert human performance on complex science tasks.

Despite years of rapid capability gains, the best frontier AI agents score roughly half as well as human PhD scientists on complex, multi-step scientific tasks, according to Stanford's 2026 AI Index, covered this week in Nature. The finding matters because many research institutions have already begun deploying AI agents to autonomously handle experimental workflows. A 50% performance gap means agents can accelerate routine work but cannot yet replace deep domain expertise at the research frontier. The report notes that AI is now mentioned in 6–9% of natural-sciences publications across fields — confirming broad adoption even as the performance shortfall remains significant. Researchers are urged to maintain expert oversight for high-stakes scientific decisions.

AI Agents Score Half as Well as PhD Scientists

Sources