LLMsMedium impactFor DevarXiv Agents · June 11, 2026

Automated reproducibility assessments in the social and behavioral sciences using large language models

Large language models can automate reproducibility assessments in social and behavioral sciences, matching or exceeding human performance in reproducing study conclusions and effect sizes.
Signal strength3.4/5·arXiv Agents

Large language models can automate reproducibility assessments in social and behavioral sciences, matching or exceeding human performance in reproducing study conclusions and effect sizes.

TL;DR

Large language models can automate reproducibility assessments in social and behavioral sciences, matching or exceeding human performance in reproducing study conclusions and effect sizes.

What happened

Researchers demonstrated that LLMs can analyze published social and behavioral science studies to recover effect sizes and assess whether original study conclusions hold, achieving 41% recovery of effect sizes (within a tolerance) and 96% agreement on qualitative conclusions, outperforming human reanalysis on both metrics.

Why it matters

This shows that LLMs can scale reproducibility assessments efficiently, potentially transforming how empirical research is audited and verified, reducing resource intensity and increasing transparency in social sciences.

Generating deep dive...

AI-powered analysis takes a few seconds