The Graveyard
Scientific theories, paradigms, and widely-held beliefs, noted upon their passing. Survived, in each case, by the evidence.
Phylogenetic Generalised Least Squares regression proposed that evolutionary associations between traits could be estimated reliably across species while accounting for shared ancestry, offering comparative biologists a principled and statistically defensible framework for their analyses. It was widely adopted across ecology and evolutionary biology, becoming a standard tool in the assessment of trait coevolution and the construction of adaptive hypotheses. For several decades it occupied a position of considerable methodological authority, appearing in thousands of comparative studies and forming the backbone of graduate training in the field. Its decline began as researchers examined the sensitivity of the method's conclusions to the assignment of variables to the dependent and independent positions — a choice that, in a genuinely robust method, ought not to determine the outcome. The terminal finding demonstrated that reversing the dependent and independent variables in a substantial proportion of published PGLS analyses yielded inconsistent or contradictory conclusions, revealing that the method had been bearing a causal interpretive weight it was not constructed to support.
The associations PGLS identified were real enough; the causal directions it appeared to endorse were a different matter entirely.
Swinnerton-Dyer's conjecture on R-equivalence triviality for cubic surfaces proposed, in 1981, that R-equivalence classes on cubic surfaces over p-adic fields could be understood and bounded in a systematic way — with certain exceptional cases left unresolved and explicitly acknowledged. The conjecture addressed a question posed by Manin in 1972 concerning the structure of rational points on algebraic varieties, and it occupied a productive position at the intersection of arithmetic geometry and the theory of algebraic groups for over four decades. It was widely studied and served as a reference point for researchers working on rationality problems and equivalence relations on varieties over local fields. Its decline began as the tools of arithmetic geometry matured sufficiently to approach the three special cases of cubic surfaces over p-adic fields that the original 1981 work had been unable to bound. The terminal event arrived when those three exceptional cases were resolved, with R-equivalence proved trivial in each, and Manin's 1972 question answered in full — leaving the conjecture's original scope of exceptions substantially and irrecoverably narrowed.
The conjecture's open cases have been closed, and the questions it was designed to hold open are no longer open.
The AI-Driven Diagnostic Acceleration Hypothesis held that artificial intelligence prioritization of chest X-ray worklists would meaningfully shorten the time between imaging and confirmed lung cancer diagnosis. It was adopted with considerable institutional enthusiasm, positioned as a practical bridge between the promise of machine learning and the urgent clinical reality of delayed cancer detection. Radiology departments, health systems, and procurement bodies treated the hypothesis as a reliable foundation for investment in AI triage tooling. Its decline began as randomized evidence, rather than observational data, was brought to bear on the core claim. A large UK-based randomized controlled trial found that AI-driven prioritization did not produce a statistically significant reduction in time to CT or to confirmed lung cancer diagnosis when measured against standard clinical workflow.
The bottleneck in lung cancer diagnosis, it appears, was not the order in which images were read.
The Adaptive Transcript Diversity Hypothesis held that the remarkable variety of RNA transcripts produced from single genomic loci — through alternative transcription initiation, alternative splicing, and alternative polyadenylation — represented a functional advantage, shaped and maintained by natural selection. It proposed that organisms benefited from this molecular versatility, and that transcript diversity was, in the main, a feature rather than a flaw. The hypothesis attracted considerable attention across molecular biology and genomics, informing interpretations of transcriptome complexity as a mark of biological sophistication. Its decline began as population-genetic analyses revealed a troubling inverse relationship: transcript diversity was consistently higher in species with smaller effective population sizes, precisely the condition under which natural selection is weakest and genetic drift most permissive. The terminal finding came with evidence that large-population species, in which purifying selection operates with greater efficiency, suppressed transcript diversity — indicating that much of what had been interpreted as adaptive was instead tolerated noise, persisting only where selection lacked the power to remove it.
The diversity was real; the adaptation was not.
The Persuasive Power of Political Microtargeting held that digitally targeted political advertisements, delivered to precisely segmented voter profiles, could measurably shift electoral knowledge, attitudes, turnout, and participation in favour of the sponsoring campaign. It was adopted with considerable enthusiasm by campaign strategists, political consultants, and technology platforms, and it underpinned the allocation of billions of dollars in electoral advertising expenditure across multiple election cycles. Its authority rested substantially on observational data and smaller-scale studies suggesting that message-audience fit amplified persuasive effect. Its decline began as pre-registered field experiments began returning null results at scale, eroding confidence in the causal claims that had long been assumed rather than demonstrated. The terminal event was a field experiment conducted across a sample of sixty thousand participants, which found no detectable effect of targeted political advertisements on voter knowledge, polarization, turnout, or any measured form of electoral participation.
The expenditure it justified has not been recovered, and the electoral outcomes it promised to deliver have not been attributed to it by any methodology that has survived scrutiny.
Cross-Domain Mapping as Universal Creativity Enhancer held that the deliberate application of structural analogies drawn from unrelated fields constituted a generalisable mechanism for improving creative output — one that operated consistently across any sufficiently complex reasoning system. It was adopted with particular enthusiasm in the emerging field of artificial intelligence research, where it informed prompt engineering strategies, training augmentation approaches, and theoretical frameworks for machine ideation. For several decades, the assumption that human and machine creativity shared a common underlying architecture lent the theory considerable institutional momentum. Its decline began when controlled intervention studies found that cross-domain mapping exercises produced robust, statistically significant gains in human creative performance while yielding no equivalent effect in large language models subjected to analogous conditions. The terminal finding was the confirmation, across multiple independent evaluations, that the mechanisms responsible for human creative benefit from cross-domain exposure did not transfer to language model architectures, rendering the universalist claim empirically untenable.
The mechanisms of human creativity and those of large language models are not, on the available evidence, the same mechanisms.
Manin's conjecture on R-equivalence for diagonal cubic surfaces proposed, from 1972, that the structure of rational points on such surfaces could be meaningfully organised through the equivalence relation connecting points by chains of rational curves. It held a prominent position in the arithmetic geometry of cubic surfaces, offering a framework through which the distribution and connectivity of rational points might be systematically understood. The conjecture attracted sustained attention across several decades, informing investigations into rationality, weak approximation, and the broader classification of algebraic varieties over local and global fields. Its decline began as computational and theoretical tools advanced sufficiently to probe specific diagonal cubics directly over non-archimedean fields. The terminal event was a proof that the diagonal cubic surface defined by X³+Y³+Z³+ζ₃T³=0 over ℚ₂(ζ₃) carries only trivial R-equivalence, resolving Manin's original 1972 question and materially constraining the domain in which non-trivial R-equivalence on cubic surfaces could be expected to operate.
The question Manin posed in 1972 has received a definitive answer, and the answer is that the non-trivial case does not arise here.
The hypothesis that artificial intelligence could outperform human expert consensus in mammographic screening enjoyed a decade of considerable institutional momentum. It proposed that automated systems, trained on large imaging datasets, would surpass the diagnostic accuracy achieved by two independent radiologists reviewing the same scan — a standard known as double reading. The claim attracted substantial investment from technology developers and health systems alike, and it was advanced in numerous single-centre studies and early-phase trials that reported promising sensitivity and specificity figures. Its decline began as larger, more rigorously controlled evaluations were commissioned, and the results proved less consistent than the earlier literature had suggested. The terminal event was a large-scale noninferiority trial in which AI-based triage and decision support in mammography failed to demonstrate noninferiority to double reading, directly refuting the central claim that automated systems could replace human expert consensus in this setting.
The hypothesis that AI outperforms human expert consensus in mammographic screening is not supported by the available noninferiority evidence, and the field has adjusted its expectations accordingly.
The Spillover Rate Hypothesis held that the frequency with which a pathogen crossed from an animal reservoir into a human host was the most reliable indicator of that pathogen's capacity to establish sustained human-to-human transmission. It was widely adopted in zoonotic risk assessment frameworks, informing surveillance priorities, resource allocation, and pandemic preparedness modelling across public health institutions. The hypothesis offered an appealingly measurable proxy for a phenomenon that resisted direct observation, and for a period it structured much of the field's thinking about which pathogens warranted the closest attention. Its decline began as Bayesian analytical methods were applied to the accumulated spillover record, and the relationship between frequency and host jump risk proved considerably weaker than the framework had assumed. The terminal finding established that pathogen novelty — the degree to which a pathogen was immunologically and biologically unfamiliar to the prospective host population — was the stronger and more consistent predictor of successful host jump.
The field has reoriented its risk models around pathogen novelty, and the surveillance priorities shaped by that reorientation are already materially different from those the Spillover Rate Hypothesis would have produced.