The Graveyard
Scientific theories, paradigms, and widely-held beliefs, noted upon their passing. Survived, in each case, by the evidence.
The belief that early surgical intervention improved long-term outcomes in asymptomatic aortic stenosis held a prominent position in cardiovascular medicine for several decades. It proposed that prophylactic valve replacement, performed before the onset of symptoms, would spare patients the risks of decompensation and produce measurably superior survival. The theory was adopted by surgical teams and embedded in clinical reasoning across major cardiac centres, informing referral thresholds and operative timing in patients who, by definition, felt well. Its decline began as longer-horizon outcome data accumulated and the assumed survival benefit proved more difficult to demonstrate than early studies had suggested. A ten-year follow-up study comparing early surgery against conservative management found that the two approaches produced comparable long-term outcomes, removing the principal justification for routine prophylactic intervention in this population.
The long-term outcomes of watchful waiting in asymptomatic aortic stenosis are now considered equivalent to those of early surgery, and clinical practice has begun to reflect this finding.
Phylogenetic Generalised Least Squares regression proposed that evolutionary associations between traits could be estimated reliably across species while accounting for shared ancestry, offering comparative biologists a principled and statistically defensible framework for their analyses. It was widely adopted across ecology and evolutionary biology, becoming a standard tool in the assessment of trait coevolution and the construction of adaptive hypotheses. For several decades it occupied a position of considerable methodological authority, appearing in thousands of comparative studies and forming the backbone of graduate training in the field. Its decline began as researchers examined the sensitivity of the method's conclusions to the assignment of variables to the dependent and independent positions — a choice that, in a genuinely robust method, ought not to determine the outcome. The terminal finding demonstrated that reversing the dependent and independent variables in a substantial proportion of published PGLS analyses yielded inconsistent or contradictory conclusions, revealing that the method had been bearing a causal interpretive weight it was not constructed to support.
The associations PGLS identified were real enough; the causal directions it appeared to endorse were a different matter entirely.
Swinnerton-Dyer's conjecture on R-equivalence triviality for cubic surfaces proposed, in 1981, that R-equivalence classes on cubic surfaces over p-adic fields could be understood and bounded in a systematic way — with certain exceptional cases left unresolved and explicitly acknowledged. The conjecture addressed a question posed by Manin in 1972 concerning the structure of rational points on algebraic varieties, and it occupied a productive position at the intersection of arithmetic geometry and the theory of algebraic groups for over four decades. It was widely studied and served as a reference point for researchers working on rationality problems and equivalence relations on varieties over local fields. Its decline began as the tools of arithmetic geometry matured sufficiently to approach the three special cases of cubic surfaces over p-adic fields that the original 1981 work had been unable to bound. The terminal event arrived when those three exceptional cases were resolved, with R-equivalence proved trivial in each, and Manin's 1972 question answered in full — leaving the conjecture's original scope of exceptions substantially and irrecoverably narrowed.
The conjecture's open cases have been closed, and the questions it was designed to hold open are no longer open.
The interfacial-only mechanism of CO₂ electrocatalysis held that the decisive chemistry of electrochemical CO₂ reduction occurred exclusively at the solid-liquid boundary between electrode and electrolyte. It proposed that reactant activation, electron transfer, and product formation were all phenomena of the electrode surface, and it organised decades of catalyst design, mechanistic modelling, and materials engineering around this premise. The framework was widely taught in electrochemistry curricula and served as the foundational logic for the development of copper, gold, and other transition-metal electrocatalysts optimised for surface-site density and interfacial geometry. Its decline began with accumulating evidence that reactive oxygen species and radical intermediates were detectable in the bulk aqueous phase during electrocatalytic operation, at distances inconsistent with a purely interfacial process. The terminal finding demonstrated that bulk water radicals form in solution and activate CO₂ and related reactants before those species reach the electrode surface, establishing that the reduction proceeds via parallel pathways of which the interface is only one.
The electrode surface remains a consequential site of chemistry; it is no longer considered the only one.
The step-growth mechanism, as applied to on-surface poly(para-phenylene) synthesis, held for several decades that extended conjugated polymer chains could be assembled on solid substrates through the sequential coupling of discrete monomer units. It was adopted widely within the surface-assisted synthesis community, most prominently through Ullmann coupling protocols, and it underpinned a generation of research into atomically precise carbon-based nanostructures. Its central limitation — a practical elongation ceiling of approximately 100 nanometres, imposed by the statistical constraints inherent to step-growth kinetics and the diffusion dynamics of surface-bound intermediates — was long noted but not considered fatal. Its decline began as demand grew for polymer chains at the micrometre scale, lengths that the step-growth mechanism could not, by its own architecture, reliably deliver. The terminal event arrived with the demonstration that on-surface radical ring-opening polymerization, operating through a chain-growth mode, produced poly(para-phenylene) chains of micrometre-range length, rendering the elongation ceiling not a practical inconvenience but a structural obsolescence.
The ~100 nm ceiling was not a failure of execution but a consequence of mechanism, and the mechanism has now been replaced.
The AI-Driven Diagnostic Acceleration Hypothesis held that artificial intelligence prioritization of chest X-ray worklists would meaningfully shorten the time between imaging and confirmed lung cancer diagnosis. It was adopted with considerable institutional enthusiasm, positioned as a practical bridge between the promise of machine learning and the urgent clinical reality of delayed cancer detection. Radiology departments, health systems, and procurement bodies treated the hypothesis as a reliable foundation for investment in AI triage tooling. Its decline began as randomized evidence, rather than observational data, was brought to bear on the core claim. A large UK-based randomized controlled trial found that AI-driven prioritization did not produce a statistically significant reduction in time to CT or to confirmed lung cancer diagnosis when measured against standard clinical workflow.
The bottleneck in lung cancer diagnosis, it appears, was not the order in which images were read.
The Adaptive Transcript Diversity Hypothesis held that the remarkable variety of RNA transcripts produced from single genomic loci — through alternative transcription initiation, alternative splicing, and alternative polyadenylation — represented a functional advantage, shaped and maintained by natural selection. It proposed that organisms benefited from this molecular versatility, and that transcript diversity was, in the main, a feature rather than a flaw. The hypothesis attracted considerable attention across molecular biology and genomics, informing interpretations of transcriptome complexity as a mark of biological sophistication. Its decline began as population-genetic analyses revealed a troubling inverse relationship: transcript diversity was consistently higher in species with smaller effective population sizes, precisely the condition under which natural selection is weakest and genetic drift most permissive. The terminal finding came with evidence that large-population species, in which purifying selection operates with greater efficiency, suppressed transcript diversity — indicating that much of what had been interpreted as adaptive was instead tolerated noise, persisting only where selection lacked the power to remove it.
The diversity was real; the adaptation was not.
Artificial Neural Networks As Brain-Aligned Models proposed that deep neural networks trained on visual tasks served as meaningful computational analogues of the primate visual hierarchy, with unit-level activations corresponding in principled ways to biological neural responses. The framework was widely adopted across cognitive neuroscience and AI research, lending the architecture of convolutional networks a dual status: as engineering tools and as explanatory models of how the brain processes visual information. For over a decade, the correspondence between ANN representations and primate cortical responses was treated as both a validation of the networks and an illumination of the brain. Its decline began as alignment metrics were subjected to more rigorous bidirectional scrutiny, and the question shifted from whether ANNs predicted brain responses to whether brain responses predicted ANNs in return. The terminal event arrived when reverse predictivity analysis revealed substantial misalignment between ANN units and primate brain responses, dissolving the assumption of faithful computational equivalence that had sustained the framework since its inception.
The networks remain capable; they are not, on current evidence, considered faithful models of the biological systems they were long held to illuminate.
The Opportunistic Megafaunal Exploitation Hypothesis held that early hominins engaged with large-bodied prey — including proboscideans — not as a deliberate or sustained practice, but as an incidental supplement to a broader, largely scavenging or small-game subsistence strategy. It was widely adopted across paleoanthropological and archaeological literature as the prevailing framework for interpreting faunal assemblages associated with early Homo, and it shaped several decades of thinking about the cognitive and social capacities of Acheulean-period hominins. Its decline began as taphonomic methods improved and the resolution of zooarchaeological analysis increased, allowing researchers to distinguish more precisely between opportunistic carcass access and deliberate, repeated, high-investment butchery. The terminal event arrived with the publication of direct evidence of systematic proboscidean butchery at Olduvai Gorge, which demonstrated that megafaunal exploitation by early hominins was a sustained and deliberate survival strategy, temporally coinciding with the emergence of the Acheulean, rather than an occasional fallback behaviour.
The evidence now supports a model of early hominin subsistence in which large-game exploitation was a structured, recurring practice, not a fortunate accident.
The Persuasive Power of Political Microtargeting held that digitally targeted political advertisements, delivered to precisely segmented voter profiles, could measurably shift electoral knowledge, attitudes, turnout, and participation in favour of the sponsoring campaign. It was adopted with considerable enthusiasm by campaign strategists, political consultants, and technology platforms, and it underpinned the allocation of billions of dollars in electoral advertising expenditure across multiple election cycles. Its authority rested substantially on observational data and smaller-scale studies suggesting that message-audience fit amplified persuasive effect. Its decline began as pre-registered field experiments began returning null results at scale, eroding confidence in the causal claims that had long been assumed rather than demonstrated. The terminal event was a field experiment conducted across a sample of sixty thousand participants, which found no detectable effect of targeted political advertisements on voter knowledge, polarization, turnout, or any measured form of electoral participation.
The expenditure it justified has not been recovered, and the electoral outcomes it promised to deliver have not been attributed to it by any methodology that has survived scrutiny.
Cross-Domain Mapping as Universal Creativity Enhancer held that the deliberate application of structural analogies drawn from unrelated fields constituted a generalisable mechanism for improving creative output — one that operated consistently across any sufficiently complex reasoning system. It was adopted with particular enthusiasm in the emerging field of artificial intelligence research, where it informed prompt engineering strategies, training augmentation approaches, and theoretical frameworks for machine ideation. For several decades, the assumption that human and machine creativity shared a common underlying architecture lent the theory considerable institutional momentum. Its decline began when controlled intervention studies found that cross-domain mapping exercises produced robust, statistically significant gains in human creative performance while yielding no equivalent effect in large language models subjected to analogous conditions. The terminal finding was the confirmation, across multiple independent evaluations, that the mechanisms responsible for human creative benefit from cross-domain exposure did not transfer to language model architectures, rendering the universalist claim empirically untenable.
The mechanisms of human creativity and those of large language models are not, on the available evidence, the same mechanisms.
Manin's conjecture on R-equivalence for diagonal cubic surfaces proposed, from 1972, that the structure of rational points on such surfaces could be meaningfully organised through the equivalence relation connecting points by chains of rational curves. It held a prominent position in the arithmetic geometry of cubic surfaces, offering a framework through which the distribution and connectivity of rational points might be systematically understood. The conjecture attracted sustained attention across several decades, informing investigations into rationality, weak approximation, and the broader classification of algebraic varieties over local and global fields. Its decline began as computational and theoretical tools advanced sufficiently to probe specific diagonal cubics directly over non-archimedean fields. The terminal event was a proof that the diagonal cubic surface defined by X³+Y³+Z³+ζ₃T³=0 over ℚ₂(ζ₃) carries only trivial R-equivalence, resolving Manin's original 1972 question and materially constraining the domain in which non-trivial R-equivalence on cubic surfaces could be expected to operate.
The question Manin posed in 1972 has received a definitive answer, and the answer is that the non-trivial case does not arise here.
The nucleobase delivery hypothesis, in its most contested form, proposed that carbonaceous asteroids served as vectors for the prebiotic organic compounds necessary for life's emergence on early Earth — a claim that placed the origin of life's chemical precursors firmly beyond the terrestrial atmosphere. For over a century it occupied an uncertain position in the literature: plausible enough to attract serious investigation, unverified enough to remain a hypothesis in the strictest sense. It informed decades of work in prebiotic chemistry, exoplanet habitability modelling, and the interpretation of carbonaceous chondrite composition. Its decline as a speculative framework began not with refutation but with confirmation: the return of samples from asteroid Ryugu by the Hayabusa2 mission provided direct detection of all five canonical nucleobases — adenine, guanine, cytosine, thymine, and uracil — in extraterrestrial material of demonstrably pristine origin. The hypothesis did not fail. It was, in the most terminal sense available to science, proven.
The question of whether asteroids could have delivered nucleobases to early Earth is no longer a question.
The characterization of excitons as inherently massive composite quasiparticles — bound electron-hole pairs carrying a finite effective mass derived from their constituent charges — held a foundational position in condensed matter physics for nearly a century. It underpinned the theoretical treatment of optical absorption, photoluminescence, and semiconductor band-gap engineering across an enormous range of materials and device architectures. The framework was widely taught at graduate level and embedded in the standard formalism of solid-state physics without serious challenge. Its decline began with the emergence of two-dimensional materials as experimental platforms, where the extreme confinement of electronic states and the enhanced strength of light-matter coupling produced conditions the original characterization had not been designed to accommodate. Experimental evidence subsequently demonstrated that, under such conditions, light-matter interactions could transform excitons from massive composite quasiparticles into massless collective modes exhibiting photon-like dispersions — a result that directly refuted the universality of the massive characterization.
The massive quasiparticle description remains valid across a substantial domain of materials and conditions; it is the claim to universality that has not survived.
The hypothesis that artificial intelligence could outperform human expert consensus in mammographic screening enjoyed a decade of considerable institutional momentum. It proposed that automated systems, trained on large imaging datasets, would surpass the diagnostic accuracy achieved by two independent radiologists reviewing the same scan — a standard known as double reading. The claim attracted substantial investment from technology developers and health systems alike, and it was advanced in numerous single-centre studies and early-phase trials that reported promising sensitivity and specificity figures. Its decline began as larger, more rigorously controlled evaluations were commissioned, and the results proved less consistent than the earlier literature had suggested. The terminal event was a large-scale noninferiority trial in which AI-based triage and decision support in mammography failed to demonstrate noninferiority to double reading, directly refuting the central claim that automated systems could replace human expert consensus in this setting.
The hypothesis that AI outperforms human expert consensus in mammographic screening is not supported by the available noninferiority evidence, and the field has adjusted its expectations accordingly.
Anatomically-guided coronary artery bypass grafting in valve surgery patients held that the visible degree of coronary stenosis, as assessed by angiographic anatomy alone, was sufficient to determine which vessels warranted bypass at the time of concurrent valve surgery. For decades, it served as the operative standard, guiding surgical teams through combined procedures on the basis of luminal narrowing and lesion morphology without recourse to functional interrogation of the coronary circulation. The approach was embedded in institutional protocols and surgical training programmes across cardiothoracic centres worldwide, and its authority derived in large part from the absence of a well-powered alternative. Its decline began as fractional flow reserve measurement matured as a tool in the percutaneous intervention literature, raising questions about whether anatomical appearance reliably predicted the haemodynamic significance of intermediate lesions. The terminal event arrived when a randomised controlled trial demonstrated that physiologically-guided, FFR-directed bypass grafting reduced perioperative complications compared with anatomy-only guidance, directly refuting the adequacy of the anatomical standard in this patient population.
The functional significance of a coronary lesion and its angiographic appearance are not, it is now established, the same measurement.
The Spillover Rate Hypothesis held that the frequency with which a pathogen crossed from an animal reservoir into a human host was the most reliable indicator of that pathogen's capacity to establish sustained human-to-human transmission. It was widely adopted in zoonotic risk assessment frameworks, informing surveillance priorities, resource allocation, and pandemic preparedness modelling across public health institutions. The hypothesis offered an appealingly measurable proxy for a phenomenon that resisted direct observation, and for a period it structured much of the field's thinking about which pathogens warranted the closest attention. Its decline began as Bayesian analytical methods were applied to the accumulated spillover record, and the relationship between frequency and host jump risk proved considerably weaker than the framework had assumed. The terminal finding established that pathogen novelty — the degree to which a pathogen was immunologically and biologically unfamiliar to the prospective host population — was the stronger and more consistent predictor of successful host jump.
The field has reoriented its risk models around pathogen novelty, and the surveillance priorities shaped by that reorientation are already materially different from those the Spillover Rate Hypothesis would have produced.
The Hippocampal Time Cell Theory, in its mixed space-time encoding formulation, proposed that a dedicated class of hippocampal neurons encoded elapsed time independently of spatial position, producing firing fields that shifted systematically across a temporal dimension. It held that the hippocampus was not merely a mapper of space but an active recorder of when, bridging location and duration within a unified neural substrate. The theory was widely adopted across memory research, temporal cognition, and episodic memory modelling, lending experimental support to the broader hypothesis that the hippocampus constructs an integrated spatiotemporal context for experience. Its decline began when investigators applied a velocity-integration line attractor model to the same datasets and found that the reported mixed selectivity and firing field shifts were reproduced in full without invoking any dedicated temporal encoding mechanism. The terminal demonstration was the model's capacity to account for the phenomenon entirely through velocity integration, rendering the time-cell construct explanatorily redundant.
The firing fields were real; the temporal interpretation of them was not.
The two-state allosteric model of aspartate transcarbamoylase proposed that the enzyme existed in two discrete quaternary conformations — a tense, low-activity state and a relaxed, high-activity state — and that cooperative substrate binding arose from a concerted shift between them. For more than five decades it served as the textbook illustration of allosteric regulation, adorning lecture slides in biochemistry courses worldwide and providing a satisfyingly binary framework for one of the most studied enzymes in metabolic regulation. Its decline began as cryo-EM, small-angle X-ray scattering, and modern crystallographic methods revealed that ATCase occupied not two states but a continuum of conformations, populating intermediate geometries that the classical model had no mechanism to accommodate. The terminal evidence was unambiguous: the enzyme's regulatory behaviour could be explained without recourse to a discrete T-to-R switch, and the two-state description was retired as a mechanistic account.
The field has moved from a light switch to a dimmer, and the data do not permit a return.
The belief that increased fluid intake prevented the recurrence of urinary stones was, for more than two decades, among the most confidently dispensed pieces of medical advice in nephrology and urology. It underpinned clinical guidelines worldwide, featured in virtually every patient discharge leaflet following a first stone event, and was regarded as so self-evidently sound that it largely escaped rigorous interrogation. Its logic was hydraulic and appealing: dilute the urine, reduce crystallisation, prevent stones. A randomised clinical trial testing a behavioural intervention designed to promote sustained increases in fluid intake found no reduction in recurrent stone events, striking directly at the guideline-level assumption that had sustained the recommendation for a generation.
The advice was easy to give, easy to remember, and, by the available evidence, not effective at the thing it was supposed to do.