Scientific Reasoning Exercises
The scientific method, applied to everyday claims about how the world works.
Scientific reasoning is the discipline of forming and evaluating beliefs about how the world works using evidence, controlled comparison, and explicit acknowledgment of uncertainty. It is not the exclusive property of scientists; it is a general intellectual style that applies to medical decisions, policy debates, business strategy, and everyday questions like whether a new diet is actually working. The exercises in this category train the moves that distinguish disciplined empirical thinking from confident-sounding speculation: identifying what would falsify a claim, distinguishing correlation from causation, recognizing confounders, evaluating sample size and selection effects.
The single most important move in scientific reasoning is asking what evidence would change your mind. Most people hold beliefs without being able to answer this question, which means the beliefs are not really empirical — they are commitments dressed up as conclusions. The exercises explicitly train the falsifiability question (Popper's contribution) along with its more modern probabilistic relatives: how much would this evidence shift my confidence, what is the strongest alternative explanation, where would I look to find disconfirming cases.
Beginner exercises focus on basic experimental design — independent and dependent variables, controls, blinding. Intermediate exercises introduce confounders, selection effects, and the difference between correlational and experimental evidence. Advanced exercises cover meta-analysis, replication, and the structural problems that have emerged in the recent reproducibility literature.
Why this skill matters
Most public discourse about empirical questions is technically poor. Studies summarized in headlines almost always overstate certainty; correlational findings are routinely reported as causal; selection effects in voluntary samples are usually ignored. People who can read past the headline and ask the structural questions — how was the sample selected, what was the control, was it blinded, has it been replicated — make systematically better empirical judgments. The skill compounds across decades of news consumption, health choices, and policy beliefs.
The exercises also have direct professional value. Anyone who reads research as part of their work — clinicians, product managers using A/B testing, analysts, consultants — benefits from internalizing the structural questions. The cost of a missed confounder in a real research project can be enormous; the cost of practicing on these exercises is small. Studies of scientific-reasoning instruction (Lehrer, Schauble; Klahr) show measurable transfer from focused practice to real-world judgment, particularly when the exercises emphasize explicit articulation of hypotheses and alternatives.
Common pitfalls
The reasoning errors these exercises specifically train against.
Treating evidence as proof
Even strong evidence shifts probability rather than proving claims absolutely. The discipline is updating confidence proportionally — moderate evidence supports moderate confidence, not certainty. Many science-communication failures come from translating probabilistic findings as definitive.
Ignoring base rates
A test that is 95% accurate sounds reliable, but its meaning depends on the prior probability of what it tests for. For rare conditions, even highly accurate tests produce mostly false positives. Base-rate neglect is one of the most common errors in interpreting empirical claims.
Confusing absence of evidence with evidence of absence
If a study did not find an effect, that may mean the effect is real but small, or that the study lacked power, or that the effect is genuinely absent. Distinguishing these requires looking at sample size and effect size, not just the headline conclusion.
Overweighting single studies
Individual studies, even well-designed ones, frequently fail to replicate. The credibility of an empirical claim depends on the body of evidence, not on the most recent or most newsworthy study. The exercises explicitly train you to look for replication patterns.
How the exercises are structured
Each exercise presents a research scenario, study design, or empirical claim and asks a structural question: what is the dependent variable, what would falsify this hypothesis, which alternative explanation is strongest, what is the most likely confounder. The wrong answers reflect common misreadings — confusing correlation for causation, ignoring selection bias, accepting an underpowered study at face value.
We rotate across domains deliberately. Medical examples train the skills that matter for personal health decisions; psychology examples train the skills relevant to most social-science journalism; physical-science examples train the methodological clarity that transfers to engineering and operations. The skill is generalizing across domains, not memorizing the conventions of one.
Where this skill applies
- Reading medical research and headlines. Most consumer health journalism is technically misleading even when not deliberately so. The exercises build the habit of reading past the headline to the methods.
- Running and interpreting A/B tests. Product managers, marketers, and engineers running experiments on real users routinely commit the same errors taught against here — peeking at results, ignoring variance, drawing causal conclusions from underpowered tests. Practiced scientific reasoning produces more reliable experimental decisions.
- Policy literacy. Most policy debates pivot on contested empirical claims. Knowing the structural questions to ask separates productive disagreement from talking past each other.
Frequently asked questions
Do I need to know statistics to do these exercises?
No. The exercises focus on the conceptual logic of scientific reasoning — design, controls, alternatives, replication — not on calculating statistics. The Probability & Statistics category covers the quantitative side. Most learners benefit from doing both, but you can start here without statistical background.
How is this different from probability and statistics?
Scientific reasoning is about the structural design and interpretation of empirical claims — how was this evidence produced, what are the alternative explanations, how should I update my belief. Probability and statistics is about the mathematical machinery for quantifying uncertainty. They are complementary: scientific reasoning frames the question, statistics answers it precisely.
What about social science and psychology research?
These fields face larger replication challenges than the physical sciences, which makes scientific reasoning especially important. The exercises include scenarios from psychology and sociology and explicitly cover the structural problems — small samples, selection effects, p-hacking — that produce unreliable findings in those fields.
Should I be skeptical of all scientific claims?
Calibrated, not skeptical. Some claims are extremely well-supported (germ theory, evolution, gravity); others are tentative (most single studies in social science). The discipline is matching your confidence to the strength of evidence, not defaulting to either trust or skepticism.
Further reading
Primary sources and reputable references for the concepts covered above.
- The Logic of Scientific DiscoveryKarl Popper — Routledge
The foundational text on falsifiability as the demarcation criterion for empirical claims.
- Stanford Encyclopedia of Philosophy: Scientific MethodStanford University
Comprehensive scholarly survey of philosophy of science from Aristotle through modern Bayesian approaches.
- Statistics Done WrongAlex Reinhart — No Starch Press
An accessible, free-online guide to the most common statistical errors in published research.
- The Book of WhyJudea Pearl & Dana Mackenzie — Basic Books
On causal inference — the modern framework for distinguishing correlation from causation.