Scientific Reasoning

Experimental Design Analysis

Tackle advanced challenges in experimental design by analyzing blinding procedures, operationalization decisions, ecological validity, randomization failures, and the replication crisis through detailed real-world research scenarios. You will build the ability to spot subtle methodological weaknesses that can invalidate even well-intentioned, well-funded studies and to evaluate whether a study's conclusions actually follow from its design.

Advanced20 minScientific Reasoning

Context

Why this exercise

Experimental design is the discipline of arranging an investigation so that the conclusion is licensed by the data rather than smuggled in through some other route. The advanced level is not about memorizing study types but about reasoning about which threats to validity are present in a given design, which confounders are controlled and which are not, and what kind of conclusion the design can and cannot support. This exercise drills the analytic moves that biostatisticians and methodologists use: identifying internal and external validity threats, recognizing the limits of natural experiments and quasi-experimental designs, and matching design strength to claim strength.

Before you start

The intellectual foundation for modern experimental design comes from Ronald Fisher's work on agricultural experiments at Rothamsted (1920s-1930s), which introduced randomization, replication, and factorial designs. Donald Campbell and Julian Stanley's 1963 monograph 'Experimental and Quasi-Experimental Designs for Research' identified the major threats to internal validity (whether the observed effect is really caused by the manipulated variable) and external validity (whether the effect generalizes beyond the studied conditions). The threats they catalogued — history, maturation, testing, instrumentation, regression to the mean, selection bias, attrition, and various interaction effects — remain the standard checklist for evaluating any experimental claim. Pearl's modern causal inference framework provides graphical tools (DAGs) for identifying exactly which adjustments and which designs can identify a given causal effect.

Randomized controlled trials remain the gold standard because random assignment breaks the connection between treatment and any confounder, controlling threats that observational designs cannot address. But RCTs are not always possible (ethical, practical, or cost constraints) and not always ideal (artificial conditions can limit external validity). Quasi-experimental designs — interrupted time series, regression discontinuity, difference-in-differences, instrumental variables — let researchers approximate causal inference from observational data under specific assumptions. Natural experiments exploit randomization that occurred outside the researcher's control (a policy change, a natural disaster, a lottery). Each of these designs has specific strengths and specific failure modes, and the advanced skill is matching the design to the claim and recognizing when the claim outruns what the design can support.

Several specific advanced traps deserve named attention. The intention-to-treat principle says that participants should be analyzed in the groups they were assigned to, even if they did not comply with the intervention, because dropping non-compliers reintroduces selection bias. Per-protocol analysis (only analyzing those who completed the intervention as intended) often shows larger effects but at the cost of validity. Subgroup analyses are notoriously underpowered and exposed to multiple-comparison problems. Mechanistic claims (X causes Y because of pathway Z) require evidence specifically about the pathway, not just about the input-output relationship. As you work the scenarios, practice identifying the specific validity threats present in each design, matching the claim's confidence to the design's strength, and recognizing when the proposed conclusion outruns the warrant. For broader treatment of how this connects to scientific reasoning generally, see Scientific Thinking.

Question 1 of 617% Complete

A medical school is testing whether a new minimally invasive surgical technique for knee cartilage repair reduces recovery time compared to the standard open procedure. Patients are randomly assigned to receive either the new (n = 85) or standard (n = 87) surgery. Surgeons obviously know which technique they perform. The physical therapists evaluating recovery milestones (range of motion, weight-bearing ability, return to activity) also know each patient's surgical group. A biostatistician reviewing the protocol says the study needs partial blinding. What specific bias concerns her most, and what is the best feasible solution?