The counterfactual is an estimate of what would have happened in the absence of the program, and for suitable programs this can be a key element of the evaluation design.
Using a counterfactual is the most rigorous approach in the right circumstances and can provide strong evidence for program outcomes. The right circumstances are when it is feasible and ethical to have some form of valid comparison, and when it can provide results in time to be used. For these reasons the counterfactual should be considered during program design where possible, exploring options at the outset for creating a control group or a credible comparison group.
Designs that use a counterfactual use a control group or a comparison group, and include various kinds of experimental and quasi-experimental options, with a wide range described at the Better Evaluation link.
Some examples of methods for constructing a control group or comparison group are described below:
- Randomized controlled trials (RCT) assign potential participants (or communities, or households) at random to receive the program or to be in a control group (either no program or the usual program) and the outcomes of the different groups are compared. Random assignment reduces the risk of selection bias, though other threats to validity still need to be considered
- Quasi-experimental design includes a comparison group, but one that is constructed administratively (such as a wait list) rather than randomly assigned. For example with matched comparisons participants (individuals, organisations or communities) are each matched with a non-participant on variables that are thought to have influenced program participation. Another option is regression discontinuity design that utilises an administrative cut-off point related to program entry, in order to 'construct' a comparison group.
- Statistically created counterfactual – a statistical model, such as a regression analysis, is used to develop an estimate of what would have happened in the absence of a program. This can be used when the program is already at scale – for example, an outcome evaluation of the privatisation of national water supply services. Another option is propensity score matching (a form of matched comparisons), which creates comparable groups based on a statistical analysis of the factors that influenced people's propensity to participate in the program.
- Logically constructed counterfactual– in some cases it is credible to use baseline data (pre-program) as an estimate of the counterfactual. For example, where a water pump has been installed, it might be reasonable to measure the outcome by comparing time spent getting water from a distant pump before and after the program, as there is no credible reason that the time taken would have decreased without the program. Process tracing can support this analysis at each step of the theory of change.
Each method has advantages and disadvantages with a different set of assumptions about program design and implementation underpinning each of them. The method chosen should depend on a number of factors, such as what is most appropriate for the program, availability of data, ethical considerations, and time required for results to be available.
Investigating the counterfactual is always worthwhile, as it provides a stronger body of evidence for program outcomes. However it is important to have access to appropriate technical advice as developing a credible counterfactual may present challenges in practice.
In situations of rapid and unpredictable change, it might not be possible to construct a credible counterfactual. It might be possible to build a strong, empirical case that a program produced certain outcomes, but not to be sure about what would have happened if the program had not been implemented. In these cases an outcome evaluation can focus on the other two elements of causal analysis – the factual and ruling out other influences using the alternative explanations approach.
An outcome evaluation of a cross-government executive development program could not use a randomised control group, because randomly assigning people to be in a control group – or even participate in the program – was impossible. Nor could the evaluation use a comparison group, because the nature of the program was such that those accepted into it were systematically different to those who were not. Instead, the evaluation used other strategies for causal explanation, including attribution by beneficiaries, temporality and specificity (changes were in the specific areas addressed by the program). [Davidson, 2006 cited in Introduction to Impact Evaluation]