Counterfactual approach in project evaluation

The counterfactual method is an estimate of what would have happened without the program, and for some programs this can be a key element of the evaluation design.

Listen

Using the counterfactual style is the most rigorous approach in the right circumstances and can provide strong evidence for program outcomes. The right circumstances are when it:

is feasible and ethical to have some form of valid comparison, and
when it can provide results in time to be used.

For the above reasons, the counterfactual should be considered during program design where possible. You can explore options at the outset for creating a control group or a credible comparison group.

Designs that use a counterfactual approach use a control or comparison group and include various types of experimental and quasi-experimental options.

Some examples of methods for constructing a control or comparison group include:

Randomised controlled trials (RCT) that assign potential participants (or communities, or households) at random to receive the program or to be in a control group with no program and the outcomes of the different groups are compared. Random assignment reduces the risk of selection bias, although consider other threats to validity.
Quasi-experimental design includes a comparison group, but one that is built administratively (such as a wait list) rather than randomly assigned. For example, participants (individuals, organisations or communities) are matched with a non-participant on variables that are thought to have influenced program participation then compared. Another option is regression discontinuity design that uses an administrative cut-off point related to program entry, in order to 'construct' a comparison group.
Statistically created counterfactual – a statistical model, such as a regression analysis, is used to develop an estimate of what would have happened in the absence of a program. This can be used when the program is already at scale – for example, an outcome evaluation of the privatisation of national water supply services. Another option is propensity score matching (a form of matched comparisons), which creates comparable groups based on a statistical analysis of the factors that influenced people's propensity to participate in the program.
Logically constructed counterfactual – it may be credible to use baseline data (pre-program) as an estimate of the counterfactual. For example, where a water pump has been installed, it might be reasonable to measure the outcome by comparing time spent getting water from a distant pump before and after the program, as there is no credible reason that the time taken would have decreased without the program. Process tracing can support this analysis at each step of the theory of change.

Each method has advantages and disadvantages with a different set of assumptions about program design and implementation underpinning each of them. The method chosen should depend on a number of factors, such as:

what is most appropriate for the program
availability of data
ethical considerations
time required for results to be available.

Investigating the counterfactual is always worthwhile, as it provides a stronger body of evidence for program outcomes. However, you'll need appropriate technical advice as developing a credible counterfactual can be challenging in practice.

In situations of rapid and unpredictable change, it might not be possible to construct a credible counterfactual. It might be possible to build a strong, empirical case that a program produced certain outcomes, but not to be sure about what would have happened without the program. In these cases, an outcome evaluation can focus on the other two elements of causal analysis – the factual and ruling out other influences using the alternative explanations approach.

Example

An outcome evaluation of a cross-government executive development program could not use a randomised control group, because randomly assigning people to be in a control group – or even participate in the program – was impossible. The evaluation also couldn't use a comparison group, because the nature of the program meant that those accepted into it were systematically different to those who were not. Instead, the evaluation used other strategies for causal explanation, including attribution by beneficiaries, temporality and specificity (changes were in the specific areas addressed by the program). [Reference from Davidson, 2006 cited in Introduction to Impact Evaluation].