Segmenting: What works for whom

15 May 2017Alex Gyani & Simon Raadsma


Tagsmachine learning, segmentation, trial

Since 2013, the Behavioural Insights Unit (BIU) has been running Randomised Controlled Trials (RCTs) to test whether our ideas result in behaviour change in the real world. RCTs are the gold standard of evidence. They allow us to test the effectiveness of an intervention compared to if we had changed nothing.


However, RCTs often identify what works on average for a group of people without drilling down into what works specifically for whom. When we display results in the form of a simple bar graph we hide a lot of complexities. While this can make the information easy to understand, to create effective policies we need to know that interventions work for different groups of people. Segmentation can with help this. Segmentation is a process of dividing people into subgroups based on defined characteristics, such as demographics, service users, etc... While understanding what works for whom can be helpful, the process of doing segmentation also can be tricky and problematic.

In this blog, we will discuss two segmenting methods:

  • Looking at specific and pre-defined groups of people in a larger sample
  • Using machine learning algorithms

There are other methods that exist but these two are the most relevant for BIU.

Looking at specific and pre-defined groups of people in a larger sample

One method of doing segmentation is to include a broad group of people in a trial and look at whether the intervention has different effects on different people. In 2014, the BIU worked with St Vincent’s Hospital to encourage more people to show up to their outpatient appointments (PDF, 507 KB). The hospital was already sending reminder text to their patients, however, even with these reminders 14 per cent of people still missed their appointments. These missed appointments were costing the hospital $125 each time, and added up to hundreds of thousands of dollars a year.

A lot of different people have hospital appointments. Within this sample, we wanted to look at whether there was a specific group of people that we should focus our attention on. Using available data, we found that:

  • people with frequent appointments were more likely to attend their appointments than those with infrequent appointments
  • people with morning appointments were more likely to attend than those with appointments in the afternoon.

So, we decided to take a closer look at what messages worked for people with non-frequent, morning appointments compared with people with frequent, afternoon appointments. We found that the Avoided Loss Message (“If you attend the hospital will not lose the $125 we lose when a patient does not turn up”) was effective for people with infrequent appointments.

The criteria we use to put people into groups are often arbitrary. For the trial above, defining someone as a frequent attender or not was made based on whether they appeared in our trial more or less than the median. We divide people into groups using the mean, or we have split people into those with one appointment per month, two appointments per month and so on. There are hundreds of ways you could investigate the effect of text messages on frequent attenders.

Each one of these hundreds of ways could give you different results. If you test all the possible combinations of groups and only report the significant ones, this is very, very bad practice (see this blog for more information). The best practice is to make a plan of which groups you are interested in before running your analysis, and stick to it. However, when this plan feels arbitrary, you might think to yourself: “surely there must be a more robust way of doing this?” Let’s look at some innovative new options.

Using machine learning algorithms

New methods are being developed to understand whether interventions work on different groups of people. We have recently been exploring machine learning techniques to segment groups. These use algorithms to decide which characteristics best predict how people will respond to an intervention. If you are interested in the details, read more here. But basically, it produces a decision tree, which can be used to work out what works for whom.

Using the data from one of the trials we ran at St Vincent’s Hospital, we have used these machine learning techniques. The trial tested which version of a reminder text message was most effective (eight different reminder text messages were developed using different behavioural insights principles).

The machine learning spits out a decision tree (see below). It should be read from top to bottom with the first number showing what the likely impact is with a specific group[1] and the second number showing what proportion of the sample is made up by that group.

Image not found

The node at the top of the tree shows that the best performing message (Avoided Losses to Hospital) reduces the likelihood of not showing up by 1.7 percentage points. It also shows that if a patient had eight or more appointments, they are 4.8 percentage points more likely miss their appointment if they receive this message compared to people who did not receive the message (the 18 per cent indicates how big a proportion this group is as a proportion of the sample). And if we just look at people under 38, with fewer than eight appointments, there is a large drop in non-attendance if they received the avoided losses to patient message compared to other messages[2].

But what does this mean in real terms? Whenever we report the results from BI trials, we state that even a reduction of half a percentage point could mean thousands more people are turning up for their appointments when an intervention is scaled up. However, when you start to segment populations into smaller and smaller groups, two things happen. Firstly, we can be less certain that the results are not spurious and secondly, the sample of people that the message can be scaled to gets smaller. There are still benefits, but they are likely to be in the hundreds, rather than the thousands.

In short, we need to be cautious about how we interpret these results. It is possible that the results that we got were spurious and won’t be replicable. How we use these results requires judgement and preferably input from those who will be scaling up or refining the intervention.

Turning insight into action

By carefully using data (be it quantitative or qualitative) to understand an issue, we can use it to improve our interventions. However, to do this we need to know what works for whom, and knowing this is tricky. It is important that we don’t forget that while interventions work for most people, often they don’t work for everyone. Working out who would really benefit from your intervention beforehand is the best way of making your interventions as effective as possible.


[1] Negative numbers, such as -0.08 reflect a reduction in people missing appointments by 8 percentage points. Positive numbers, such as 0.048, represent an increase in missed appointments by 4.8 percentage points.

[2] In the group of 1,537 people who had fewer than 7.5 appointments and were under than 38, 19 out of 192 of those who received the message failed to show if they got this message, compared to 245 of 1,345 who received the other messages.