Testing, disproportionate returns and how to find them

touse1 (1)

“Hey, do you want some chocolate ice-cream?”
“Maybe after dinner tonight, but it’s 7am right now, so that’s a terrible idea”.

From “A vs. B”, to “everyone’s a winner”, this short series looks at what we’ve been doing wrong for so long, and how to start doing it right.

As a discipline, AB testing has been in use for well over a decade. Even personally, I’m in my sixth year of test development, having led multiple teams – that’s a long time to have been doing it wrong.

Let me explain

The premise of an AB test is that you have a variation of your thing (heading, banner, email, payment flow, etc.), that you think is going to be better than what you have today. So, you try out both, realise you were right, and spend the next week revelling in your genius. Or, you have more than one super-awesome idea, in which case you run an ABn test and give them all a shot.

At some point, we became savvy enough to question what exactly in our grand idea was really helping us, which gave Multi-Variant Tests (MVTs) their purpose in life. By testing variations of each component of our change, we could work out which bits were doing well, and which ones need to be thrown out of the window at passers-by.

For example, the size of your button might not make much of a difference, but its colour and text might. After this revelation, though, we abandoned trying to improve the testing process for several years. These practices – AB, ABn and MVT testing – along with serving targeted content (often referred to as personalisation or segmentation), have formed the basis of all testing activities for as long as I’ve been doing it, and I’m sure you’ve all been there too.

So why is it so wrong?

While the process works, it can be hugely inefficient. Imagine running a test that daytime users respond to very well and evening users respond to terribly. If your test doesn’t pick up that variable the net result could be null or even negative, despite a huge lift being there for the taking.

That kind of failure is exactly why our usual method of single-variable testing with no contextual consideration is extremely flawed.

touse2 (1)

Disproportionate returns…

In order to identify these, until now, hidden opportunities to realise disproportionate returns we need a new approach. We need to reframe our question – moving from “What’s the best piece of content/functionality?” to “what’s the best piece of content/functionality for each user?”.

And, the better we know each user, the better equipped we should be to answer this question well. The alternative is no different to believing that all kids probably like milkshakes, only to find that the one you offer some to is Oreo intolerant.

…And how to find them

The theory is great, and whilst an MVT would tell us exactly what to show our people, how do we find out who do we show it to? And when? Do we consider how they got to the page? Segmentation has been our only tool to achieve this in the past, but we’ve been limited (understandably) by our tools not being designed to generate 1000s of tests for each idea we have.

Now, however, there is a new kid on the testing block. Slightly more advanced querying, or “Machine Learning” as it’s marketed, is the advancement we’ve needed to finally start testing well.

The key to all this is to accept that most ideas hold some value – we just need to the right tools and the insight to know when micro-segments are performing well. So when armed, we can automatically push traffic to these segments for as long as appropriate – to both realise disproportionate returns and circumvent the need for the manual work that’s rendered this an impractical way of working to date.

Through the next three posts in this series, we’ll look at how all this might work in practice, and also evaluate tools that claim to make it possible today.