A New Method to Measure the Results of your A/B Tests

Shiraz Kuwailid
6 min readJun 12, 2021

A/B testing is exciting, rewarding and profitable, but it’s also tough because less than half of your tests (usually) fail to deliver a winner. However, many tests do deliver other valuable insights.

So the question is: Is there any method to measure their results so that we capture ALL the positive effects of our A/B testing efforts? I think so.

The Win Rate

Let us start by defining our problem. An A/B test can have three types of results:

  1. Won
  2. Tie / No clear winner / Inconsistent result
  3. Lost

When we talk about the “Win rate”, we mean the percentage of your total tests that produce a winner.

A/B test win rate

What win rate should you expect?

Did you know that the majority of companies and organisations have a win rate of around 30%? We might encounter very few (if any) that consistently deliver a win rate of over 50%.

Those who A/B test more frequently and are very good at maintaining a continuous win rate have highly optimised sites, often due to years of continuous optimisation. This means that they have an even harder time producing new winners because they have already squeezed most of the potential out of their sites.

Well — that’s not good to hear. Since it feels a bit off since two-thirds of what we A/B test doesn’t end up being a winner.

Now we come to the next step in the argument. We have heard that,

The only losing test is the one you didn’t learn anything from”.

Even if a test delivers a negative result, to begin with, it can deliver valuable new insights, inspiring new tests, product feature development or something else that ultimately delivers a positive outcome for the business.

The New Method to Measure A/B Tests

At its core, our problem is about having clear outcomes at both extremes: clear winners and clear losers. But then we have a lot of uncertainty in the middle. So if we could just make some of the uncertain results clearer, a lot could be gained. We can break it down into 5 categories.

  1. Direct — A test result that shows a direct winner. A clear positive effect that we can implement immediately. This is the 30% or so that we count as winners in my previous model.
  2. Delayed — This is a test that doesn’t deliver a winner on the first try, but we get some ideas and we implement it in follow-up tests and eventually we get a winner. All tests on the way to our final win, are now not counted as losers but as “delayed winners”.
  3. Discovery — Here we have a test that led to an insight that we then translated into a related positive result. It doesn’t have to be a follow-up A/B test, but really anything that originated in our test that we could then apply to create a positive effect — any positive effect.
  4. Dead loser — This is a loser that we failed to “raise from the dead”. No matter how we try to tweak and do follow-up tests, we never reach a winner. Nor have we been able to generate any other valuable insights from the tests. The tests don’t deliver any meaningful results and we don’t know why.

The first four categories are very much about what are the direct immediate results of our tests. And then our results look like this:

Great, now the percentage of positive results has increased above 30%!

The final category is about what happens next.

Many organisations have problems implementing the positive test results they have created. This could be due to ongoing promotions or how the development / QA backlog is prioritised. It can also be seen as testing the wrong things because what you are testing is “not implementable”. Therefore, the last category is about the tests that were winners but not implemented.

5. Deferred — Direct winners that have not been implemented within the measurement period.

How does it work in practice?

This may seem complicated, so I have illustrated it with some examples.

Let’s say you want to test the copy on a call-to-action (CTA) button on your page.

Today it says “Start subscription”. You might check through the heatmaps and the hypothesis is that there’s room for improvement. So you want to test the text “Subscribe Now”. Another variant could be “Subscribe Today”.

You test it and get 5% fewer conversions than the original.

Now you might have the hypothesis that maybe you were a little too straight in your CTAs communication, and you should do just the opposite. Maybe visitors need more information before they make a decision. So you test — “See more” and “Learn more”.

You try it and get 5% more conversions.

With the old way of measuring, the result would have been:
Test 1 — Loser
Test 2 — Winner
Win rate = 50%

With the new way of measuring, the result is:
Test 1 — Delayed
Test 2 — Direct
Win rate = 100%

Inspired by your successes, you would try out more A/B tests. You might think about making the contact form a little simpler by removing the field for the customer’s phone number or making it not mandatory.

Your hypothesis here would be;

One less thing to fill in => Higher conversion rate

You test it and see — no difference!

You then decide to keep the field. No difference in conversion rate, but valuable to get the customer’s phone number in, if it doesn’t cost anything. This is a test result of the “Discovery” type. No direct result in the test, but a subsequent positive effect as a result of the test.

With the old way of measuring, the result would have been:
Test 1 — Loser
Win rate = 0%

With the new method of measuring, the result is:
Test 1 — Discovery
Win rate = 100%

What we have done with this new “method of measuring” is to recategorize some of our previous losers so that they now count as real winners instead of the usual approach of “one loser, fixed…..”.

New goals for your A/B testing projects

Now we have a new way of measuring. So the next step is to set new goals for our work. Previously our goal was: “Highest possible win rate”. It still is — but we have a new way of measuring our winners.

But I think it gets even more interesting if you reverse the funnel. The highest win rate is the same as the lowest possible loss rate. And now our losers are just those who are “Dead losers”.

Think about it. Does it make any difference if you work on:

Maximizing the number of “Direct Winners” (according to our old measuring method).
OR
Minimize the number of “Dead losers”, i.e. the tests from which we get nothing at all.

I’m pretty sure that testing efforts pursuing these two different goals will look entirely different.

So your new goals are now:

Objective 1: Minimise the proportion of “Dead losers”

And the second goal is about your “implementation rate”, i.e. how many of your direct wins are actually implemented.

Objective 2: Maximise the percentage of implemented direct wins

I hope that this new way of measuring and tracking your successes can lead to more successes and, above all, a clearer way of communicating in your organisation and a better understanding of the optimisation work.

Please feel free to contact me if you agree, or more interestingly — disagree — or just want to discuss this A/B test measurement model.

--

--