A Statistical Significance Primer


About A/B Split-Testing.  Testing in a direct marketing environment is based upon principles which were originally developed by experimental social psychologists.

First you choose a single element to test. Then you select two representative samples of names or visitors from a larger universe (commonly called "cells") and present both of them with exactly the same overall stimuli, but vary just one element of the stimuli presented to the second group.

An Example. Suppose that you wanted to test a higher price against your existing permission email control offer. You would first have your online service bureau select two equally-sized, equally-representative test cells of names from the larger universe you want to email.

This is typically done using a method called "Nth" name testing, where, in the case of a 2-way split test, every other name is selected (sort of like how your Phys Ed teacher used to pick teams for gym class).

To the baseline "Control" cell of names, you'd mail your existing email copy and/or HTML art, with a special URL and/or tag process to measure the click-through and conversion rate from this group.

To the Price Increase Test cell, you'd mail the exact same version, but with copy and art changes only where needed to promote the price increase. You'd also be careful to mail both test cells on the same day at the same time from the same server host.

What Does "Control" Mean?  By varying only a single test element and keeping all other components of the test the same, this process enables you to "control" for all other intervening variables. That's why the baseline cell in a test is called the "Control" cell.

This way, if there is a significant difference between the two groups' response rates, you can conclude that the difference was driven solely by the testing element you varied (i.e. the "Free Gift" offer).

But to roll out, you need to be confident that this result is reliable and valid, meaning that if you repeated the test many times with the rest of the universe, you'd get a consistent difference between the two Strategies (Free Gift vs. No Free Gift). This is commonly called "Statistical Significance".

Statistical Significance.  To be considered significant, the difference in response rate needs to meet a minimum threshold. The size of this threshold is based on a combination of the two cells' response rates, sample sizes and your desired level of confidence.

Level of confidence. Level of confidence is defined as the number of times out of 100 that you could expect to see a consistently-directional difference between response rates for two test cells.

In our example, this means that the existing Control cell price would consistently pull better than the price increase, but you're not guaranteed how much the difference will be.

But Testing's Not Always Perfectly Scientific.  There are many other reasons however why testing isn't always perfect. Sometimes you get a test result which, if you repeat the test, doesn't hold up a second time.

This can be caused by a variety of factors, which are generally lumped into the category of "noise".  Sometimes they're also due to a misunderstanding by the IT folks of what you're really trying to accomplish when you ask them to select and split the names.

Strategies to Minimize Your Risk.  If you're considering rolling out with a high-risk strategy:

1. Ramp Up Rollout Volume Slowly. (i.e. in several separate and increasingly-large campaigns).

2. Repeat the Test.  In a few days you can confirm the original result. This is cheap insurance if you're paying a third party to let you mail to their opt-in permission email names.

A Final Note: "Beware of The Rollout Effect". When you roll out to a much larger universe, usually the difference in response rate compared to the old control is smaller than it was in the test. So we typically advise clients to calculate the new strategy's effect on your rollout P/L to see if it still pays for itself if you only get only half the lift (the most conservative scenario). This gives you a better idea of your potential downside.


About the Author.  Bill Baird is a subscription marketing consultant and trainer.  Baird Direct Marketing, Inc. is a full-service interactive direct response agency specializing in customer conversion and retention.