A/B Testing: How We Run and Interpret Experiments

A/B Testing: How We Run and Interpret Experiments

A/B testing (also called split testing) is the process of showing two variants of a page or feature to different groups of users and measuring which performs better. Done correctly, it eliminates opinion from design decisions and generates reliable evidence about what works for your specific audience.

How A/B Tests Work

Traffic is randomly split between variant A (control — the existing version) and variant B (the challenger — the change being tested). Both groups are measured on a defined success metric (conversion rate, click rate, revenue per user). The test runs until statistical significance is reached.

Statistical Significance

Statistical significance (typically p < 0.05, or 95% confidence) means there is less than a 5% chance the observed difference is due to random variation. A test with insufficient traffic will not reach significance — the result is unreliable regardless of the apparent difference.

Common A/B Testing Mistakes

  • Stopping too early ("peeking"): Ending a test as soon as you see a difference — before reaching significance. This dramatically inflates false positive rates.
  • Testing too many things at once (multivariate confusion): Testing multiple changes simultaneously makes it impossible to know which change drove the result
  • Seasonal bias: Running a test over a period that is not representative (a bank holiday week, a major sale)
  • Not measuring secondary metrics: A variant that increases sign-ups but decreases retention is not a win

Our Approach

We define the hypothesis, success metric, and minimum detectable effect before starting any test. We use sample size calculators to determine required traffic. We report results with confidence intervals, not just point estimates.

Did you find this article useful?