SaaSCalcHub. Explore Tools →

A/B Test Significance Calculator

Statistical significance, confidence level, and winner determination.

Experiment data

live

Variant A (control)

Variant B (treatment)

Statistical verdict

How to use this calculator

  1. Enter visitors and conversions for variant A (control).
  2. Enter the same for variant B (treatment). Conversion can be any binary event: purchase, signup, click.
  3. The calculator runs a two-tailed z-test for two proportions and returns the confidence level.
  4. Look for confidence ≥ 95% before declaring a winner. Pre-commit to your sample size; do not stop early.

Calculation method

pA = convA / visA

pB = convB / visB

p_pool = (convA + convB) / (visA + visB)

SE = sqrt(p_pool x (1-p_pool) x (1/visA + 1/visB))

z = (pB - pA) / SE

p-value = 2 x (1 - normalCdf(|z|))

Confidence = (1 - p-value) x 100

Uses an Abramowitz-Stegun rational approximation for the normal CDF (accurate to ~7 decimals). For very small samples (<30 conversions per variant) consider a Fisher exact test.

Frequently Asked Questions

A result is statistically significant if the observed difference is unlikely to have occurred by chance under the null hypothesis (that both variants perform identically). The 95% confidence threshold means there is less than a 5% probability of seeing this lift if the variants were truly equal.
95% is the industry-default trade-off between false positives (calling a winner that is not real) and statistical power (sample size needed). 99% reduces false positives but requires much larger sample sizes, slowing iteration. For high-stakes decisions (pricing changes, redesigns), use 99%.
Rule of thumb: you need ~16 / (lift^2 x base_conversion) visitors per variant to detect a relative lift at 80% power and 95% significance. For a 1% baseline conversion and 10% relative lift target, that is ~160,000 visitors per variant.
Two-tailed tests check if variants differ in either direction (B is better OR worse). One-tailed tests check only one direction. Two-tailed is the safer default for A/B testing because it surfaces unexpected regressions. This calculator uses a two-tailed z-test.
Peeking and stopping when you first hit 95% inflates your false-positive rate to as high as 30%. Pre-commit to a sample size before starting; only declare significance once that target is reached. If you must monitor, use sequential testing methods designed for repeated looks.

Business & SaaS Disclaimer

Statistical significance is necessary but not sufficient for business decisions. Consider effect size, segment heterogeneity, and long-term retention impact before rolling out a winner. SaaSCalcHub is not business or financial advice. Consult business advisors, CPAs, and consultants for your specific situation.

Last updated: May 26, 2026