Optmizely: Lessons learned from running 127,000 experiments.

Dec 20

The recently published "Final Benchmark Study" by Optimizely, based on the analysis of over 127,000 experiments, provides invaluable insights to inform your A/B testing and experimentation program.

These are some interesting insights from the study:

💡 88% of tests don't win (This is why it's SO important to test - Our intuitions about what will succeed are often not correct)

💡 Only a third of experiments test more than one variation, but experiments that have more variations are 3x as impactful (i.e., we should do more ABCD tests when possible)

💡 Tests that make significant changes to the user experience (pricing, discounts, checkout flow, data collection, etc.) are more likely to win and with higher uplifts.

💡 Experiments that include targeting are 16% more likely to win when compared to untargeted experiments.

💡 The median company runs 34 experiments per year. The top 3% of companies run over 500. To be in the top 10%, you need to be running 200 experiments annually.

Other key findings

Experimentation Win Rates and Company Practices

About 12% of experiments win on their primary metric, while 88% do not.
The median company runs 34 experiments per year, with the top 3% conducting over 500 annually.
Companies are increasing their experimentation velocity by 20% year over year.
Most experiment uplifts decrease to 80% of their initial value after a year, except for revenue-related uplifts, which retain 91%.

Experimentation Evolution and Strategies

Companies are transitioning from client-side testing to more mature experimentation frameworks, with feature experimentation growing to 36% of all tests since 2016.
Experiments involving more complex changes and multiple variations are more successful.
Advanced analytics and integrated Customer Data Platforms (CDPs) significantly enhance experimentation success.

Industry and Metric Variations

Win rates and experiment success vary across industries, influenced by experimentation maturity and metric selection.
The choice of primary metrics for experiments differs by industry, reflecting varying goals and priorities.

Team Performance and Experiment Design

Experimentation teams tend to maintain consistent performance over three years. Improvement requires altering research, creativity, and development processes.
High-impact experiments often involve substantial changes and multiple variations.
Greater complexity in experiments, such as multiple change types, leads to higher returns.

Micro-Conversion and Personalization

Focusing on micro-conversions (like search rate and add-to-cart rate) can lead to a higher experiment impact than solely targeting revenue.
Personalized experiments targeting specific user segments are 41% more impactful than general ones.

Resource Allocation and Traffic Models

Effective resource allocation, including developer time, is crucial. The most productive setup is running one experiment per developer per two-week sprint.
Machine-learning models like Stats Accelerator and Multi-Armed Bandit, which dynamically allocate traffic, significantly enhance experiment outcomes compared to standard A/B tests.

Successful experimentation in digital commerce hinges on advanced analytics, complex experiment designs, focus on micro-conversions, personalization, and efficient resource allocation. These insights can guide executives to foster a culture of innovation and optimize their digital strategies effectively.

Check out the report

Dive in to start reading The Evolution of Experimentation research from Optimizely.

Nick Di Stefano

I’m a product design lead fascinated by the intersection of people, technology, and design.

I bring over 12 years of experience in leading teams and shipping complex digital products. I enjoy untangling complex systems and collaborating across disciplines to create measurable change.

http://www.nickdistefano.com