Product Growth Experiments Part 3: Documentation
This is the third post about product growth experiments. This time we’ll cover the last part of the experimentation puzzle, and that’s documenting the results and scaling your experimentation environment. It might be the last part for now, but it’s definitely not the least important.
If you’ve landed here for the first time, I encourage you to read the first post of this series: Product Growth Experiments Part 1: Concept and Hypothesis.
The crucial role of documenting what you learn
Just as I mentioned in the second post on analyzing growth experiments, what isn’t documented now, will be lost next month. It’s the same case as with a famous quote from M. Cagan’s “Inspired” book, “If you’re just using your engineers to code, you’re only getting about half their value.”
In experiments, if you’re not documenting what you’ve learned, you’re only getting 50% of the value from your experiments.
What’s the lost 50%?
Doing experiments let you minimize the risk of releasing something that will hurt your business or clutter the design without any additional value added to the business or user. With each hypothesis, we’re aiming at improving some status quo. If it wins, great! If it’s inconclusive or failed, we know we’re not making things worse. That’s the first 50%. Let’s call it the “short-term benefit”. The remaining 50% is a blocker to increase your experiments’ success rate in the long term.
Because experiments are the most effective tool to understand the causality that drives your business. You learn what feature, design, wording, or flow really moves the needle. If you don’t document what you’ve learned, you’ll force your company, and yourself, to reinvent the wheel next month or quarter. Every. Single. Time. That’s the lost 50%. Doing so is just wasting half of the effort you put into those couple of weeks of developing and conducting the experiment.
What should you include in the documentation?
If you followed the first post while preparing the experiment, you already have most of the work done. This includes:
- The big hypothesis.
- The changelog.
- The screenshots.
- Defined success metrics and secondary KPIs/micro-conversions.
- The statistical analysis of the results and decisions, including screenshots of the results.
Now it’s time to find the additional things you’ve learned like why we observed the results we got and how the variant changed the users’ behavior.
Qualitative analysis of the results
The fastest and most efficient way is to watch users’ screen recordings and heat maps, and compare the users’ behavior between the test and control groups. Some of the guidelines on what to look for could be:
- Are the key changes discoverable and prominent enough?
- Do users need more/less time to solve the task when compared to the control?
- Are there any design glitches or bugs that could affect the results?
- Do users behave oddly? For example, do they click on some buttons or elements a couple of times when once would be enough?
- Do users go back and forth often?
- Are there any significant changes to the user flow or their behavior?
Depending on the time available and the context of the experiment, the above points could be done separately for both mobile and desktop.
If you want to check how such analysis can be transferred into the next effective hypothesis and winning experiment, be sure to check this experiment’s case study.
Quantitative analysis of user behavior
When it comes to the quantitative analysis of the results, we already did the lion’s share of that in the previous post. Ideas and lessons learned from them should be documented here as hypotheses for follow-up experiments or quick fixes and improvements.
On top of that, it’s always good to try to validate the qualitative observations you did previously. If you noticed that a particular thing got more clicks or uses in the heatmap/recording, always try to double-check it in the data so you won’t draw conclusions from an unrepresentative sample size.
Synthesizing two sources of information
At this point, you should already have a bunch of legit observations and thoughts on why the given variant won or lost. That knowledge should be put into the experiment report in the form of short and sweet bullet points. That’ll give anyone a better understanding of what works, what’s neutral, and what should be avoided in the future.
Following the practices described so far will set you up for getting the most out of your experiments.
Becoming an experiment documentation badass
If you’re aiming higher though, or if you’re a product leader, then there are a couple more things you could do to create a proper experimental environment within your company. The ideas below will be your tools to evaluate your overall optimization success rate, make product learnings more accessible, and engage your teams more.
When you’ve conducted several dozen experiments, then, even with proper reports, it’ll be hard to find specific insights quickly. That’s especially true given that various experiments can affect the same components of your product, e.g., navigation patterns, tooltips, social proof, etc.
A good solution to solve this issue is to transform the list of reports into a simple database. Each report then has a couple of predefined properties like, for instance:
- device category (mobile/desktop/tablets)
- step of the funnel/app area
- link to visuals
- the metric you we’re optimizing for
- affected feature
- anything else that suits your company’s case
Back @ volders.de, together with Alex Krieger, ex-CPO @volders, and Jakub Linowski, GoodUI.org, we liked to base the optimization ideas on the first principles and behavioral patterns. Correspondingly, we had an additional property for that so we could check the effectiveness of different patterns. You can check the structure of this database in this Notion example.
When a database like this exists, it’s way simpler to find all of the knowledge you’ve gained about things like the checkout funnel, onboarding flow, etc.
This could be an extension of the database or a single, simple spreadsheet. The aim here is to keep track of how effective your experimenting is. For example:
- What is the success rate?
- What is the average uplift we are generating?
- How long, on average, do we need to run the experiment to get significant results?
- How many experiments were inconclusive or canceled due to faulty tracking?
- What was the average uplift for a particular metric?
You can check a sample spreadsheet I was using some time ago here.
If you want to engage your team or stakeholders more in the experimentation process, consider placing a bet for each experiment. Every person bets if a given experiment will win and what would be the expected uplift. The person who was most accurate wins something, and the entire team learns how to better evaluate possible uplift ranges.
Experimentation process summary
That would be the ending paragraph for the three-part experimentation series. Up until now, we’ve covered how to create an experiment’s concept and hypothesis, how to analyze the results, and why, without documentation, you’re losing half of the value of the experiment.
Since that’s the last post on experimentation for some time, I’d love to find out what are your experiences with experimentation and what information you found valuable in the last three posts. Be sure to leave comments below!