Measuring your content with user data
I wanted to build on Keri’s recent insightful post on measuring content strategy, and talk about possible ways to measure the effectiveness of content changes on e-commerce sites. More specifically, how do you select the best content if you have a variety of different alternatives, each with its own group of fans who want to get it on the site right away? Since the voice of a web site can be such an abstract, arbitrary decision, how can we apply methodologically robust research methods to help make these decisions?
First, I would define “effectiveness” in this context as the optimization of the following three concepts:
- Do users understand what you are trying to tell them and what action they should take to be successful in their task?
- Are you invoking the desired emotions with your content?
- Does the proposed content result in higher conversion rates than other alternatives?
It’s so important to combine the user perception data (the first two bullets) with business metrics (the last bullet). From my experience the only way for user experience professionals to affect change is if we can show the positive impact these changes have on engagement/revenue metrics.
It seems to me that you will be well served by using the following three methodologies to measure the relative effectiveness of different versions of the same content. This is also a really nice way to progressively reduce the number of alternatives down to the best solution:
- Usability testing. Start with several different version of the content (~10), along with the current version (if it exists). Ask users in a lab setting what they understand the content to mean, and any other thoughts they have on the way it sounds. This should help narrow down the alternatives to 4-6 possibilities.
- Desirability testing. Use the Desirability method, but adjust it for use in large sample online surveys by turning it into a between-subjects experimental design. In the survey, users are asked to rate the content on different brand and design attributes. This way you can determine what emotional response the content extracts out of users. You’d also be able to ask users which version of the content they’d prefer, and why. This method has the added benefit of large numbers to give you confidence in the statistical significance of the results.
- A/B testing. Once you’ve narrowed the alternatives down to two or three, live A/B testing can help you determine which of the alternatives perform better from a revenue or engagement perspective, by looking at differences that can be attributed purely to content changes. This obviously works easiest when the content is directly related to a revenue-generating task, like the call to action on a checkout page, for example. But it’s not just about revenue — there are great ways to measure metrics of engagement with the page, which is just as powerful.
Now, I can see two issues that make this a pretty difficult task, and it’s the reason why the above three methods should not be used in isolation. In combination, they help tell the whole story.
- It is difficult to know what users really read on a page. In the first two methods you pretty much have to show people what to read — that doesn’t happen when they visit your site organically with no one looking over their shoulder. This is why A/B testing is so important as it gives you a sense of how behavior will change based on content.
- It is difficult to isolate the effect of content changes from the other influencing factors on a page. This is the really difficult part. How do you know that conversion/engagement improved because of the content and not of some other factor on the page, like visual design changes? That is why it is important to keep the rest of the page exactly the same, and also why usability and desirability testing is important to bring out the perceptual data from users.
And the biggest problem is of course that this is an idealistic approach. Finding the resources/time/money to do this for every content change is obviously not feasible. But for major changes to the site, this approach could be well worth the investment.
This is also by no means the only way to measure content effectiveness, but I think it’s a good approach that balances methodological rigor with the dangers of not overdoing it. I’d be curious if anyone has any thoughts or ideas on how to improve on this approach…



