Estimating Website Conversions: How Multiple Data Sources Reveal the Importance of Quality Over Quantity
For any organization in any industry, building a website these days is only half the battle — maybe less. It’s incredibly important to understand your site’s performance as well. Is it achieving your intended goal, whether it’s a purchase, a donation or just the simple act of providing contact information? There’s plenty of data available that can fill in the blanks, but it’s possible the usual approach leaves some empty.
Our theory is that incorporating more than just website-specific metrics can improve the ability to estimate conversion volume and gain a better understanding of your website’s overall performance. To explore this, we’ll use a regression-based approach with data from a website whose purpose is to facilitate charitable donations (although the concepts and methods are applicable to just about any site). All the data should be readily available via Google Analytics and internal sources. Beyond that, you just need to be able to click “export.” If by chance you don’t have a Google Analytics account for your website, visit Google.com/Analytics to learn more about starting one.
A note before we start
Before we begin to explain the estimation process, keep in mind the goal of this post is not to provide you with key metrics that, when improved, will increase your conversions. It’s possible to evaluate your site’s ability to facilitate conversions, just not solely through the methods explained here. This is because most webpage metrics provide insight to the quality of traffic reaching your site, but not necessarily its ability to bring about conversions.
In the case of our example, someone who is already interested in donating is inherently more likely to spend a greater amount of time on the website, visit more pages, return to the site at a later time and eventually donate. Seeing these metrics improve alongside increased conversion volume likely means your website attracts higher-quality traffic, but don’t assume a causal relationship exists. If done correctly, however, this method can provide powerful insight to the impact of a given marketing campaign, a major change to your website or other alterations to your business strategy or industry. OK, enough caveats. Let’s dig into the estimation process.
Begin with website-specific data
To kick things off, we’ll look at what’s possible using only website-specific data. For this example, we’ll include total users, returning users, bounce rate, average session duration, organic search traffic, page views and, of course, conversions — all of which should be available through Google Analytics. We chose these variables because they’re available on a daily basis and are used regularly to capture the majority of website activity that is likely to explain conversion fluctuations. Every website is different, however, so you may find that other metrics are of greater importance to your website and therefore should be included.
Using conversions as the dependent variable and the other metrics as the independent — or explanatory — variables, we can employ regression analysis and begin to get an idea of what helps shape conversion fluctuations. For the most part, the results suggest what you might expect: The bounce-rate variable has a negative relationship with conversions, while page views, organic search traffic, returning traffic and average session duration all display positive relationships. A slightly unexpected result of the first estimation is that total traffic appears to have a negative relationship with conversions. These results support the theory that website goals should focus on the quality of incoming traffic rather than quantity. You can see the relationships in Table 1.
These results also help illustrate the value of regression analysis over simple correlation analysis. The correlation coefficient for total traffic and conversions in this sample is 0.3, suggesting a slightly positive relationship. Making key business decisions based on this correlation alone could lead you astray. But through regression analysis, we can control for key variables that impact conversions and begin to get at the true relationship. As we continue to control for more variables, we will be able to determine just how robust these results really are.
Next, incorporate internal data
With this simple model including only website-specific data, our average percent error between predicted values and actual values was 43.6%. Not a terrible start, but we can definitely get better. To take that next step forward, we can include variables from internal data sources to help capture business activities that might impact conversions. These variables could be flag variables (also called dummy or binary variables) that represent time periods, such as when campaigns or promotions were in place, when your company had a booth open at a trade show, when your website had technical issues or any other time-specific business activity. You can also include more traditional data, such as marketing spend, pricing or anything you believe might affect conversions. For this example, we’ll incorporate campaign data representing email, banner, social media and search engine campaigns. Depending on how you set up your campaigns, this data might be available in your Google Analytics account.
After controlling for these different campaigns, we find the coefficient for organic search traffic no longer suggests a statistically significant relationship with conversions, while the other website metrics maintained their relationships. We also find results that suggest the search engine and banner campaigns had a significant and positive impact on conversions, the social media campaign displayed a negative relationship with conversions, and the email campaign showed no signs of a significant impact. The negative relationship associated with the social media campaign is likely just picking up seasonal factors for which we haven’t controlled yet. As we add more relevant variables and improve the model, some of these relationships may not hold. You can find these relationships displayed in Table 2.
Finally, incorporate external factors
With the website-specific variables and some internal factors controlled for, we’ll incorporate a few external factors to capture macroeconomic conditions and seasonal effects. For seasonal effects, we included flag variables for month- and year-specific differences. We also included a variable to capture weekends because this involves daily data, and individuals could have a higher propensity to go through the donation process either on the weekends or during the work week. You should choose the seasonal variables to include based on the time specifics of your dataset and seasonal factors likely to impact your business.
Next, determine what macroeconomic and other external variables likely affect your industry and therefore should be controlled for. If you are in the auto industry, it might be important to include new-vehicle sales, used-car prices or metal prices. If your business is greatly affected by changes in weather, you might need to include that data. In the case of donations, an individual’s propensity to donate is likely impacted by their financial situation and outlook on how the economy will affect them moving forward. For this reason, we’ll include data on real disposable income from the Federal Reserve Bank of St. Louis and a Consumer Sentiment Index from the University of Michigan’s Surveys of Consumers.[i], [ii]
After we incorporate the macroeconomic and seasonal variables, we begin to get a clearer picture of what might truly explain conversion volumes. The monthly variables helped capture the seasonal rise in donations that occurs toward the end of each year versus the slight drop-off in donations that occurs in the early months of each year. The yearly variables helped capture some of the impacts associated with an improving economy, as both the 2014 and 2015 variables displayed a positive and significant impact on conversion volume. Finally, the weekend variable helped capture the fall in donations that occurs on the weekends, as donors tend to complete the donation process during the work week. We didn’t find any significance involving the consumer sentiment and disposable income variables, likely due to the yearly and monthly variables picking up some of the effects associated with improving economic conditions.
And it all comes together
The addition of macroeconomic and seasonal data helped provide further evidence of the connection between website metrics and conversions, as the relationships found in the last iteration of the model held strong, with the exception of average session duration no longer showing a statistically significant impact. The consistent negative relationship between total traffic and conversions continues to provide evidence that increasing the quality of visitors should take precedence over increasing the sheer volume of visitors. You can see these relationships in Table 3.
With regards to the campaign variables, we found the search engine campaign no longer displayed a statistically significant relationship with conversions. This change was likely a result of the search engine campaign variable picking up seasonal effects before they were properly controlled for. However, the banner, social media and email campaigns all displayed statistically significant and positive relationships with donations.
Through the addition of internal and macroeconomic/external variables, our model reduced the average percent error between model-predicted and actual values from 43.6% to 40.2%. Furthermore, for a conversion dataset that ranged from 0 to 445 with a standard deviation of 41 conversions, the average absolute error went from 24.5 with just the website data to 22.3 with the internal and macroeconomic/external data in the model. It might not seem like much, but when trying to explain what drives conversions, a 9% improvement can mean a lot. You can clearly see these improvements in a monthly transformation of the forecasted and actual values in Chart 1.
Beyond improvements realized by incorporating multiple data sources, this technique has provided us with insight to how changes in specific variables can help explain fluctuations in conversion volume. It’s now possible to use this information to help estimate the impact of marketing campaigns or key website changes moving forward, truly allowing data to drive marketing strategy.
For updates on our blog and other information, follow us on Twitter.@misixanalytics
Data drives your business, data should drive your marketing. Find out how Misix helps businesses redefine their marketing in a data-driven world by visiting https://misix.com/process.
[i] Federal Reserve Bank of St. Louis, “Real Disposable Personal Income,” https://research.stlouisfed.org/fred2/series/DSPIC96.