The dangers of the “Google Analytics-powered Startup”

They did it again. With Google Analytics, Google has once again created a piece of free-ish enterprise software in a previously costly field (think Google Drive instead of DropBox/Box or Google Docs instead of Microsoft Office) which has become so omnipresent that it’s hard to avoid. Not only is it easy and free to install and deploy, it’s also simple enough that CEO’s to understand it and the de-facto standard for investor due diligence. But most of the time it’s completely wrong.

Let me start off by saying that I am not advocating for people not to use Google Analytics; in fact I use it avidly and consider myself a big fan. With this post, my mission is rather to provide a bit of perspective for the many current and prospective startup founders out there who base (or intend to base) their ‘data-driven work style’ on Google Analytics alone. In my world, Google Analytics should be seen as one advisor of many rather than ‘the one truth’, and here’s a few reasons for that:

1: Sampling makes most custom reports highly inaccurate

The free version of Google Analytics, which is by far the most popular due to the ~150,000 USD/year price tag of the Premium version, uses sampled data in order to limit the computing power needed to perform analysis and, to some extent, to make it difficult  to track single users rather than performing purely quantitative analysis. It’s a bit like how Gallup can summarise Indonesians’ smartphone habits by calling 1,500 of them; it works fine if you’re looking for a general pattern, but it might skew the data if you’re looking for data about a tiny niche of smartphone users or if Gallup happened to call up relatively too many Nokia users that day.

Phew! Okay, what does that mean for startups relying only on Google Analytics? For the most generic analysis like Sessions and Pageviews it might be fine, but if you’re looking for very tiny subsets of data, things can get dangerous when you reach certain levels of traffic. This is because sampling applies to queries of more than 250,000 Sessions on a Google Analytics Property level. Here’s an example:

“Yearly overview of sales channels”

Let’s say that your Google Analytics Property includes your company’s e-commerce website and the attached blog, the latter of which receives most of your traffic. You’re looking to identify the most important revenue channels for the past year, a view with about 750,000 visits. In this case your sampling level would kick in at a third of your total traffic, leaving the real data of 500,000 visits (66,6%) completely untouched. In reality this means that certain channels quite likely get too much revenue attributed, while others get way too little. It must really suck to be the channel manager for the latter.

Google does a decent job at explaining the implications and extent of sampling in Google Analytics, but most people are not aware of its functional limitations. Here’s what I recommend:

Become aware. As long as you understand which reports are sampled and which are not, you can quickly figure out which data to trust completely and which to consider an indication only.

Narrow down the time cohorts. For most startups, the 250,000 sampling limit is only a problem when comparing quarterly, yearly etc. If this is the case, just narrow down the cohorts and compare the data in Excel afterwards.

Stick to the default reports. The issues outlined above typically hit harder when you rely on Google Analytics as a part of heavy and automated reports, such as a Data Warehouse, than if you just look at the online Analytics app.

2. There is almost always something completely wrong with the install

There is a common tendency in start-up culture, heavily backed by ‘tech savvy’ VCs and online marketing consultants, to consider quantitative data the best kind of data and to preach the benefits of data-driven performance loops where scheduled and ad-hoc reports based on a specific hypothesis lead to structured optimisation. And while this is great in theory (and sometimes in reality, too), it only really works when the data is correct. And it very rarely is.

While your average Wordpress blog installation typically reports accurate data to Google Analytics, it all starts to get a bit more cluttered when your website grows in complexity, your site receives more traffic and you start relying more heavily on custom parameters and conversion tracking. From my own experience working with several large-scale clients, I have yet to see a completely reliable Google Analytics setup for a website with more than ~1 million monthly sessions. Here’s a few examples of what can go wrong:

a) Traffic country attribution

Want to figure out the correlation between your company’s markets’ respective marketing spend and traffic? That’s a great idea, but less so if your traffic is not tagged with the correct country. If your audience is corporate executives in MNC’s you might get hit by VPN’s routing the traffic past the employer’s motherland, and if you’ve got at complicated server setup, some of your traffic might not even attribute a country and instead seemingly appear from a field in Ashburn, Virginia or the like.

2) Traffic conversion attribution

Want to figure out where your best leads or transactions come from? That’s a great idea, but before you go crazy with automated conversion reports based on specific channels, geographic locations or device types remember that the data might not be accurate. As an example specific for e-commerce companies, some of your orders might not attribute the correct revenue type (gross or net? eh?) and internal/incorrect orders might attach a negative revenue figure which can hurt the aggregate channel/location/device performance. Mix multiple currencies into the pot and you’ve got a whole new level of headaches.

3) Traffic channel attribution

Most entry-level Google Analytics reporting starts with identifying which traffic channels perform well and matching that with cost figures from internal data sources. But what about all the data that just doesn’t get attributed correctly? Is an organic visit really an organic visit if it’s a brand-match visit from a user that just viewed an eDM from your company on their phone and got back to their desktop? Is ‘not set’ traffic really people typing your url into their browser or might it be impossible-to-tag traffic from poorly coded mobile device browsers? And what about all that direct traffic? Good question.

As with the tips explained earlier in this post, the answer to the above questions is largely to manage your expectations and look at Google Analytics as an avenue of getting closer to the truth rather than the truth itself.

My point here is, to iterate, not that start-ups should rely less on quantitative data from sources like Google Analytics. To a higher degree, it’s about realising that insights come in many forms and resisting the urge to base all reporting and performance measurement on the channels that are the easiest to load into a spreadsheet.

The more you learn about the tool, the more you realise its strengths and weaknesses. That’s great when you use the tool yourself, but perhaps even more effective when it gets used against you. If your manager base your targets on Google Analytics, wouldn’t it be nice to be able to explain why the numbers don’t look as good as they ‘should’? I think so.

Further reading:

– How to Solve Google Analytics Sampling: 8 Ways to Get More Data

Google: How sampling works

Google: About sampled data

Tableau: Sampled Data from Google Analytics

Do Things That Don’t Scale


Leave a Reply

Your email address will not be published. Required fields are marked *