The ability to draw valid conclusions from any data-collecting tool obviously derives from that tool's thoroughness and accuracy. While there are technical limits to the real-world precision of online metrics (cookie-clearing by site visitors comes to mind), it is easy to shoot oneself in the foot by mis-installing or misconfiguring web analytics software.
This article is focused on the most common ways Google Analytics (GA) can be improperly set up, and tools and procedures to correct and preempt these mistakes, some of which are relevant to any other analytics suite. Indeed, while GA is easy to set up to collect basic data, more advanced features can be tricky to fully implement.
GA 101: accounts, trackers, domains
When debugging, better rule out the easy stuff first, especially if you are not seeing any data at all. Before doing anything else, doublecheck that 1) the tracking code is indeed in your website's HTML source code, 2) you are using the right tracking code, 3) you are checking the right GA account in the application's settings, 4) GA is acknowledging that it is receiving data for that account, and 5) there are no "rogue" sites using your UA code out there.
These sound like obvious mistakes but they are common nonetheless, and easy to make if you are handling several Analytics accounts, or went through site migrations, domain name changes and other site "life events" that might have rocked the boat.
Cross-domain tracking is more tricky than your typical single domain setup, and is best done with several profiles and specialized filters. This site spells out steps to check such a case, including conversion tracking.
Are your analytics really broken?
Again in the spirit of avoiding a very time consuming white whale chase, become familiar with explainable (and acceptable) sources of discrepancies between analytics solutions. Aim for functional, not "perfect." In Avinash Kaushik's words, data quality sucks so get over it. Of course this is not a call to settle for dysfunctional mediocrity, but the data-driven marketer needs to know when to keep pushing for better data, and when to accept what they have and move on.
Is Google Analytics full of fail?
It is easy to "blame the vendor," and sometimes your software provider is indeed guilty as charged. Before spending too many cycles on triple-checking your syntax, especially if you haven't changed your site recently, have a look at the official blog and the Google Analytics Status Dashboard. Google has admitted to some hiccups with its Analytics software, such as missing April 2011 data.
Conflicting Scripts and Apps
Occasional Google glitches aside, most of the time, when marketers are faced with dubious or missing data in their web analytics reports, it is their analytics implementation that's at fault rather than a bug in the analytics package itself. Many GA setups fail because of incomplete or buggy integration with the other software used to run a website, such as content management, ecommerce and payment systems:
- Tracking code conflict: make sure not to confuse GA with Google Web Optimizer code: GWO should use its own UA- tracker ID, and the function call to ga.js should appear just once on a page (after both GWO and GA tracking code, and yes that snippet can be split from _gaq pushes). Similarly, check that you don't have other scripts using variables named _gat or _gaq. Similarly, keeping the old log-based Urchin code alongside GA can cause problems.
- Obsolete CMS plugins that insert GA code for you can stop working, especially if they become out-of-sync with the underlying CMS (it's easy to upgrade a CMS while overlooking that plugins should be updated too). If you are using one, check they are up-to-date and read their changelogs, official blog or user community to see if other people are reporting problems similar to yours.
- If Google Analytics integration is disabled in your Adwords account, that's because you have to allow data sharing between the two.
Goals, funnels, and filters
If your goals are not being tracked (i.e. GA never reports any match) make sure your URLs are an exact match, or double-check your regular expressions, depending on how the goal rules have been written. Exact match is easier to use but less flexible and more brittle if you are going to change URLs or want to match a whole family of similar URLs.
Badly set up funnels can show incoherent data such as everyone leaving after an early step while your goal conversion does show conversion events further down the funnel. See this page on common funnel problems and solutions.
When running filters, check you understand their syntax. If you are using several filters, bear in mind they are executed one after the other and "feed" into each other. Using more than one Include filter can lead to data loss and should be done with caution.
If you have already made sure your various traffic generation efforts (e.g. in email newsletters) embed the right URL parameters, the next step is to verify that the redirects properly work. In larger teams it is advised to use a centralized online document or spreadsheet to keep track of normalized campaign parameters.
Site search, site overlay, site speed
Some GA features do not work if you rewrite URLs, so if you want to use site search or site overlay, make sure to use a separate profile from the one where you generate readable "fake" URLs. If your CMS allows it or if you can run custom server-side code (e.g. in PHP) you can also make GA believe you are using a search query parameter even if you are not.
Meanwhile, an easy oversight is that the newer site speed feature requires the addition of a line of code, it is not supported with the default code snippet.
Asynchronous tracking, ecommerce and custom variables
While CMS upgrades and migrations are a leading cause of analytics problems on accounts that used to work, moving to the newer async GA code has to be done with similar caution. Among specific problems with the latest generation of GA scripting:
- Stick to the exact spelling and casing of method names (e.g. _trackPageview) - they are case sensitive.
- Be careful to not have leading or trailing whitespace when you're pushing the tracking code.
- Pass along strings within quotes, but do not otherwise use quotes for other value types such as booleans.
- When coming from the older (synchronous) syntax, make sure you have converted everything, including ecommerce integration. Speaking of which, check that you do not have improperly escaped special characters or apostrophes getting in the way. See more on this topic.
- If you are using custom variables, verify that you are following these guidelines. Mixing page, session and visitor-level variables in the same slot is not recommended. Migration from the deprecated _setVar method to _setCustomVar should be done carefully. And while the dreaded "%20" bug was finally fixed in May 2011, this means filters need to be rewritten. Also, know that custom variables typically take longer to appear in GA than "regular" data - it can take up to 48 hours on very large sites.
Data sampling, and, dare we say, math
More complex queries involving custom reports or advanced segments do not retrieve exact data, but rather an approximation based on sampling. This is a feature, not a bug, otherwise some queries would be very slow and/or GA's pricetag couldn't be $0. The user interface makes it pretty clear when sampling is at play - it will either mention sampling by name, or "fast access mode." As a side note, larger sites can also explicitly set up their tracker to use sampling - it's easy to have a look at your tracker code to verify whether yours is set up that way.
And to fix what might look like issues coming out of the blue such as apparently wild metric swings, it is also necessary to understand the math behind web analytics. As stated earlier, first you must find whether it is your analytics that need to be fixed our the underlying facts being measured.
Auditing and support tools
Other troubleshooting resources include:
- ObservePoint's SiteAudit lets you run automated, recurring scans. [Edit: we are told EpikOne is no longer active, and while the SiteScan website still exists, it no longer processes submitted sites.]
- Analytics HealthCheck takes a different approach and looks at your GA data via the Export API to look for anomalies such as cross-domain problems.
- Regular expressions (regexp) can be very complex to write, and worse to troubleshoot. Typically they can be too broad and "greedy", matching more strings than their maker intended (on GA this can translate into too aggressive filters). These sites can help: RegEx ebook for GA, Regular-Expressions.info, RegExr, RegexPal, txt2re. However, no matter what, know that they are not for the faint of heart. Best trust your favorite Vulcan developer with them, unless you were looking for a new time-consuming hobby.
- Beyond the links included earlier in this article, Google has introduced an interactive help tool alongside the launch of GA v5.
- There used to be a forum just for troubleshooting within the official GA community but it was retired years ago. Most such discussion can now be found under "Set Up Your Tracking." As in all such communities, to improve your chances to get a helpful answer, make sure to do your homework before asking any question.
Limiting future breakage and discrepancies
To reduce the risk of disrupting existing GA code, try testing code tweaks on a separate test tracker first, thanks to GA's support for multiple trackers. This way you will not compromise your production data and you can iterate more freely on the test tracker until it works to your satisfaction.
However, mistakes can still happen. To make your setup easier to audit, use the annotation feature to take note of GA implementation changes rather than just documenting changes with your website itself. Bigger sites are also hopefully committing code changes to a database or repository (ask your developers whether your organization is using something like Subversion or Git) which leaves another electronic trail to track down and possibly reverse analytics code issues. Either way, you need a system to help you assess whether changes reported by your analytics software are changes in the underlying real-world events, or changes in the measurement themselves.
If you are using a complex combination of tags from several vendors, "tag creep" can lead to a brittle setup that is more likely to break and harder to maintain. In that case consider a tag management solution or universal tag from vendors such as TagMan, BrightTag, UberTags, Tealium or Ensighten. (Clearly by the time online marketing became complex enough to justify tag consolidation software, the industry had already run out of good company names.) Read more on why a TMS may or may not be a good idea, and the obligatory Forrester whitepaper.
Finally, if you do extensive internal testing, you might want to block the IP addresses from which the test traffic is coming, as well as using conditional code to prevent GA from being loaded in test or staging environments. This is especially important if your development team is using browser emulation for regression testing which can generate a high number of pageviews and events.
Editorial note: we're trying occasional longer, more technical articles here on MarketingVox to help marketers with "rubber meets the road" operational issues that can involve your development and design teams, as well as the tools of trade. Please let us know how that works out for you and feel free to suggest topics you'd like to see treated in a similar fashion. Send your thoughts and any other feedback to olivier @ watershed-publishing.com