3 Easy Steps to Filtering out Spam in your Google Analytics

Google Analytics is the pulse of your online business. Without it, you know nothing about the traffic coming to your website, their demographics, or the length of time that they are spending with you. It is arguably one of the most important parts of running your website (and your small business), so how do you keep it clean and prevent of all the of the spam that skews your results? With a few simple filters and something Google Analytics (GA) calls segments, you can do it quickly and easily.

1. View nice and clean reports

A great thing to do first is clean up your existing reports through the use of "segments". Let's implement a new segment that is going to show only true visits to our website from verified hostnames. Start by opening up your GA account and navigating to "Network" within the "Technology" branch of the "Audience tree". On this tab, select the "Hostname" dimension just below the graph, like this:

This is a list of all the hostnames that GA is including in your reports, and what we want to do is identify the ones that are valid. What tends to happen is that your site starts receiving "Ghost Spam", which isn't actually a bot, but direct injections into your analytics. Since these aren't originating from anywhere and don't actually view your web pages, they are often seen as (not set) on your reports.

An easy and recommended way to filter out this traffic is to simply find the hostnames that you recognize and make a regex list of them. With the example above, I only want to track the traffic that directly hits my website, and it's subdomains. So my regex list would look like this;

emeentsmedia\.com  

But if you wanted to include additional hostnames, you would separate them with an or command which is signified with a bar, |, like so:

emeentsmedia\.com|testthis\.com|anotherexample\.com  

In regex the dot and hyphen are special characters, so precede them with a backslash \, don't include any spaces, nor start or end your list with the |. Lastly, by only including the name of our domain emeentsmeda\.com we are able to match any subdomains.

Now we have our list, let's head back to our "Audience" overview branch and create a new segment, which will filter out anything that isn't in our regex list. At the top of the overview, click + Add Segment, and in this new menu click the + New Segment. Give your segment a name, then click conditions under advanced. We now want to specify we are including only hostnames that match our regex list:

Click save and your report should now be displaying only the genuine traffic that you've specified!

2. Preventing spam and ghost spam with filters

Using the same regex list you created above, now click on the filters option within the "View" column of your "Admin" panel. Create a new filter named "Include Valid Hostnames" by clicking the + Add Filter button at the top.

You want to define the "Filter Type" as "custom", check the "include" option, and the set the "Filter Field" to Hostname. In the "Filter Pattern" box paste in your regex list, and then verify the filter. If everything checks out, click save and now you will no longer track ghost spam!

The next thing we want to do is exclude crawler referral spam, which as ohow.co put it,

Crawler spam is harder to detect since it uses a valid hostname, so you'll need a different filter with an expression that matches all known crawler spam.

Ohow has a really great set of expressions that I recommend you use to try and catch all known crawler spam. I've listed them below, but ohow keeps there list up to date, so check there often for the new ones!

Returning back to our filters, we want to add two new ones, each with the regex expressions below. Give them names that you'll recognize easily, set the filter type to custom, and then exclude "campaign source". Input the filter patterns, and click save.

# Expression 1
(best|dollar|success|top1)\-seo|(videos|buttons)\-for|anticrawler|^scripted\.|\-gratis|semalt|forum69|7make|sharebutton|ranksonic|sitevaluation|dailyrank|vitaly|profit\.xyz|rankings\-|dbutton|\-crew|uptime(bot|check|\.com)

# Expression 2
responsive\-test|torrent\-to|magnet\-to|dogsrun|tkpass|free\-video|keywords\-monitoring|pr\-cy\.ru|fix\-website  

These two expressions need to be in separate filters as the limit per filter is 255 characters. Once each expression is added your just about all set! There's one last thing we want to do to really lock down any spam from getting to us.

3. Save the easiest for last

Analytics actually already comes equipped to prevent the most common of bots and spiders, and it's right in the view settings under the admin panel. Near the bottom of the "view settings" panel there is a checkbox that by default is not checked that reads "exclude all hits from known bots and spiders". Check that puppy and click save. You're done!

With these three simple steps you can effectively prevent a majority of the spam, ghost spam, bots, and spiders that plague every analytics account. You'll get a more accurate read on your viewers visits to your sites, and will be able to make more informed decisions about how to improve your business.

If you have any questions, or want to leave your feedback, please do in the comments below! Thanks for visiting!

David Meents

Read more posts by this author.

Subscribe to eMeents Media Blog

Get the latest posts delivered right to your inbox.

or subscribe via RSS with Feedly!
comments powered by Disqus