Crawler Spam Referrals: How to filter them out from Google Analytics

Written by Candice Underwood on 25th April 2016

(Last updated 10th July 2018)

Crawler Spam Referrals: How to filter them out from Google Analytics

Using Google Analytics to monitor your website’s success is a great way to see what’s working and gather vital information about your site visits, but more and more we are hearing about the trouble caused by spam referrals remaining in Google Analytics reports. So why is this a problem? By showing these spam referrals Google Analytics is showing us incorrect data and skewing our results, thereby preventing us from drawing solid conclusions about our website.

There are currently two types of spam referrals - Ghost Spam and Crawler Spam. Ghost Spam is the more common and manages to trick Google Analytics into logging a visit, even though they never visit your site. Then there’s Crawler Spam which does actually visit your site. This means it can be a lot harder to identify, as it is using real data to target its destination (your website).

Last time we focussed on how to filter out Ghost Spam Referrals, but this time we will be looking at filtering out those referrals caused by Crawler Spam. So rather than including the right hostnames we will be excluding the spam domain names.

So let's look at how to get started.


Step 1: Identify the spam domain names

To do this log into your Google Analytics account and follow the steps here:

  1. In the left-hand navigation open ‘Acquisition’
  2. Select ‘All Traffic’ and below ‘Referrals’
  3. In the main area of your Analytics account select ‘Hostname’ as your secondary dimension, as seen below:

This results in:

From here we can identify our crawler spam by those spam sources which are using a valid hostname (e.g. www.liquidlight.co.uk ). These are the non-ghost spam referrals that we need to exclude in our filter.


Step 2: Create your Regular Expression

Now you can identify those spam domain names you will need to create a regular expression in exactly the same way as before. For example:

 

traffic2cash\.xyz|buttons-for-website\.com

 

Again, we would recommend putting this together in TextEdit or Notepad so you can refer back to this at a later stage. You also do not need to end the expression with a ‘|’ - this will stop the expression from being effective.


Step 3: Set up your custom filter to exclude these spam domain names

Now we have created our regular expression it’s time to revisit our filters:

  1. Go back to your ‘Admin’ area on the top menu
  2. Under ‘Account’ click on ‘All Filters’
  3. Select the red ‘Add Filter’ button
  4. You will then need to set up your filter as follows:

The final step is to verify this filter to ensure any issues are flagged up before you save.


And that’s it

By setting up these two filters they will strip out those spam referrals from any future reports generated. Unfortunately it cannot fix any past reports, but it’s a start to a spam-free future!

And that’s not to say it’s guaranteed. Filtering out spam is an ongoing process you need to keep on top of. As spam bots multiply and get smarter they will keep hitting your site from different places, so it’s important to make this a fixed part of your analytics process.

We’d recommend having a look every couple of weeks. That way it should be a quick 5 minute job, rather than trawling through months of data!

This article was posted in SEO by Candice Underwood