Ghost Spam Referrals: How to filter them out from Google Analytics

Google Analytics can provide some fantastic statistics for your website and give you a great idea of who is visiting your website, where from, how etc. However, one thing that keeps cropping up recently is the amount of spam referrals that remain in the reports; skewing the results and making it harder to determine valid good referrals. This is a problem as its limiting our understanding of the website’s success and stopping us from being able to interpret the results correctly.
There are currently two types of spam referrals in Google analytics: Ghost Spam and Crawlers. It’s most likely both are affecting your site but they are dealt with in different ways, so it’s important to understand the difference.
Ghost Spam is the most common type of spam and manages to trick Google Analytics into logging a visit, even though they never actually visit your site. Whereas Crawlers do visit your site and can be harder to identify, as they are using real data and have a fixed target.
In this article, we will be focussing on Ghost Spam so here are some quick tips to keep on top of it:
Step 1: Identify your valid hostnames
When a visitor puts in your URL in their browser their device contacts your page and connects to your site. Google Analytics then logs this ‘hostname’ their device has requested; the ‘hostname’ being where they have arrived, so the full URL of the site requested. For example if you visited our website the hostname logged would be ‘www.liquidlight.co.uk’.
Now as we learnt earlier, ghost spam does not actually visit your site and so they will show alien hostnames in comparison to what you would expect. We can use this fact to our advantage by setting up a filter that defines the ‘hostnames’ we want Google Analytics to keep the data from. The good news is this should clear the majority of this ghost spam.
So, to get started we need to identify the hostnames linked to your website. Start by logging into your Google Analytics account and follow these steps:
- Go to the Reporting tab at the top of the page
- Click on ‘Audience’ in the left-hand menu
- Select the arrow next to ‘Technology’ to expand and choose ‘Network’
- Finally, on the report click ‘Hostname’:
You will now see a full list of hostnames including those the spammers use, so the next thing to do is filter out the genuine ones. These are things like ‘yourdomainname.com’,’blog.yourdomainname.com’, plus any paid for referral domain names.
Now you have your hostnames you need to create a regular expression that looks like the following:
yourmaindomain\.com|paidforrefferal\.com|validhostname\.com
We would suggest creating this in something like Notepad or TextEdit, so you can refer back to it later.
One thing to note is you don’t need to put in your sub-domains, as your main domain name will cover this. You also don’t need to end the expression with a |
- this is a very common mistake when setting up these expressions. You will then need to set up a new filter in your Google Analytics accounts using this regular expression.
Step 2: Setting up your custom filter in Google Analytics
To get started with your custom filter go to your Google Analytics accounts and click ‘Admin’ at the top of the page. You will then need to follow these steps:
- Under ‘Account’ select ‘All filters’
- Click on the red ‘Add Filter’ button
- You will then need to set up your filter as follows:
So, in short, you will need to ensure your filter type is ‘Include’, and you have pasted your regular expression from earlier into the ‘Filter Pattern’ field.
Once you have set this up we’d suggest verifying it, just to make sure it doesn’t throw up any errors. You can do this by clicking ‘verify’ at the bottom.
Once you are happy choose which views you would like this to apply on and save your changes. From that point on any new ghost spam referrals should then disappear from your reports.
The next step is then to filter out your Crawler Spam, which you can find more about in our article Crawler Spam Referrals: How to filter them out from Google Analytics.