Bot Filtering: Remove Invalid Traffic from your Data

Written by: Stephen Published: Last Modified:
GoogleBot crawls over the internet 24/7, and likely loads your ads multiple times a day. You likely don't want to count impressions or clicks made those bots, crawlers and spiders. Most ad servers, like AdGlare, uses bot filtering at engine level. Here in 2018, bot traffic accounts for more than 50% of all global internet traffic. An incredible number that we shouldn't neglect. Unless you're running your own campaigns, it's imperative that you're filtering for bot traffic to avoid getting skewed reports.

Although bot activity fluctuates over the years, we can't deny the huge impact that bots and spiders have on our statistical data. Back in 2016, Incapsula released a great infographic to give us an update on where we're heading.

Bot traffic report 2016
Source: Incapsula

So, how does Bot Filtering work?

Genuine bots and crawlers tell us who they are via the User Agent string that is passed along with each HTTP request. This string will be matched against IAB's list of known bots and spiders to determine if we're dealing with a human or non-human traffic. For example, Googlebot uses the following user agent string:
Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)
The Media Rating Council (MRC) has set a standard for detection and filtering of invalid traffic (PDF here). If the ad serving engines receive a request from such a user agent, an advertisement will be returned but the impression or click will simply not be logged. AdGlare uses this method to make sure page layout remains the same whether a bot or a human visits the page. This is important to let Google determine which content is above the fold - a significant factor in SEO.
Note that we're not speaking about "bad bot" filtering for automated ad exchanges or programmatic ad buying. We're speaking about detecting and filtering spiders and crawlers (as per the infographic above).

Why does this all matter?

In the online advertising industry, publishers are getting paid to show advertiser's ads. The publisher's inventory is bought to shows these ads in order to attract potential buyers, which are humans, not bots. As most inventory is sold on a CPM basis, it doesn't make sense to serve half of the campaign to bots while getting paid for it. It's therefore common practice for advertisers to insist on bot filtering when closing a deal with a publisher or ad network.

How to enable Bot Filtering in AdGlare

It's highly recommended to enable bot filtering to minimize discrepancies with third party ad servers, especially if you're a publisher. To do so, follow these steps:
1 Click Settings => Main Configuration from within your ad server portal
2 Click the Engine Config tab
3 Enable the Bot/Crawler Filter

Bot Filtering in AdGlare


Filtering IP addresses from Malicious Networks

In addition to filtering invalid traffic from bots, you may also want to consider to filter requests made from known malicious networks. A quick search on Google can provide you with a list of IP addresses (likely CIDR notations) from networks known to be infected with software to automatically crawl pages to artificially inflate impressions. AdGlare can filter those impressions and clicks at two levels:
  • Campaign level. See the Targeting Rules tab when editing a campaign.
  • Engine level. See the page Settings => Main Configuration.
Note that the filtering works slightly different than described above. Instead of returning an ad, the engines will simply respond with 'no ads available' for requests made from those IP ranges. The end result is the same: these impressions and clicks are not logged whatsoever, keeping your statistical reports free of bot traffic.

That's not all. A few more things...

Now you're right on track to improve your CTR and the quality of your inventory, it's absolutely worth it to consider the following practices as well.
  • Lazy-Loading Ads. A banner is only loaded when it's scrolled into view, right in front of the visitor's eyes. Highly recommended, as it doesn't make sense to load ads below the fold that are never seen.
  • Nofollow attributes. This avoids passing on the link juice to low authority domains on outbound links. Buying links is a black hat SEO technique, penalized by Google.

Weekend Special: Creating a Bot Trap!

Always wanted to test the claim that 50% of all traffic comes from bots? If you have some time to spare, you can create a simple bot trap. The idea? Create a small 10 x 10 pixel transparent image that links to a secret URL (i.e. your trap page). The image is invisible to humans, but bots will follow the URL and end up on your secret page. Simply log how many times your secret page is visited and match that against the total number of visits. You'll have a good guess of the amount of bot traffic that your website receives.


Further Reading...
Google Bot Information support.google.com

Do you think your followers may also like this article? Thanks for sharing! :)
Share this
Article!
Permalink
Ad Server.
Simplified.
Ready in 30 seconds
Click screenshot to open the Demo.
About the Author
is an ad platform for advertisers, publishers and agencies. By employing 5 data centers worldwide to reduce network latency, the ad tags are one of the fastest in the industry.