Verified Data Logo

[vc_row][vc_column width=”2/3″][vc_column_text]You can and should optimise your spider. This is because your spider results are the foundation for ALL further tests: garbage in = garbage out. Also, testing irrelevant or duplicate content is wasteful – each page must first be loaded by our headless browser with time allowed for all assets to load (excluded assets are images, css and video files). This takes time and can be as long as 10 seconds per page, so this adds up.

Note, as a subscriber user you can audit the results of the SAME spider unlimuted times – without additional cost. However you account billing contains a URL limit that may need to be topped up with extra URLs if you exhuast these needlessly. So spiders are important!

Tips for Spider Optimisation

In this context optimisation means generating a representative spider size in the most efficient way. That means crawling a large representative sample of your website content for the most complete picture.

Avoid collecting multiple URLs for the same page e.g. /productX/?match=black and /productX/?match=white are usually the same page/same content. Also, unless older content is valuable to you, avoid spidering it e.g. exclude /2009/, /archives/ etc.

QUERY PARAMETERS – IGNORE

The common optimisation technique is to ignore URLs that differ only by their query parameter values. By doing this, any URL that differs only by the ignored parameter value specified is considered a duplicate page – and the spider will ignore it.

By default the QUERY PARAMETERS – IGNORE field excludes the following from all crawls (i.e. you do not need to add them):

campaign
ccm_token
color
colour
dir
filter
filter-color
filter-colour
filter_color
filter_colour
height
mc_id
mkt_tok
notso
order
order-by
orderby
order_by
print
referrer
render
replytocom
size
sort
sort-by
sortby
sort_by
url
width
_ke

[/vc_column_text][/vc_column][vc_column width=”1/3″][/vc_column][/vc_row]