Synovate - The global market research company driven by curiosity


Website Stats And Fool’s Gold

SmartMarketer, October 2001

 

In my last few Smartmarketer columns I’ve addressed the need and potential gain to be made from analyzing the behaviour of your website visitors.  Just as retailers, for example, will generally be pleased from noting an extended period of in-store time per shopper, so should most online operators be able to record and easily track time spent within a site – this is just one example of how offline store management can be easily transferred to the online operations.  Other insights to be gained from good website behaviour statistics include answers to the questions “what’s the most popular section of the site?”,  “the least?”, “where do they enter and leave?”,  “how do they get from part A to part B?” and so on.

 

But where there’s gold, there’s fool’s gold.  Just because those numbers on the screen are there, and probably have been provided by a well-meaning developer, hoster, or webmaster, doesn’t been they’re actually any good – chances are, they’re actually quite inaccurate and difficult to work with.

 

So let’s take a minute to understand how those numbers are usually generated – no, this isn’t a maths lesson – just a trip to follow a website page from birth to your screen.  For starters, understand that every link you click on is a request to your ISP which says” please deliver me page X”.  ISPs long ago realized that they could save money and time by storing copies of the most frequently requested pages, and so will often deliver their copy of a page rather than taking the extra time and bandwidth to get it from the original source.  So long as the copy is current, no problems to the end user.  But this means that your request for this page, and the subsequent viewing of it, is missed by the server logs where the page originally resides. As the website manager, your server logs may tell you this page has been seen 1000 times – it could be 1200 once ISP “caching” is accounted for.  Or, it could be 800 – the other 200 views arising from search engine programmes which request pages from your server just like any other user.  This is just the start of the problems.

 

Other problems?  Consider other places where a website page could be stored and viewed from without alerting your servers – try the memory within your own internet browser for starters.  Most of your recently viewed sites will be stored in here for rapid re-delivery, i.e. these sites will to varying degrees be delivered to you without going back to the source – this is why sites viewed by hitting the “back” button will come up faster than when originally viewed.  Other such memories (caches) can be found within company LANs as well.

 

In addition, many server-log based statistics packages find it difficult to distinguish between the individual frames on a framed website. I’ve seen many cases where a 3-frame webpage is reported as three distinct “pages” (triple overcounting anyone?).

 

So now what? Well, realize that most of your user statistics arise from the page-view information discussed above in combination with “unique user” data. In this way a good statistics package is able to calculate session-length, visitor frequency, website stickiness, navigation, visitor churn and so on. But most server-log statistics packages can only distinguish between “IP addresses”, i.e. where a request is coming from.  Instead of recording 1000 people, a server log may just record the one ISP those people are using.  Or, for example, the company firewall behind which 20 people may be viewing your site.

 

In a nutshell, server log statistics have to be viewed with a large boulder of salt. Some of their data can be quite accurate (e.g. referring URL), and some totally off (e.g. page views). In addition to this, the memory required to process the data can be prohibitive.

 

The answer lies in recording the traffic at the browser level, as opposed to examining activity at the server end.  This entails including a small script within each of your  website pages, which communicates back to a third-party server, essentially saying “this page has just been viewed by user number X” (the user being identified by cookies, generally the most accurate non-intrusive method around).  This third-party server referred to is that owned and run by the company providing the script to begin with, which then analyses the data and presents it in a secure online environment for you to see whenever desired. This data is usually presented to the user in a dynamic database-driven environment, meaning you can “mix and match” your stats with massive flexibility and speed – all which serves you well in understanding your website visitors’ behaviours. Key to know too that these companies are using very clever programmes designed to ensure maximum data accuracy yet with minimum effects on the user experience.

 

This approach to website traffic measurement is still relatively unknown in NZ, although it’s used by most market leaders overseas and many major companies in this country. Companies which provide such services include Webtrends (their Webtrends Live option), Superstats, Hitbox, and RedSheriff – all worth a call.

Jonathan Dodd