Webalizer Web Statistics FAQ
- 1 What is The Webalizer?
- 1.1 How do I get my web stats?
- 1.2 What is the difference between "hits," "files," "pages," and "visits?"
- 1.3 What do these Webalizer reports look like?
- 1.4 Where does webalizer get its information?
- 1.5 Where can I get more information about The Webalizer and its uses?
What is The Webalizer?
The Webalizer web server log file analysis program. It produces highly detailed, usage reports in HTML format, for viewing with a standard web browser. Sonic.net provides this analysis to assist our customers in better understanding their web and FTP traffic.
The Webalizer produces yearly, monthly, daily and hourly statistics. In the monthly reports, various statistics may be produced to show overall usage, usage by day and hour, usage by visiting sites, URL's, user agents (browsers), referrers, page and visit totals, entry and exit page totals, search string analysis, and much more.
Every day your access logs are read by the Webalizer program. This then produces a set of web pages with your site statistics.
How do I get my web stats?
Your web and FTP traffic analysis can be found at https://members.sonic.net/account/resource_usage/bandwidth_usage/ in the Sonic.net member tools. Click on the link labeled "Bandwidth Quota Usage & Webstats" and you will be presented with a list of domain names used by your account. If you have no multihomed domain names with us, only Sonic.net will appear as a domain. Click on the domain name you want to examine to see its webalizer stats. You will be presented with a 12-month summary of all traffic on the selected domain. Webalizer statistics started in mid-November 2001, so no information will be available for prior months. For a breakdown of traffic in a given month, click on the month name under "Summary by Month."
What is the difference between "hits," "files," "pages," and "visits?"
Here is a listing of terms used by The Webalizer, and how it defines each. Examination of each stastic tells you different things about your site and its traffic. The makers of The Webalizer maintain their own list of definitions at http://mrunix.com/webalizer/webalizer_help.html
Any request made to the server which is logged, is considered a 'hit'. The requests can be for anything... html pages, graphic images, audio files, cgi scripts, etc... Each valid line in the server log is counted as a hit. This number represents the total number of requests that were made to the server during the specified report period.
Some requests made to the server, require that the server then send something back to the requesting client, such as a html page or graphic image. When this happens, it is considered a 'file' and the files total is incremented. The relationship between 'hits' and 'files' can be thought of as 'incoming requests' and 'outgoing responses'.
Pages are, well, pages! Generally, any HTML document, or anything that generates an HTML document, would be considered a page. This does not include the other stuff that goes into a document, such as graphic images, audio clips, etc... This number represents the number of 'pages' requested only, and does not include the other 'stuff' that is in the page. What actually constitutes a 'page' can vary from server to server. The default action is to treat anything with the extension '.htm', '.html' or '.cgi' as a page. This is also used with other extensions, such as '.shtml', '.php3' and '.pl'.
Each request made to the server comes from a unique 'site', which can be referenced by a name or ultimately, an IP address. The 'sites' number shows how many unique IP addresses made requests to the server during the reporting time period. This does not mean the number of unique individual users (real people) that visited, which is impossible to determine using just logs and the HTTP protocol (however, this number might be about as close as you will get).
Whenever a request is made to the server from a given IP address (site), the amount of time since a previous request by the address is calculated (if any). If the time difference is greater than a preconfigured 'visit timeout' value (or has never made a request before), it is considered a 'new visit', and this total is incremented (both for the site, and the IP address). The default timeout value is 30 minutes, so if a user visits your site at 1:00 in the afternoon, and then returns at 3:00, two visits would be registered. Note: in the 'Top Sites' table, the visits total should be discounted on 'Grouped' records, and thought of as the "Minimum number of visits" that came from that grouping instead. Note: Visits only occur on Page Type requests, that is, for any request whose URL is one of the 'page' types defined with the Page Type option. Due to the limitation of the HTTP protocol, log rotations and other factors, this number should not be taken as absolutely accurate, rather, it should be considered a pretty close "guess".
The KBytes (kilobytes) value shows the amount of data, in KB, that was sent out by the server during the specified reporting period. This value is generated directly from the log file. Note: Webalizer defines a kilobyte as 1024 bytes, not 1000
Top Entry and Exit Pages
The Top Entry and Exit Pages give a rough estimate of what URL's are used to enter your site, and what the last pages viewed are. Because of limitations in the HTTP protocol, log rotations, etc... this number should be considered a good "rough guess" of the actual numbers, however will give a good indication of the overall trend in where users come into, and exit, your site.
What do these Webalizer reports look like?
See Sample Reports for a look at a sample report. This sample is hosted by the creators of The Webalizer.
Where does webalizer get its information?
Actually it's farmed straight out of the raw web logs (which you can check out at /var/log/httpd/username/ on shell.sonic.net or ftp.sonic.net). The raw logs should contain a variety of environment variables supplied to www.sonic.net when a browser requests a file from us, including HTTP_REFERRER.
- source (IP or name)
- destination domain (eg. sonic.net)
- HTTP request (normally a "GET" followed by the URL of the file in question)
- HTTP response code (normally 200 for a success)
- file size (in bytes)
- URL of the referring site (which is where Webalizer gets the search string)
- Whatever the browser identifies itself as (eg. "Mozilla/4.0 (compatible; MSIE 6.0; Windows 98; Win 9x 4.90)")
Take all that information for every request regarding your site and you can get all kinds of data together, namely the webalizer stats.
Where can I get more information about The Webalizer and its uses?
See Webalizer home page for authoritative information about the workings of The Webalizer straight from the source. You may also want to read the WebMonkeytm article on " Troubles with Tracking, " a realistic read on the capabilities and limitations of of web site tracking statistics for business.