Analyze your server logs
This note explains how to use the Analytics CLI to monitor your RWSERVE logs.
The Analytics CLI reads, filters, coalesces, and summarizes RWSERVE network logs. The tool uses a command line interface to specify what to analyze, over what time period, and using what type of output formatting.
RWSERVE network logs are kept in the system's journal, and accessed using the
journalctl command. You need root privileged access rights in order to successfully use this tool.
What to examine
These are the parameters that limit what to examine.
--since— Scan the log from this point in time
--until— Scan the log up to this point in time
--authority— Include only these authorities (a.k.a. hostnames)
--messages— Include only these types of messages
--since parameter may be specified using timestamps, simple dates, descriptive names, or relative points in time.
Timestamps are specified using
"yyyy-mm-dd hh:mm:ss" format. Timestamps must be surrounded by enclosing quotation marks.
Simple dates are specified using
yyyy-mm-dd format. Simple dates use an implied clock time of midnight
Descriptive names are shortcuts, and include the names
yesterday. Both of these have an implied time of midnight
Relative points in time refer to a point that was some time ago. Relative points in time use descriptive units such as
months Relative points in time always begin with a numeric quantity, and always end with the word
ago. For example,
12 hours ago or
1 week ago.
--until parameter may be specified using all of the same formats and values used with
Additionally, the descriptive name
now may be specified to include all logged entries up to the present second.
--authority parameter filters the log data so that only the specified hostname(s) are included. Use commas to separate hostnames. By default, omitting this parameter will include all hostnames.
--messages parameter limits which types of messages to include. This is a comma separated list. The possible values are:
The default value is
normal is specified, the four standard request/response log entries are included and treated as a single unit. The four standard entries are:
abnormal is specified, all of the non-standard message types are included. Abnormal messages cannot be tabulated, tracked, traced, dumped, or used with the thread or campaign parameters. Typically
abnormal messages are simply counted and reported as "number of times". The only parameter that may be used in conjunction with
abnormal messages is the
Type of operations
The Analytics CLI can perform these types of operations:
- tracking — summarizing visitor navigation between documents, showing "coming-from" and "going-to", which can help to understand what originally attracted visitors to each document, where they went next, and where they lost interest.
- tabulating — examining header patterns by stratifying their values, for example, showing how many times each content-type was requested, or showing how many times each status code occurred, or showing how many times each kind of browser or crawler accessed the website.
- targeting a resource — listing every access to a particular resource.
- monitoring a campaign — listing every resource request that is related to a marketing campaign.
- searching — searching for a particular string pattern across all logged messages.
- following a thread — showing a visitor's path and all the resources accessed during a single session.
- tracing an address — showing all visits from a particular IP address, with marked gaps in the timeline.
- dumping facets — emitting every logged piece of information captured about a particular request/response facet.
--tracking parameter accepts a comma-separated list of which content-type (CT) mimetypes should have detailed tracking of coming-from and going-to.
The default, when not specified, is
--tracking=text/html,application/xhtml+xml. Other values might include mimetypes such as
all to track all mime-types. But note that any mimetype that doesn't send referer headers for hyperlinks will not be able to generate going-to information, so only coming-from information will show in the report.
none to omit coming-from and going-to tracking.
--tabulate parameter accepts a comma-separated list of which logged items to stratify and tabulate.
The default, when not specified, is
Any request/response header may be tabulated. This is an open-ended parameter. That is, if you create a plugin that conditionally sends a custom response header of
clacks-overhead, and you configure the server to log that header, then you can tabulate
This parameter accepts abbreviated header names too. For example, if you've configured the server with:
then you can specify either
When tabulating cookies and query-strings, use square-bracket notation like:
none to omit all tabulation.
--resource parameter accepts a single document resource path (PA) specifying the directory and resource portion of a URL, without any query-string values.
This parameter will list all facets going through this resource path.
A facet is a single request/response. It is uniquely identified with the concatenation of the process-id plus session-id plus request-response-id, and is formatted as
--campaign parameter accepts a comma-separated list of query-string keys, as they appear in the URL's resource path.
This parameter will list all facets accessed with any of these query-string keys.
For example, when using Urchin Tracking Module style parameters, the parameter might be
--find parameter accept a simple string value. This parameter will list all logged items whose value includes the given string. This parameter may be used with both normal and abnormal message types.
For example, to find all request/response logged items having the word "moon", use
And to find all configuration messages having the word "invalid", use
--thread parameter accepts a single session spec, consisting of a process-id plus a session-id, in the format
This parameter will show the visitor's path during this session, from its origin to its point of departure.
--trace parameter accepts a single IPv4 address, in the form
This parameter will show all resources requested from that address. It will report the duration of the time gaps between separate sessions allowing for easier comprehension of return frequencies.
--dump parameter accepts a facet-id, in the form
PID-SID-RR. This parameter will provide a detailed report of everything known about the specified request/response.
--argv parameter may be added to any Analytics CLI invocation. It will print the values for each of the parameters that were used in the generation of the report.
Formatting the output
The Analytics CLI can format the output in three ways:
--output=textsimply creates the report using plain text.
--output=htmlformat the output using HTML.
--output=jsonproduces a JSON object.
Configuring periodic reports
The use of
cron is the standard way to produce periodic reports. Be sure to specify the
root user as the process owner because root privileges are needed to access the
If you choose to have the report sent by email and you want the output to be in HTML, place these settings at the top of your cron file:
By way of example, the following will configure cron to run four periodic reports:
- A daily tracking report each night at 1:00am.
- A weekly tabulation report each Monday morning at 2:00am with stratifications across referer, remote-address, user-agent-common-name, status, content-type, and content-length.
- A monthly campaign report the first of each month for utm_term and utm_source.
- A report on abnormal messages every day at 6:00am and 6:00pm.
00 01 * * * root rwserve-analytics --since="yesterday" --until="today" --output=html --tabulate=none --tracking=text/html,application/json
00 02 * * 1 root rwserve-analytics --since="1 week ago" --until="today" --output=html --tracking=none --tabulate=RF,RA,UACN,ST,CT,CL
00 03 * 1 * root rwserve-analytics --since="1 month ago" --until="today" --output=html --campaign=utm_term,utm_source
00 06,18 * * * root rwserve-analytics --since="12 hours ago" --until="now" --output=html --messages=abnormal
# - - - - - -
# | | | | | + user account name
# | | | | +----- day of week (0 - 6) (Sunday=0)
# | | | +------- month (1 - 12)
# | | +--------- day of month (1 - 31)
# | +----------- hour (0 - 23)
# +------------- min (0 - 59)