Analyze your server logs

Analytics

Preliminaries

This note explains how to use the Analytics CLI to monitor your RWSERVE logs.

The Analytics CLI reads, filters, coalesces, and summarizes RWSERVE network logs. The tool uses a command line interface to specify what to analyze, over what time period, and using what type of output formatting.

RWSERVE network logs are kept in the system's journal, and accessed using the journalctl command. You need root privileged access rights in order to successfully use this tool.

What to examine

These are the parameters that limit what to examine.

--since — Scan the log from this point in time
--until — Scan the log up to this point in time
--authority — Include only these authorities (a.k.a. hostnames)
--messages — Include only these types of messages

--since

The --since parameter may be specified using timestamps, simple dates, descriptive names, or relative points in time.

Timestamps are specified using "yyyy-mm-dd hh:mm:ss" format. Timestamps must be surrounded by enclosing quotation marks.

Simple dates are specified using yyyy-mm-dd format. Simple dates use an implied clock time of midnight (00:00:00).

Descriptive names are shortcuts, and include the names today and yesterday. Both of these have an implied time of midnight (00:00:00).

Relative points in time refer to a point that was some time ago. Relative points in time use descriptive units such as hours, days, weeks or months Relative points in time always begin with a numeric quantity, and always end with the word ago. For example, 12 hours ago or 1 week ago.

--until

The --until parameter may be specified using all of the same formats and values used with --since.

Additionally, the descriptive name now may be specified to include all logged entries up to the present second.

--authority

The --authority parameter filters the log data so that only the specified hostname(s) are included. Use commas to separate hostnames. By default, omitting this parameter will include all hostnames.

--messages

The --messages parameter limits which types of messages to include. This is a comma separated list. The possible values are:

all
normal
abnormal
policy
cluster
config
debug
error
network
nodejs
systemd

The default value is normal. When normal is specified, the four standard request/response log entries are included and treated as a single unit. The four standard entries are: request, staging, info and response.

When abnormal is specified, all of the non-standard message types are included. Abnormal messages cannot be tabulated, tracked, traced, dumped, or used with the thread or campaign parameters. Typically abnormal messages are simply counted and reported as "number of times". The only parameter that may be used in conjunction with abnormal messages is the --find parameter.

Type of operations

The Analytics CLI can perform these types of operations:

tracking — summarizing visitor navigation between documents, showing "coming-from" and "going-to", which can help to understand what originally attracted visitors to each document, where they went next, and where they lost interest.
tabulating — examining header patterns by stratifying their values, for example, showing how many times each content-type was requested, or showing how many times each status code occurred, or showing how many times each kind of browser or crawler accessed the website.
targeting a resource — listing every access to a particular resource.
monitoring a campaign — listing every resource request that is related to a marketing campaign.
searching — searching for a particular string pattern across all logged messages.
following a thread — showing a visitor's path and all the resources accessed during a single session.
tracing an address — showing all visits from a particular IP address, with marked gaps in the timeline.
dumping facets — emitting every logged piece of information captured about a particular request/response facet.

--tracking

The --tracking parameter accepts a comma-separated list of which content-type (CT) mimetypes should have detailed tracking of coming-from and going-to.

The default, when not specified, is --tracking=text/html,application/xhtml+xml. Other values might include mimetypes such as --tracking=application/pdf,application/json.

Use all to track all mime-types. But note that any mimetype that doesn't send referer headers for hyperlinks will not be able to generate going-to information, so only coming-from information will show in the report.

Use none to omit coming-from and going-to tracking.

--tabulate

The --tabulate parameter accepts a comma-separated list of which logged items to stratify and tabulate.

The default, when not specified, is --tabulate=content-length,content-type,:status

Any request/response header may be tabulated. This is an open-ended parameter. That is, if you create a plugin that conditionally sends a custom response header of clacks-overhead, and you configure the server to log that header, then you can tabulate clacks-overhead.

This parameter accepts abbreviated header names too. For example, if you've configured the server with:

remote-address  *abbr=RA
referer         *abbr=RF
:path           *abbr=PA

then you can specify either --tabulate=RA,RF,PA or tabulate=remote-address,referer,:path

When tabulating cookies and query-strings, use square-bracket notation like: --tabulate=parameter-map[utm_term] or --tabulate=PM[utm_term].

Specify none to omit all tabulation.

--resource

The --resource parameter accepts a single document resource path (PA) specifying the directory and resource portion of a URL, without any query-string values.

This parameter will list all facets going through this resource path.

A facet is a single request/response. It is uniquely identified with the concatenation of the process-id plus session-id plus request-response-id, and is formatted as PID-SID-RR.

--campaign

The --campaign parameter accepts a comma-separated list of query-string keys, as they appear in the URL's resource path.

This parameter will list all facets accessed with any of these query-string keys.

For example, when using Urchin Tracking Module style parameters, the parameter might be --campaign=utm_source,utm_term,utm_content,utm_medium,utm_campaign.

--find

The --find parameter accept a simple string value. This parameter will list all logged items whose value includes the given string. This parameter may be used with both normal and abnormal message types.

For example, to find all request/response logged items having the word "moon", use

--messages=normal --find=moon

And to find all configuration messages having the word "invalid", use

--message=config --find=invalid

--thread

The --thread parameter accepts a single session spec, consisting of a process-id plus a session-id, in the format PID-SID.

This parameter will show the visitor's path during this session, from its origin to its point of departure.

--trace

The --trace parameter accepts a single IPv4 address, in the form xxx.xxx.xxx.xxx.

This parameter will show all resources requested from that address. It will report the duration of the time gaps between separate sessions allowing for easier comprehension of return frequencies.

--dump

The --dump parameter accepts a facet-id, in the form PID-SID-RR. This parameter will provide a detailed report of everything known about the specified request/response.

--argv

The --argv parameter may be added to any Analytics CLI invocation. It will print the values for each of the parameters that were used in the generation of the report.

Formatting the output

The Analytics CLI can format the output in three ways:

--output=text simply creates the report using plain text.
--output=html format the output using HTML.
--output=json produces a JSON object.

Configuring periodic reports

The use of cron is the standard way to produce periodic reports. Be sure to specify the root user as the process owner because root privileges are needed to access the journalctl tool.

If you choose to have the report sent by email and you want the output to be in HTML, place these settings at the top of your cron file:

PATH=/sbin:/bin:/usr/sbin:/usr/bin
MAILTO=admin@example.com
MAILFROM="hostname"
CONTENT_TYPE="text/html; charset=utf-8"

By way of example, the following will configure cron to run four periodic reports:

A daily tracking report each night at 1:00am.
A weekly tabulation report each Monday morning at 2:00am with stratifications across referer, remote-address, user-agent-common-name, status, content-type, and content-length.
A monthly campaign report the first of each month for utm_term and utm_source.
A report on abnormal messages every day at 6:00am and 6:00pm.

00      01    *     *     *     root rwserve-analytics --since="yesterday"    --until="today" --output=html --tabulate=none --tracking=text/html,application/json
00      02    *     *     1     root rwserve-analytics --since="1 week ago"   --until="today" --output=html --tracking=none --tabulate=RF,RA,UACN,ST,CT,CL 
00      03    *     1     *     root rwserve-analytics --since="1 month ago"  --until="today" --output=html --campaign=utm_term,utm_source
00      06,18 *     *     *     root rwserve-analytics --since="12 hours ago" --until="now"   --output=html --messages=abnormal
# -     -     -     -     -      -
# |     |     |     |     |      + user account name
# |     |     |     |     +----- day of week (0 - 6) (Sunday=0)
# |     |     |     +------- month (1 - 12)
# |     |     +--------- day of month (1 - 31)
# |     +----------- hour (0 - 23)
# +------------- min (0 - 59)