Analyze your server logs
Analytics
Preliminaries
This note explains how to use the Analytics CLI to monitor your RWSERVE logs.
The Analytics CLI reads, filters, coalesces, and summarizes RWSERVE network logs. The tool uses a command line interface to specify what to analyze, over what time period, and using what type of output formatting.
RWSERVE network logs are kept in the system's journal, and accessed using the journalctl
command. You need root privileged access rights in order to successfully use this tool.
What to examine
These are the parameters that limit what to examine.
--since
— Scan the log from this point in time--until
— Scan the log up to this point in time--authority
— Include only these authorities (a.k.a. hostnames)--messages
— Include only these types of messages
--since
The --since
parameter may be specified using timestamps, simple dates, descriptive names, or relative points in time.
Timestamps are specified using "yyyy-mm-dd hh:mm:ss"
format. Timestamps must be surrounded by enclosing quotation marks.
Simple dates are specified using yyyy-mm-dd
format. Simple dates use an implied clock time of midnight (00:00:00)
.
Descriptive names are shortcuts, and include the names today
and yesterday
. Both of these have an implied time of midnight (00:00:00)
.
Relative points in time refer to a point that was some time ago. Relative points in time use descriptive units such as hours
, days
, weeks
or months
Relative points in time always begin with a numeric quantity, and always end with the word ago
. For example, 12 hours ago
or 1 week ago
.
--until
The --until
parameter may be specified using all of the same formats and values used with --since
.
Additionally, the descriptive name now
may be specified to include all logged entries up to the present second.
--authority
The --authority
parameter filters the log data so that only the specified hostname(s) are included. Use commas to separate hostnames. By default, omitting this parameter will include all hostnames.
--messages
The --messages
parameter limits which types of messages to include. This is a comma separated list. The possible values are:
- all
- normal
- abnormal
- policy
- cluster
- config
- debug
- error
- network
- nodejs
- systemd
The default value is normal
. When normal
is specified, the four standard request/response log entries are included and treated as a single unit. The four standard entries are: request
, staging
, info
and response
.
When abnormal
is specified, all of the non-standard message types are included. Abnormal messages cannot be tabulated, tracked, traced, dumped, or used with the thread or campaign parameters. Typically abnormal
messages are simply counted and reported as "number of times". The only parameter that may be used in conjunction with abnormal
messages is the --find
parameter.
Type of operations
The Analytics CLI can perform these types of operations:
- tracking — summarizing visitor navigation between documents, showing "coming-from" and "going-to", which can help to understand what originally attracted visitors to each document, where they went next, and where they lost interest.
- tabulating — examining header patterns by stratifying their values, for example, showing how many times each content-type was requested, or showing how many times each status code occurred, or showing how many times each kind of browser or crawler accessed the website.
- targeting a resource — listing every access to a particular resource.
- monitoring a campaign — listing every resource request that is related to a marketing campaign.
- searching — searching for a particular string pattern across all logged messages.
- following a thread — showing a visitor's path and all the resources accessed during a single session.
- tracing an address — showing all visits from a particular IP address, with marked gaps in the timeline.
- dumping facets — emitting every logged piece of information captured about a particular request/response facet.
--tracking
The --tracking
parameter accepts a comma-separated list of which content-type (CT) mimetypes should have detailed tracking of coming-from and going-to.
The default, when not specified, is --tracking=text/html,application/xhtml+xml
. Other values might include mimetypes such as --tracking=application/pdf,application/json
.
Use all
to track all mime-types. But note that any mimetype that doesn't send referer headers for hyperlinks will not be able to generate going-to information, so only coming-from information will show in the report.
Use none
to omit coming-from and going-to tracking.
--tabulate
The --tabulate
parameter accepts a comma-separated list of which logged items to stratify and tabulate.
The default, when not specified, is --tabulate=content-length,content-type,:status
Any request/response header may be tabulated. This is an open-ended parameter. That is, if you create a plugin that conditionally sends a custom response header of clacks-overhead
, and you configure the server to log that header, then you can tabulate clacks-overhead
.
This parameter accepts abbreviated header names too. For example, if you've configured the server with:
remote-address *abbr=RA
referer *abbr=RF
:path *abbr=PA
then you can specify either --tabulate=RA,RF,PA
or tabulate=remote-address,referer,:path
When tabulating cookies and query-strings, use square-bracket notation like: --tabulate=parameter-map[utm_term]
or --tabulate=PM[utm_term]
.
Specify none
to omit all tabulation.
--resource
The --resource
parameter accepts a single document resource path (PA) specifying the directory and resource portion of a URL, without any query-string values.
This parameter will list all facets going through this resource path.
A facet is a single request/response. It is uniquely identified with the concatenation of the process-id plus session-id plus request-response-id, and is formatted as PID-SID-RR
.
--campaign
The --campaign
parameter accepts a comma-separated list of query-string keys, as they appear in the URL's resource path.
This parameter will list all facets accessed with any of these query-string keys.
For example, when using Urchin Tracking Module style parameters, the parameter might be --campaign=utm_source,utm_term,utm_content,utm_medium,utm_campaign
.
--find
The --find
parameter accept a simple string value. This parameter will list all logged items whose value includes the given string. This parameter may be used with both normal and abnormal message types.
For example, to find all request/response logged items having the word "moon", use
--messages=normal --find=moon
And to find all configuration messages having the word "invalid", use
--message=config --find=invalid
--thread
The --thread
parameter accepts a single session spec, consisting of a process-id plus a session-id, in the format PID-SID
.
This parameter will show the visitor's path during this session, from its origin to its point of departure.
--trace
The --trace
parameter accepts a single IPv4 address, in the form xxx.xxx.xxx.xxx
.
This parameter will show all resources requested from that address. It will report the duration of the time gaps between separate sessions allowing for easier comprehension of return frequencies.
--dump
The --dump
parameter accepts a facet-id, in the form PID-SID-RR
. This parameter will provide a detailed report of everything known about the specified request/response.
--argv
The --argv
parameter may be added to any Analytics CLI invocation. It will print the values for each of the parameters that were used in the generation of the report.
Formatting the output
The Analytics CLI can format the output in three ways:
--output=text
simply creates the report using plain text.--output=html
format the output using HTML.--output=json
produces a JSON object.
Configuring periodic reports
The use of cron
is the standard way to produce periodic reports. Be sure to specify the root
user as the process owner because root privileges are needed to access the journalctl
tool.
If you choose to have the report sent by email and you want the output to be in HTML, place these settings at the top of your cron file:
PATH=/sbin:/bin:/usr/sbin:/usr/bin
MAILTO=admin@example.com
MAILFROM="hostname"
CONTENT_TYPE="text/html; charset=utf-8"
By way of example, the following will configure cron to run four periodic reports:
- A daily tracking report each night at 1:00am.
- A weekly tabulation report each Monday morning at 2:00am with stratifications across referer, remote-address, user-agent-common-name, status, content-type, and content-length.
- A monthly campaign report the first of each month for utm_term and utm_source.
- A report on abnormal messages every day at 6:00am and 6:00pm.
00 01 * * * root rwserve-analytics --since="yesterday" --until="today" --output=html --tabulate=none --tracking=text/html,application/json
00 02 * * 1 root rwserve-analytics --since="1 week ago" --until="today" --output=html --tracking=none --tabulate=RF,RA,UACN,ST,CT,CL
00 03 * 1 * root rwserve-analytics --since="1 month ago" --until="today" --output=html --campaign=utm_term,utm_source
00 06,18 * * * root rwserve-analytics --since="12 hours ago" --until="now" --output=html --messages=abnormal
# - - - - - -
# | | | | | + user account name
# | | | | +----- day of week (0 - 6) (Sunday=0)
# | | | +------- month (1 - 12)
# | | +--------- day of month (1 - 31)
# | +----------- hour (0 - 23)
# +------------- min (0 - 59)