Monitoring and capturing what's going on

Logging

Preliminaries

This note describes the options available for fine-grain control over the server's logging operations.

Webmasters need to know what's going on with the server in order to properly fine-tune its operation. One of the most important tools to help with this is the server's log.

Different stages of a website's operation need different information. During development, designers and authors need to know that embedded media and hyperlinked documents are being served without 404 status codes. During initial startup, webmasters need to know that access to the public is allowed, and that access to restricted content is prohibited. When the website begins to receive awareness, and usage patterns begin to be recognized, engineers need to fine tune caching and load balancing. And when the website has reached a steady state, someone needs to monitor what's going on and be alerted when bad actors are behaving maliciously.

The server's logging facility can help with all of these scenarios.

Motivation

The server sends messages to the system log with the intention that they will be read and analyzed by computers rather than people. Messages are designed to be both flexible and parsable. Anyone with basic knowledge of regular expressions and a scripting language should be able to filter the system log to answer these types of questions:

How many requests were successfully fulfilled yesterday?
How many megabytes of data were transferred over the network?
What was the average compression ratio for gzip responses?
Which IP addresses generated more than ten potentially malicious requests?
What is the percentage breakdown of user requests by browser type?
Are there any internal hyperlinks that no longer work?
Did any unexpected errors occur from the website's custom pages?

SystemD

When operated under control of the SystemD service manager, messages are routed to the system log. When running in direct foreground mode, messages are sent to the console via STDOUT (this is uncommon).

Configuration

The server's logging section consists of several subsections allowing for both high-level and fine-tuned configuration:

The message-type subsection comprises seven "on" or "off" switches to enable or disable logging at a high-level.

request comprises entries that define which HTTP/2 request headers should be logged.
staging comprises entries that define which of the server's work order variables should be logged.
information comprises entries that define which server information headers should be logged.
response comprises entries that define which HTTP/2 response headers should be logged.
policy comprises entries received from feature policy and content-security policies.
network enables/disables the logging of TLS network errors.
cluster enables/disables the logging of startup and shutdown of cluster worker threads.
debug enables/disables the logging of internal tracing messages.

Message type

The server emits 12 types of messages:

Request header entries provide details of each incoming HTTP/2 request.
Staging variable entries provide details of about the internal state of the server during the request-response cycle.
Information header entries are emitted for responses that contain any of the special server headers that begin with the letters "rw-".
Response header entries provide details of each outgoing HTTP/2 response.
Policy messages are sent by the browser back to the server in response to unmet policy conditions.
Network messages are emitted for TLS network errors such as: no shared cipher, unknown protocol, http request, version too low, wrong version number, and others.
Cluster messages are emitted when a worker thread is created or disconnected.
Debug messages are used for tracing the path of the request/response cycle through the software.
Config messages are emitted during server initialization to show its progress towards binding to the network interface, and to report any non-recoverable configuration mistakes.
Error messages are emitted any time an unexpected condition occurs, due to user configuration issues or server software bugs.
Nodejs messages are emitted any time an uncaught Node.js exception is thrown.
Systemd messages are emitted by the operating system when the server starts or stops.

The content and format of request, staging, information, response and policy messages are configurable. The choice of which header values and variables to be included is made in the logging section.

The network, cluster and debug messages may be enabled or disabled, but the individual messages themselves are non-configurable.

The config, error, nodejs and systemd messages cannot be disabled.

Log message structure

Here is the breakdown of a standard log entry. First there is a message type, as explained above. This is followed by a request-response (RR) identifier, which is a sequential number that is shared by the request/staging/information/response messages of a single RR. This is followed by a series of fields, either HTTP headers or work-order variables.

This is the regular expression for messages generated by the server, where the second parenthesis will capture the fields:

RR=(.*);( .*=.*;)*

When sent to the system log via the SystemD service manager, each message is stored with these additional OS-provided values: 1) timestamp, 2) hostname, 3) unit name, and 4) process identifier.

Filtering fields

Many HTTP headers are repetitious and of little value to typical website analysis. The server's configuration can specify exactly which fields to include in the logged message. Each of the four configurable subsections comprise a list of fields to include. When specified this way, only those fields are included; any other HTTP header names or work-order variables will be omitted.

Sometimes, a webmaster may not know which header fields to expect in a message. In that case it is possible to use the special field name 'default', in a configuration subsection, either by itself or immediately after the other field entries. The 'default' keyword is a catch-all fallback; it instructs the server to include any other HTTP header actually present in the logged message. This is a useful technique when setting up a web server the first time, and when learning about HTTP requests and responses. Later, it is more prudent to remove the 'default' catch-all and limit the logging to just what's needed.

Abbreviating field names

In the interest of conserving disk space, it is desirable to shorten fields which may be lengthy both because the HTTP header name itself may be long and because its associated value may be long.

Each configuration entry may include an *abbr attribute which will be substituted for the field name. Typically these abbreviations are 2, 3, or 4 capitalized letters. Be sure to choose unique values in order to retain the ability to parse messages using scripts.

When the *abbr attribute value is "none" or omitted, the full field name is used in the log message.

Here are some typical configuration entries showing the use of the *abbr attribute:

Request headers

Request header	abbreviation
:method	*abbr=ME
:scheme	*abbr=SC
:authority	*abbr=AU
:path	*abbr=PA
accept	*abbr=AC
access-control-request-headers	*abbr=ACRH
access-control-request-method	*abbr=ACRM
accept-encoding	*abbr=AE
accept-language	*abbr=AL
alpn	*abbr=ALPN
cache-control	*abbr=CC
content-encoding	*abbr=CE
content-length	*abbr=CL
cookie	*abbr=CO
content-range	*abbr=CR
content-type	*abbr=CT
if-match	*abbr=IM
if-modified-since	*abbr=IMS
if-none-match	*abbr=INM
if-unmodified-since	*abbr=IUS
link	*abbr=LK
origin	*abbr=OR
pragma	*abbr=PR
referer	*abbr=RF
user-agent	*abbr=UA
upgrade-insecure-requests	*abbr=UG

Staging variables

Staging variables	abbreviation
cookie-map	*abbr=CK
form-data-map	*abbr=FD
proxy-pocation	*abbr=PL
masked-replacement-path	*abbr=MRP
parameter-map	*abbr=PM
push-candidate	*abbr=PC
query-string	*abbr=QS
remote-address	*abbr=RA
resource-path	*abbr=RP
user-agent-common-name	*abbr=UCN
user-agent-groups	*abbr=UAG

Information headers

Information headers	abbreviation
rw-bad-range	*abbr=none
rw-blue-cache-add	*abbr=none
rw-blue-cache-dependency	*abbr=none
rw-blue-cache-hit	*abbr=none
rw-blue-cache-stale	*abbr=none
rw-blue-dir-deleted	*abbr=none
rw-blue-file-deleted	*abbr=none
rw-blue-invalid-option	*abbr=none
rw-blue-map-deleted	*abbr=none
rw-blue-unsupported-method	*abbr=none
rw-conditional-range-failed	*abbr=none
rw-content-decoding	*abbr=none
rw-content-decoding	*abbr=none
rw-content-length-limit	*abbr=none
rw-content-length-missing	*abbr=none
rw-content-range	*abbr=none
rw-content-type-mismatch	*abbr=none
rw-content-type-missing	*abbr=none
rw-content-type-not-supported	*abbr=none
rw-encoding-cache-add	*abbr=none
rw-encoding-cache-add	*abbr=none
rw-encoding-cache-hit	*abbr=none
rw-encoding-dir-deleted	*abbr=none
rw-encoding-file-deleted	*abbr=none
rw-forbidden	*abbr=none
rw-header-not-allowed	*abbr=none
rw-hostname	*abbr=none
rw-if-match	*abbr=none
rw-if-none-match	*abbr=none
rw-if-unmodified-since	*abbr=none
rw-ip-access	*abbr=none
rw-language-not-acceptable	*abbr=none
rw-language-not-configured	*abbr=none
rw-method-not-allowed	*abbr=none
rw-no-acceptable-type	*abbr=none
rw-no-cache-control	*abbr=none
rw-origin-not-allowed	*abbr=none
rw-path	*abbr=none
rw-range-units-not-supported	*abbr=none
rw-rbac-expired	*abbr=none
rw-rbac-forbidden	*abbr=none
rw-rbac-forged	*abbr=none
rw-rbac-internal-error	*abbr=none
rw-rbac-missing-credentials	*abbr=none
rw-rbac-no-matching-role	*abbr=none
rw-rbac-no-resource-rule	*abbr=none
rw-rbac-remote-address	*abbr=none
rw-rbac-renewal	*abbr=none
rw-rbac-unsupported-action	*abbr=none
rw-rbac-unsupported-content-type	*abbr=none
rw-rbac-unsupported-method	*abbr=none
rw-rbac-unsupported-scheme	*abbr=none
rw-referer-nopush	*abbr=none
rw-request-dir-deleted	*abbr=none
rw-request-file-deleted	*abbr=none
rw-request-file-not-deleted	*abbr=none
rw-resource-path-illegal	*abbr=none
rw-resource-path-length	*abbr=none
rw-router-missing-module	*abbr=none
rw-router-not-forwarded	*abbr=none
rw-user-agent-nopush	*abbr=none
rw-user-agent-noserve	*abbr=none

Response headers

Response headers	abbreviation
:status	*abbr=ST
access-control-allow-headers	*abbr=ACAH
access-control-expose-headers	*abbr=ACEH
access-control-allow-origin	*abbr=ACAO
access-control-allow-credentials	*abbr=ACAC
access-control-allow-methods	*abbr=ACAM
access-control-max-age	*abbr=ACMA
allow	*abbr=AL
cache-control	*abbr=CC
content-encoding	*abbr=CE
content-length	*abbr=CL
content-location	*abbr=CLO
content-language	*abbr=CLA
content-type	*abbr=CT
last-modified	*abbr=LM
etag	*abbr=ET
location	*abbr=LO
set-cookie	*abbr=SC
server	*abbr=SE
vary	*abbr=VA

Policy data

Policy data	abbreviation
type	*abbr=TYP
age	*abbr=AGE
blocked-uri	*abbr=B-URI
disposition	*abbr=DISP
document-uri	*abbr=D-URI
effective-directive	*abbr=E-DIR
elapsed-time	*abbr=ELAP
line-number	*abbr=LN
method	*abbr=ME
phase	*abbr=PH
protocol	*abbr=PROT
referrer	*abbr=RF
sampling-fraction	*abbr=S-FRAC
server-ip	*abbr=S-IP
source-file	*abbr=S-FIL
status	*abbr=ST
status-code	*abbr=ST
original-policy	*abbr=O-POL
url	*abbr=URL
user-agent	*abbr=UA
violated-directive	*abbr=V-DIR

Formatting fields

Some header values can be quite lengthy. These can be shortened using ellipses. Use the *format attribute to instruct the server how much to shorten the string and where to put the ellipsis. There are three meaningful definition patterns: 1) trim leading, 2) trim trailing, and 3) trim middle. Each definition consists of the number of leading characters to keep, followed by three FULL-STOPS, followed by the number of trailing characters to keep.

Given an example header value like this "AaBbCcDdEeFfGgHhIiJjKkLlMmNnOoPpQqRrSsTtUuVvWwXxYyZz" the text to be sent to the log message is demonstrated here:

Definition pattern	attribute	log text
Trim leading	*format='...10'	...VvWwXxYyZz
Trim trailing	*format='10...'	AaBbCcDdEe...
Trim middle	*format='8...8'	AaBbCcDd...WwXxYyZz

Be sure to enclose the attribute value in APOSTROPHES.

Configuration snippet using the *format attribute

cookie     *abbr=CO *format='...10'
referer    *abbr=RF *format='10...'
user-agent *abbr=UA *format='8...8'

When no format attribute is provided, the default for most field names is *format='20...20'. The exceptions to this rule are the :path and referer fields, which by default use an implicit *format='100...100'. These two special fields retain more of the original string in order to provide better results with the Analytics CLI.

Coloring fields

The foreground and background color of each header value can be set by specifying *fg and *bg attributes. Eight colors are allowed: black, white, red, green, blue, cyan, magenta, and yellow.

Configuration snippet using the *fg and *bg attributes

cookie     *abbr=CO *fg=red   *bg=white
referer    *abbr=RF *fg=green *bg=black
user-agent *abbr=UA *fg=black *bg=cyan

Colors are normally suppressed by the journalctl utility. To enable colors, use the --output=cat switch, like this:

journalctl -u rwserve --output=cat

Orientation of log fields

Log messages are normally sent to the console as single horizontal lines. It is also possible to configure the server to create messages in a vertically stacked style. This is convenient during development and testing, to bring an extra bit of clarity to the output. Do this by adding an orientation vertical setting to the logging section.

EBNF

SP	::=	U+20
CR	::=	U+0D
ASTERISK	::=	U+2A
APOSTROPHE	::=	U+27
EQUALS-SIGN	::=	U+3D
QUESTION-MARK	::=	U+3F
LEFT-CURLY-BRACKET	::=	U+7B
RIGHT-CURLY-BRACKET	::=	U+7D
orientation-entry	::=	'orientation' SP ('vertical' \| 'horizontal')
http-header	::=	ALPHA* \| 'default'
policy-header	::=	ALPHA* \| 'default'
work-order-variable	::=	'cookieMap' \| 'formDataMap' \| 'proxyLocation' \| 'maskedReplacementPath' \| 'parameterMap' \| 'pushCandidate' \| 'queryString' \| 'remoteAddress' \| 'resourcePath' \| 'userAgentCommonName' \| 'userAgentGroups'
ansi-color	::=	('white' \| 'black' \| 'red' \| 'green' \| 'blue' \| 'cyan' \| 'magenta' \| 'yellow')
abbr-attr	::=	ASTERISK 'abbr' EQUALS-SIGN ALPHA*
abbr-fg	::=	ASTERISK 'fg' EQUALS-SIGN ansi-color
abbr-bg	::=	ASTERISK 'bg' EQUALS-SIGN ansi-color
leading-char-count	::=	DIGITS*
trailing-char-count	::=	DIGITS*
format-spec	::=	leading-char-count '...' trailing-char-count
format-attr	::=	ASTERISK 'format' EQUALS-SIGN APOSTROPHE format-spec APOSTROPHE
message-type-entry	::=	('request' \| 'staging' \| 'response' \| information' \| 'policy' \| 'network' \| 'cluster' \| 'debug') SP ('on' \| 'off') CR
request-entry	::=	http-header SP abbr-attribute* SP format-attr* SP fg-attr* SP bg-attr* CR
staging-entry	::=	work-order-variable SP abbr-attribute* SP format-attr* SP fg-attr* SP bg-attr* CR
information-entry	::=	http-header SP abbr-attribute* SP format-attr* SP fg-attr* SP bg-attr* CR
response-entry	::=	http-header SP abbr-attribute* SP format-attr* SP fg-attr* SP bg-attr* CR
policy-entry	::=	policy-header SP abbr-attribute* SP format-attr* SP fg-attr* SP bg-attr* CR
message-type-subsection	::=	'message-type' SP LEFT-CURLY-BRACKET CR message-type-entry* RIGHT-CURLY-BRACKET CR
request-subsection	::=	'request' SP LEFT-CURLY-BRACKET CR request-entry* RIGHT-CURLY-BRACKET CR
staging-subsection	::=	'staging' SP LEFT-CURLY-BRACKET CR staging-entry* RIGHT-CURLY-BRACKET CR
information-subsection	::=	'information' SP LEFT-CURLY-BRACKET CR information-entry* RIGHT-CURLY-BRACKET CR
response-subsection	::=	'response' SP LEFT-CURLY-BRACKET CR response-entry* RIGHT-CURLY-BRACKET CR
policy-subsection	::=	'policy' SP LEFT-CURLY-BRACKET CR policy-entry* RIGHT-CURLY-BRACKET CR
logging-subsection	::=	orientation-entry \| message-type-subsection \| request-subsection \| staging-subsection \| information-subsection \| response-subsection \| policy-subsection
logging-section	::=	'logging' SP LEFT-CURLY-BRACKET CR logging-subsection* RIGHT-CURLY-BRACKET CR

Cookbook

Example 1: Logging only config and error messages

Note: With these rules in place, important 'config' and 'error' messages are still sent to the system log.

server {
    logging {
        message-type {
            request         off
            staging         off
            information     off
            response        off
            network         off
            cluster         off
        }
    }
}

Example 2: Basic request/response logging

server {
    logging {
        message-type {
            request         on
            staging         off
            information     off
            response        on
            network         off
            cluster         off
        }
        request {
            :method           *abbr=ME
            :path             *abbr=PA
            cookie            *abbr=CO
            referer           *abbr=RF
        }
        response {
            :status           *abbr=ST
            content-length    *abbr=CL
            content-type      *abbr=CT
            set-cookie        *abbr=SC
        }
    }
}

Example 3: Debugging the website

server {
    logging {
        message-type {
            request         on
            staging         on
            information     off
            response        on
            network         on
            cluster         off
        }
        request {
            default               *format='64...'
        }
        staging {
            remote-address                *abbr=RA
            cookie-map                    *abbr=CM
            parameter-map                 *abbr=PM
            form-data-map                 *abbr=FDM
            user-agent-common-name        *abbr=UACN
            user-agent-groups             *abbr=UAG
            masked-replacement-path       *abbr=MRP
            resource-path                 *abbr=RP
            query-string                  *abbr=QS
            push-candidate                *abbr=PC
            proxy-location                *abbr=PL
            default
        }
        response {
            default               *format='64...'
        }
    }
}

Example 4: Monitoring the website

server {
    logging {
        message-type {
            request         on
            staging         off
            information     on
            response        on
            network         off
            cluster         off
        }
        request {
            :method           *abbr=ME
            :path             *abbr=PA
            cookie            *abbr=CO
            referer           *abbr=RF
        }
        information {
            default           *format='32...32'
        }
        response {
            :status           *abbr=ST
            content-length    *abbr=CL
            content-type      *abbr=CT
            set-cookie        *abbr=SC
        }
    }
}

Example 5: Vertical logging

server {
    logging {
        orientation vertical
        message-type {
            request         on
            staging         off
            information     on
            response        on
            network         off
            cluster         off
        }
        request {
            default           *format='32...32'
        }
        information {
            default           *format='32...32'
        }
        response {
            default           *format='32...32'
        }
    }
}

Review

Key points to remember:

The default settings for logging are only a starting point; adjustments should be made to match the operator's needs.
Request and response headers can be lengthy and repetitive; use of the 'default' entry can produce copious messages.
Staging variables are useful for debugging during development.
Information headers are useful during the website's steady-state to monitor for possible misuse.