Rewriting masked resources using replacement patterns

Resource Masks

Preliminaries

This note describes a way to rewrite resource request paths. Strings containing capture groups are used as request patterns that match incoming paths to rewrite rules. Replacement strings are used to specify how to reassemble the capture group values into masked replacement paths.

The URL transmitted from the browser to the server is used to uniquely identify a resource. When serving static files, this URL maps one-to-one with the server's file system, but when serving dynamic resources, the URL is instead treated as an instruction for how to obtain the data needed to fulfill the request. Oftentimes the path portion of the URL maps to generic instructions, and the query-string portion maps to the specific instructions — together they provide enough information to assemble the request.

When configuring the server, resource masks are specified by the webmaster as a way to rewrite the instructions in a dynamic resource path into a form suitable for fulfilling the request.

In addition to their utility in dynamic resource requests, resource masks can also be used to:

  • map SEO-friendly URLs into static filesystem paths; and
  • redirect requests to a different location.

The resource mask algorithm operates on pairs of request patterns and replacement strings. Both of these are specified using the same syntax.

Request patterns and replacement strings

Resource masks make use of named capture groups that are embedded into request patterns and replacement strings. A named capture group syntactically consists of a variable name enclosed in curly braces. Here are some request pattern examples:

/beachballs/size/{size}/color/{color}
/article/{id}/{seo}
/convert-blue/{name}?output={lang}
{path}.html

The resource mask module parses strings such as these into a sequence of literals and variables. When matching an incoming request's path to one of these strings, a GREP-like search is conducted whereby the literal portions of the string must match exactly, while the variable portions of the string can match any sequence of one or more characters. (If you know GREP, then think of each variable as if it were the sequence .*) Here are example paths that would match the previous request patterns:

/beachballs/size/xxl/color/blue
/article/42/the-answer-to-everything
/convert-blue/deeply/nested/document?output=xml
/1972/06/17/watergate.html

The literal values in these strings are used during the pattern matching process, but are discarded once that is accomplished. The variables are saved as capture groups. This is the internal result from the above example:

[size:'xxl', color:'blue']
[id:'42', seo:'the-answer-to-everything']
[name:'deeply/nested/document', lang:'xml']
[path:'/1972/06/17/watergate']

In furtherance of this example, these request patterns could be paired with these replacement strings:

/swimming/kids/{size}-{color}.png
/hitchhikers-guide-to-the-galaxy/{id}.html
/{name}?option.emit={lang}
{path}.blue

Again, the resource mask module would parse these strings into a sequence of literals and variables, but this time both the literal values and the named variables are kept internally for use in the final masked replacement path:

['/swimming/kids/', '{size}', '-', '{color}', '.png']
['/hitchhikers-guide-to-the-galaxy/', '{id}', '.html']
['/', '{name}', '?option.emit=', '{lang}']
['{path}', '.blue']

The final masked replacement path is assembled from the two internal structures: capture group values obtained from the request pattern are plugged into the named variables parsed from the replacement string; the whole is assembled from the combination of replaced values and literals. The examples result in these final masked replacement paths:

/swimming/kids/xxl-blue.png
/hitchhikers-guide-to-the-galaxy/42.html
/deeply/nested/document?option.emit=xml
/1972/06/17/watergate.blue

NOTE: The masked request patterns just described should not be confused with the path-patterns described and used in other modules. Those patterns use simple wildcards and globbing rules that are unrelated to this module.

Redirects

In addition to the above, the resource mask module can be used for HTTP redirects. The replacement algorithm is the same, but instead of processing the request directly, the masked replacement path is put in a location header, and a 302 response code is returned, instructing the browser to go to that web page. The redirection can be to the same host as the original, or it can be to an entirely different scheme, authority and port.

Order of precedence

Resource mask processing occurs early in the request-response cycle, before the incoming request path is split into a resourcePath, queryString, and parameterMap. Because of this, any configuration that uses path-patterns should specify those patterns based on the masked replacement path resulting from this module, rather than from the original resource path specified by the user.

Router versus resource masks

The process of interpreting the incoming URL as a set of instructions is sometimes referred to, in software development frameworks, as routing; however in these notes resource masks is the term applied to the current topic. This server uses the term Router for a different purpose.

Configuration

The server's resource-masks section is used to configure the module. It comprises a collection of entries, where each entry has, at a minimum, a pair of *pattern and *replacement attributes, both of which are made of literals and capture group variables.

Typically, a named variable will appear in both the request-pattern attribute and the replacement-string attribute, but this is not a requirement. In the above examples the variable {seo} is used in the request-pattern, to capture and discard a portion of the path that is not needed.

On the other hand, if a variable were to appear only in the replacement-string attribute, there would be no captured string to substitute for it. This type of misconfiguration can also occur when the webmaster spells the variable name inconsistently. In these cases, no replacement occurs and the variable name itself is passed through to the final masked replacement path.

Each entry may also specify one or more of these "redirect" attributes:

  • A *scheme attribute which should have a value of either 'http' or 'https'.
  • An *authority attribute which should contain a DNS hostname.
  • A *port attribute which should contain a port number such as '80', '443', '8443', etc.

The presence of any of these three optional attributes always triggers a status code 302 with a location redirect. Named variables must not appear in these optional attributes.

Enabling the module

The resource-masks module must be on for any of these configured entries to be effective.

Placement

The resource-masks configuration section is subordinate to the request section. If values are placed in the host/request section they will be used in their entirety; if not, values in the server/request section will be used as a fallback.

EBNF

SP ::= U+20
CR ::= U+0D
ASTERISK ::= U+2A
HYPHEN ::= U+2D
APOSTROPHE ::= U+27
SOLIDUS ::= U+2F
COLON ::= U+3A
EQUALS-SIGN ::= U+3D
QUESTION-MARK ::= U+3F
LEFT-CURLY-BRACKET ::= U+7B
RIGHT-CURLY-BRACKET ::= U+7D
literal ::= (unrestricted characters)*
variable-name ::= (ALPHA | DIGIT | HYPHEN)*
capture-group-variable ::= LEFT-CURLY-BRACKET variable-name RIGHT-CURLY-BRACKET
mask-pattern ::= literal* | capture-group-variable*
delimited-pattern ::= APOSTROPHE mask-pattern APOSTROPHE
pattern-attr ::= ASTERISK 'pattern' EQUALS-SIGN delimited-pattern
replacement-attr ::= ASTERISK 'replacement' EQUALS-SIGN delimited-pattern
scheme-attr ::= ASTERISK 'scheme' EQUALS-SIGN ('http' | 'https')
authority-attr ::= ASTERISK 'authority' EQUALS-SIGN hostname (COLON port)*
resource-mask-entry ::= pattern-attr SP replacement-attr SP scheme-attr* SP authority-attr* CR
resource-masks-section ::= 'resource-masks' SP LEFT-CURLY-BRACKET CR
resource-mask-entry*
RIGHT-CURLY-BRACKET CR

Cookbook

Example 1: Enabling the resource mask module
host {
modules {
resource-masks on
}
}
Example 2: Rewriting the resource path

Configuration file:

host {
request {
resource-masks {
*pattern='/beachballs/size/{size}/color/{color}' *replacement='/swimming/kids/{size}-{color}.png'
}
}
}

Example usage:

Request => https://example.com/beachballs/size/xxl/color/blue

Raw path => /beachballs/size/xxl/color/blue
Masked replacement path => /swimming/kids/xxl-blue.png
Example 3: Rewriting the query-strings

Configuration file:

host {
request {
resource-masks {
*pattern='/convert-blue/{name}?output={lang}' *replacement='/{name}?option.emit={lang}'
}
}
}

Example usage:

Request => https://example.com/convert-blue/deeply/nested/document?output=xml

Raw path => /convert-blue/deeply/nested/document?output=xml
Masked replacement path => /deeply/nested/document?option.emit=xml
Resource path => /deeply/nested/document
Query string => option.emit=xml
Example 4: Redirecting selected pages to a new location

Configuration file:

host {
request {
resource-masks {
*pattern='/articles/{year}-{month}/{document}' *replacement='/archives/{year}/{document}' *scheme=https *authority=archives.example.com
}
}
}

Example usage:

Request => https://example.com/articles/1995-Sep/america-online.html

Status code => 302
Response => location: https://archives.example.com/archives/1995/america-online.html
Example 5: Redirecting the entire website from 443 to 6443

Configuration file:

server {
ip-address 10.20.30.40
port 443
...
}
host {
hostname example.com
...
request {
resource-masks {
*pattern='{}' *replacement='{}' *scheme=https *authority=example.com *port=6443
}
}
}

Example usage:

Request => https://example.com/index.html

Status code => 302
Response => location: https://example.com:6443/index.html

Review

Key points to remember:

  • Resource masks make use of named capture groups that are embedded into request patterns and replacement strings.
  • Resource mask processing occurs before the request path is split into resourcePath, queryString, and parameterMap.
  • Configuration rules that specify a scheme or authority, return a location header with status code 302.

Rewriting masked resources using replacement patterns