Rewriting masked resources using replacement patterns
Resource Masks
Preliminaries
This note describes a way to rewrite resource request paths. Strings containing capture groups are used as request patterns that match incoming paths to rewrite rules. Replacement strings are used to specify how to reassemble the capture group values into masked replacement paths.
The URL transmitted from the browser to the server is used to uniquely identify a resource. When serving static files, this URL maps one-to-one with the server's file system, but when serving dynamic resources, the URL is instead treated as an instruction for how to obtain the data needed to fulfill the request. Oftentimes the path portion of the URL maps to generic instructions, and the query-string portion maps to the specific instructions — together they provide enough information to assemble the request.
When configuring the server, resource masks are specified by the webmaster as a way to rewrite the instructions in a dynamic resource path into a form suitable for fulfilling the request.
In addition to their utility in dynamic resource requests, resource masks can also be used to:
- map SEO-friendly URLs into static filesystem paths; and
- redirect requests to a different location.
The resource mask algorithm operates on pairs of request patterns and replacement strings. Both of these are specified using the same syntax.
Request patterns and replacement strings
Resource masks make use of named capture groups that are embedded into request patterns and replacement strings. A named capture group syntactically consists of a variable name enclosed in curly braces. Here are some request pattern examples:
/beachballs/size/{size}/color/{color}
/article/{id}/{seo}
/convert-blue/{name}?output={lang}
{path}.html
The resource mask module parses strings such as these into a sequence of literals and variables. When matching an incoming request's path to one of these strings, a GREP-like search is conducted whereby the literal portions of the string must match exactly, while the variable portions of the string can match any sequence of one or more characters. (If you know GREP, then think of each variable as if it were the sequence .*
) Here are example paths that would match the previous request patterns:
/beachballs/size/xxl/color/blue
/article/42/the-answer-to-everything
/convert-blue/deeply/nested/document?output=xml
/1972/06/17/watergate.html
The literal values in these strings are used during the pattern matching process, but are discarded once that is accomplished. The variables are saved as capture groups. This is the internal result from the above example:
[size:'xxl', color:'blue']
[id:'42', seo:'the-answer-to-everything']
[name:'deeply/nested/document', lang:'xml']
[path:'/1972/06/17/watergate']
In furtherance of this example, these request patterns could be paired with these replacement strings:
/swimming/kids/{size}-{color}.png
/hitchhikers-guide-to-the-galaxy/{id}.html
/{name}?option.emit={lang}
{path}.blue
Again, the resource mask module would parse these strings into a sequence of literals and variables, but this time both the literal values and the named variables are kept internally for use in the final masked replacement path:
['/swimming/kids/', '{size}', '-', '{color}', '.png']
['/hitchhikers-guide-to-the-galaxy/', '{id}', '.html']
['/', '{name}', '?option.emit=', '{lang}']
['{path}', '.blue']
The final masked replacement path is assembled from the two internal structures: capture group values obtained from the request pattern are plugged into the named variables parsed from the replacement string; the whole is assembled from the combination of replaced values and literals. The examples result in these final masked replacement paths:
/swimming/kids/xxl-blue.png
/hitchhikers-guide-to-the-galaxy/42.html
/deeply/nested/document?option.emit=xml
/1972/06/17/watergate.blue
NOTE: The masked request patterns just described should not be confused with the path-patterns described and used in other modules. Those patterns use simple wildcards and globbing rules that are unrelated to this module.
Redirects
In addition to the above, the resource mask module can be used for HTTP redirects. The replacement algorithm is the same, but instead of processing the request directly, the masked replacement path is put in a location
header, and a 302
response code is returned, instructing the browser to go to that web page. The redirection can be to the same host as the original, or it can be to an entirely different scheme, authority and port.
Order of precedence
Resource mask processing occurs early in the request-response cycle, before the incoming request path is split into a resourcePath
, queryString
, and parameterMap
. Because of this, any configuration that uses path-patterns should specify those patterns based on the masked replacement path resulting from this module, rather than from the original resource path specified by the user.
Router versus resource masks
The process of interpreting the incoming URL as a set of instructions is sometimes referred to, in software development frameworks, as routing; however in these notes resource masks is the term applied to the current topic. This server uses the term Router for a different purpose.
Configuration
The server's resource-masks
section is used to configure the module. It comprises a collection of entries, where each entry has, at a minimum, a pair of *pattern
and *replacement
attributes, both of which are made of literals and capture group variables.
Typically, a named variable will appear in both the request-pattern attribute and the replacement-string attribute, but this is not a requirement. In the above examples the variable {seo}
is used in the request-pattern, to capture and discard a portion of the path that is not needed.
On the other hand, if a variable were to appear only in the replacement-string attribute, there would be no captured string to substitute for it. This type of misconfiguration can also occur when the webmaster spells the variable name inconsistently. In these cases, no replacement occurs and the variable name itself is passed through to the final masked replacement path.
Each entry may also specify one or more of these "redirect" attributes:
- A
*scheme
attribute which should have a value of either'http'
or'https'
. - An
*authority
attribute which should contain a DNS hostname. - A
*port
attribute which should contain a port number such as'80'
,'443'
,'8443'
, etc.
The presence of any of these three optional attributes always triggers a status code 302
with a location
redirect. Named variables must not appear in these optional attributes.
Enabling the module
The resource-masks
module must be on
for any of these configured entries to be effective.
Placement
The resource-masks
configuration section is subordinate to the request
section. If values are placed in the host/request
section they will be used in their entirety; if not, values in the server/request
section will be used as a fallback.
EBNF
SP | ::= | U+20 |
CR | ::= | U+0D |
ASTERISK | ::= | U+2A |
HYPHEN | ::= | U+2D |
APOSTROPHE | ::= | U+27 |
SOLIDUS | ::= | U+2F |
COLON | ::= | U+3A |
EQUALS-SIGN | ::= | U+3D |
QUESTION-MARK | ::= | U+3F |
LEFT-CURLY-BRACKET | ::= | U+7B |
RIGHT-CURLY-BRACKET | ::= | U+7D |
literal | ::= | (unrestricted characters)* |
variable-name | ::= | (ALPHA | DIGIT | HYPHEN)* |
capture-group-variable | ::= | LEFT-CURLY-BRACKET variable-name RIGHT-CURLY-BRACKET |
mask-pattern | ::= | literal* | capture-group-variable* |
delimited-pattern | ::= | APOSTROPHE mask-pattern APOSTROPHE |
pattern-attr | ::= | ASTERISK 'pattern' EQUALS-SIGN delimited-pattern |
replacement-attr | ::= | ASTERISK 'replacement' EQUALS-SIGN delimited-pattern |
scheme-attr | ::= | ASTERISK 'scheme' EQUALS-SIGN ('http' | 'https') |
authority-attr | ::= | ASTERISK 'authority' EQUALS-SIGN hostname (COLON port)* |
resource-mask-entry | ::= | pattern-attr SP replacement-attr SP scheme-attr* SP authority-attr* CR |
resource-masks-section | ::= | 'resource-masks' SP LEFT-CURLY-BRACKET CR resource-mask-entry* RIGHT-CURLY-BRACKET CR |
Cookbook
Example 1: Enabling the resource mask module
host {
modules {
resource-masks on
}
}
Example 2: Rewriting the resource path
Configuration file:
host {
request {
resource-masks {
*pattern='/beachballs/size/{size}/color/{color}' *replacement='/swimming/kids/{size}-{color}.png'
}
}
}
Example usage:
Request => https://example.com/beachballs/size/xxl/color/blue
Raw path => /beachballs/size/xxl/color/blue
Masked replacement path => /swimming/kids/xxl-blue.png
Example 3: Rewriting the query-strings
Configuration file:
host {
request {
resource-masks {
*pattern='/convert-blue/{name}?output={lang}' *replacement='/{name}?option.emit={lang}'
}
}
}
Example usage:
Request => https://example.com/convert-blue/deeply/nested/document?output=xml
Raw path => /convert-blue/deeply/nested/document?output=xml
Masked replacement path => /deeply/nested/document?option.emit=xml
Resource path => /deeply/nested/document
Query string => option.emit=xml
Example 4: Redirecting selected pages to a new location
Configuration file:
host {
request {
resource-masks {
*pattern='/articles/{year}-{month}/{document}' *replacement='/archives/{year}/{document}' *scheme=https *authority=archives.example.com
}
}
}
Example usage:
Request => https://example.com/articles/1995-Sep/america-online.html
Status code => 302
Response => location: https://archives.example.com/archives/1995/america-online.html
Example 5: Redirecting the entire website from 443 to 6443
Configuration file:
server {
ip-address 10.20.30.40
port 443
...
}
host {
hostname example.com
...
request {
resource-masks {
*pattern='{}' *replacement='{}' *scheme=https *authority=example.com *port=6443
}
}
}
Example usage:
Request => https://example.com/index.html
Status code => 302
Response => location: https://example.com:6443/index.html
Review
Key points to remember:
- Resource masks make use of named capture groups that are embedded into request patterns and replacement strings.
- Resource mask processing occurs before the request path is split into resourcePath, queryString, and parameterMap.
- Configuration rules that specify a scheme or authority, return a
location
header with status code302
.