Using MIME types to control the behavior associated with filename extensions
MIME Types
Preliminaries
This note provides information about MIME-type configuration: associating filename extensions; declaring character sets; declaring encoding algorithms; and declaring content negotiation.
The content of files varies greatly, and all parties need to be able to communicate precisely which rules to follow when reading and writing files. For a long time, software used standard naming conventions — based primarily on a filename's trailing extension — to communicate what rules to use to read and interpret a file's contents. This proved to be ambiguous and untenable, and the software industry has moved to the use of content-types to convey this information.
Content-types are also known as MIME types. They are described in IETF RFC 6838 Media Type Specifications and Registration Procedures. Each content type is composed of a media type and a subtype. Media types include: text
, application
, image
, audio
, video
and multipart
. Subtypes include values like: html
, javascript
, jpeg
, mp3
, mp4
and so forth.
Mapping filename extensions to content-types is a required part of server configuration. When configuring the server, the content-types
section is used to associate filename extensions with content-type
HTTP headers.
In addition to filename extension mapping, the server uses MIME-types to configure three other rules: character sets, compression algorithms, and content negotiation.
Text media use various character sets to represent glyphs (the letters of the alphabet, numbers, punctuation, etc.). Communicating that information to all parties is essential for a correct reading of a text document, and is done using the charset
attribute of the content-type
header. For more about character set identifiers refer to IETF RFC 2978 IANA Charset Registration Procedures.
When a file's contents are compressed during transmission, the sender must know how to compress the outgoing bytes, and the receiver must know how to decompress the incoming bytes. The server is configured to handle this by associating content-encodings with MIME-types.
When requesting a file with a GET
method, the server negotiates with the browser to determine which MIME-types are acceptable. The accept-types
section is used to configure that; see the separate note about how that occurs. When uploading a file with a PUT
method, the server can selectively decide what types of files are acceptable. For example, if a server is ready to handle TIFF images, it would signal that with an image/tiff
entry in the accept-types
section.
Configuration
The content-types
configuration section is used to associate filename extensions to content-types. It comprises a collection of two-part entries: the left hand side is the filename extension (without a leading dot), and the right-hand side is the MIME-type. Filename extensions are case-sensitive. When a requested file is found on the server, but no content-type
is associated with its extension, no content-type
header is sent with the response, per HTTP official guidelines.
The charset
configuration section is used to declare character set identifiers for text
and application
media types. It comprises a collection of two-part entries: the left hand side is the MIME-type, and the right-hand side is the character set identifier. When a requested file is served with a content-type
header, any declared charset identifier is appended to that header. When configured in this fashion, there is no need to add an HTML meta
tag or CSS @charset
declaration to each file's inner contents. On the other hand, when no charset identifier is configured for a given MIME-type, the response headers omit the charset
attribute completely.
The content-encoding
configuration section is used to define which compression algorithm to use based on media types. It comprises a collection of two-part entries: the left hand side is the MIME-type, and the right-hand side is the compression algorithm.
The accept-types
configuration section is used to declare which types can be negotiated. It comprises a collection of two-part entries: the left hand side is the literal string mime-type
, and the right-hand side is a MIME-type that adheres to the IETF RFC 6838 specifications.
Placement
The content-types
, charset
, content-encoding
, and accept-types
configuration sections may appear in either the server
section or a host
section. When values occur in both the server
and host
sections, they are merged according to the standard rules defined for the merge
attribute.
EBNF
SP | ::= | U+20 |
CR | ::= | U+0D |
SOLIDUS | ::= | U+2F |
ASTERISK | ::= | U+2A |
LEFT-CURLY-BRACKET | ::= | U+7B |
RIGHT-CURLY-BRACKET | ::= | U+7D |
media-type | ::= | 'text' | 'application' | 'image' | 'audio' | 'video' | 'multipart' |
subtype | ::= | (ALPHA | DIGIT | †)* |
MIME-type | ::= | media-type SOLIDUS subtype |
filename-extension | ::= | (ALPHA | DIGIT | ††)* |
content-type-entry | ::= | filename-extension SP MIME-type CR |
content-types-section | ::= | 'content-types' SP LEFT-CURLY-BRACKET CR content-type-entry* RIGHT-CURLY-BRACKET CR |
charset-identifier | ::= | (ALPHA | DIGIT | †††)* |
charset-entry | ::= | MIME-type SP charset-identifier CR |
charset-section | ::= | 'charset' SP LEFT-CURLY-BRACKET CR charset-entry* RIGHT-CURLY-BRACKET CR |
compression-algorithm | ::= | 'gzip' | 'deflate' | 'none' |
content-encoding-entry | ::= | MIME-type SP compression-algorithm CR |
content-encoding-section | ::= | 'content-encoding' SP LEFT-CURLY-BRACKET CR content-encoding-entry* RIGHT-CURLY-BRACKET CR |
accept-type-entry | ::= | 'mime-type' SP* MIME-type CR |
accept-types-section | ::= | 'accept-types' SP LEFT-CURLY-BRACKET CR accept-type-entry* RIGHT-CURLY-BRACKET CR |
† See section 4.2 of RFC 6838 for exact rules
†† Legal file system characters vary by platform
††† See IETF RFC 2978 for guidance
Cookbook
Example 1: Filename extensions associated with MIME-types
server {
content-types {
css text/css
csv text/csv
html text/html
txt text/plain
blue text/blue
md text/markdown
wiki text/x-mediawiki
haml text/x-haml
yaml text/vnd.yaml
toml text/toml
ini text/plain
js application/javascript
json application/json
pdf application/pdf
xhtml application/xhtml+xml
xml application/xml
plist application/xml
gif image/gif
jpg image/jpeg
png image/png
svg image/svg+xml
webp image/webp
ico image/x-icon
mp3 audio/mpeg
mp4 video/mp4
weba audio/webm
webm video/webm
}
}
Example 2: Character set identifiers associated with MIME-types
server {
response {
charset {
text/css utf-8
text/html utf-8
text/plain windows-1252
application/xhtml+xml iso-8859-1
}
}
}
Example 3: content-encoding algorithms associated with MIME-types
server {
content-encoding {
text/css gzip
text/csv gzip
text/html gzip
text/plain gzip
text/yaml gzip
application/javascript gzip
application/json gzip
application/pdf none
application/xhtml+xml gzip
application/xml deflate
application/gzip none
image/gif none
image/jpeg none
image/png none
image/svg+xml gzip
image/webp none
image/x-icon none
audio/mpeg none
video/mp4 none
audio/webm none
video/webm none
}
}
Example 4: MIME-types declared for browser accept-types negotiation
server {
accept-types {
mime-type text/html
mime-type text/plain
mime-type text/css
mime-type application/xhtml+xml
mime-type application/xml
mime-type application/json
mime-type application/javascript
mime-type application/pdf
mime-type image/png
mime-type image/jpeg
mime-type image/gif
mime-type image/svg+xml
mime-type image/webp
mime-type audio/webm
mime-type audio/ogg
mime-type audio/mpeg
mime-type audio/wav
mime-type audio/flac
mime-type video/webm
mime-type video/ogg
mime-type video/mp4
}
}
Review
Key points to remember:
- The
content-types
section associates filename extensions to MIME-types. - The
charset
section associates MIME-types with character set identifiers. - The
content-encoding
section associates MIME-types with compression algorithms. - The
accept-types
section lists MIME-types that may be served by negotiated settlement with the browser.