Using MIME types to control the behavior associated with filename extensions

MIME Types

Preliminaries

This note provides information about MIME-type configuration: associating filename extensions; declaring character sets; declaring encoding algorithms; and declaring content negotiation.

The content of files varies greatly, and all parties need to be able to communicate precisely which rules to follow when reading and writing files. For a long time, software used standard naming conventions — based primarily on a filename's trailing extension — to communicate what rules to use to read and interpret a file's contents. This proved to be ambiguous and untenable, and the software industry has moved to the use of content-types to convey this information.

Content-types are also known as MIME types. They are described in IETF RFC 6838 Media Type Specifications and Registration Procedures. Each content type is composed of a media type and a subtype. Media types include: text, application, image, audio, video and multipart. Subtypes include values like: html, javascript, jpeg, mp3, mp4 and so forth.

Mapping filename extensions to content-types is a required part of server configuration. When configuring the server, the content-types section is used to associate filename extensions with content-type HTTP headers.

In addition to filename extension mapping, the server uses MIME-types to configure three other rules: character sets, compression algorithms, and content negotiation.

Text media use various character sets to represent glyphs (the letters of the alphabet, numbers, punctuation, etc.). Communicating that information to all parties is essential for a correct reading of a text document, and is done using the charset attribute of the content-type header. For more about character set identifiers refer to IETF RFC 2978 IANA Charset Registration Procedures.

When a file's contents are compressed during transmission, the sender must know how to compress the outgoing bytes, and the receiver must know how to decompress the incoming bytes. The server is configured to handle this by associating content-encodings with MIME-types.

When requesting a file with a GET method, the server negotiates with the browser to determine which MIME-types are acceptable. The accept-types section is used to configure that; see the separate note about how that occurs. When uploading a file with a PUT method, the server can selectively decide what types of files are acceptable. For example, if a server is ready to handle TIFF images, it would signal that with an image/tiff entry in the accept-types section.

Configuration

The content-types configuration section is used to associate filename extensions to content-types. It comprises a collection of two-part entries: the left hand side is the filename extension (without a leading dot), and the right-hand side is the MIME-type. Filename extensions are case-sensitive. When a requested file is found on the server, but no content-type is associated with its extension, no content-type header is sent with the response, per HTTP official guidelines.

The charset configuration section is used to declare character set identifiers for text and application media types. It comprises a collection of two-part entries: the left hand side is the MIME-type, and the right-hand side is the character set identifier. When a requested file is served with a content-type header, any declared charset identifier is appended to that header. When configured in this fashion, there is no need to add an HTML meta tag or CSS @charset declaration to each file's inner contents. On the other hand, when no charset identifier is configured for a given MIME-type, the response headers omit the charset attribute completely.

The content-encoding configuration section is used to define which compression algorithm to use based on media types. It comprises a collection of two-part entries: the left hand side is the MIME-type, and the right-hand side is the compression algorithm.

The accept-types configuration section is used to declare which types can be negotiated. It comprises a collection of two-part entries: the left hand side is the literal string mime-type, and the right-hand side is a MIME-type that adheres to the IETF RFC 6838 specifications.

Placement

The content-types, charset, content-encoding, and accept-types configuration sections may appear in either the server section or a host section. When values occur in both the server and host sections, they are merged according to the standard rules defined for the merge attribute.

EBNF

SP ::= U+20
CR ::= U+0D
SOLIDUS ::= U+2F
ASTERISK ::= U+2A
LEFT-CURLY-BRACKET ::= U+7B
RIGHT-CURLY-BRACKET ::= U+7D
media-type ::= 'text' | 'application' | 'image' | 'audio' | 'video' | 'multipart'
subtype ::= (ALPHA | DIGIT | )*
MIME-type ::= media-type SOLIDUS subtype
filename-extension ::= (ALPHA | DIGIT | ††)*
content-type-entry ::= filename-extension SP MIME-type CR
content-types-section ::= 'content-types' SP LEFT-CURLY-BRACKET CR
content-type-entry*
RIGHT-CURLY-BRACKET CR
charset-identifier ::= (ALPHA | DIGIT | †††)*
charset-entry ::= MIME-type SP charset-identifier CR
charset-section ::= 'charset' SP LEFT-CURLY-BRACKET CR
charset-entry*
RIGHT-CURLY-BRACKET CR
compression-algorithm ::= 'gzip' | 'deflate' | 'none'
content-encoding-entry ::= MIME-type SP compression-algorithm CR
content-encoding-section ::= 'content-encoding' SP LEFT-CURLY-BRACKET CR
content-encoding-entry*
RIGHT-CURLY-BRACKET CR
accept-type-entry ::= 'mime-type' SP* MIME-type CR
accept-types-section ::= 'accept-types' SP LEFT-CURLY-BRACKET CR
accept-type-entry*
RIGHT-CURLY-BRACKET CR

† See section 4.2 of RFC 6838 for exact rules

†† Legal file system characters vary by platform

††† See IETF RFC 2978 for guidance

Cookbook

Example 1: Filename extensions associated with MIME-types
server {
content-types {
css text/css
csv text/csv
html text/html
txt text/plain
blue text/blue
md text/markdown
wiki text/x-mediawiki
haml text/x-haml
yaml text/vnd.yaml
toml text/toml
ini text/plain

js application/javascript
json application/json
pdf application/pdf
xhtml application/xhtml+xml
xml application/xml
plist application/xml

gif image/gif
jpg image/jpeg
png image/png
svg image/svg+xml
webp image/webp
ico image/x-icon

mp3 audio/mpeg
mp4 video/mp4
weba audio/webm
webm video/webm
}
}
Example 2: Character set identifiers associated with MIME-types
server {
response {
charset {
text/css utf-8
text/html utf-8
text/plain windows-1252
application/xhtml+xml iso-8859-1
}
}
}
Example 3: content-encoding algorithms associated with MIME-types
server {
content-encoding {
text/css gzip
text/csv gzip
text/html gzip
text/plain gzip
text/yaml gzip

application/javascript gzip
application/json gzip
application/pdf none
application/xhtml+xml gzip
application/xml deflate
application/gzip none

image/gif none
image/jpeg none
image/png none
image/svg+xml gzip
image/webp none
image/x-icon none

audio/mpeg none
video/mp4 none
audio/webm none
video/webm none
}
}
Example 4: MIME-types declared for browser accept-types negotiation
server {
accept-types {
mime-type text/html
mime-type text/plain
mime-type text/css
mime-type application/xhtml+xml
mime-type application/xml
mime-type application/json
mime-type application/javascript
mime-type application/pdf
mime-type image/png
mime-type image/jpeg
mime-type image/gif
mime-type image/svg+xml
mime-type image/webp
mime-type audio/webm
mime-type audio/ogg
mime-type audio/mpeg
mime-type audio/wav
mime-type audio/flac
mime-type video/webm
mime-type video/ogg
mime-type video/mp4
}
}

Review

Key points to remember:

  • The content-types section associates filename extensions to MIME-types.
  • The charset section associates MIME-types with character set identifiers.
  • The content-encoding section associates MIME-types with compression algorithms.
  • The accept-types section lists MIME-types that may be served by negotiated settlement with the browser.

Using MIME types to control the behavior associated with filename extensions