Using MIME types to control the behavior associated with filename extensions

MIME Types

Preliminaries

This note provides information about MIME-type configuration: associating filename extensions; declaring character sets; declaring encoding algorithms; and declaring content negotiation.

The content of files varies greatly, and all parties need to be able to communicate precisely which rules to follow when reading and writing files. For a long time, software used standard naming conventions — based primarily on a filename's trailing extension — to communicate what rules to use to read and interpret a file's contents. This proved to be ambiguous and untenable, and the software industry has moved to the use of content-types to convey this information.

Content-types are also known as MIME types. They are described in IETF RFC 6838 Media Type Specifications and Registration Procedures. Each content type is composed of a media type and a subtype. Media types include: text, application, image, audio, video and multipart. Subtypes include values like: html, javascript, jpeg, mp3, mp4 and so forth.

Mapping filename extensions to content-types is a required part of server configuration. When configuring the server, the content-types section is used to associate filename extensions with content-type HTTP headers.

In addition to filename extension mapping, the server uses MIME-types to configure three other rules: character sets, compression algorithms, and content negotiation.

Text media use various character sets to represent glyphs (the letters of the alphabet, numbers, punctuation, etc.). Communicating that information to all parties is essential for a correct reading of a text document, and is done using the charset attribute of the content-type header. For more about character set identifiers refer to IETF RFC 2978 IANA Charset Registration Procedures.

When a file's contents are compressed during transmission, the sender must know how to compress the outgoing bytes, and the receiver must know how to decompress the incoming bytes. The server is configured to handle this by associating content-encodings with MIME-types.

When requesting a file with a GET method, the server negotiates with the browser to determine which MIME-types are acceptable. The accept-types section is used to configure that; see the separate note about how that occurs. When uploading a file with a PUT method, the server can selectively decide what types of files are acceptable. For example, if a server is ready to handle TIFF images, it would signal that with an image/tiff entry in the accept-types section.

Configuration

The content-types configuration section is used to associate filename extensions to content-types. It comprises a collection of two-part entries: the left hand side is the filename extension (without a leading dot), and the right-hand side is the MIME-type. Filename extensions are case-sensitive. When a requested file is found on the server, but no content-type is associated with its extension, no content-type header is sent with the response, per HTTP official guidelines.

The charset configuration section is used to declare character set identifiers for text and application media types. It comprises a collection of two-part entries: the left hand side is the MIME-type, and the right-hand side is the character set identifier. When a requested file is served with a content-type header, any declared charset identifier is appended to that header. When configured in this fashion, there is no need to add an HTML meta tag or CSS @charset declaration to each file's inner contents. On the other hand, when no charset identifier is configured for a given MIME-type, the response headers omit the charset attribute completely.

The content-encoding configuration section is used to define which compression algorithm to use based on media types. It comprises a collection of two-part entries: the left hand side is the MIME-type, and the right-hand side is the compression algorithm.

The accept-types configuration section is used to declare which types can be negotiated. It comprises a collection of two-part entries: the left hand side is the literal string mime-type, and the right-hand side is a MIME-type that adheres to the IETF RFC 6838 specifications.

Placement

The content-types, charset, content-encoding, and accept-types configuration sections may appear in either the server section or a host section. When values occur in both the server and host sections, they are merged according to the standard rules defined for the merge attribute.

EBNF

SP	::=	U+20
CR	::=	U+0D
SOLIDUS	::=	U+2F
ASTERISK	::=	U+2A
LEFT-CURLY-BRACKET	::=	U+7B
RIGHT-CURLY-BRACKET	::=	U+7D
media-type	::=	'text' \| 'application' \| 'image' \| 'audio' \| 'video' \| 'multipart'
subtype	::=	(ALPHA \| DIGIT \| ^†)*
MIME-type	::=	media-type SOLIDUS subtype
filename-extension	::=	(ALPHA \| DIGIT \| ^††)*
content-type-entry	::=	filename-extension SP MIME-type CR
content-types-section	::=	'content-types' SP LEFT-CURLY-BRACKET CR content-type-entry* RIGHT-CURLY-BRACKET CR
charset-identifier	::=	(ALPHA \| DIGIT \| ^†††)*
charset-entry	::=	MIME-type SP charset-identifier CR
charset-section	::=	'charset' SP LEFT-CURLY-BRACKET CR charset-entry* RIGHT-CURLY-BRACKET CR
compression-algorithm	::=	'gzip' \| 'deflate' \| 'none'
content-encoding-entry	::=	MIME-type SP compression-algorithm CR
content-encoding-section	::=	'content-encoding' SP LEFT-CURLY-BRACKET CR content-encoding-entry* RIGHT-CURLY-BRACKET CR
accept-type-entry	::=	'mime-type' SP* MIME-type CR
accept-types-section	::=	'accept-types' SP LEFT-CURLY-BRACKET CR accept-type-entry* RIGHT-CURLY-BRACKET CR

† See section 4.2 of RFC 6838 for exact rules

†† Legal file system characters vary by platform

††† See IETF RFC 2978 for guidance

Cookbook

Example 1: Filename extensions associated with MIME-types

server {
    content-types {
        css   text/css
        csv   text/csv
        html  text/html
        txt   text/plain
        blue  text/blue
        md    text/markdown
        wiki  text/x-mediawiki
        haml  text/x-haml
        yaml  text/vnd.yaml
        toml  text/toml
        ini   text/plain

        js    application/javascript
        json  application/json
        pdf   application/pdf
        xhtml application/xhtml+xml
        xml   application/xml
        plist application/xml
        
        gif   image/gif
        jpg   image/jpeg
        png   image/png
        svg   image/svg+xml
        webp  image/webp
        ico   image/x-icon
        
        mp3   audio/mpeg
        mp4   video/mp4
        weba  audio/webm
        webm  video/webm
    }
}

Example 2: Character set identifiers associated with MIME-types

server {
    response {
        charset {
            text/css               utf-8
            text/html              utf-8
            text/plain             windows-1252
            application/xhtml+xml  iso-8859-1
        }
    }
}

Example 3: content-encoding algorithms associated with MIME-types

server {
    content-encoding {
        text/css               gzip
        text/csv               gzip
        text/html              gzip
        text/plain             gzip
        text/yaml              gzip

        application/javascript gzip
        application/json       gzip
        application/pdf        none
        application/xhtml+xml  gzip
        application/xml        deflate
        application/gzip       none
        
        image/gif              none
        image/jpeg             none
        image/png              none
        image/svg+xml          gzip
        image/webp             none
        image/x-icon           none
        
        audio/mpeg             none
        video/mp4              none
        audio/webm             none
        video/webm             none
    }
}

Example 4: MIME-types declared for browser accept-types negotiation

server {
    accept-types {
        mime-type text/html
        mime-type text/plain
        mime-type text/css
        mime-type application/xhtml+xml
        mime-type application/xml
        mime-type application/json
        mime-type application/javascript
        mime-type application/pdf
        mime-type image/png
        mime-type image/jpeg
        mime-type image/gif
        mime-type image/svg+xml
        mime-type image/webp
        mime-type audio/webm
        mime-type audio/ogg
        mime-type audio/mpeg
        mime-type audio/wav
        mime-type audio/flac
        mime-type video/webm
        mime-type video/ogg
        mime-type video/mp4
    }
}

Review

Key points to remember:

The content-types section associates filename extensions to MIME-types.
The charset section associates MIME-types with character set identifiers.
The content-encoding section associates MIME-types with compression algorithms.
The accept-types section lists MIME-types that may be served by negotiated settlement with the browser.

In-depth reading

Content Types

Mapping filename extensions to MIME types

This note describes how to map filename extensions to MIME-types to declare which filename extensions are used by the server and what their inner contents contain.

This note describes how character set declarations can be added to response headers for documents, style sheets, and text files.

Templates & Content

READ WRITE HUB

Rediscover HTML

BLUE PHRASE

Using MIME types to control the behavior associated with filename extensions

MIME Types

Configuration

Placement

EBNF

Cookbook

In-depth reading

Content Types

Mapping filename extensions to MIME types

This note describes how to map filename extensions to MIME-types to declare which filename extensions are used by the server and what their inner contents contain.

content-type, media-type, MIME-type, filename extension, IETF RFC 6838

Charsets

Declaring which character set to use with documents, style sheets, and text files

This note describes how character set declarations can be added to response headers for documents, style sheets, and text files.

UTF-8, ISO-8859-1, Windows-1252, Shift_JIS, BIG5, EUC-KR, IETF RFC 2978, IETF RFC 6838

Content Encoding

Saving bandwidth and increasing throughput using compression

This note describes how to configure the server to compress files while in transit between the server and browser.

file compression, accept-encoding, content-encoding, gzip, deflate, Ethernet MTU