A Media type (formerly known as MIME type) is an identifier for file formats and format contents. Media types are used by different internet technologies like e-mail or HTTP.
Media types consist of a type and a subtype. It can optionally contain a suffix and one or more parameters. Media types follow this syntax:
type "/" [tree "."] subtype ["+" suffix]* [";" parameter]
For example the media type for JSON documents is:
application/json
It consists of the type application with the subtype json.
A HTML document with UTF-8 encoding can be expressed as:
text/html; charset=UTF-8
Here we have the type text, the subtype html and a parameter charset=UTF-8 indicating UTF-8 character encoding.
A suffix can be used to specify the underlying format of a media type. For example, SVG images use the media type:
image/svg+xml
The type is image, svg is the subtype and xml the suffix. The suffix tells us that the SVG file format is based on XML.
Note that subtypes can be organized in a hierarchical tree structure. For example, the binary format used by Apache Thrift uses the following media type:
application/vnd.apache.thrift.binary
vnd is a standardized prefix that tells us this is a vendor specific media type.
The Content-Type header
With HTTP any message that contains an entity-body should include a Content-Type header to define the media type of the body.
The RFC says:
Any HTTP/1.1 message containing an entity-body SHOULD include a Content-Type header field defining the media type of that body. If and only if the media type is not given by a Content-Type field, the recipient MAY attempt to guess the media type via inspection of its content and/or the name extension(s) of the URI used to identify the resource. If the media type remains unknown, the recipient SHOULD treat it as type "application/octet-stream".
The RFC allows clients to guess the media type if the Content-Type header is not present. However, this should be avoided in any case.
Guessing the media-type of a piece of data is called Content sniffing (or MIME-sniffing). This practice was (and sometimes is still) used by web browsers and accounts for multiple security vulnerabilities. To explicitly tell browsers not to guess certain media types the following header can be added:
X-Content-Type-Options: nosniff
Note that the Content-Type header always contains the media type of the original resource, before any content encoding is applied. Content encoding (like gzip compression) is indicated by the Content-Encoding header.
Leave a reply