Understanding the Structure of MPEG-DASH Manifests
MPEG-DASH (Dynamic Adaptive Streaming over HTTP) has revolutionized the way we consume video content by enabling seamless playback across a variety of devices and network conditions. At the heart of this technology lies the manifest, a crucial component that orchestrates the delivery of multimedia streams. This blog will explore the complexities of MPEG-DASH manifests., exploring their structure, key elements, and significance in adaptive streaming.
Sample MPEG-DASH Manifest (MPD) File
<?xml version="1.0" encoding="UTF-8"?>
<MPD xmlns="urn:mpeg:dash:schema:mpd:2011"
profiles="urn:mpeg:dash:profile:isoff-on-demand:2011"
type="static"
minBufferTime="PT1.5S"
mediaPresentationDuration="PT600S"
maxSegmentDuration="PT2S">
<Period id="1" start="PT0S">
<AdaptationSet id="1" mimeType="video/mp4" codecs="avc1.4d401f" width="1920" height="1080" frameRate="30" startWithSAP="1">
<SegmentTemplate timescale="90000" media="video_$Number$.mp4" initialization="video_init.mp4" duration="180000"/>
<Representation id="1" bandwidth="5000000" />
</AdaptationSet>
<AdaptationSet id="2" mimeType="audio/mp4" codecs="mp4a.40.2" lang="en">
<SegmentTemplate timescale="48000" media="audio_$Number$.mp4" initialization="audio_init.mp4" duration="192000"/>
<Representation id="1" bandwidth="128000" />
</AdaptationSet>
</Period>
</MPD>
Breakdown of Each Tag and Branch:
1. MPD (Media Presentation Description)
- The root element of the DASH manifest file.
- Attributes:
- xmlns: Namespace definition for DASH XML schema.
- profiles: Indicates the DASH profile used. In this case, it’s
"isoff-on-demand"
, used for on-demand streaming. - type: Defines if the stream is
static
(pre-recorded content) ordynamic
(live content). Here, it’sstatic
. - minBufferTime: Minimum buffer time that the player needs to start playing the stream, here
PT1.5S
(1.5 seconds). - mediaPresentationDuration: Total duration of the media presentation (
PT600S
, meaning 600 seconds or 10 minutes). - maxSegmentDuration: Specifies the maximum duration of a single segment (
PT2S
, meaning 2 seconds per segment).
2. Period
- Represents a timeline of media content, which could have multiple periods (e.g., ad breaks).
- Attributes:
- id: Unique identifier of the period.
- start: Specifies when this period starts (
PT0S
, meaning 0 seconds, i.e., at the beginning).
Contents:
- Contains one or more AdaptationSet elements that define different streams (e.g., video and audio).
3. AdaptationSet
- Groups different representations of media (e.g., video of different qualities or audio in multiple languages).
- Attributes:
- id: Unique identifier for the adaptation set.
- mimeType: The MIME type of the content, e.g.,
"video/mp4"
for video or"audio/mp4"
for audio. - codecs: Specifies the codec used for encoding, e.g.,
"avc1.4d401f"
for H.264 video or"mp4a.40.2"
for AAC audio. - width and height: Resolution of the video in pixels (only in video adaptation sets).
- frameRate: Frame rate of the video stream, e.g.,
30
frames per second. - startWithSAP: Indicates the presence of an access point that allows playback to start in the middle of the stream.
4. SegmentTemplate
- Defines the template for locating and naming media segments.
- Attributes:
- timescale: Time scale of the media, meaning how many time units there are per second (e.g.,
90000
units per second for video or48000
for audio). - media: URL pattern for media segments, where
$Number$
will be replaced by the segment number (e.g.,video_$Number$.mp4
). - initialization: Specifies the initialization segment (e.g.,
video_init.mp4
oraudio_init.mp4
), which is needed before the media segments can be played. - duration: Duration of each segment in the timeline units, e.g.,
180000
for video, which means 2 seconds per video segment (180000 / 90000 timescale).
- timescale: Time scale of the media, meaning how many time units there are per second (e.g.,
5. Representation
- Describes a specific encoding of the media (e.g., different bitrates, resolutions, or language tracks).
- Attributes:
- id: Unique identifier for the representation.
- bandwidth: Bitrate of the media stream in bits per second (e.g.,
5000000
for video and128000
for audio).
Key Points:
- MPD is the root element, defining the overall structure, content type, and duration.
- Period contains the media content, starting at a specific time.
- AdaptationSet groups different versions of the media, such as video or audio streams, in different qualities or languages.
- SegmentTemplate describes how the media segments are structured, including where to find them and how they are named.
- Representation provides detailed information about each version of the media, like resolution and bitrate.
This structure allows for adaptive streaming, where the player can switch between different Representations based on available bandwidth and device capability.
MPEG-DASH with DRM information
In an MPEG-DASH manifest, Digital Rights Management (DRM) is usually applied via a protection mechanism using Content Protection elements like ContentProtection
, cenc:default_KID
(Key ID), and DRM-specific information (such as Widevine, PlayReady, or FairPlay). Here’s how DRM key information and KID would look in the same MPEG-DASH manifest.
<?xml version="1.0" encoding="UTF-8"?>
<MPD xmlns="urn:mpeg:dash:schema:mpd:2011"
profiles="urn:mpeg:dash:profile:isoff-on-demand:2011"
type="static"
minBufferTime="PT1.5S"
mediaPresentationDuration="PT600S"
maxSegmentDuration="PT2S">
<Period id="1" start="PT0S">
<AdaptationSet id="1" mimeType="video/mp4" codecs="avc1.4d401f" width="1920" height="1080" frameRate="30" startWithSAP="1">
<!-- Content Protection for DRM -->
<ContentProtection schemeIdUri="urn:uuid:edef8ba9-79d6-4ace-a3c8-27dcd51d21ed"
value="Widevine">
<cenc:pssh>Base64EncodedPSSHData</cenc:pssh>
</ContentProtection>
<ContentProtection schemeIdUri="urn:uuid:9a04f079-9840-4286-ab92-e65be0885f95"
value="PlayReady">
<mspr:pro>Base64EncodedPROData</mspr:pro>
</ContentProtection>
<ContentProtection schemeIdUri="urn:mpeg:dash:mp4protection:2011"
value="cenc"
cenc:default_KID="12345678-1234-5678-1234-567812345678"/>
<SegmentTemplate timescale="90000" media="video_$Number$.mp4" initialization="video_init.mp4" duration="180000"/>
<Representation id="1" bandwidth="5000000" />
</AdaptationSet>
<AdaptationSet id="2" mimeType="audio/mp4" codecs="mp4a.40.2" lang="en">
<!-- Content Protection for DRM -->
<ContentProtection schemeIdUri="urn:uuid:edef8ba9-79d6-4ace-a3c8-27dcd51d21ed"
value="Widevine">
<cenc:pssh>Base64EncodedPSSHData</cenc:pssh>
</ContentProtection>
<ContentProtection schemeIdUri="urn:uuid:9a04f079-9840-4286-ab92-e65be0885f95"
value="PlayReady">
<mspr:pro>Base64EncodedPROData</mspr:pro>
</ContentProtection>
<ContentProtection schemeIdUri="urn:mpeg:dash:mp4protection:2011"
value="cenc"
cenc:default_KID="87654321-4321-8765-4321-876543218765"/>
<SegmentTemplate timescale="48000" media="audio_$Number$.mp4" initialization="audio_init.mp4" duration="192000"/>
<Representation id="1" bandwidth="128000" />
</AdaptationSet>
</Period>
</MPD>
Explanation of DRM-Related Elements:
- ContentProtection Element: This element contains DRM information for each media stream (video/audio). Each DRM system has a unique identifier (UUID) and may include system-specific data. Multiple
ContentProtection
elements can be used to support multiple DRM systems.- schemeIdUri: Specifies the type of DRM used by the adaptation set. Common UUIDs include:
- Widevine:
urn:uuid:edef8ba9-79d6-4ace-a3c8-27dcd51d21ed
- PlayReady:
urn:uuid:9a04f079-9840-4286-ab92-e65be0885f95
- Common Encryption (CENC):
urn:mpeg:dash:mp4protection:2011
- Widevine:
- value: Indicates the DRM system name (e.g.,
Widevine
orPlayReady
).
- schemeIdUri: Specifies the type of DRM used by the adaptation set. Common UUIDs include:
- cenc(Key Identifier):
- The KID (Key ID) identifies the decryption key for a specific piece of content. Each segment (or representation) has an associated default_KID, a 128-bit value represented as a UUID.
- Example:
"12345678-1234-5678-1234-567812345678"
is the Key ID for the video stream in this manifest. This Key ID links the encrypted content to its decryption key managed by the DRM system.
- cenc(Protection System Specific Header):
- The pssh (Protection System Specific Header) contains the DRM license information in base64-encoded format. It’s specific to the DRM system (e.g., Widevine, PlayReady).
- This data contains the keys, licenses, or instructions necessary for the DRM system to decrypt the content.
- mspr(PlayReady Object):
- The pro element is a PlayReady-specific object that contains licensing information encoded in base64 format. It’s used to fetch the license for decrypting the PlayReady protected content.
Key Points:
- schemeIdUri: Identifies the DRM system (e.g., Widevine, PlayReady).
- cenc: The unique identifier (UUID) that points to the decryption key for that specific adaptation set (e.g., video or audio).
- cenc: DRM-specific encrypted metadata (e.g., for Widevine).
- mspr: PlayReady-specific DRM metadata.
The combination of these elements ensures that the encrypted video and audio segments are properly decrypted by the player’s DRM client during playback