Standards for HTTP Adaptive Streaming

Session Notes

Session Type: Working group

Session Category: Open Media Developers

Session Leader: Silvia Pfeiffer (Google), Frank Galligan (Google)

Day: Sunday
Room: Faculty Commons
Time: 11:30pm – 1:00pm

Description

One of the most popular features of modern Flash players is the ability to adapt the bitrate of the streamed video to the available bandwidth, also called HTTP adaptive streaming. HTML5 browsers (with the notable exception of Safari, which supports Live Streaming) do not yet suport this feature because it has not been standardised yet in a codec-independent manner.

ISO/MPEG have developed the DASH specification, which may be applied in a codec-independent manner and therefore be a good option for HTML5.

While there are several solutions for HTTP adaptive streaming of MPEG video, none has been released for WebM, nor a standard set that works across media formats.

Experiments have been run in several frameworks for WebM to see how it can work in comparison to MPEG. This session gives the developers an opportunity to report on their experiences and to discuss how to move forward for standardisation across browsers and codecs.

Outcome

This session will give developers the opportunity to discuss their experiences with different approaches to HTTP adaptive streaming and allow to come to a conclusion as to what approach may be the best to propose for HTML5.


Notes

Frank Galligan (Google)
Aaron Colwell (Google)
Mark Watson (Netflix)

Mark: Introduction to DASH

  • MPEG is finalizing the DASH specification
  • DASH is a manifest format for describing all available streams for a media resource
  • DASH also defines file formats that are suitable for making use of this with MPEG-4 File Format; transport streams work "magically"
  • there is also a common encryption scheme for MPEG-4 file format in DASH
  • defined profiles: 2 for mp4, 2 for transport streams
  • for transport streams there is a on-demand and a live profile:
    • on-demand profiles require alignment of files
    • live require chunking of the files
  • example manifest
    • AdaptationSet: groups per resource or per track
    • Representation: different representations for same resource
    • muxed cases are difficult in particular to align the chunks
  • features:
    • multiple tracks / accessibility
    • trick modes (e.g. deal with fast forward by dropping to lower framerate)
    • 3D, multi-view, scalable video
    • protected content
  • implementation: Netflix is using this format

Comments:

  • the DASH file format is very complex and could be simplified and not require such a complicated parser
  • to implement it with Matroska, you will need to extract the Matroska cluster index
  • nicely arranged media in chunks and index them (when authoring) really helps to simplify the adaptive streaming
  • example bitstreams and implementations are in the process of being developed in MPEG

Aaron: Media Source API

  • implemented in Chrome for adaptive streaming, but may be applicable to other use cases
  • an extension to HTMLMediaElement that allos to pipe data into audio and video, similar to appendBytes in Flash
  • stream switching, buffering, delivery, manifest strategies should not be done by the browser - the browser just provides an API to dynamically create content
  • supports adaptive streaming, mashups, ad insertion, streaming services, DVR, video editors (constraint editing)
  • proposed IDL:
    • special URL that the browser advertises: webkitMediaSourceURL (media data comes from JS)
    • webkitSourceAppend() call with error handling
    • webkitSourceEdnOfStream with states to manage stage handling
interface HTMLMediaElement : HTMLElement {
    ...
    // URL passed to src attribute to enable the media source logic.
    readonly attribute DOMString webkitMediaSourceURL;

    // Appends media to to the source.
    void webkitSourceAppend(in Uint8Array data) raises (DOMException);

    // Signals the end of stream.
    const unsigned short EOS_NO_ERROR = 0; // End of stream reached w/o error.
    const unsigned short EOS_NETWORK_ERR = 1; // A network error triggered end of stream.
    const unsigned short EOS_DECODE_ERR = 2; // A decode error triggered end of stream.
    void webkitSourceEndOfStream(in unsigned short status) raises (DOMException);

    // Indicates the current state of the media source.
    const unsigned short SOURCE_CLOSED = 0;
    const unsigned short SOURCE_OPEN = 1;
    const unsigned short SOURCE_ENDED = 2;
    readonly attribute unsigned short webkitSourceState;
    ....
};
  • browser doesn't care how JS gets the seek point and the time stamp - it just allows the JS developer to deal with it
  • you would use JS to parse the Ogg packets
  • Draft spec available: http://html5-mediasource-api.googlecode.com/svn/trunk/draft-spec/mediasource-draft-spec.html
  • supports playback of a minimalist WebM byte stream (INFO, TRACKS, CLUSTER)
  • supports muxed & demuxed streams
  • bulk of code is check into WebKit and Chromium, but #ifdef-ed out; about to add command-line flag
  • two demos available: basic VOD & adaptive streaming

Next steps:

  • make byte stream look more like WebM live streams, so JS handling becomes easier
  • implement frame size change & possibly Vorbis codebook change support
  • publish a byte stream spec for WebM
  • release command-line flag in Chrome 16 to encourage demos to iron out issues
  • find out how to support MP4 and Ogg with this API
  • for Ogg providing an index to simplify seeking would help (this is essentially a shoutcast stream)
  • start standardisation discussion at WHATWG

Have break-out group tomorrow about this!!!

Discussion:

  • have you considered ad blocking as an application; we expect there will be the jquery equivalent for adaptive streaming
  • problem with implementing adaptive streaming in the player is that it is hard to innovate; if we push that into JS, it is easier to improve and allow support; a more fundamental API for how to manipulate video is useful and the browser is just used as a playback engine
  • don't use encoded frames, but clusters to deal with larger chunks and make it still performant
  • how will this deal with offline viewing?

Frank: adaptive streaming demo

  • I have a manifest file in xml, but it could be in JSON
  • average bitrates for streams are being shown in video player
  • browser downloads and parses manifest, starts with lowest bitrate stream
  • demo limits the birate to 1000kbps artificially
  • JS gets datarate and decides to switch based on that
  • chunks are for video150 frames length (6.25s), audio at 5s
  • I'm not switching audio right now
  • mediaSource API works on muxed files, so you need unique track IDs
  • to start playback, you need to completely download the first chunk

Discussion:

  • it would be nice to just have one request per adaptation and then just pull chunks off that same connection; right now every chunk requires another XHR range request
  • feels like you're starting to implement a Web browser in JS with XHRs; something with a lower latency might be useful
  • originally we wanted to have the browser just deal with the manifest, but we discovered that this is much simpler
  • seeking latency is determined by the time it takes to download a whole chunk
  • JS devs want early access to data; maybe we can extend XHR to get a stream object
  • media.buffered property allows you to show the time ranges, because the browser parses the appended buffer as soon as it got it
  • demo doesn't deal well yet with buffer underrun - the time in the player controls increases, then jumps back
  • storing the buffers to disk would help in doing offline playback and deal with later; maybe local storage can help, or maybe the browser cache might solve it where there is a special media cache
  • it might be worth providing an API to the media cache that the JS can invalidate chunk-wise

Questions:

  • SourceAppend ask for byte ranges on each media resource - how does that work for live? you need to use chunked files for live and not use byte range requests any more
  • can we do JSON for DASH?
  • manifest support of Frank is completely implemented in JS:
    • Presentation - MediaInterval - MediaGroup - with list of Media elements, their header and index identified through byte ranges
    • it's possible to do it with chained WebM
  • how do we deal with errors, where the web author feeds in the wrong files? JS does not need to do any parsing - it just needs to pull in the right byte ranges and hand the chunks to the browser for decoding
  • error reporting on media element would be similar to how it happens on real files; e.g. JS calls end-of-stream when it runs out of packets; if bad chunks are being handled to the decoder, the decoder will deal with it; other than decode and network errors, there are none; we may need some further errors for the JS