Media Capabilities Detection
Overview
Motivation: Alex worked for different media companies for the past 10 years. Some of the issues they always run into when enabling advanced features: new codecs, HDR, etc.
- What device can we actually deliver this to?
- How many devices support Dolby ATMOS or HEVC?
For product planning: nobody goes and test for features they don’t support at the moment. Try to estimate ahead of time.
Closer we are: HTML media capabilities API. Fairly recent, not widely supported, inconsistent behavior.
Figure out:
Doesn’t seem that complicated on paper:
- DRM
- CMAF support
- HDR
- multi-channel audio
Wonder: as an industry, can we come up with a cross-platform universal standard on how to detect media capabilities? Is there some progress that can be made so it’s more consistent: what API you call, what response you get.
If you look at media capabilities API: describes syntax, but not semantics. Eg: HLG, Dynamic HDR metadata, is it supported by the decoder? Is there a screen supporting it?
Supplemental codecs in the MIME type. Eg: codec HEVC, supplemental codec is some Dolby specific attribute.
Depends on how the feature intersects: Apple TV 4k, doesn’t support HDR & 60fps at the same time, but it supports them separately.
The way that chrome has done it: based on historical data from previous playouts.
It’s fair to separate the question of performance from capability: Can I play this vs. How well will it perform?
HDR content or Multi-channel audio: you have content available in SDR / HDR, ATMOS & stereo audio. A lot of the players are not good at figuring out what they can play vs. what they should play.
Risk: fingerprinting problem.
Proposed solution: Library (Alex Z)
- Implement on the Common Media Library
- The goal would be to have a cross-platform library that you can integrate into your apps.
- In the apple ecosystem, playable and smooth is tested to be able to reproduce 2x the same stream. To have a 50%
Main Stakeholders:
- Device manufacturers & App development platform owners (Tizen, Apple, PS, XBox?, etc.). Probably CTA-WAVE is the right place.
In terms of spec (specially on the browser), for dynamic environments it makes sense to have a listener service. It changes if you plug / unplug a device and have HDR or HDCP support. Key status change.
The idea behind media capabilities is that it’s invariant!
CastLabs? implementation: now they have a lot of IFs? On Android, there are some weird chipsets that you need to do some workarounds. For instance text matching to identify hardware vs. software decoders. There’s no competitive advantage between players.
Is this w3c? Wave?
Does it make sense to define something like “profile / level” from codecs for capabilities?
Is there a point where you stop using this capabilities API?
- There are very few things you can assume work. E.g. H.264
Some challenges:
- There are hw decoders that have a min resolution (at apple). 360p video for instance.
- Number of frames in an Android decoder input buffer to get an output buffer.
- Can I have multiple decoders running at the same time?
- Can I switch between codecs seamlessly?
Where to have this discussion:
- Some of this could be discussed in W3C?.
- SVTA & CML intermediary query API that integrates to media capabilities
- CTA-Wave
Detection of peripherals connected?=
- When the capabilities of the current window change (e.g. new display).
- Equivalent for audio: spatial support, hdmi with DD passthrough, etc?
- Difficult to define where that API should sit.
Aggregation for information (Christian):
- In the past, two places that would be obvious for this:
- Browser list
- Web Platform Tests
CTA-Wave is doing something similar for DASH.
The downside about doing it on a database is that media capabilities change, even depending on what the device is connected to (e.g. DD+ passthrough)
Thasso: As a community we should have a caniplay.tv
Next Steps / Action Items:
- Alex Z will talk to JT about the SVTA WG: Getting the Use Cases documented. All the capabilities we care about.
- Beyond that: work with w3c addressing any gaps
- Then CTA-Wave could provide guidelines on what should be supported.