[foms] WebM Manifest
philipj at opera.com
Thu Apr 7 08:34:31 PDT 2011
On Thu, 17 Mar 2011 21:10:46 +0100, Jeroen Wijering
<jeroen at longtailvideo.com> wrote:
> On Mar 17, 2011, at 1:47 PM, Philip Jägenstedt wrote:
>>> Next, the Stream API needs to be very strictly defined in terms of how
>>> provided A/V frames should be formatted, and how and when codec
>>> initialization data must be (re)sent.
>>> allowing for much flexibility. At the same time, the amount of
>>> knowledge required for such an API would be so staggering (e.g. full
>>> understanding of video containers) that few people would be able to
>>> work with it.
>> I may very well be in need of education, but I don't see why that needs
>> to be the case.
>> Assume a manifest at its simplest is a list of URLs and switchover
>> times. If one has a "manifest API" that allows one to add URLs and
>> switchover times, then surely anything that can be done with a manifest
>> can be done with the API? If a manifest solution doesn't require
>> inspecting the data outside of the normal decoding, why would it be
>> necessary when one uses an API?
> When portraying a manifest as a "list of URLs" you are following an
> approach similar to Apple HLS, imposing two restrictions on your
> 1. Interleaving. Only when audio + video are interleaved in fragments,
> you can have a list of URLs. Your presentation is basically chopped up
> vertically (time) instead of horizontally (stream). In any case where
> you have more than one quality level of a certain track, there is data
> duplication. Sometimes (5 video qualities with the same audio, like
> Apple HLS) that might be acceptable. Sometimes (5 video qualities with 5
> audio languages) the amount of data simply explodes.
> 2. Initialization. Only when every fragment is self-initializing (every
> fragment contains all codec configuration), you can have a list of URLs.
> any fragment and do random subsequent switching. Every container format
> has its peculiarities that makes this amount of data not trivial - e.g.
> Vorbis initialization requires a couple of kB.
> These restrictions can probably be worked around (A+V buffer,
> initialization segments), but they do complicate things - both for the
> - full video container knowledge is probably not required by
> flexible approach than Apple HLS.
OK, so perhaps an API with the functionality we need would be a better
approach. This is similar to how <track> (captions) is handled, while
there exists a simple baseline format (WebVTT), scripts can do fancy stuff
using the API. For adaptive streaming, specific advantages of bringing
scripts into the mix is allowing experimentation on the rate switching
algorithms and allowing site-specific schemes for URL patterns with live
streaming that does away with the need for ever re-fetching a manifest or
having a complex manifest for declaratively giving the URL pattern.
More information about the foms