Stream Recomposition Example

The Problem

An Annodex (or for that matter also an Ogg) stream is composed of several temporally interleaved bitstreams. These bitstreams produce data at their own rate - some very densly (such as vorbis) and some very sparsely (such as CMML). A stream recomposition is required when e.g. a temporal URI like example.anx?t=2000 is requested. Such a recomposition has to follow several requirements:

  • no decoding of the logical bitstreams is performed to avoid delays.
  • no changes to the pages, in particular to the granule positions are made also to avoid delays.
  • data/field changes occur only to the control section.
  • the data section is cut as little as necessary (possibly only at one location) to allow simple recombination of separate sections from caches.

You've got to be careful here to not destroy the caching capability of the stream as HTTP/1.1 caching is based on byte-ranges. So, when you cut a section out of the physical bitstream, you must retain a direct mapping between the query time and the byte range in the original file on the server. However, there is no problem with referencing multiple byte ranges in the original, as in any case would need to be done to serve a query containing multiple time ranges.

Issues

Essentially what you want is that: you first seek to the right point in the stream from which on to stream out, change the control section to adapt presentation time and granulebase for each bitstream, and stream out this new control section plus the data from the seek point onwards. There are some problems about this though:

  1. where do you seek to? In a correctly interleaved file, a sparse stream such as CMML may have a still active Ogg page quite some time ago (see the example below) and there are plenty of data pages of other bitstreams in between which are not relevant. Do you cut at that point in the "past" and take all pages from there or do you ignore this sparse stream for identifying the cut position?
  2. if you don't ignore sparse streams on seeking, what do you do with the irrelevant data pages of the other logical bitstreams in between that early point and the requested position? Do you include them in the stream or do you not?
  3. if you do not include in between data pages, how do you still keep the recomposed stream cachable?

Example

An example of a stream (not to scale ;)

                                       
              5           1200  1500   1600  1800  2000(remux point)
 CMML     ... E                                    :   ...
 Theora                    ... E|------------------:-E ...
 Vorbis              ... E|-----------E|----E|-----:-----E ...

 : = remux point
 | = start of packet
 E = end of packet (which will contain the granulepos for that packet)

We have this situation:

  • Display everything starting with a presentation time of t=2000
  • The last CMML packet that occurs before the presentation time is at t=5
  • The last Theora packet that occurs before the presentation time is at t=1500
  • The last Vorbis packet that occurs before the presentation time is at t=1800. The page previous to that is t=1600, and the page previous to that is at t=1200.

Proposal A

  • For each logical bitstream (in the fisbone), have a boolean flag that specifies "this bitstream is relevant when seeking"
  • For the entire physical stream (in the fishead), have a boolean flag that specifies "include intermittent data"

Proposal B

(note from S: this is the same as just doing the first point of proposal A - intermittent data is then always included)(yes, except that intervening data isn't included -- conrad)

  • An extra boolean flag per logical bitstream (in the fisbone). If the flag is set, this indicates that when remuxing you must include the _previous_ packet from this track (and from there, its preroll/keyframe), regardless of whether or not any packets post-presentation-time have a backwards dependency on i

This satisfies the Issues as follows:

  1. you don't cut at the point in the past. If a previous packet from one logical bitstream is required (eg. a cmml clip), you include just it -- there's no reason to include packets from other bitstreams, as they do not contribute to correct presentation at the presentation time.
  2. you don't include intervening data from other streams
  3. you don't include intervening data from other streams

Proposal C

  • Replace the "preroll" flag with a "remux behaviour" field, which can have the following values:
  • -1: include _all_ packets in this stream before the presentation time
  • 0: don't include anything in this stream before the presentation time
  • > 0: include this many packets before the presentation time. Note that if the old preroll value was to be used, this is equivalent to the old preroll value + 1. e.g. for Vorbis packets, which require a preroll of 2, this field should be set to 3, which indicates that the last 3 packets before t=2000 should be included.

Proposal D

(outcome of discussion between K, Oz & S)

The CMML stream is the real problem, as it is mapped as an instantaneous bitstream and therefore there is no such thing as "still active" in the binary bitstream. So, if ogg seeking has found the accurate page for the given seek time, this will not take into account the cmml track. In order to take that into account, the previous CMML packet has to be located. To achieve this, we could use the keyframe shift hack that was introduced to handle theora and twist it to also be used for CMML bitstreams by making it point back to the previous CMML packet's granule pos.

Here's the idea:

  • the granulepos is split into two at the middle, the first half references the granulepos of the previous CMML packet ('s first page), the second half provides the difference to the actual granulepos the current packet is supposed to be at, e.g.
    keyframe    keyframeshift    actual granulepos
       |-------|-------|
       |     0 |  1000 |        1000
       |-------|-------|
       |  1000 |   400 |        1400
       |-------|-------|
       |  1400 |   200 |        1600
       |-------|-------|
       |  1600 |   400 |        2000
       |-------|-------|
    
  • upon hitting one of the CMML pages, the granulepos is interpreted such that it's necessary to backtrack to the given "keyframe" (previous CMML packet) on seeking, in accordance with the theora granulepos spec
  • the only difference is that if you actually hit a CMML page occuring at the correct seek time, you do not need to seek back
  • and you also don't do this backtracking recursively, but only as a one-shot operation
  • the positive side effect is that we now don't need another flag because if we don't want to look back onto previous CMML packets, we just introduce a keyframe of 0
  • on remuxing none of the intermittent non-useful pages of other streams are included, i.e. the new stream ends up being a recomposition from the old stream from several bitranges

A problem that this specification creates is that it does not work with CMML that has several overlapping track because backtracking will only happen to the previous CMML packet, not the previous CMML packet of the same track. The source of the problem is that CMML has tracks that may overlap in time and are encoded within one logical bitstream. One possible, but ugly and overly complex solution is to point back to the earliest previous CMML packet on all tracks as keyframe. A different solution is to encode tracks as different logical bitstreams, which in turn means we can discard the track attribute from the clip tag. For the moment and CMML 2.0 we'll ignore this problem.

Caching Web Proxy

A bit more information about the Web caching implementation and why the last proposal does not produce any more problems (this was worked out between Rob Collins and Conrad Parker some time ago). The caching works on the idea that the Web client asks the Web server to return fro a temporal URI query for a resource the byte ranges this URI refers to. Then, the Web client can re-send his URI to the server, but instead of including a temporal query, it just asks for the base resource and the byte ranges. This enables a Web proxy to build up a cache for a resource that consists of several byte ranges that can be filled up over time through further URI requests.

For proposal D, this means having several non-consecutive byte ranges be the answer to a URI request does not create any new problems.