root/standards/draft-pfeiffer-oggskeleton-current.txt

Revision 3680, 46.0 kB (checked in by silvia, 5 months ago)

added to skeleton I-D that x/0 for basetime & presentationtime imply them being 0. Also added presentationtime >=0

Line 
1
2
3
4 Network Working Group                                        S. Pfeiffer
5 Internet-Draft                                                 C. Parker
6 Intended status: Informational                                   Annodex
7 Expires: May 4, 2008                                       November 2007
8
9
10              The "skeleton" meta information track for Ogg
11                      draft-pfeiffer-oggskeleton-00
12
13 Status of this Memo
14
15    This document is an Internet-Draft and is subject to all provisions
16    of Section 3 of RFC 3667.  By submitting this Internet-Draft, each
17    author represents that any applicable patent or other IPR claims of
18    which he or she is aware have been or will be disclosed, and any of
19    which he or she become aware will be disclosed, in accordance with
20    RFC 3668.
21
22    Internet-Drafts are working documents of the Internet Engineering
23    Task Force (IETF), its areas, and its working groups.  Note that
24    other groups may also distribute working documents as Internet-
25    Drafts.
26
27    Internet-Drafts are draft documents valid for a maximum of six months
28    and may be updated, replaced, or obsoleted by other documents at any
29    time.  It is inappropriate to use Internet-Drafts as reference
30    material or to cite them other than as "work in progress."
31
32    The list of current Internet-Drafts can be accessed at
33    http://www.ietf.org/ietf/1id-abstracts.txt.
34
35    The list of Internet-Draft Shadow Directories can be accessed at
36    http://www.ietf.org/shadow.html.
37
38    This Internet-Draft will expire on May 4, 2008.
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55 Pfeiffer & Parker          Expires May 4, 2008                  [Page 1]
56
57 Internet-Draft                  SKELETON                   November 2007
58
59
60 Abstract
61
62    This specification defines "Skeleton", a logical bitstream for the
63    Ogg encapsulation format version 0 [Ogg].  Skeleton is a header-style
64    bitstream that describes the content of the other logical bitstreams
65    encapsulated inside an Ogg container.  Its purpose is to remove
66    codec-specific information requirements from the multiplexing/
67    demultiplexing process.  It provides default structure and semantic
68    information to describe multitrack physical Ogg bitstreams.  There is
69    also a mechanism through which more information than the default can
70    be provided.
71
72    Please note that this document assumes that the reader understands
73    the Ogg encapsulation format version 0 [Ogg].  The specification of
74    Skeleton is not encumbered by patents.
75
76    The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
77    "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
78    document are to be interpreted as described in RFC 2119 [rfc2119].
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111 Pfeiffer & Parker          Expires May 4, 2008                  [Page 2]
112
113 Internet-Draft                  SKELETON                   November 2007
114
115
116 Table of Contents
117
118    1.  Features of Ogg and Skeleton . . . . . . . . . . . . . . . . .  4
119    2.  The Ogg skeleton logical bitstream . . . . . . . . . . . . . .  5
120      2.1.  The format of the skeleton ident header  . . . . . . . . .  6
121      2.2.  The format of the skeleton secondary headers . . . . . . .  8
122      2.3.  Media mapping of skeleton into Ogg . . . . . . . . . . . . 11
123    3.  Handling time in an Ogg format bitstream . . . . . . . . . . . 13
124      3.1.  Conceptual overview  . . . . . . . . . . . . . . . . . . . 13
125      3.2.  Mapping a granule position to a time position  . . . . . . 15
126      3.3.  Seeking into the bitstream . . . . . . . . . . . . . . . . 17
127      3.4.  Remultiplexing an Ogg bitstream using Skeleton . . . . . . 19
128    4.  Security considerations  . . . . . . . . . . . . . . . . . . . 20
129    5.  References . . . . . . . . . . . . . . . . . . . . . . . . . . 21
130    Appendix A.  Acknowledgments . . . . . . . . . . . . . . . . . . . 22
131    Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 23
132    Intellectual Property and Copyright Statements . . . . . . . . . . 24
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167 Pfeiffer & Parker          Expires May 4, 2008                  [Page 3]
168
169 Internet-Draft                  SKELETON                   November 2007
170
171
172 1.  Features of Ogg and Skeleton
173
174    Ogg is a container format for encapsulation of several tracks of
175    temporally interleaved bitstreams of time-continuous data.  It
176    enables encapsulation of any type of time-continuous data stream as
177    long as it is streamable.  Each track represents codec data for only
178    one type of time-continuous data stream.  Ogg is designed to be used
179    both as a persistent file format and as a streaming format to
180    exchange temporally addressable bitstreams.
181
182    Skeleton adds to Ogg a means to describe the codec tracks contained
183    inside Ogg. It assumes reasonably that for each logical bitstream
184    there is a regular data sampling rate (called granulerate).  For
185    variable sampling rate bitstreams, it assumes there is a common
186    multiple of the used sampling rates that is used as granulerate.
187
188    Codec tracks generally contain the following information:
189
190    o  setup information for a codec
191
192    o  content data
193
194    The setup information is inserted at the start of a data bitstream
195    before any content data.  Skeleton pulls out the key information
196    about the codecs from their headers and puts them into a defined
197    location in a defined manner, such that no decoding of logical
198    bitstreams is required to find out about the tracks of content
199    encapsulated inside Ogg.
200
201    An Ogg physical bitstream with a Skeleton track has the following
202    mandatory order of Ogg pages:
203
204    1.  skeleton bos page.
205
206    2.  bos pages of the other logical bitstreams.
207
208    3.  secondary header pages of all logical bitstreams, including
209        fisbone.
210
211    4.  skeleton eos page.
212
213    5.  data and eos pages of logical bitstreams, excluding skeleton,
214        multiplexed in a time-synchronous fashion.
215
216
217
218
219
220
221
222
223 Pfeiffer & Parker          Expires May 4, 2008                  [Page 4]
224
225 Internet-Draft                  SKELETON                   November 2007
226
227
228 2.  The Ogg skeleton logical bitstream
229
230    The purpose of Ogg skeleton is to provide codec-specific knowledge
231    that allows parsing, demultiplexing and remultiplexing of Ogg
232    bitstreams without having to decode.
233
234    While the Ogg encapsulation format by itself is capable of
235    interleaving an unlimited number of time-continuous bitstreams, it is
236    not possible to identify the type of bitstreams (e.g. audio or video)
237    and their encoding format (e.g.  Vorbis or Speex or Theora) without
238    decoding at least the bos page of the logical bitstreams.  Also,
239    further general media type information such as the image dimensions
240    of a frame in a video bitstream or the language of a speech bitstream
241    may be provided in skeleton.  Another limitation of Ogg is that each
242    logical bitstream defines its own mapping of granule_position to
243    time, which is therefore also given in the skeleton.
244
245    This section specifies the content of the "skeleton" logical
246    bitstream and how it is mapped into Ogg. Knowledge of the Ogg
247    bitstream format as specified in the Ogg RFC [Ogg] is presumed.
248    Please also refer to that document for descriptions of the terms used
249    in this document.
250
251    The skeleton bitstream has the ability to generically describe Ogg
252    bitstreams that consist of one or more time-continuous data bitstream
253    and one or more time-instantaneous data bitstream concurrently
254    interleaved (in Ogg terms: multiplexed).  It does not describe
255    sequentially multiplexed Ogg bitstreams, but rather expects that a
256    sequentially multiplexed bitstream has its own skeleton logical
257    bitstream.
258
259    The skeleton logical bitstream provides the following functionality
260    on top of Ogg:
261
262    o  allows for the identification of the codec format and the content
263       type of encapsulated logical bitstreams without the need to decode
264       that bitstream's headers or data.
265
266    o  allows for extraction of a temporal interval of the Ogg physical
267       bitstream while retaining the original start time offset of that
268       interval.
269
270    o  allows for attachment of a real-world wall-clock time and a date
271       to the Ogg physical bitstream, thus e.g. retaining creation date/
272       time or first broadcast date/time.
273
274    o  allows for temporal offset operations into an Ogg physical
275       bitstream without a need to decode any data.
276
277
278
279 Pfeiffer & Parker          Expires May 4, 2008                  [Page 5]
280
281 Internet-Draft                  SKELETON                   November 2007
282
283
284    o  allows generally for handling of content without a need to decode
285       it, such as is necessary in a caching Web proxy.
286
287    o  allows for attachment of message header fields given as name-value
288       pairs that contain some sort of protocol messages about the
289       logical bitstream, e.g. the screen size for a video bitstream or
290       the number of channels for an audio bitstream.
291
292 2.1.  The format of the skeleton ident header
293
294    The skeleton logical bitstream starts with an ident header containing
295    information for the complete Ogg physical bitstream.  The ident
296    header has the following format:
297
298     0                   1                   2                   3
299     0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1| Byte
300    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
301    | Identifier 'fishead\0'                                        | 0-3
302    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
303    |                                                               | 4-7
304    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
305    | Version major                 | Version minor                 | 8-11
306    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
307    | Presentationtime numerator                                    | 12-15
308    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
309    |                                                               | 16-19
310    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
311    | Presentationtime denominator                                  | 20-23
312    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
313    |                                                               | 24-27
314    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
315    | Basetime numerator                                            | 28-31
316    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
317    |                                                               | 32-35
318    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
319    | Basetime denominator                                          | 36-39
320    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
321    |                                                               | 40-43
322    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
323    | UTC                                                           | 44-47
324    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
325    |                                                               | 48-51
326    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
327    |                                                               | 52-55
328    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
329    |                                                               | 56-59
330    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
331    |                                                               | 60-63
332
333
334
335 Pfeiffer & Parker          Expires May 4, 2008                  [Page 6]
336
337 Internet-Draft                  SKELETON                   November 2007
338
339
340    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
341
342    Fields with more than one Byte length are encoded LSB (least
343    significant Byte) first.
344
345    The fields in the skeleton ident header have the following meaning:
346
347    1.  Identifier: a 8 Byte field that identifies this bitstream as a
348        skeleton.  It contains the magic numbers:
349
350           0x66 'f'
351
352           0x69 'i'
353
354           0x73 's'
355
356           0x68 'h'
357
358           0x65 'e'
359
360           0x61 'a'
361
362           0x64 'd'
363
364           0x00 '\0'
365
366    2.  Version major: 2 Byte unsigned integer signifying the major
367        version number of the skeleton bitstream.  This document
368        specifies the major version 3.
369
370    3.  Version minor: 2 Byte unsigned integer signifying the minor
371        version number of the skeleton bitstream.  This document
372        specifies the minor version 0.
373
374    4.  Presentationtime numerator & denominator: 8 Byte signed integer
375        each.  They represent together the time at which to start
376        presenting the Ogg physical bitstream given as a rational number.
377        The denominator represents the temporal resolution at which the
378        presentationtime is given.  E.g. 5 on 1000 results in a
379        presentationtime of 0.005 sec.  This enables a very high temporal
380        resolution without having to store floating point numbers.  In a
381        newly created physical bitstream presentationtime and basetime
382        are the same.  When remultiplexing a subpart of the stream, this
383        number MUST be adapted to the requested start time offset of the
384        newly created stream.  Presentationtime must always be larger or
385        equal to zero.
386
387
388
389
390
391 Pfeiffer & Parker          Expires May 4, 2008                  [Page 7]
392
393 Internet-Draft                  SKELETON                   November 2007
394
395
396    5.  Basetime numerator & denominator: 8 Byte signed integer each.
397        They represent together the basetime of the Ogg physical
398        bitstream given as a rational number like the presentationtime.
399        This number is fixed once the physical bitstream is created and
400        provides a mapping to time for the beginning of the physical
401        bitstream when it starts with a granule position of 0.
402
403    6.  UTC [ISO8601]: a 20 Byte string containing a UTC time in the form
404        of YYYYMMDDTHHMMSS.sssZ.  It associates a calendar date and a
405        wall-clock time with the basetime.  It is a sequence of 20 NUL
406        Bytes if not in use, making this ident packet and thus the bos
407        page of the skeleton bitstream constant length.
408
409    Please note: The possible temporal resolution of the presentation-
410    and basetime is on the order of 2^-64.  For example, the time formats
411    in use for media that are described in this document range from 1/24
412    to 1/60 for the different smpte formats [SMPTE].  This resolution is
413    enough for any one of these.  It is also expected to accommodate any
414    future needs of time resolution for any other time format and time-
415    continuously sampled data.
416
417    Please note further: A denominator of 0 in either presentationtime or
418    basetime is regarded as a special value and sets the respective time
419    to 0, no matter what the value of the numerator.
420
421 2.2.  The format of the skeleton secondary headers
422
423    The skeleton secondary headers are a sequence of packets that each
424    contain information about one of the time-continuous or time-
425    instantaneous other logical bitstreams contained within the Ogg
426    physical bitstream.  A skeleton secondary header packet has the
427    following format:
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447 Pfeiffer & Parker          Expires May 4, 2008                  [Page 8]
448
449 Internet-Draft                  SKELETON                   November 2007
450
451
452     0                   1                   2                   3
453     0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1| Byte
454    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
455    | Identifier 'fisbone\0'                                        | 0-3
456    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
457    |                                                               | 4-7
458    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
459    | Offset to message header fields                               | 8-11
460    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
461    | Serial number                                                 | 12-15
462    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
463    | Number of header packets                                      | 16-19
464    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
465    | Granulerate numerator                                         | 20-23
466    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
467    |                                                               | 24-27
468    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
469    | Granulerate denominator                                       | 28-31
470    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
471    |                                                               | 32-35
472    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
473    | Startgranule                                                  | 36-39
474    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
475    |                                                               | 40-43
476    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
477    | Preroll                                                       | 44-47
478    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
479    | Granuleshift  | Padding/future use                            | 48-51
480    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
481    | Message header fields ...                                     | 52-
482    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
483
484
485    Fields with more than one Byte length are encoded LSB (least
486    significant Byte) first.
487
488    The fields in a skeleton secondary header packet have the following
489    meaning:
490
491    1.   Identifier: a 8 Byte field that identifies this packet as a
492         skeleton secondary header for identifying other logical
493         bitstreams.  It contains the magic numbers:
494
495            0x66 'f'
496
497
498
499
500
501
502
503 Pfeiffer & Parker          Expires May 4, 2008                  [Page 9]
504
505 Internet-Draft                  SKELETON                   November 2007
506
507
508            0x69 'i'
509
510            0x73 's'
511
512            0x62 'b'
513
514            0x6f 'o'
515
516            0x6e 'n'
517
518            0x65 'e'
519
520            0x00 '\0'
521
522    2.   Offset to message header fields: 4 Byte unsigned integer that
523         contains the number of Bytes used in this packet before the
524         message header fields.  For the version of the skeleton
525         bitstream described in this document this number is fixed to 44.
526         This field accommodates future changes to the skeleton bitstream
527         allowing to parse message header fields even if more fields get
528         inserted before them.
529
530    3.   Serial number: 4 Byte unsigned integer containing the
531         bitstream_serial_number of the Ogg logical bitstream described
532         by this skeleton secondary header packet and thus connecting it
533         to the logical bitstream.
534
535    4.   Number of header packets: a 4 Byte unsigned integer that
536         contains the number of header packets of that particular logical
537         bitstream consisting of the bos page and the secondary header
538         pages.
539
540    5.   Granulerate numerator & denominator: 8 Byte signed integer each.
541         They represent the temporal resolution of the logical bitstream
542         in Hz given as a rational number in the same way as the basetime
543         attribute above.
544
545    6.   Startgranule: 8 Byte signed integer that represents the granule
546         number with which this logical bitstream starts, which is
547         originally 0, but will be a positive offset when only a subpart
548         of the stream is requested.
549
550    7.   Preroll: 4 Byte unsigned integer that contains the number of
551         packets to pre-roll in order to decode a current packet
552         correctly.  This is for example the case with Ogg Vorbis, which
553         requires a pre-roll of 2 packets.
554
555
556
557
558
559 Pfeiffer & Parker          Expires May 4, 2008                 [Page 10]
560
561 Internet-Draft                  SKELETON                   November 2007
562
563
564    8.   Granuleshift: a 1 Byte unsigned integer describing whether to
565         partition the granule_position into two for that logical
566         bitstream, and how many of the lower bits to use for the
567         partitioning.  The upper bits signify a time-continuous granule
568         position for an independently decodable and presentable data
569         granule.  The lower bits are generally used to specify the
570         relative offset of dependent packets, such as predicted frames
571         of a video.  Hence these can be addressed, though not decoded
572         without tracing back to the last fully decodable data granule.
573         This is the case with Ogg Theora; the general procedure is given
574         in section 3.2.
575
576    9.   Padding/future use: 3 Bytes padding data that may be used for
577         future requirements and are mandated to zero in this revision.
578
579    10.  Message header fields: header fields, following the generic
580         Internet Message Format defined in RFC 2822 [Headers].  Each
581         header field consists of a name followed by a colon (":") and
582         the field value.  Field names are case-insensitive.  The field
583         value MAY be preceded by any amount of LWS, though a single SP
584         is preferred.  Header fields can be extended over multiple lines
585         by preceding each extra line with at least one SP or HT.
586
587    There is one mandatory Message header field for all of the logical
588    bitstreams: the "Content-type" header field.  For an application that
589    is parsing the Ogg bitstream, this field contains the MIME type and
590    the character encoding of the data in the logical bitstream.  E.g.
591    for a bitstream containing Ogg Vorbis data the value is "Content-
592    type: audio/x-vorbis".  The Content-type message header field MUST
593    come first for all of the Message header fields such that it can be
594    found at a fixed location in the skeleton fisbone packet.
595
596    As per RFC 2277 [I18N], message header fields are considered protocol
597    data, i.e. it is not expected to have human readable text in there,
598    and they MUST be entirely encoded in UTF-8.  In addition, the
599    mandatory header fields MUST be encoded in US-ASCII and it is
600    recommended to also use US-ASCII code points as much as possible for
601    the optional header fields.
602
603    User defined optional message header fields MUST follow the naming
604    standard given in RFC2822.
605
606 2.3.  Media mapping of skeleton into Ogg
607
608    The media mapping for skeleton into Ogg is as follows:
609
610    o  The skeleton ident (fishead) header is mapped into the skeleton
611       bos page.
612
613
614
615 Pfeiffer & Parker          Expires May 4, 2008                 [Page 11]
616
617 Internet-Draft                  SKELETON                   November 2007
618
619
620    o  The secondary header pages of a skeleton logical bitstream consist
621       of the fisbone header packets that each describe one particular
622       logical data bitstream within the Ogg physical bitstream.
623
624    o  There are no content pages or data packets.  As the skeleton eos
625       page is included before the first data page of any logical
626       bitstream, there actually cannot be any content data packets.
627
628    o  The skeleton eos page MUST contain one packet of length zero.
629
630    When using a skeleton logical bitstream in Ogg, a further restriction
631    on the order in which Ogg pages appear is introduced to allow for
632    easier identification:
633
634    1.  The skeleton bos page is the very first bos page.  This allows
635        its differentiation from other Ogg bitstreams that don't contain
636        a skeleton logical bitstream.
637
638    2.  The bos pages of the other logical bitstreams come next as is a
639        requirement of the Ogg bitstream format.
640
641    3.  The secondary header pages of all the logical bitstreams in the
642        Ogg physical bitstream come next, as is also a requirement of
643        Ogg. The skeleton secondary header pages are also included here.
644
645    4.  Before any data pages of any of the logical bitstreams appear in
646        the Ogg physical bitstream, the skeleton eos page MUST end the
647        skeleton logical bitstream.  This is necessary to end the control
648        section of the bitstream.  If an Ogg stream parser reaches the
649        skeleton eos page, it knows that it has received all the bos and
650        secondary header pages and can start setting up its decoding or
651        parsing environment.
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671 Pfeiffer & Parker          Expires May 4, 2008                 [Page 12]
672
673 Internet-Draft                  SKELETON                   November 2007
674
675
676 3.  Handling time in an Ogg format bitstream
677
678    With time-continuous data inside Ogg, one needs to handle data at
679    four different levels:
680
681    o  at the Bytes level, upon seeking.
682
683    o  at the packets level, upon encapsulating.
684
685    o  at the granules level, upon recomposing.
686
687    o  at the time level, upon displaying and addressing.
688
689    This section explains how they all fit together.
690
691 3.1.  Conceptual overview
692
693    Ogg bitstreams inherently represent one timeline only, where the
694    different logical bitstreams can be thought of as content tracks on
695    that timeline.  All of these tracks relate to the same timeline which
696    starts at a certain time point and ends when the last bitstream ends.
697
698    An example bitstream can be seen in the following figure.  It
699    consists of an Ogg bitstream that contains 4 media bitstreams.  The
700    picture is a conceptual representation of the time intervals covered
701    by the different logical bitstreams and the Ogg pages used to
702    encapsulate the data.  In the flat representation these are
703    multiplexed such that the data packets of each of these bitstreams
704    occur at the correct time.
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727 Pfeiffer & Parker          Expires May 4, 2008                 [Page 13]
728
729 Internet-Draft                  SKELETON                   November 2007
730
731
732                              t_url
733                                |
734 t_0                            v                                      t_n
735 |------------------------------------------------------------------->|
736 ----------------------------------------------
737 |  |  |  |  |  |  |  |  |  |  |//|  |  |  |  |
738 ----------------------------------------------
739 audio bitstream 1
740         -------------------------------------------------------------
741         |     |     |     |/////|     |     |     |     |     |     |
742         -------------------------------------------------------------
743         video bitstream 1
744                  ----------------------------------------------------
745                  |  |  |  |  |//|  |  |  |  |  |  |  |  |  |  |  |  |
746                  ----------------------------------------------------
747                  audio bitstream 2
748                         -------------------------------
749                         |     |/////|     |     |     |
750                         -------------------------------
751                         video bitstream 2
752
753    The time point at which an Ogg bitstream starts (t_0 in the above
754    diagram) is called the "basetime" and represents the time in seconds
755    associated with the granule position of 0 on all logical bitstreams.
756    Typically, a newly created Ogg file starts all its logical bitstreams
757    at granule position 0, and a typical extract of an Ogg bitstream,
758    such as the one starting at t_url in the image above, starts each of
759    its logical bitstreams at a different granule positions.  These
760    granule positions are stored in the "startgranule" field of the
761    skeleton secondary header packets.
762
763    The "basetime" of an Ogg bitstream may be 0, but it can also be any
764    positive time.  For example, in professional video production, the
765    first frame of video of a program normally refers to a SMPTE basetime
766    [SMPTE] of 01:00:00:00, not 00:00:00:00 (see also the temporal URI
767    addressing [timedURI] specification).  Associating such a practice to
768    a digital video resource requires a way to store that basetime with
769    the resource and interpreting it correctly when addressing offsets
770    such as t_uri.  Skeleton provides such a mapping through the basetime
771    field in the skeleton ident header.
772
773    Also associated with the basetime is a calendar date [ISO8601] and
774    wall-clock time (a "UTC base") which represent a real-world time
775    giving some meaningful calendar date association to the content such
776    as the creation time or the first presentation time.  The UTC base is
777    specified in the UTC field of the skeleton ident header.
778
779
780
781
782
783 Pfeiffer & Parker          Expires May 4, 2008                 [Page 14]
784
785 Internet-Draft                  SKELETON                   November 2007
786
787
788 3.2.  Mapping a granule position to a time position
789
790    Each one of the encapsulated data bitstreams have their own temporal
791    resolution at which they provide data to cover the given timeline.
792    This temporal resolution is usually given through the sampling rate
793    of the particular bitstream.  For example, a raw audio bitstream at
794    CD quality is sampled with a sampling rate of 44100 Hz.  A video
795    bitstream may be sampled with a frame rate of 25 frames per second.
796
797    This temporal resolution is called the "granulerate".  A granule is a
798    data element that is based on a regular data rate specific to the
799    content type, such as the frame rate for video or the sampling rate
800    for audio.  It even exists for bitstreams that are not sampled at a
801    regular rate - then it is the highest resolution of any of the used
802    sampling rates.  The granulerate is specified in the skeleton
803    secondary header packets for each logical bitstream.
804
805    Each one of the bitstreams insert data into the Ogg bitstream through
806    packets which have an associated temporal duration based on the
807    encoder packaging.  Packets are packaged into Ogg pages, which have a
808    granule position associated with them.  Not taking the special case
809    of a granuleshift into account, the granule position specifies the
810    number of granules that has been encapsulated since the implicit
811    start of the original bitstream until and including the given Ogg
812    page.
813
814    The granule position together with the granulerate and granuleshift
815    information of the skeleton secondary header packets for the
816    particular logical bitstream are used for the calculation of the time
817    position for which a data packet of the logical bitstream completes
818    data.  A granule position of -1 indicates a special case and MUST NOT
819    be used for calculation of a mapping to time.
820
821    In principle, the granule position of an Ogg page divided by the
822    granulerate of this page's logical bitstream provides the time
823    position that is reached in that bitstream after decoding all data
824    packets finished on this page.  However, the granule_position field
825    in an Ogg page allows for a more finely-grained description of the
826    temporal position.  The following image explains the composition of
827    the granule_position field in an Ogg page:
828
829            granule_position
830            ------------------------------------------------
831            |  keyindex               |  keyoffset         |
832            ------------------------------------------------
833
834    The granuleshift field of the skeleton secondary header packets
835    describes how many of the granule_position's 64 bits are being used
836
837
838
839 Pfeiffer & Parker          Expires May 4, 2008                 [Page 15]
840
841 Internet-Draft                  SKELETON                   November 2007
842
843
844    for the keyoffset.  The keyoffset part of the granule_position is
845    commonly used when the logical bitstream consists of packets that can
846    only be fully decoded when referring back to a previous packet.  For
847    example, video streams often consist of inter and intra coded frames,
848    where the intra frames are fully decodable and the inter frames are
849    intermediate frames that require backtracking to the last inter frame
850    for accurate decoding.  Another example is a logical bitstream that
851    is mapped as instantaneous information (i.e. their granuleposition
852    represents the start time and the end time of the packet data), but
853    actually has a duration associated to it, which is provided through a
854    subsequent packet.  CMML is such an example.  The keyindex part of
855    the granule_position is then used to provide the temporal position of
856    the reference packet and the keyoffset part provides a counter for
857    the data in between.
858
859    The calculation of the temporal position of an Ogg page using
860    Skeleton is thus specified through the following formula:
861
862    t_page = basetime + ((keyindex + keyoffset) / granulerate)
863
864    The basetime provides the time offset used at the beginning of the
865    logical bitstream for the first data packet and thus MUST be added
866    for a correct calculation of the temporal position.
867
868    As an example regard an audio bitstream that has a granulerate of
869    44100 (i.e. 44100 samples per 1 sec), a granuleshift of 0, and starts
870    at 4 sec.  When reaching a granule_position of 88200, this maps to a
871    time position of 6 seconds:
872
873    t_page = 4 + ((88200 + 0) / 44100) = 6
874
875    This signifies that the bitstream has reached the second sec of the
876    audio bitstream after the end of decoding this page's packets, but
877    maps to 6 seconds because of the basetime.
878
879    As another example consider a video bitstream that has a granulerate
880    of 25 (i.e. 25 frames per 1 second), a granuleshift of 3 (because it
881    encodes - say - 7 partial frames between each fully encoded frame),
882    and starts at 0 sec.  When reaching a granule_position of 997, i.e. a
883    keyindex of 62 and a keyshift of 5, this maps to a fully decodable
884    time position of 2.68 seconds:
885
886    t_page = 0 + ((62 + 5) / 25) = 2.68 sec
887
888    The granulerate of a time-instantaneous bitstream such as a CMML
889    bitstream can be chosen arbitrarily by the bitstream multiplexer.
890    Per default, a granulerate of 1000 is used, which is the resolution
891    of npt.  The resolution of all the time schemes is given as:
892
893
894
895 Pfeiffer & Parker          Expires May 4, 2008                 [Page 16]
896
897 Internet-Draft                  SKELETON                   November 2007
898
899
900    o  npt: 1000 (milliseconds)
901
902    o  smpte-24: 24 (24 fps)
903
904    o  smpte-24-drop: 24/1.001 = 23.976 (approx. as per SMPTE)
905
906    o  smpte-25: 25
907
908    o  smpte-30: 30
909
910    o  smpte-30-drop: 30/1.001 = 29.970 (approx. as per SMPTE)
911
912    o  smpte-50: 50
913
914    o  smpte-60: 60
915
916    o  smpte-60-drop: 60/1.001 = 59.940 (approx. as per SMPTE)
917
918    The granule position of the page finishing data of a time-
919    instantaneous bitstream packet MUST signify the start time of that
920    packet.  For example, a CMML bitstream with a granulerate of 1000, a
921    basetime of 0, and a clip that lasts from npt=12.020 till npt=15.0
922    will get a granule_position of 12020.  In contrast, the
923    granule_position of the page finishing data of e.g. an audio
924    bitstream with granulerate 44100, basetime 0 and containing data from
925    npt=12.020 to npt=15.0 will be 661500.
926
927    A note about field overflows: an overflow of the granule position
928    field can destroy the temporal integrity of the Ogg physical
929    bitstream.  In this case, a multiplexer MUST end the Ogg physical
930    bitstream and restart a new one resetting the counter to 0 and
931    adjusting the basetime appropriately.  This is also called sequential
932    multiplexing in Ogg. The same measure MUST be taken in case of an
933    overflow of the page_sequence_number on one of the logical
934    bitstreams.
935
936 3.3.  Seeking into the bitstream
937
938    Seeking to a time offset inside an Ogg logical bitstream is a
939    fundamental activity frequently performed on media data.  Time inside
940    an Ogg with a Skeleton track is specified as a temporal offset from
941    the "beginning" of the stream, making use of the basetime field.
942    Time offsets can also be specified as calendar dates and times.  The
943    UTC base is then used as a basis for offsetting.
944
945    The basetime allows to correctly map a temporal offset point such as
946    a temporal URI to a Byte position in the stream.  In the above figure
947    take t_uri=npt:14.0 as the temporal offset addressed on a stream with
948
949
950
951 Pfeiffer & Parker          Expires May 4, 2008                 [Page 17]
952
953 Internet-Draft                  SKELETON                   November 2007
954
955
956    t_0=npt:5.0 as the basetime - this requires a stream offsetting of
957    only 9 sec to the appropriate granule position in each of the
958    bitstreams, in the figure marked through patterned pages.
959
960    The seeking action is performed on the interleaved bitstream, in
961    which the data packets occur in a temporally consecutive order based
962    on the time at which their data ends.  These times are represented in
963    the granule positions of the Ogg pages, which are only allowed to
964    monotonically increase within one logical bitstream.  This implies
965    that when having found an Ogg page with a granule position that maps
966    to a given seek time (i.e. covers the time or ends at it), the seek
967    has found the right location.  This applies over all logical
968    bitstreams.  In the above example, this means that the Byte position
969    of the first occurring page of the patterned pages has been found.
970
971    There is a complication to the seeking: some logical bitstreams have
972    backwards dependencies in their data packets and these have to be
973    taken into account for seeking.  For example, a logical bitstream may
974    require several of its previous packets to allow a correct and
975    complete decoding of the actual packet that occurs at the seektime.
976    This is the case for Theora which requires to go back to the previous
977    keyframe when decoding from a time offset.  It is also the case for
978    Vorbis which requires the previous 2 packets for accurate setup of
979    the frequency transform - Speex needs approximately 2 packets for
980    similar reasons.  Even instantaneous bitstreams such as CMML may
981    require to go back to a previous packet to recover the last state
982    information - the currently active clip in the case of CMML.
983
984    Therefore, once seeking has located the correct Byte position that
985    refers to the given temporal offset, it MUST seek back.  For logical
986    bitstreams that have a non-zero "granuleshift" in the skeleton, it
987    MUST seek back to the Ogg page that has a "keyindex" granule
988    position.  For logical bitstreams that have a non-zero "preroll" in
989    the skeleton, it MUST seek back that many packets.  The earliest Byte
990    position that satisfies all these requirements is the correct seek
991    position.
992
993    A player that presents from an offset MUST take into account that the
994    bitstream may contain some packets that are only there to allow
995    accurate decoding of the seek time.  When the backwards dependencies
996    were resolved for a specific logical bitstream, several non-relevant
997    Ogg pages of may also have ended up in the intermediate.  These have
998    to be skipped by a player.  The time that a player MUST start
999    presenting from is given in the "presentationtime" in the skeleton
1000    ident header.
1001
1002
1003
1004
1005
1006
1007 Pfeiffer & Parker          Expires May 4, 2008                 [Page 18]
1008
1009 Internet-Draft                  SKELETON                   November 2007
1010
1011
1012 3.4.  Remultiplexing an Ogg bitstream using Skeleton
1013
1014    Ogg with a Skeleton track allows for the creation of mashups of a
1015    file without actual decoding and re-encoding.  A mashup in the sense
1016    used here is when a subpart of a Ogg physical bitstream is required,
1017    such as a temporal sub-interval from the whole file.  Skeleton allows
1018    the creation of the mashup bitstream through recomposition and
1019    remultiplexing.  There are several aims for performing the
1020    remultiplexing with as little effort and therefore as little delay as
1021    possible:
1022
1023    o  no decoding of the logical bitstreams is performed.
1024
1025    o  no changes to the pages, in particular to the granule positions
1026       are made.
1027
1028    o  changes occur only to the control section.
1029
1030    The fields of the skeleton track allow achievement of all these aims.
1031    Remultiplexing is essentially achieved by seeking to the position as
1032    described above and then including from each logical bitstream only
1033    the relevant Ogg pages into the new stream.  Changes to fields in the
1034    bitstream are restricted to the control section:
1035
1036    o  the "presentationtime" MUST be adjusted to the requested start
1037       time
1038
1039    o  the "startgranule" for each logical bitstream MUST be adjusted to
1040       the granule position at which each logical bitstream starts.  This
1041       is not the first granule position of the Ogg pages included into
1042       the bitstream, but rather the last one that did not get included,
1043       as it represents the start time of the bitstream.
1044
1045    Everything else, and in particular the Ogg pages, stay the same.
1046    This is important also to allow caching of such files as is required
1047    for Web proxies and described in temporal URI addressing [timedURI].
1048
1049
1050
1051
1052
1053
1054
1055
1056
1057
1058
1059
1060
1061
1062
1063 Pfeiffer & Parker          Expires May 4, 2008                 [Page 19]
1064
1065 Internet-Draft                  SKELETON                   November 2007
1066
1067
1068 4.  Security considerations
1069
1070    Ogg format bitstreams contain several multiplexed binary and non-
1071    binary data bitstream.  There is no generic encryption or signing
1072    mechanism provided for the complete bitstream or anyone of its parts.
1073    As the format of the encapsulated media bitstreams is not prescribed
1074    and is identified through the "Content-type" Message header field in
1075    that bitstream's skeleton secondary header packet, it is possible to
1076    encrypt or sign that media bitstream and then mark it accordingly
1077    with a MIME type that signifies the encryption.  It is up to the
1078    applications that use this bitstream to provide an appropriate codec
1079    to handle such bitstreams.
1080
1081    As Ogg format bitstreams generally contain binary media bitstreams,
1082    it is possible to include executable content in them.  This can be an
1083    issue with applications that decode these bitstreams, especially when
1084    they are used in a network scenario.  Such applications MUST ensure
1085    correct handling of manipulated bitstreams, of buffer overflow and
1086    the like.
1087
1088
1089
1090
1091
1092
1093
1094
1095