RTP Payload Format for sub-codestream latency JPEG 2000 streaming

RTP Payload Format for sub-codestream latency JPEG 2000 streaming Sandflow Consulting LLC

San Mateo CA US pal@sandflow.com

University of New South Wales

Sydney AU d.taubman@unsw.edu.au

Applications and Real-Time Area Audio/Video Transport Core Maintenance JPEG 2000 J2K HTJ2K low latency scalable streaming This RTP payload format defines the streaming of a video signal encoded as a sequence of JPEG 2000 codestreams. The format allows sub-codestream latency, such that the first RTP packet for a given codestream can be emitted before the entire codestream is available.

Introduction The real-time transport protocol (RTP), which is specified in , provides end-to-end network transport functions for transmitting real-time data, but does not define the characteristics of the data itself (the payload), which varies across applications and is defined in companion RTP payload format documents. This RTP payload format specifies the streaming of a video signal encoded as a sequence of JPEG 2000 codestreams (see for a primer on the structure of JPEG 2000 codestreams). In addition to supporting a variety of frame scanning techniques (progressive, interlaced and progressive segmented frame) and image characteristics, the payload format includes the following features specifically designed for streaming applications:

the payload format allows sub-codestream latency such that the first RTP packet of a given codestream to be emitted before the entire codestream is available. Specifically, the payload format does not rely on the JPEG 2000 PLM and PLT marker segments for recovery after RTP Packet loss since these markers can only be written after the codestream is complete and are thus incompatible with sub-codestream latency. Instead, the payload format includes payload header fields (ORDH, ORDB, POS and PID) that indicates whether the RTP packet contains a resynchronization (resync) point and how a recipient can restart codestream processing from that resync point. This contrasts with , which also specifies an RTP payload format for JPEG 2000, but relies on codestream structures that cannot be emitted until the entire codestream is available.
as in , the payload header contains an extension (ESEQ) to the standard 16-bit RTP sequence number, enabling the payload format to accommodate high data rates without ambiguity. This is necessary as the standard sequence number will roll over very quickly for high data rates likely to be encountered in this application. For example, the standard sequence number will roll over in 0.5 seconds with a 1-Gbps video stream with RTP Packet sizes of at least 1000 octets, which can be a problem for detecting loss and out-of-order packets particularly in instances where the round-trip time is greater than the roll over period (0.5 seconds in this example).
the payload header optionally contains a temporal offset (PTSTAMP) relative to the first RTP Packet with the same value of RTP timestamp field (). The higher resolution of PTSTAMP compared to the timestamp allows receivers to recover the sender's clock more rapidly.

Finally, the payload format also makes use of the unique scalability features of JPEG 2000 to allow a network agent or recipient to discard resolutions and/or quality layers merely by inspecting payload headers (QUAL and RES fields), without having to parse the underlying codestream.

Requirements Language The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14 when, and only when, they appear in all capitals, as shown here.

Media format description The following summarizes the structure of the JPEG 2000 codestream, which is specified in detail at . NOTE: as described at , a JPEG 2000 codestream allows capabilities defined in any part of the JPEG 2000 family of standards, including those specified in and . JPEG 2000 represents an image as one or more components, e.g., R, G and B, each uniformly sampled on a common rectangular reference grid. An image can be further divided into contiguous rectangular tiles that are each independently coded and decoded. JPEG 2000 codes each image as a standalone codestream. Each codestream consists of (i) marker segments, which contain coding parameters and metadata, and (ii) coded data. The codestream starts with an SOC marker segment and ends with an EOC marker segment. The main header of the codestream consists of marker segments between the SOC and first SOT marker segment and contains information that applies to the codestream in its entirety. It is generally impossible to decode a codestream without its main header. The rest of the codestream consists of additional marker segments (tile-part headers) interleaved with coded image data. The coded image data ultimately consists of code-blocks, each containing coded samples belonging to a rectangular (spatial) region within one resolution level of one component. Code-blocks are further collected into precincts, which, accordingly, represents code-blocks belonging to a spatial region within one resolution level of one component. The image coded data can be arranged into several progression orders, which dictates which aspect of the image appears first in the codestream (in terms of byte offset). The progression orders are parameterized according to:

Position (P): The first lines of the image come before the last lines of the image.
Component (C): The first component of the image come before the last component of the image.
Resolution Layer (R): The information needed to reconstruct the lower spatial resolutions of the image come before the information needed to reconstruct the higher spatial resolutions of the image.
Quality Layer (L): The information needed to reconstruct the most-significant bits of each sample come before the information needed to reconstruct the least-significant bit of each sample.

For example, in the PRCL progression order, the information needed to reconstruct the first lines of the image come before that needed to reconstruct the last lines of the image and, within a collection of lines, the information needed to reconstruct the lower spatial resolutions of the image come before the information needed to reconstruct the higher spatial resolutions. This progression order is particular useful for sub-frame latency operations.

Video signal description This RTP payload format supports three distinct video frame scanning techniques:

Progressive frame
Interlaced frame, where each frame consists of two fields. Field 1 occurs temporarily before Field 2. The height in lines of each field is half the height of the image.
Progressive segmented frame (PsF), where each frame consists of two segments. Segment 1 contains the odd lines (1, 3, 5, 7,...) of a frame and Segment 2 contains the even lines (2, 4, 6, 8,...) of the same frame, where lines from the top of the frame to the bottom of the frame are numbered sequentially starting at 1.

All frames are scanned left to right, top to bottom.

Payload Format

General

Packetization of a sequence of JPEG 2000 codestreams (not to scale). | | < Extended Header > | | | | +-----+-----+-----+------------//--+-----+-----+--------- | SOC | ... | SOD | .............. | EOC | P | SOC ... +-----+-----+-----+------------//--+-----+-----+--------- | | | | | | +---------------------+------+-//--+-----------+--------- Packets | Main | Body | ... | Body | Main ... +---------------------+------+-//--+-----------+--------- SOC = Start of codestream marker SOD = Start of data marker EOC = End of codestream marker P = (Optional) padding bytes ]]> Each RTP packet, as specified at , is either a Main Packet or a Body Packet. A Main Packet consists of the following ordered sequence of structures concatenated without gaps:

the RTP Fixed Header;
a Main Packet Payload Header, as specified at ; and
the payload, which consists of a JPEG 2000 codestream fragment.

A Body Packet consists of the following ordered sequence of structures concatenated without gaps:

the RTP Fixed Header;
a Body Packet Payload Header, as specified at ; and
the payload, which consists of a JPEG 2000 codestream fragment.

When concatenated, the sequence of JPEG 2000 codestream fragments emitted by the sender MUST be a sequence of JPEG 2000 codestreams where two successive JPEG 2000 codestreams MAY be separated by one or more arbitrary padding bytes (see ). The JPEG 2000 codestreams MUST conform to . The padding bytes MUST be ignored by the recipient. NOTE: Padding bytes can be used to achieve constant bit rate transmission. A JPEG 2000 codestream fragment does not necessarily contain complete JPEG 2000 packets, as defined in . A JPEG 2000 codestream Extended Header consists of the bytes between, and including, the SOC marker and the first SOD marker. The payload of a Body Packet MUST NOT contain any bytes of the JPEG 2000 codestream Extended Header. The payload of a Main Packet MUST contain at least one byte of the JPEG 2000 codestream Extended Header and MAY contain bytes other than those of the JPEG 2000 codestream Extended Header. A payload MUST NOT contain bytes from more than one JPEG 2000 codestream.

RTP Fixed Header Usage The following RTP header fields have a specific meaning in the context of this payload format:

marker

1: The payload contains an EOC marker.
0: Otherwise

timestamp

The timestamp is the presentation time of the image to which the payload belongs. The timestamp clock rate is 90 kHz. The timestamp of successive progressive frames MUST advance at regular increments based on the instantaneous video frame rate. The timestamp of Field 1 of successive interlaced frames MUST advance at regular increments based on the instantaneous video frame rate, and the Timestamp of Field 2 MUST be offset from the timestamp of Field 1 by one half of the instantaneous frame period. The timestamp of both segments of a progressive segmented frame MUST be equal. timestamp of all RTP packets of a given image MUST be equal.

sequence number

The low-order bits of the RTP sequence number. The higher order bits of the RTP sequence number are contained in the ESEQ field, which is specified at . The RTP sequence number is calculated as follows: ESEQ * 65536 + sequence number

Main Packet Payload Header specifies the structure of the payload header. Fields are interpreted as unsigned binary integers in network order.

Structure of the Main Packet Payload Header

MH (Codestream Main Header Presence)

0: The RTP Packet is a Body Packet.
1: The RTP Packet is a Main Packet and the codestream has more than one Main Packet. The next RTP Packet is a Main Packet.
2: The RTP Packet is a Main Packet and the codestream has more than one Main Packet. The next RTP Packet is a Body Packet.
3: The RTP Packet is a Main Packet and the codestream has exactly one Main Packet.

TP (Image Type)

Indicates the scanning structure of the image to which the payload belongs.

0: Progressive frame.
1: Field 1 of an interlaced frame, where the first line of the field is the first line of the frame.
2: Field 2 of an interlaced frame, where the first line of the field is the second line of the frame.
3: Field 1 of an interlaced frame, where the first line of the field is the second line of the frame.
4: Field 2 of an interlaced frame, where the first line of the field is the first line of the frame.
5: Segment 1 of a progressive segmented frame, where the first line of the image is the first line of the frame.
6: Segment 2 of a progressive segmented frame, where the first line of the image is the second line of the frame.
7: Extension value. See and .

ORDH (Progression Order [Main Packet])

Specifies the progression order used by the codestream and whether resync points are signaled.

0: Resync points are not necessarily signaled. The progression order can vary over the codestream.
1: The progression order is LRCP for the entire codestream. The first resync point is specified in every Body Packet that contains one or more resync points.
2: The progression order is RLCP for the entire codestream. The first resync point is specified in every Body Packet that contains one or more resync points.
3: The progression order is RPCL for the entire codestream. The first resync point is specified in every Body Packet that contains one or more resync points.
4: The progression order is PCRL for the entire codestream. The first resync point is specified in every Body Packet that contains one or more resync points.
5: The progression order is CPRL for the entire codestream. The first resync point is specified in every Body Packet that contains one or more resync points.
6: The progression order is PRCL for the entire codestream. The first resync point is specified in every Body Packet that contains one or more resync points.
7: The progression order can vary over the codestream. The first resync point is specified in every Body Packet that contains one or more resync points.

ORDH MUST be 0 if the codestream consists of more than one tile. NOTE: Only ORDH = 4 and ORDH = 6 allow sub-codestream latency streaming. NOTE: Progression order PRCL is defined in . The other progression orders are specified in .

P (Precision Timestamp Presence)

0: PTSTAMP is not used.
1: PTSTAMP is used.

XTRAC (Extension Payload Length)

Length, in multiples of 4 bytes, of the XTRAB field.

PTSTAMP (Precision Timestamp)

PTSTAMP = (timestamp + TOFF) mod 4096, if P = 1 in the Main Packet of this codestream. TOFF is the transmission time of this RTP Packet, in the timebase of the timestamp clock and relative to the first packet with the same timestamp value. TOFF = 0 in the first RTP Packet with the same timestamp value. PTSTAMP = 0, if P = 0 in the Main Packet of this codestream. NOTE: As described at and , PTSTAMP is intended to improve clock recovery at the receiver and only applies when the transmission time of two consecutive RTP packets with identical timestamp fields differ by no more than 45 ms = 4095/90,000. provides addresses the general case when a RTP packet is transmitted at a time other than its nominal transmission time.

ESEQ (Extended Sequence Number)

The high order bits of the RTP sequence number. specifies the low-order bits of the RTP sequence number and the formula to compute the RTP sequence number

R (Codestream Main Header Reuse)

Determines whether Main Packet and codestream header information can be reused across codestreams.

1

All Main Packets in this stream, as identified by its SSRC value:

MUST have identical Main Packet Payload Headers, with the exception of their TP, MH, ESEQ and PTSTAMP fields;
MUST contain the same codestream main header information, with the exception of the SOT and COM marker segments, and any pointer marker segments; and
MUST NOT contain bytes other than Extended Header bytes.

0

Otherwise

S (Parameterized Colorspace Presence)

0

Component colorimetry is not specified, and left to the session or the application. PRIMS, TRANS and MAT and RANGE MUST be zero.

1

Component colorimetry is specified by the PRIMS, TRANS and MAT and RANGE fields. The codestream components MUST conform to one of the combinations at . Mapping of codestream components to color channels

Combination name	Component index
Combination name	0	1	2	3
Y	Y
YA	Y	A
RGB	R	G	B
RGBA	R	G	B	A
YCbCr	Y	C_B	C_R
YCbCrA	Y	C_B	C_R	A
The channel `A` is an opacity channel. The minimum sample value (0) indicates a completely transparent sample, and the maximum sample value (as determined by the bit depth of the codestream component) indicates a completely opaque sample. The opacity channel MUST map to a component with unsigned samples.

C (Code-block Caching Usage)

0: Code-block caching is not in use.
1: Code-block caching is in use. R MUST be equal to 1.

RSVD (Reserved)

Reserved value. See and .

RANGE (Video Full Range Usage)

Value of the VideoFullRangeFlag specified in

PRIMS (Color Primaries)

One of the ColourPrimaries values specified in

TRANS (Transfer Characteristics)

One of the TransferCharacteristics values specified in

MAT (Color Matrix Coefficients)

One of the MatrixCoefficients values specified in

XTRAB (Extension Payload)

Allows the contents of the Main Packet Payload Header to be extended in the future. See and .

Body Packet Payload Header specifies the structure of the Body Packet Payload Header. Fields are interpreted as unsigned binary integers in network order.

Structure of the Body Packet Payload Header

MH

See .

TP

See .

RES (Resolution Layers)

0: The payload can contribute to all resolution layers.
Otherwise: The payload contains at least one byte of one JPEG 2000 packet belonging to resolution level (N_L + RES - 7) but does not contain any byte of any JPEG 2000 packet belonging to lower resolution levels. N_L is the number of decomposition levels of the codestream.

ORDB (Progression Order [Body Packet]

0: No resync point is specified for the payload.
1: The payload contains a resync point.

ORDB MUST be 0 is the codestream consists of more than one tile.

QUAL (Quality Layers)

0: The payload can contribute to all quality layers.
Otherwise: The payload contributes only to quality layer index QUAL or above.

PTSTAMP

See .

ESEQ

See .

POS (Resync Point Offset)

Byte offset from the start of the payload to the first byte of the resync point belonging to the precinct identified by PID. POS MUST be 0 if ORDB = 0.

PID (Precinct Identifier)

Unique identifier of the precinct of the resync point. PID = c + s * num_components where:

c is the index (starting from 0) of the image component to which the precinct belongs;
s is a sequence number which identifies the precinct within its tile-component; and
num_components is the number of components of the codestream.

If PID is present, the payload MUST NOT contain codestream bytes from more than one precinct. PID MUST be 0 if ORDB = 0. NOTE: PID is identical to precinct identifier I specified in .

JPEG 2000 codestream

General A JPEG 2000 codestream consists of the bytes between, and including, the SOC and EOC markers, as defined in . The JPEG 2000 codestream MAY include capabilities beyond those specified at , including those specified in and . NOTE: The Rsiz parameter and CAP marker segments of each JPEG 2000 codestream contain detailed information on the capabilities necessary to decode the codestream. NOTE: The caps media type parameter defined in allows applications to signal required device capabilities. NOTE: The block coder specified at improves throughput and reduces latency compared to the original arithmetic block coder defined in . For interlaced or progressive segmented frames, the height specified in the JPEG 2000 main header MUST be the height in lines of the field or the segment, respectively. If any decomposition level involves only horizontal decomposition then no decomposition level MUST involve only vertical decomposition; and conversely, if any decomposition level involves only vertical decomposition then no decomposition level MUST involve only horizontal decomposition.

Sender requirements

Main Packet Only Main Packets MAY contain bytes of the JPEG 2000 codestream Extended Header. The sender MUST either emit a single Main Packet with MH = 3, or one or more Main Packets with MH = 1 followed by a single Main Packet with MH = 2. The Main Packet Payload Headers fields MUST be identical in all Main Packet of a given codestream, with the exception of:

MH;
ESEQ; and
PTSTAMP.

RTP Packet filtering A network agent MAY strip out RTP Packet from a codestream that are of no interest to a particular client, e.g., based on a resolution or a spatial region of interest.

Resync point A resync point is the first byte of JPEG 2000 packet header data for a precinct and for which PID < 2²⁴. NOTE: Resync points cannot be specified if the codestream consists of more than one tile (ORDB and ORDH are both equal to zero). NOTE: A resync point can be used by a receiver to process a codestream even if earlier packets in the codestream have been corrupted, lost or deliberately discarded by a network agent. As a corollary, resync points can be used by a network agent to discard packets that are not relevant to a given rendering resolution or region of interest. Resync points play a role similar to pointer marker segments, albeit tailored for high bandwidth low latency streaming applications.

PTSTAMP field A sender SHOULD set P = 1, but only if it can generate PTSTAMP accurately. PTSTAMP can be derived from the same clock that is used to produce the 32-bit timestamp field in the RTP fixed header. Specifically, a sender maintains, at least conceptually, a 32-bit counter that is incremented by a 90kHz clock. The counter is sampled at the point in time when each RTP Packet is transmitted and the 12 LSBs of the sample are stored in the PTSTAMP field. If P = 1, then the transmission time TOFF (as defined at ) for two consecutive RTP packets with identical timestamp fields MUST NOT differ by more than 4095.

RES field A sender SHOULD set RES > 0 whenever possible. NOTE: While a sender can always safely set RES = 0, this makes it more difficult to discard packets based on resolution, as described at .

Extra information The sender MUST set the value of XTRAC to 0. Future edition of this specification can permit other values.

Reserved values The sender MUST set reserved values to 0. Future edition of this specification can specify other values such that these values can be ignored by receivers that conform to this specification.

Extension values A sender MUST NOT use an extension value.

Code-block caching This section applies only if C = 1. A sender can improve bandwidth efficiency by only occasionally transmitting code-blocks corresponding to static portions of the video and otherwise transmitting empty code-blocks. When C = 1, and as described at , a receiver maintains a simple cache of previously received code-blocks, which it uses to replace empty code-blocks. A sender alone determines which and when code-blocks are replaced with empty code-blocks. The sender cannot however determine with certainty the state of the receiver's cache: some code-blocks might have been lost in transit, the sender doesn't know exactly when the receiver started processing the stream, etc. A code-block is empty if:

it does not contribute code-bytes as specified in the parent JPEG 2000 packet header; or
if the code-block conforms to , contains an HT cleanup segment and the first two bytes of the Magsgn byte-stream are between 0xFF80 and 0xFF8F.

NOTE: the last condition allows the encoder to insert padding bytes to achieve a constant bit rate even when a code-block does not contribute code-bytes, as suggested at , F.4.

Receiver

PTSTAMP Receivers can use PTSTAMP values to accelerate sender clock recovery since PTSTAMP typically updates more regularly than timestamp.

QUAL A receiver can discard packets where QUAL > N if it is interested in reconstructing an image that only incorporates quality layers N and below.

RES The JPEG 2000 coding process decomposes an image using a sequence of discrete wavelet transforms (DWT) stages. Optional discarding of Body Packets based on the value of the RES field when decoding a reduced resolution image, in the case where N_L = 5 and all DWT stages consist of both horizontal and vertical transforms. The image has nominal width and height of W x H.

Decomposition level	Resolution level	Sub-bands	Keep all Body Packets with RES equal to or less than this value...	... to decode an image with at most these dimensions
1	5	HL1,LH1,HH1	7	W x H
2	4	HL2,LH2,HH2	6	(W/2) x (H/2)
3	3	HL3,LH3,HH3	5	(W/4) x (H/4)
4	2	HL4,LH4,HH4	4	(W/8) x (H/8)
5	1	HL5,LH5,HH5	3	(W/16) x (H/16)
5	0	LL5	2	(W/32) x (H/32)

illustrates the case where each DWT stage consists of both horizontal and vertical transforms, which is the only mode supported in . The first stage transforms the image into (i) the image at half-resolution (LL1 sub-bands) and (ii) residual high-frequency data (HH1, LH1, HL1 sub-bands). The second stage transforms the image at half-resolution (LL1 sub-bands) into the image at quarter resolution (LL2 sub-bands) and residual high-frequency data (HH2, LH2, HL2 sub-bands). This process is repeated N_L times, where N_L is the number of decomposition levels as defined in the COD and COC marker segments of the codestream. The decoding process reconstructs the image by reversing the coding process, starting with the lowest resolution image stored in the codestream (LL_{N_L}). As a result, it is possible to reconstruct a lower resolution of the image by stopping the decoding process at a selected stage. For example, in order to reconstruct the image at quarter resolution (LL2), only sub-bands with index greater than 2, e.g., HL3, LH3, HH3, HL4, LH4, HH4, etc., are necessary. In other words, a receiver that wishes to reconstruct an image at quarter resolution could discard all packets where RES >= 6 since those packets can only contribute to HL1, LH1, HH1, HL2, LH2 and HH2 sub-bands. In the case where all DWT stages consist of both horizontal and vertical transforms, the maximum decodable resolution is reduced by a factor of 2^{7 - N} if all Body Packets where RES > N are discarded. Optional discarding of Body Packets based on the value of the RES field when decoding a reduced resolution image, in the case where N_L = 5 and some DWT stages consist of only horizontal transforms. The image has nominal width and height of W x H.

Decomposition level	Resolution level	Sub-bands	Keep all Body Packets with RES equal to or less than this value...	... to decode an image with at most these dimensions
1	5	HL1,LH1,HH1	7	W x H
2	4	HL2,LH2,HH2	6	(W/2) x (H/2)
3	3	HX3	5	(W/4) x (H/2)
4	2	HX4	4	(W/8) x (H/2)
5	1	HX5	3	(W/16) x (H/2)
5	0	LX5	2	(W/32) x (H/2)

illustrates the case where some of DWT stage consist of only horizontal transforms, as specified at Annex F of . A receiver can therefore discard all Body Packets where RES is greater than some threshold value if it is interested in decoding an image with its resolution reduced by a factor determined by the threshold value, as illustrated in and .

Extra information The receiver MUST accept values XTRAC other than 0 and MUST ignore the value of XTRAB, whose length is given by XTRAC. Future edition of this specification can specify XTRAB contents such that this content can be ignored by receivers that conform to this specification.

Reserved values The receiver MUST ignore the value of reserved values.

Extension values The receiver MUST discard an RTP packet that contains any extension value.

Code-block caching This section applies only if C = 1. When C = 1, and as specified in , the sender can improve bandwidth efficiency by only occasionally transmitting code-blocks corresponding to static portions of the video and otherwise transmitting empty code-blocks, as defined at . When decoding a codestream, and for each code-block in the codestream:

if the code-block in the codestream is empty, the receiver MUST replace it with a matching code-block from the cache, if one exists; or
if the code-block in the codestream is not empty, the receiver MUST replace any matching code-block from the cache with the code-block in the codestream.

Two code-blocks are matching if the following characteristics are identical for both: spatial coordinates, resolution level, component, sub-band and value of the TP field of the parent RTP packet.

Media Type

General This RTP payload format is identified using the media type defined at , which is registered in accordance with and using the template of .

Definition

Type name

video

Subtype name

jpeg2000-scl

Required parameters

None

Optional parameters

pixel: Specifies the pixel format used by the video sequence. The parameter MUST be a URI-reference as specified in . If the parameter is a relative-ref as specified in , then it MUST be equal to one of the pixel formats specified in and the RTP header and payload MUST conform with the characteristics of that pixel format. If the parameter is not a relative-ref, the specification of the pixel format is left to the application that defined the URI. If the parameter is not specified, the pixel format is unspecified.
sample: Specifies the format of the samples in each component of the codestream. The parameter MUST be a URI-reference as specified in . If the parameter is a relative-ref as specified in , then it MUST be equal to one of the formats specified in and the stream MUST conform with the characteristics of that format. If the parameter is not a relative-ref, the specification of the sample format is left to the application that defined the URI. If the parameter is not specified, the sample format is unspecified.
width: Maximum width in pixels of each image. Integer between 0 and 4,294,967,295. The parameter MUST be a sequence of 1 or more digits. If the parameter is not specified, the maximum width is unspecified.
height: Maximum height in pixels of each image. Integer between 0 and 4,294,967,295. The parameter MUST be a sequence of 1 or more digits. If the parameter is not specified, the maximum height is unspecified.
signal: Specifies the sequence of image types. The parameter MUST be a URI-reference as specified in . If the parameter is a relative-ref as specified in , then it MUST be equal to one of the signal formats specified in and the image sequence MUST conform to that signal format. If the parameter is not a relative-ref, the specification of the pixel format is left to the application that defined the URI. If the parameter is not specified, the stream consists of an arbitrary sequence of image types.
caps: The parameters contains a list of sets of constraints to which the stream conforms, with each set of constraints identified using an absolute-URI defined by an application. The parameter MUST conform to the uri-list syntax expressed using ABNF (): uri-list = absolute-URI *(";" absolute-URI) Each absolute-URI MUST NOT contain any ";" character. The application that defines the absolute-URI MUST associate it with a set of constraints to which the stream conforms. Such constraints can, for example, include the maximum height and width of images. If the parameter is not specified, constraints, beyond those specified in this document, are unspecified.
cache: The value of the parameter MUST be either false or true. If the parameter is true, the field C MAY be 0 or 1; otherwise the field C MUST be 0. If the parameter is not specified, then the parameter is equal to false.

Encoding considerations

This media type is framed and binary, see .

Security considerations

See .

Interoperability considerations

The RTP stream is a sequence of JPEG 2000 images. An implementation that conforms to the family of JPEG 2000 standards can decode and attempt to display each image.

Published specification

This document

Applications that use this media type

video streaming and communication

Person and email address to contact for further information

Pierre-Anthony Lemieux <pal@sandflow.com>

Intended usage

COMMON

Restrictions on Usage

This media type depends on RTP framing, and hence is only defined for use with RTP as specified at . Transport within other framing protocols is not defined at the time.

Author

Pierre-Anthony Lemieux

Change controller

IETF Audio/Video Transport Core Maintenance Working Group delegated from the IESG.

Mapping to the Session Description Protocol (SDP) The mapping of the payload format media type and its parameters to SDP, as specified in MUST be done according to .

IANA Considerations This memo requests that IANA registers the content type specified at .

Security considerations RTP packets using the payload format specified in this document are subject to the security considerations discussed in , and in any applicable RTP profile such as , , , . However, as discusses, it is not an RTP payload format's responsibility to discuss or mandate what solutions are used to meet the basic security goals like confidentiality, integrity, and source authenticity for RTP in general. This responsibility lays on anyone using RTP in an application. They can find guidance on available security mechanisms and important considerations in . Applications SHOULD use one or more appropriate strong security mechanisms. The rest of this Security Considerations section discusses the security impacting properties of the payload format itself. This RTP payload format and its media decoder do not exhibit any significant non-uniformity in the receiver-side computational complexity for RTP Packet processing, and thus are unlikely to pose a denial-of-service threat due to the receipt of pathological data. Nor does the RTP payload format contain any active content. Security considerations related to the JPEG 2000 codestream contained in the payload are discussed at .

References Normative References Recommendation ITU-T T.800, JPEG 2000 image coding system: Core coding system ITU-T Recommendation ITU-T T.801, JPEG 2000 image coding system: Extensions ITU-T Recommendation ITU-T T.814, JPEG 2000 image coding system: High-throughput JPEG 2000 ITU-T Recommendation ITU-T H.273, Coding-independent code points for video signal type identification ITU-T JPEG 2000 image coding system: Interactivity tools, APIs and protocols ITU-T Informative References

Pixel formats defines pixel formats. Defined pixel formats

NAME	SAMP	COMPS	TRANS	PRIMS	MAT	VFR	Mapping in
rgb444sdr	4:4:4	RGB	1	1	0	0, 1	RGB
rgb444wcg	4:4:4	RGB	1	9	0	0, 1	RGB
rgb444pq	4:4:4	RGB	16	9	0	0, 1	RGB
rgb444hlg	4:4:4	RGB	18	9	0	0, 1	RGB
ycbcr420sdr	4:2:0	YCbCr	1	1	1	0	YCbCr
ycbcr422sdr	4:2:2	YCbCr	1	1	1	0	YCbCr
ycbcr422wcg	4:2:2	YCbCr	1	9	9	0	YCbCr
ycbcr422pq	4:2:2	YCbCr	16	9	9	0	YCbCr
ycbcr422hlg	4:2:2	YCbCr	18	9	9	0	YCbCr

Each pixel format is characterized by the following:

NAME

Identifies the pixel format

COMPS

RGB: Each codestream contains exactly three components, associated with the R, G and B color channels, in order.
YCbCr: Each codestream contains exactly three components, associated with the Y, C_b and C_r color channels, in order.

SAMP

4:2:0: The C_b and C_r color channels are subsampled horizontally and vertically by 1/2.
4:2:2: The C_b and C_r color channels are subsampled horizontally by 1/2.
4:4:4: No color channels are sub-sampled.

TRANS

Identifies the transfer characteristics allowed by the pixel format, as defined at

PRIMS

Identifies the color primaries allowed by the pixel format, as defined at

MAT

Identifies the matrix coefficients allowed by the pixel format, as defined at

VFR

Allows values of the VideoFullRangeFlag defined at

Signal formats

prog: The stream MUST only consist of a sequence of progressive frames.
psf: Progressive segmented frame (PsF) stream. The stream MUST only consist of an alternating sequence of first segment and second segment.
tff: Interlaced stream. The stream MUST only consist of an alternating sequence of first field and second field, where the first line of the first field is the first line of the frame.
bff: Interlaced stream. The stream MUST only consist of an alternating sequence of first field and second field, where the first line of the first field is the second line of the frame.

Sample formats

8: All components consist of unsigned 8-bit integer samples.
10: All components consist of unsigned 10-bit integer samples.
12: All components consist of unsigned 12-bit integer samples.
16: All components consist of unsigned 16-bit integer samples.

Summary of Changes (Informative)

Introduction This Appendix summarizes substantive changes across revisions of this specification. This summary is informative and not intended to be exhaustive.

Changes from draft-ietf-avtcore-rtp-j2k-scl-00

Allow multi-tile images in a single stream, in addition to allowing multi-tile images to be transmitted as multiple single-tile streams.
Fix incorrect TRANS values.

Changes from draft-ietf-avtcore-rtp-j2k-scl-01

Removed signaling for the transmission of multi-tile images as multiple single-tile image streams (the tile media type parameter).

Changes from draft-ietf-avtcore-rtp-j2k-scl-02

Removed request for registration in the deprecated IANA registry for RTP Payload Format MIME types.