m41993 OMAF Overview 20171210 d06

An Overview of Omnidirectional MediA
Format (OMAF)
Ye-Kui Wang
Director, Technical Standards
December 10, 2017
A tutorial @ VCIP2017
Tutorial material available here: https://goo.gl/zFqqsC
2
Outline
Concepts: OMAF, 360o video, VR
OMAF: when, who, what, and architecture
OMAF coordinate system and key processing steps
Fisheye 360o video/image
Media storage and metadata signalling in ISOBMFF
Media encapsulation and metadata signalling in DASH
Media profiles and presentation profiles
HEVC omnidirectional video SEI messages
Viewport-dependent omnidirectional video processing
Acknowledgements
References
3
What is OMAF?
It is a systems standard
developed by MPEG
that defines a media format
that enables omnidirectional media applications,
focusing on 360° video, images, and audio, as well as
associated timed text.
4
What is 360o video?
Z
Yaw α
is a simple version
Roll γ
of virtual reality (VR) where only Pitch β

X
3 degrees of freedom (3DOF) Y
is supported
The user's viewing perspective is from the center of the sphere looking outward towards the inside surface of the sphere.
Purely translational movement of the user would not result in different omnidirectional media being rendered to the user.
5
What is VR?
VR is an immersive media experience

in which the user can get a sense of really being there
where and when the media was captured.
6
Achieving immersive VR is challenging
Extreme pixel quantity and quality

Screen is very close to the eyes High resolution audio
Up to human hearing capabilities
Spherical view Visual Sound
Look anywhere with a quality quality 3D audio
full 360° spherical view Realistic 3D, positional, surround
Immersion audio that is accurate to the real world
Stereoscopic display
Humans see in 3D
Intuitive
Minimal latency interactions Precise motion tracking
Minimized system latency Accurate on-device motion tracking
to remove perceptible lag
Natural user interfaces

Seamlessly interact with VR using
natural movements, free from wires
7
8
OMAF – when
• Most likely the first industry standard on virtual reality (VR)
• MPEG started looking at VR standardization and started the OMAF project in Oct. 2015
• Technical proposals started in Feb. 2016
• First working draft (WD): Jun. 2016 MPEG meeting output
• Committee draft (CD): Jan. 2017 MPEG meeting output
• Draft international standard (DIS): Apr. 2017 MPEG meeting output (N16824)
• Final draft international standard (FDIS): Oct. 2017 MPEG meeting output (N17235)
• Publication of standard by ISO/IEC: Some time in 2018, as ISO/IEC 23090-2
9
MPEG-I (ISO/IEC 23090): Coded Representation of Immersive Media
Part 1: Architectures and nomenclature for immersive media

Part 2: Omnidirectional media format (OMAF)
Part 3: Coded representation of immersive video
Part 4: Coded representation of immersive audio
Part 5: Coded representation of point clouds
Part 6: Immersive media metrics
Part 7: Immersive media metadata
Part 8: Network-based media processing
10
OMAF – who
• Alpen-Adria-Universität Klagenfurt • IMT Atlantique • ParisTech

• Apple • Intel • Qualcomm
• Audio Research Labs • InterDigital • Samsung
• Bitmovin • Korea University • Shanghai Jiao Tong University
• Canon • LGE • Sharp
• Dolby • LetinVR • Sony
• Ericsson • Nanjing University • Technicolor
• Fraunhofer • Nokia • TNO
• GoPro • MediaTek • University of Aizu
• Huawei • OwlReality • Xiaomi
• ZTE
11
OMAF – what
• Scope: 360o video, images, audio, and associated timed text, 3 DOF only
• Specifies
• A coordinate system
• that consists of a unit sphere and three coordinate axes, namely the x (back-to-front) axis, the y (lateral, side-to-side) axis, and the z (vertical, up) axis
• Projection and rectangular region-wise packing methods

• that may be used for conversion of a spherical video sequence or image into a two-dimensional rectangular video sequence or image, respectively
• The sphere signal is the result of stitching of video signals captured by multiple cameras
• A special case: fisheye video
• Storage of omnidirectional media and the associated metadata using ISOBMFF

• Encapsulation, signalling, and streaming of omnidirectional media in DASH and MMT
• Media profiles and presentation profiles
• that provide interoperable and conformance points for media codecs as well as media coding and encapsulation configurations that may be used for
compression, streaming, and playback of the omnidirectional media content
• Provides some informative viewport-dependent 360o video processing approaches
12
OMAF architecture – projected omnidirectional video
A Ba A: Real-world scene
Acquisition Audio encoding Ea
Image stitching, Ev B: Multiple-sensors-
Video encoding File/segment Fs
Bi rotation, projection, captured video or audio
D encapsulation
and region-wise F
packing Ei
Image encoding
D/D’: Projected/packed
Metadata video
File playback
Delivery E/E’: Coded video or
OMAF player audio bitstream
Loudspeakers/ A’a B’a
Audio rendering Audio decoding F/F’: ISOBMFF
headphones E’a
file/segment
Metadata
E’v File/segment F’
Video decoding
A’i
decapsulation F’s
Display Image rendering D’ E’i
Image decoding
Head/eye
Orientation/viewport metadata tracking Orientation/
viewport metadata
13
OMAF conformance points
Orientation /
metadata
viewport
Other Media
OMAF conformance
File decoder
OMAF
track(s) Video or
File Synchronized and
or Mapping of decoded Rendered
image spatially-aligned
image parser viewport
decoder picture(s) onto sphere presentation
item
14
15
The coordinate system
Projection and region-wise packing
− Concept
− Equirectangular projection
− Cubemap projection
− Region-wise packing and guard band
Basic OMAF video processing steps and order
Frame packing for stereoscopic 360o video/image
Rendering
16
The coordinate system
Consists of a unit sphere

and three coordinate axes
X: back-to-front
Y: lateral, side-to-side
Z: vertical, up
A location on the sphere:

(azimuth, elevation), (, )
The user looks from the

sphere center outward
towards the inside surface of
the sphere
17
Global and local coordinate axes
Local coordinate axes Global coordinate axes
Yaw Z
ɸd (ɸd,Ѳd)
Pitch
Ѳd Y
For a sphere location
with sphere coordinates
X Roll (, ) on the local
coordinate axes:
1. Find the 3D
𝑥1 = cos  cos , 𝑦1 = sin cos , 𝑧1 = sin  Cartesian coordinate
(x1, y1, z1).
𝑥2 cos 𝛽 cos 𝛾 − cos 𝛽 sin 𝛾 sin 𝛽 𝑥1
𝑦2 = cos 𝛼 sin 𝛾 + sin 𝛼 sin 𝛽 cos 𝛾 cos 𝛼 cos 𝛾 − sin 𝛼 sin 𝛽 sin 𝛾 − sin 𝛼 cos 𝛽 𝑦1 2. Apply the 3D
𝑧2 sin 𝛼 sin 𝛾 − cos 𝛼 sin 𝛽 cos 𝛾 sin 𝛼 cos 𝛾 + cos 𝛼 sin 𝛽 sin 𝛾 cos 𝛼 cos 𝛽 𝑧1 Cartesian rotation to
180° obtain the 3D
′ = atan2( y2 , x2 )×
𝜋 Cartesian
180°
′ = sin−1 z2 × coordinates (x2, y2,
𝜋
z2) on the global
Where α, β, and γ are the yaw, pitch, and roll rotation angles, respectively. coordinate axes.
18
3. Convert to the
Projection and region-wise packing: concept
• Projection and region-wise packing are the geometric operational processes used at the
content production side to generate 2D video pictures from the sphere signal
• And their inverse operations used in rendering
• Projection usually goes together with stitching
Monoscopic
Stereoscopic
19
Projection
• Projection is a fundamental processing step in 360o video
• OMAF supports two projection types: equirectangular and cubemap
• Descriptions of more projection types can be found in JVET-H1004
20
Equirectangular projection (ERP)
The ERP projection process is

close to how a world map is
generated, but with the left-hand
side being the east instead of the
west, as the viewing perspective
is opposite.
In ERP, the user looks from the

sphere center outward towards
the inside surface of the sphere.
While for a world map, the user

looks from outside the sphere
towards the outside surface of the
sphere.
21
Cubemap projection (CMP)
Z
PZ Top
NX Back NY Right
Six square faces
Y
3x2 layout
PY Left PX Front
==0
Some faces
NZ Bottom
rotated to
NX Back
increasing 
PZ Top
maximize face
X PX Front
edge continuity
NZ
Bottom
22
Region-wise packing
Region-wise packing is
an optional step after
projection (content
production side).
It enables manipulations
(resize, reposition,
rotation, and mirroring)
of any rectangular region
of the packed picture
before encoding.
A simple example
23
Guard band
Basically allows adding some additional pixels at
geometric boundaries when generating the 2D
pictures for encoding
Can be used to avoid or reduce seam artifacts in

rendered 360 video due to projection or region-wise
packing
Signalled as part of the region-wise packing syntax
24
- content production side
Frame
Stitching Rotation Projection
packing
Region-wise
Encapsulation Encoding
packing
25
- OMAF player side
Region-wise Frame de-

File parsing Decoding packing
de-packing
Inverse of
Rendering Rotation
Projection
26
Frame packing for stereoscopic 360o video/image
• OMAF supports the following three types of frame packing arrangement:
• Side-by-side
• Top-bottom
• Temporal interleaving
• With the following constraints:

• quincunx_sampling_flag shall be equal to 0.
• spatial_flipping_flag shall be equal to 0.
• field_views_flag shall be equal to 0.
• frame0_grid_position_x shall be equal to 0.
• frame0_grid_position_y shall be equal to 0.
• frame1_grid_position_x shall be equal to 0.
• frame1_grid_position_y shall be equal to 0.
27
Side-by-side frame packing arrangement
X X X X X X X X X X X X
Upconversion
processing
X X X X O O O O X X X X X X X X X X X X
X X X X O O O O Samples of colour Upconverted colour
X X X X O O O O component plane of component plane of
Side-by-side constituent frame 0 constituent frame 0
X X X X O O O O
packing
X X X X O O O O rearrangement
X X X X O O O O
X X X X O O O O
X X X X O O O O O O O O O O O O O O O O
Interleaved colour O O O O O O O O O O O O
component plane of O O O O O O O O O O O O
side-by-side packed O O O O O O O O O O O O
decoded frame Upconversion
O O O O processing O O O O O O O O
O O O O O O O O O O O O
Samples of colour Upconverted colour
component plane of component plane of
constituent frame 1 constituent frame 1
H.265v2(14)_FD.4
28
Top-bottom frame packing arrangement
X X X X X X X X
X X X X X X X X X X X X X X X X
X X X X X X X X
Upconversion
processing
X X X X X X X X
X X X X X X X X
X X X X X X X X X X X X X X X X X X X X X X X X
X X X X X X X X Samples of colour Upconverted colour
X X X X X X X X component plane of component plane of
Top-bottom constituent frame 0 constituent frame 0
X X X X X X X X
packing
O O O O O O O O rearrangement
O O O O O O O O
O O O O O O O O
O O O O O O O O O O O O O O O O O O O O O O O O
Interleaved colour O O O O O O O O
component plane of O O O O O O O O O O O O O O O O
top-bottom packed O O O O O O O O
decoded frame Upconversion
O O O O O O O O processing O O O O O O O O
O O O O O O O O
O O O O O O O O O O O O O O O O
O O O O O O O O
Samples of colour Upconverted colour
component plane of component plane of
constituent frame 1 constituent frame 1
H.265v2(14)_FD.7
29
Temporal interleaving frame packing arrangement
2N+2
2N O O O O O O O O
O O O O O O O O
O O O O O O
O O
O O
O O O O O O
O O O O O O O O
Samples of colour
O O O O O O O O component plane of
O O O O O O
O O
O O
O O O O O O constituent frames 0
Sequentially decoded frames with O O O O O O O O
temporal interleaving frame arrangement O O O O O O O O
O O O O O O
O O
O O
O O O O O O
2N+3 O O O O O O O O
O O O O O O O O
O O O O O O O O
2N+2 X X X X X X X X
O O O O O O O O
X X X X X X X X
2N+1 O O O O O
X O
X O
X O
X X X X X
O O O O O
X O
X O
X O
X X X X X Temporal
2N X X X X X
O X
O X
O X
O O O O O interleaving
X X X X X X X X
X X X X X X X X 2N+3
O O O O O
X O
X O
X O
X X X X X decomposition
O O O O O O
X O
X O
X X X X X X
O O O O O
X O
X O
X O
X X X X X 2N+1 X X X X X X X X
O O O O O O
X O
X O
X X X X X X
O O O O O
X O
X O
X O
X X X X X X X X X X X X X
O O O O O O
X O
X O
X X X X X X X X X X X X X X
O O O O O O O O X X X X X X X X Samples of colour
O O O O O O
X O
X O
O O O O O O O O X X X X X X X X component plane of
O O O O O O
X O
X O
X X X X X X X X constituent frames 1
O O O O O O
X O
X O
t X X X X X X X X
O O O O O O O O X X X X X X X X
X X X X X X X X
O O O O O O O O X X X X X X X X
X X X X X X X X
X X X X X X X X
X X X X X X X X
t H.265v2(14)_FD.9
30
Rendering
• The rendering process typically involves generation of a viewport
• Using the rectilinear projection
D
u
v
Y
A
P Z O
C X
• In implementations, the viewport can also be directly generated from the decoded picture
• Where the geometric processing steps like de-packing, inverse of projection, etc. are combined in an optimized
manner
31
32
Fisheye 360o video/image support in OMAF
• Fisheye is a special feature supported in OMAF
• It does not use projection or region-wise packing
• A fisheye picture looks like the right and is directly

encoded (w/o using projection)
• Parameters indicating the characteristics of the

fisheye video/image are signalled and used to
correctly rendering the 360o video/image
• The main advantage is support of low cost, user

generated VR content by mobile terminals
33
OMAF architecture – fisheye omnidirectional video
A Ba A: Real-world scene
Acquisition Audio encoding Ea
Ev B: Multiple-sensors-
Video encoding File/segment Fs
Bi Circular image captured video or audio
D encapsulation
mapping F
Ei
Image encoding
D/D’: Fisheye video
Metadata pictures
File playback
Delivery E/E’: Coded video or
OMAF player audio bitstream
Loudspeakers/ A’a B’a
Audio rendering Audio decoding F/F’: ISOBMFF
headphones E’a
file/segment
Metadata
E’v File/segment F’
Video decoding
A’i
decapsulation F’s
Display Image stitching & E’i
D’
rendering
Image decoding
Head/eye
Orientation/viewport metadata tracking Orientation/
viewport metadata
34
File format basics

OMAF signalling in ISOBMFF
35
File format basics
Why file formats?
ISOBMFF basics
Typical ISOBMFF box hierarchy
An example ISOBMFF file
36
Media file format basics
A video application always needs more than just the video bitstream.
37
Protocol stack of an HTTP adaptive streaming system
ISO Base Media File

Format
38
Why file formats?
A video application always needs more than just the video bitstream.
− Metadata, including timing information etc., to ease content exchange, editing, streaming, playback
operations like seeking, …
Lots of today’s video applications, e.g., all video streaming systems, are based on a file format.
One of the most widely used standard file format is the ISO base media file format (ISOBMFF)
− ISO/IEC 14496-12
Each media codec typically has a codec-specific file format based on ISOBMFF, for carriage of
media coded using that codec in ISOBMFF
ISO/IEC 14496-15 includes
− AVC file format
− SVC file format
− MVC file format
− HEVC file format
− Layered HEVC file format
− File format for HEVC and layered HEVC tiled video
39
ISO base media file format (ISOBMFF) basics
Object-oriented files
Based on the data structure called box
− Box type, flags, version, length, box data
Separate media data and metadata
− Media data are the coded media (video, audio, …) data in access units or samples
− Metadata includes media type, codec, timestamps, sample size and location, random access
indications, …
40
ISOBMFF – typical box hierarchy
41
ISOBMFF – some boxes
42
ISOBMFF – an example file
43
OMAF signalling in ISOBMFF
General rules for signalling of important information
Overall omnidirectional video indication
Signalling of projection format
Signalling of region-wise packing and guard bands
Signalling of rotation
Signalling of frame packing
Signalling of content coverage
Region-wise quality ranking
Signalling of fisheye video parameters
Storage and signalling of omnidirectional images
Storage and signalling of timed text
OMAF timed metadata
44
General rules for signalling of important information
Important video information: information that may be used for content selection,
e.g., selection of a video track or a part thereof for consumption
Important video information should be signalled in a manner that is easily
accessible, including a location that is easily found (e.g., in sample entry) and
easily parsed (e.g., using fixed length coding instead of entropy coding)
All pieces of important video information should be easily aggregated to be
imposed to higher level systems, e.g., to be aggregated and included in the
MIME type parameter, the ‘codecs’ parameter, for easy access and inclusion in a
DASH media presentation description (MPD)
− MPD is also referred to as manifest
45
Overall omnidirectional video indication
To indicate whether the video in a file track is 360o video or traditional video
− This is a piece of important information
− By using a transformed sample entry type (‘resv’) and the restricted scheme type in the sample
entry
− Advantage of this approach
− Legacy file parsers would just ignore a 360o video track instead of trying to request, download, or play it,
which will result in bad user experiences
If 360o video, whether projected video or fisheye video, by the restricted scheme type
− ‘podv’: projected
− If yes, what type of projected video?
− ‘erpv’: Equirectangular projected video, with essentially no region-wise packing, and other constraints
− ‘ercm’: Packed equirectangular or cubemap projected video, with no restriction on rectangular region-wise packing,
and other constraints
− ‘fodv’: fisheye
46
Signalling of projection format
To indicate which projection format is used
− This is also a piece of important information
By using the projection format box (‘prfr’) in the sample entry
The box contains the following structure:
aligned(8) class ProjectionFormatStruct() {
bit(3) reserved = 0;
unsigned int(5) projection_type;
}
projection_type Omnidirectional
projection
0 Equirectangular
projection
1 Cubemap projection 47
To indicate whether region-wise packing and/or guard bands are used
− Another piece of important information
By using the region-wise packing box (‘rwpk’) in the sample entry
− RegionWisePackingStruct()
48
aligned(8) class RectRegionPacking(i) {
unsigned int(32) proj_reg_width[i];
unsigned int(32) proj_reg_height[i];
aligned(8) class RegionWisePackingStruct() { unsigned int(32) proj_reg_top[i];
unsigned int(1) constituent_picture_matching_flag; unsigned int(32) proj_reg_left[i];
bit(7) reserved = 0; unsigned int(3) transform_type[i];
unsigned int(8) num_regions; bit(5) reserved = 0;
unsigned int(32) proj_picture_width; unsigned int(16) packed_reg_width[i];
unsigned int(32) proj_picture_height; unsigned int(16) packed_reg_height[i];
unsigned int(16) packed_picture_width; unsigned int(16) packed_reg_top[i];
unsigned int(16) packed_picture_height; unsigned int(16) packed_reg_left[i];
for (i = 0; i < num_regions; i++) { }
unsigned int(1) guard_band_flag[i];
unsigned int(4) packing_type[i]; aligned(8) class GuardBand(i) {
if (packing_type[i] == 0) { unsigned int(8) left_gb_width[i];
RectRegionPacking(i); unsigned int(8) right_gb_width[i];
if (guard_band_flag[i]) unsigned int(8) top_gb_height[i];
GuardBand(i); unsigned int(8) bottom_gb_height[i];
} unsigned int(1) gb_not_used_for_pred_flag[i];
} for (j = 0; j < 4; j++)
} unsigned int(3) gb_type[i][j];
}
49
Signalling of rotation
To indicate whether and how much rotation is used
By using the rotation box (‘rotn’) in the sample entry
aligned(8) class RotationStruct() {

signed int(32) rotation_yaw;
signed int(32) rotation_pitch;
signed int(32) rotation_roll;
}
50
Signalling of frame packing
To indicate whether frame packing is in use, and if yes, what types of frame packing arrangement
By using the stereo video box (‘stvi’) in the sample entry (existing, defined in ISOBMFF, with new
amendment)
The box has the following syntax:
aligned(8) class StereoVideoBox extends extends FullBox('stvi', version = 0, 0)

{
template unsigned int(30) reserved = 0;
unsigned int(2) single_view_allowed;
unsigned int(32) stereo_scheme;
unsigned int(32) length;
unsigned int(8)[length] stereo_indication_type;
Box[] any_box; // optional
}
51
Signalling of content coverage
To indicate which area(s) on the sphere are covered by the content
− Another piece of important information, which can be used, e.g., by the OMAF player to choose the right
track(s) that cover the viewport the user is looking at.
By using the coverage information box (‘covi’) in the sample entry
aligned(8) class ContentCoverageStruct() { aligned(8) SphereRegionStruct(range_included_flag) {
unsigned int(8) coverage_shape_type; signed int(32) center_azimuth;
unsigned int(8) num_regions; signed int(32) center_elevation;
unsigned int(1) view_idc_presence_flag; singed int(32) center_tilt;
if (view_idc_presence_flag == 0) { if (range_included_flag) {
unsigned int(2) default_view_idc; unsigned int(32) azimuth_range;
bit(5) reserved = 0; unsigned int(32) elevation_range;
} else }
bit(7) reserved = 0; unsigned int(1) interpolate;
for ( i = 0; i < num_regions; i++) { bit(7) reserved = 0;
if (view_idc_presence_flag == 1) { }
unsigned int(2) view_idc[i];
}
SphereRegionStruct(1);
}
} 52
Sphere region shape types
Shape type 0: Shape type 1:

Sphere region specified by four sphere region specified by two azimuth
great circles circles and two elevation circles
53
Signalling of region-wise quality ranking
To indicate relative quality of regions on the sphere or on the 2D picture domain
− Can be used by the OMAF player to choose the track(s)
− That cover the viewport the user is currently looking at, and
− That have the highest relative quality among viewports of the same projected picture
− This is particularly useful in viewport-dependent 360o video streaming based on multiple alternative
representations
− Multiple streams are encoded, within each one particular viewport is of high quality and all other areas are of
lower quality
− In streaming when the user turns head to a different viewport, stream switching occurs
− The goal is to minimize bandwidth and maximize the quality of the viewport being viewed
By using the sphere region quality ranking box (‘srqr’) or the 2D region quality ranking box
(‘2dqr’) in the sample entry
54
Sphere region quality ranking box
aligned(8) class SphereRegionQualityRankingBox extends FullBox('srqr', 0, 0) {
unsigned int(8) region_definition_type;
unsigned int(8) num_regions;
unsigned int(1) remaining_area_flag;
unsigned int(1) view_idc_presence_flag;
unsigned int(1) quality_ranking_local_flag;
unsigned int(4) quality_type;
if (view_idc_presence_flag == 0) {
unsigned int(2) default_view_idc;
}
for (i = 0; i < num_regions; i++) {
unsigned int(8) quality_ranking;
unsigned int(2) view_idc;
}
if (quality_type == 1) {
unsigned int(16) orig_width;
unsigned int(16) orig_height;
}
if ((i < (num_regions − 1)) || (remaining_area_flag == 0))
SphereRegionStruct(1);
}
}
55
2D region quality ranking box
aligned(8) class 2DRegionQualityRankingBox extends FullBox('2dqr', 0, 0) {
unsigned int(8) num_regions;
unsigned int(1) remaining_area_flag;
unsigned int(1) view_idc_presence_flag;
unsigned int(1) quality_ranking_local_flag;
unsigned int(4) quality_type;
unsigned int(2) default_view_idc;
}
for (i = 0; i < num_regions; i++) {
unsigned int(8) quality_ranking;
unsigned int(2) view_idc;
}
if (quality_type == 1) {
unsigned int(16) orig_width;
unsigned int(16) orig_height;
}
if ((i < (num_regions - 1)) || (remaining_area_flag == 0)) {
unsigned int(16) left_offset;
unsigned int(16) top_offset;
unsigned int(16) region_width;
unsigned int(16) region_height;
}
}
} 56
Signalling of fisheye video parameters
To signal fisheye video parameters that can be used to select the desired track(s) and for
rendering
By using the fisheye omnidirectional video box (‘fodv’) in the sample entry
The box contains the following structures:
− Mandatory: FisheyeVideoEssentialInfoStruct(), essential parameters for enabling stitching and rendering at
the OMAF player
− Optional: FisheyeVideoSupplementalInfoStruct(), supplemental parameters for enhanced rendering and
delivery
− They signal
− View dimension information
− Region information of circular images in the coded picture
− Field of view and camera parameters of fisheye lens
− Lens distortion correction (LDC) parameters with local variation of FOV
− Lens shading compensation (LSC) parameters with RGB gains
− Deadzone information
57
Storage and signalling of omnidirectional images
Omnidirectional images are stored in a file as image items per ISO/IEC 23008-
12 (High Efficiency Image File Format, HEIF)
Similar information as for omnidirectional video is stored in item properties
− Projection format item property (‘prfr’)
− Region-wise packing item property (‘rwpk’)
− Rotation item property (‘rotn’)
− Frame packing item property (‘stvi’)
− Essential fisheye image item property (‘fovi’)
− Supplemental fisheye image item property (‘fvsi’)
− Coverage information item property (‘covi’)
− Initial viewing orientation item property (‘iivo’)
58
Timed text is used for providing subtitles and closed captions for omnidirectional
video.
In OMAF, the timed text may be either
− Fixed-positioned: not moving as the user’s viewing orientation moves, or
− Always-visible: always visible to the user irrespective of the user’s viewing orientation
Fixed-positioned timed text should be used for those that are specific to a
particular object, thus when that object is not visible, it is not rendered to the
timed text
Always-visible timed text should be used for those that are global to the entire
omnidirectional video
59
Two timed text format options
− TTML profiles for Internet media subtitles and captions 1.0
(IMSC1)
− WebVTT
A timed text configuration box (‘otcf’) was designed for both
formats. It contains
− The mode: fixed-positioned or always-visible
− Information for determining the position of the rendering plane in
the 3D space
− A sphere location, the line segment between which and the sphere
center is orthogonal to the rendering plane
− Depth of the rendering plane relative to the sphere center
The size of the rectangle on the rendering plane for the
timed text is signalled as part of the IMSC1/WebVTT track
60
OMAF timed metadata
Timed metadata are contained in their own tracks (separate from the media
tracks)
A timed metadata track is linked to media tracks by a 'cdsc' track reference
OMAF includes the designs of three types of timed metadata tracks
− Initial viewing orientation
− Recommended viewport
− Timed text sphere location metadata
They are about sphere regions or a sphere location (a point on the sphere)
They all use the same sample entry syntax and the same base sample syntax
− The syntaxes were designed in a manner that they can be used to efficiently represent both
sphere regions and sphere locations
61
OMAF timed metadata sample entry and sample syntaxes
Sample entry Sample syntax

syntax
class SphereRegionSampleEntry(type) extends aligned(8) SphereRegionSample() {
MetaDataSampleEntry(type) { for (i = 0; i < num_regions; i++)
SphereRegionConfigBox(); // mandatory SphereRegionStruct(dynamic_range_flag)
Box[] other_boxes; // optional }
}
class SphereRegionConfigBox extends aligned(8)
FullBox('rosc', 0, 0) { SphereRegionStruct(range_included_flag) {
unsigned int(8) shape_type; signed int(32) center_azimuth;
bit(7) reserved = 0; signed int(32) center_elevation;
unsigned int(1) dynamic_range_flag; singed int(32) center_tilt;
if (dynamic_range_flag == 0) { if (range_included_flag) {
unsigned int(32) static_azimuth_range; unsigned int(32) azimuth_range;
unsigned int(32) static_elevation_range; unsigned int(32) elevation_range;
} }
unsigned int(8) num_regions; unsigned int(1) interpolate;
} bit(7) reserved = 0;
}
62
Initial viewing orientation
Enables the content author to indicate that a particular viewing orientation is
recommended
− When random accessing from a point in the stream; and
− Whether it is still recommended even when playing the video continuously.
Identified by the sample entry type ‘invo’.

Syntax elements are constrained to identify a sphere point instead of a sphere region.
An OMAF player should use the indicated or inferred center_azimuth, center_elevation,
and center_tilt values as follows:
− If the orientation/viewport metadata of the OMAF player is obtained on the basis of an orientation sensor
included in or attached to a viewing device, the OMAF player should
− obey only the center_azimuth value, and
− ignore the values of center_elevation and center_tilt and use the respective values from the orientation sensor instead.
− Otherwise, the OMAF player should obey all three of center_azimuth, center_elevation, and center_tilt.
63
Recommended viewport
Enables the content author to indicate that a particular viewport trace is
recommended for playback
− Based on the director’s cut for best VR story telling, or
− Based on statistical measures
Note that the recommendation should only be followed when the user does not
have control of the viewing orientation or has released control of the viewing
orientation
− E.g., when the user with an HMD sets the mode, or if the terminal is a TV
The information may also be used for data prefetching to optimize the streaming
performance
64
Recommended viewport
Identified by the sample entry type ‘rcvp’
Sphere region shape type 0
Two specified recommended viewport types
Recommen
ded
Description
viewport
type
A recommended viewport per the director's cut, i.e., a
0 viewport suggested according to the creative intent of
the content author or content provider
A recommended viewport selected based on
1
measurements of viewing statistics
Reserved (for use by future extensions of ISO/IEC
2..239
23090-2)
65
Timed text sphere location metadata
Signals the following information (which is also
signalled in the timed text configuration box) in
a timely dynamic fashion:
− Information for determining the position of the
rendering plane in the 3D space
− A sphere location, the line segment between which and
the sphere center is orthogonal to the rendering plane
− Depth of the rendering plane relative to the sphere
center
This should be used when and only when the

position of the 2D rendering plane for the text
changes dynamically over time
66
Media encapsulation and metadata
signalling in DASH
DASH basics
OMAF DASH delivery architecture and procedure
OMAF signalling in DASH
67
DASH basics
A simple example DASH streaming procedure
Why adaptive streaming over HTTP?
Scalability and cost: leveraging HTTP caches
DASH data model
Example DASH Representation and Segments for ISOBMFF
68
A simple example DASH streaming procedure
1) The client obtains the MPD of a streaming content, e.g., a movie.

• The MPD includes information on different alternative representations, e.g., bit rate, video resolution, frame
rate, audio language, of the streaming content, as well as the URLs of the HTTP resources (the initialization
segment and the media segments).
2) The client requests the desired representation(s), one segment (or a part
thereof) at a time.
• Based on information in the MPD and the client's local information, e.g., network bandwidth, decoding/display
capabilities, and/or user preference.
3) The client requests segments of a different representation with a better-

matching bitrate.
• When the client detects a network bandwidth change.
• Ideally starting from a segment that starts with a random access point.
69
Why adaptive streaming over HTTP?
Basic Approach: Adapt Video to Web rather than Changing the Web
Streaming realized by continuous Short Downloads

− Downloads in small chunks to minimize bandwidth waste
− Enables monitoring consumption and tracking clients
Adaptation to Dynamic Conditions and Device Capabilities
− Adapts to dynamic conditions anywhere on the path through the Internet or home network
− Adapts to display resolution, CPU and memory resources of the client
− Facilitates “any device, anywhere, anytime” paradigm
Improved Quality of Experience
− Enables faster start-up and seeking (compared to progressive download)
− Reduces and may eliminate rebuffering, skips, freezes and stutters
Use of HTTP
− Well-understood naming/addressing approach
− Provides easy traversal for all kinds of middleboxes (e.g., NATs, firewalls)
− Enables cloud access, leverages existing HTTP caching infrastructure
− Enables client-driven deployments
− Enables reuse of existing web technologies: authentication, authorization, etc.
70
Scalability and cost: leveraging HTTP caches
Time
71
DASH Data Model
Provide information to a client, where and when to find the data that composes A/V
experience  MPD
Provide the ability to offer a service on the cloud and HTTP-CDNs  HTTP-URLs and
MIME Types
Provide service provider the ability to combine/splice content with different properties into
a single media presentation  Periods
Provide service provider to enable the client/user selection of media content components
based on user preferences, user interaction device profiles and capabilities, using
conditions or other metadata  Adaptation Sets
Provide ability to provide the same content with different encodings (bitrate, resolution,
codecs)  Representations
Provide extensible syntax and semantics for describing Representation and Adaptation
Set properties  Descriptors
Provide ability to access content in small pieces and do proper scheduling of access 
Segments and Subsegments
Provide ability for efficient signaling and deployment optimized addressing  Playlist,
Templates, Segment Index
72
DASH Data Model
MPD Segment Access
Period id = 2 Adaptation Set 1
BaseURL=http://abr.rocks.com/
Period id = 1 start = 100 s Initialization Segment
start = 0 s Representation 3 http://abr.rocks.com/3/0.mp4
Adaptation Set 0 Representation 1
Rate = 500 Kbps Rate = 2 Mbps
subtitle turkish Resolution = 720p
Period id = 2 Media Segment 1
start = 100 s Representation 2 start = 0 s
http://abr.rocks.com/3/1.mp4
Adaptation Set 1 Rate = 1 Mbps
Segment Info
video Duration = 10 s
Period id = 3 Representation 3 Media Segment 2
start = 300 s Rate = 2 Mbps Template: start = 10 s
3/$Number$.mp4
Adaptation Set 2 http://abr.rocks.com/3/2.mp4
audio english Representation 4

Period id = 4 Rate = 3 Mbps
start = 850 s Adaptation Set 3
audio german
Media Delivery
Selecting/Switching Well-defined
Splicing of arbitrary Selection of Format, chunks
of Representation media format, i.e.
content, e.g. ad Components/Tracks with unique
based on ISO BMFF or MPEG-
insertion based on properties addresses +
bandwidth, etc. 2 TS
associated timing
73
Example DASH Representation and Segments for ISOBMFF
ftyp moov moof mdat moof mdat moof mdat moof mdat
Initialization Media
Media Segment
Segment Segment
Representation
74
OMAF DASH delivery architecture and procedure
DASH delivery
Fs/F’s: Initialization and media segments
Fs Fs G: MPD
DASH MPD
Server
generation G − It additionally includes OMAF-specific metadata,
H such as information on projection and region-wise
packing
Basic OMAF DASH streaming procedure
1) The client gets the MPD.
2) The client obtains the current viewing orientation
H’
and gets the estimated bandwidth.
3) The client chooses the Adaptation Set(s) and the
DASH MPD Representation(s), and requests the
& segment (Sub)Segments to match the client’s capabilities,
reception incl. OMAF-specific capabilities, and to maximize
F’s the quality, under the network bandwidth
constraints, for the current viewing orientation.
Head/eye 4) Repeat steps 2 and 3.
tracking Orientation/
viewport metadata 75
OMAF signalling in DASH
OMAF-specific information carried on file format level and needed for content selection
is also carried in the MPD
− Thus either the same or a subset of that in file format
By using the newly defined OMAF-specific DASH MPD descriptors
− All under the URN "urn:mpeg:mpegI:omaf:2017“
− Projection format (PF) descriptor
− Region-wise packing (RWPK) descriptor
− Content coverage (CC) descriptor
− Spherical region-wise quality ranking (SRQR) descriptor
− 2D region-wise quality ranking (2DQR) descriptor
− Fisheye omnidirectional video (FOMV) descriptor
The frame packing information is signalled using the existing DASH FramePacking
element.
76
77
Media profiles
A media profile for timed media is defined as requirements and constraints for a set of
one or more ISOBMFF tracks of a single media type.
The conformance of a set of one or more ISOBMFF tracks to a media profile is specified
as a combination of:
− Specification of which sample entry type(s) are allowed, and which constraints and extensions are
required in addition to those imposed by the sample entry type(s).
− Constraints on the samples of the tracks, typically expressed as constraints on the elementary
stream contained within the samples of the tracks.
A media profile for static media is defined as requirements and constraints for a set of
one or more ISOBMFF items of a single media type.
The conformance of a set of one or more ISOBMFF items to a media profile is specified
as a combination of:
− Specification of which item type(s) are allowed, and which constraints and extensions are required
in addition to those imposed by the item type(s).
− Constraints on the content of the items, typically expressed as constraints on the elementary
stream contained within the items.
78
Presentation profiles
A presentation profile is defined as requirements and constraints for an
ISOBMFF file containing tracks or items of any number of media types.
A specification of a presentation profile should refer to the specified media
profiles and may include additional requirements or constraints.
A file conforming to a presentation profile typically provides an omnidirectional
audio-visual experience.
79
OMAF specifies 9 media profiles
3 video profiles
− HEVC-based viewport-independent OMAF video profile
− HEVC-based viewport-dependent OMAF video profile
− AVC-based viewport-dependent OMAF video profile
2 audio profile
− OMAF 3D audio baseline profile
− OMAF 2D audio legacy profile
2 image profiles
− OMAF HEVC image profile
− OMAF legacy image profile
2 timed text profiles
− OMAF IMSC1 timed text profile
− OMAF WebVTT timed text profile
80
OMAF video media profiles
Media Profile Codec Profile Level Required Scheme Brand
Types
HEVC-based viewport- HEVC Main 10 5.1 podv and erpv
hevi
independent OMAF video profile
HEVC-based viewport- HEVC Main 10 5.1 podv and at least one hevd
dependent OMAF video profile of erpv and ercm
AVC-based viewport-dependent AVC Progressive 5.1 podv and at least one avde
OMAF video profile High of erpv and ercm
Note that HEVC Level 5.1 supports up to 3840x2160 @ 64 fps and 4096x2160 @ 60 fps, and the
Main 10 profile does not exclude support of HDR/WCG video.
Key differences between the two HEVC-based OMAF video media profiles:
1) The viewport-dependent profile supports unconstrained region-wise packing while the other does not.
2) The viewport-dependent profile supports file format extractors to get a conforming HEVC bitstream
when tile-based streaming is used (while the other does not).
81
OMAF audio media profiles
Max
3D
Media Profile Codec Profile Level Sampling Brand
Metadata
Rate
Low
OMAF 3D audio MPEG-H 1, 2 or included in
Complexit 48 kHz oabl
baseline profile Audio 3 codec
y
OMAF 2D audio HE-AACv no 3D
AAC 4 48 kHz oa2d
legacy profile 2 metadata
82
OMAF image media profiles
Media Profile Codec Profile Level Brand

OMAF HEVC image
HEVC Main 10 5.1 heoi
profile
OMAF legacy image Not
JPEG Not applicable jpoi
profile applicable
83
OMAF timed text media profiles
Media Profile Codec Profile Brand

OMAF IMSC1 timed text IMSC1 Text Profile or Image ttml
profile Profile
OMAF WebVTT timed text WebVT n\a ttwv
profile T
84
OMAF specifies 2 presentation profiles
OMAF viewport-independent baseline presentation profile
− File band: ‘ovdp’
− Video: At least one video track shall conform to the HEVC-based viewport-independent
OMAF video profile
− Audio: At least one audio track shall conform to the OMAF 3D audio baseline profile
OMAF viewport-dependent baseline presentation profile
− File band: ‘ompp’
− Video: At least one video track shall conform to the HEVC-based viewport-dependent OMAF
video profile
− Audio: At least one audio track shall conform to the OMAF 3D audio baseline profile
85
86
Both of the OMAF HEVC-based video media profiles mandate the presence of SEI
messages for signalling of projection, region-wise packing, etc.
This is to enable OMAF player implementations that rely on elementary-stream level
signalling for rendering of omnidirectional video.
The following omnidirectional video SEI messages have been recently specified for
HEVC (see JCTVC-AC1005):
− Equirectangular projection SEI message
− Cubemap projection SEI message:
− Sphere rotation SEI message
− Region-wise packing SEI message
− Omnidirectional viewport SEI message
These SEI messages and the corresponding OMAF signalling are basically aligned with
each other.
These SEI messages are expected to be ported to the AVC/H.264 specification soon.
87
Equirectangular projection SEI message
equirectangular_projection( payloadSize ) { Descriptor

erp_cancel_flag u(1)
if( !erp_cancel_flag )
erp_persistence_flag u(1)
erp_padding_flag u(1)
erp_reserved_zero_2bits u(2)
if( erp_padding_flag = = 1 ) {
gp_erp_type u(3)
left_gb_erp_width u(8)
right_gb_erp_width u(8)
}
}
}
88
Cubemap projection SEI message
cubemap_projection( payloadSize ) { Descriptor

cmp_cancel_flag u(1)
if( !cmp_cancel_flag )
cmp_persistence_flag u(1)
}
89
Sphere rotation SEI message
sphere_rotation( payloadSize ) { Descriptor

sphere_rotation_cancel_flag u(1)
if( !sphere_rotation_cancel_flag ) {
sphere_rotation_persistence_flag u(1)
sphere_rotation_reserved_zero_6bits u(6)
yaw_rotation i(32)
pitch_rotation i(32)
roll_rotation i(32)
}
}
90
Region-wise packing SEI message
regionwise_packing( payloadSize ) { Descriptor
rwp_cancel_flag u(1)
if( !rwp_cancel_flag ) {
rwp_persistence_flag u(1)
constituent_picture_matching_flag u(1)
rwp_reserved_zero_5bits u(5)
num_packed_regions u(8)
proj_picture_width u(32)
proj_picture_height u(32)
packed_picture_width u(16)
packed_picture_height u(16)
for( i = 0; i < num_packed_regions; i++ ) {
rwp_reserved_zero_4bits[ i ] u(4)
transform_type[ i ] u(3)
guard_band_flag[ i ] u(1)
proj_region_width[ i ] u(32)
proj_region_height[ i ] u(32)
proj_region_top[ i ] u(32)
proj_region_left[ i ] u(32)
packed_region_width[ i ] u(16)
packed_region_height[ i ] u(16)
packed_region_top[ i ] u(16)
packed_region_left[ i ] u(16)
if( guard_band_flag[ i ] ) {
left_gb_width[ i ] u(8)
right_gb_width[ i ] u(8)
top_gb_height[ i ] u(8)
bottom_gb_height[ i ] u(8)
gb_not_used_for_pred_flag[ i ] u(1)
for( j = 0; j < 4; j++ )
gb_type[ i ][ j ] u(3)
rwp_gb_reserved_zero_3bits[ i ] u(3)
}
}
} 91
}
Omnidirectional viewport SEI message
omni_viewport( payloadSize ) { Descriptor
omni_viewport_id u(10)
omni_viewport_cancel_flag u(1)
if( !omni_viewport_cancel_flag ) {
omni_viewport_persistence_flag u(1)
omni_viewport_cnt_minus1 u(4)
for( i = 0; i <= omni_viewport_cnt_minus1; i++ ) {
omni_viewport_azimuth_centre[ i ] i(32)
omni_viewport_elevation_centre[ i ] i(32)
omni_viewport_tilt_centre[ i ] i(32)
omni_viewport_hor_range[ i ] u(32)
omni_viewport_ver_range[ i ] u(32)
}
}
}
92
93
Multiple approaches documented in the informative Annex D of OMAF
− To tackle the bandwidth and processing complexity challenges
− To utilize the fact that only a part of entire encoded sphere video is rendered at any moment
− The approaches are enabled by the video media profiles
Viewport-dependent omnidirectional video processing approaches
(Some of them are not documented in the final version of the OMAF specification, and some of the approached documented in the final
version of the OMAF specification are not included here)
− The conventional approach
− Region-wise quality ranked encoding of omnidirectional content
− Merging of HEVC MCTS-based sub-picture tracks of the same resolution
− Merging of HEVC MCTS-based sub-picture tracks of different resolutions with multiple decoders
− Merging of HEVC MCTS-based sub-picture tracks of different resolutions with one decoder
− SHVC with MCTS-based enhancement layer
− Simulcast with MCTS-based HEVC high-resolution representation
94
360o video encoding and decoding – conventional
Each picture is in a format after a particular type

of projection, e.g., equi-rectangular or cube-map
The video sequence is coded as a single-layer

bitstream, with TIP used
Temporal inter prediction (TIP) The entire bitstream is transmitted and decoded
95
Region-wise quality ranked encoding of omnidirectional content
Multiple coded single-layer bitstreams are stored at a server in different tracks.
Each bitstream contains the whole omnidirectional video.
Each bitstream has a different high quality encoded region.
HQ HQ HQ HQ HQ
The high-quality (HQ) region is indicated by region-wise quality ranking metadata.

Depending on the current viewport, the track that contains a HQ encoded region
matching the current viewport is selecte and transmitted to the OMAF player.
96
Merging of HEVC MCTS-based sub-picture tracks of the same resolution
Two HEVC bitstreams, of

the same source content, at
different picture qualities
Each MCTS sequence
included in one sub-picture
track
An extractor track also
created, to reconstruct a
bitstream for a single HEVC
decoder
The OMAF player chooses
sub-picture tracks with
higher quality for the user’s
current viewport
97
Merging of HEVC MCTS-based sub-picture tracks of different resolutions with >1 decoders
Two HEVC bitstreams

different resolutions
Two HEVC decoders used
Other aspects same as in
the previous slide
98
Merging of HEVC MCTS-based sub-picture tracks of different resolutions with one decoder
Two HEVC bitstreams

different resolutions
Each MCTS sequence
included in one sub-
picture track
The extraction process
rearrange tiles and
merges some tiles
Only one HEVC decoder
used
Region-wise packing is
used to recover the tile
rearrangement for
rendering
99
SHVC with MCTS-based enhancement layer
… The video sequence is coded as a multi-layer

scalable bitstream using SHVC, with both TIP and
Encoding ILP used
and
storage Each Enhancement Layer (EL) is split into motion-
constrained tiles
… The Base Layer (BL) does not need to be split into
independent areas
The BL can have less frequent random access points
… (RAPs) than an EL
Transmission
and
Only part of an EL needs to be transmitted and
decoding decoded
…
Compared to the conventional scheme:
- Lower decoding complexity (or higher resolution under the same
complexity)
Temporal inter prediction (TIP)
Inter-layer prediction (ILP)
- Lower transmission bandwidth
- Higher encoding and storage costs
100
Simulcast with MCTS-based HEVC high-resolution representation
… The video sequence is coded as multiple

Encoding independent/simulcast bitstreams of different
and resolutions, with TIP but without ILP used
storage
Each higher resolutions is split into motion-
… constrained tiles
The lowest resolution stream can have less RAPs
than a stream of a higher resolution
… Only part of a higher resolution stream needs to be

transmitted and decoded
Transmission
and
decoding
… - Compared to the previous approach:
- Higher transmission bandwidth (due to no use of ILP)
Temporal inter prediction (TIP)

Inter-layer prediction (ILP)
101
Acknowledgements
Thanks to the MPEG OMAF ad-hoc group members and others who contributed
to the development of the OMAF standard.
Thanks to Miska Hannuksela and Sachin Deshpande for having helped me chair
reviewing and making decisions for a number of OMAF proposals when needed
(particularly those from myself or my company).
Thanks to Miska Hannuksela, Thomas Stockhammer, Byeongdoo Choi, Sachin
Deshpande, Yago Sanchez, Robert Skupin, Alexandre Gabriel, Imed Bouazizi,
Cyril Concolato, et al., who drafted some of the figures and/or slides that I used
as is or with modifications in this slide deck.
102
References
ISO/IEC 23090-2: Information technology — Coded representation of immersive media
(MPEG-I) — Part 2: Omnidirectional media format
− The finalized FDIS text will be included in MPEG output document N17235.
− Some latest working-in-progress draft versions of the FDIS text are included in MPEG input
document m41922.
ISO/IEC 14496-12, Information technology — Coding of audio-visual objects — Part 12:
ISO base media file format
ISO/IEC 14496-15, Information technology — Coding of audio-visual objects — Part 15:
Carriage of network abstraction layer (NAL) unit structured video in the ISO base media
file format
ISO/IEC 23009-1, Information technology — Dynamic adaptive streaming over HTTP
(DASH) — Part 1: Media presentation description and segment formats
JVET-H1004, Algorithm descriptions of projection format conversion and video quality
metrics in 360Lib version 5.
JCTVC-AC1005, HEVC additional supplemental enhancement information (draft 4)
103
Outline
Concepts: OMAF, 360o video, VR
Media encapsulation and metadata signalling in DASH
Acknowledgements
References
104
Thank you
Follow us on:
For more information on Qualcomm, visit us at:
www.qualcomm.com & www.qualcomm.com/blog
105

m41993 OMAF Overview 20171210 d06

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

m41993 OMAF Overview 20171210 d06

Uploaded by

Copyright:

Available Formats

An Overview of Omnidirectional MediA

of virtual reality (VR) where only Pitch β

3 degrees of freedom (3DOF) Y

VR is an immersive media experience

Extreme pixel quantity and quality

Natural user interfaces

• Most likely the first industry standard on virtual reality (VR)

• Technical proposals started in Feb. 2016

• First working draft (WD): Jun. 2016 MPEG meeting output

• Committee draft (CD): Jan. 2017 MPEG meeting output

• Publication of standard by ISO/IEC: Some time in 2018, as ISO/IEC 23090-2

Part 1: Architectures and nomenclature for immersive media

• Alpen-Adria-Universität Klagenfurt • IMT Atlantique • ParisTech

• Projection and rectangular region-wise packing methods

• Storage of omnidirectional media and the associated metadata using ISOBMFF

• Provides some informative viewport-dependent 360o video processing approaches

Consists of a unit sphere

A location on the sphere:

The user looks from the

• Projection usually goes together with stitching

The ERP projection process is

In ERP, the user looks from the

While for a world map, the user

Can be used to avoid or reduce seam artifacts in

Region-wise Frame de-

• With the following constraints:

• A fisheye picture looks like the right and is directly

• Parameters indicating the characteristics of the

• The main advantage is support of low cost, user

File format basics

ISO Base Media File

aligned(8) class RotationStruct() {

aligned(8) class StereoVideoBox extends extends FullBox('stvi', version = 0, 0)

Shape type 0: Shape type 1:

Sample entry Sample syntax

Identified by the sample entry type ‘invo’.

This should be used when and only when the

1) The client obtains the MPD of a streaming content, e.g., a movie.

3) The client requests segments of a different representation with a better-

Streaming realized by continuous Short Downloads

audio english Representation 4

Media Profile Codec Profile Level Brand

Media Profile Codec Profile Brand

equirectangular_projection( payloadSize ) { Descriptor

cubemap_projection( payloadSize ) { Descriptor

sphere_rotation( payloadSize ) { Descriptor

Each picture is in a format after a particular type

The video sequence is coded as a single-layer

The high-quality (HQ) region is indicated by region-wise quality ranking metadata.

Two HEVC bitstreams, of

Two HEVC bitstreams

Two HEVC bitstreams

… The video sequence is coded as a multi-layer

… The video sequence is coded as multiple

… Only part of a higher resolution stream needs to be

Temporal inter prediction (TIP)

You might also like