<?xml version='1.0' encoding='utf-8'?>
<!DOCTYPE rfc [
  <!ENTITY nbsp    "&#160;">
  <!ENTITY zwsp   "&#8203;">
  <!ENTITY nbhy   "&#8209;">
  <!ENTITY wj     "&#8288;">
]>
<?xml-stylesheet type="text/xsl" href="rfc2629.xslt" ?>
<!-- generated by https://github.com/cabo/kramdown-rfc version 1.7.29 (Ruby 3.4.4) -->
<rfc xmlns:xi="http://www.w3.org/2001/XInclude" ipr="trust200902" docName="draft-mahy-mimi-av-metadata-00" category="info" consensus="true" submissionType="IETF" tocInclude="true" sortRefs="true" symRefs="true" version="3">
  <!-- xml2rfc v2v3 conversion 3.29.0 -->
  <front>
    <title abbrev="MIMI Content AV Metadata">Audio, Video, and Image Metadata extensions for the More Instant Messaging Interoperability (MIMI) Content format</title>
    <seriesInfo name="Internet-Draft" value="draft-mahy-mimi-av-metadata-00"/>
    <author fullname="Rohan Mahy">
      <organization/>
      <address>
        <email>rohan.mahy@gmail.com</email>
      </address>
    </author>
    <date year="2025" month="July" day="04"/>
    <area>Applications and Real-Time</area>
    <workgroup>More Instant Messaging Interoperability</workgroup>
    <keyword>mimi content</keyword>
    <keyword>image metadata</keyword>
    <keyword>audio metadata</keyword>
    <keyword>video metadata</keyword>
    <abstract>
      <?line 38?>

<t>The More Instant Messaging Interoperability (MIMI) content format is a container for rich content, which can reference image, video, and audio files.
This document describes metadata for these files to allow for more pleasant rendering.</t>
    </abstract>
    <note removeInRFC="true">
      <name>About This Document</name>
      <t>
        The latest revision of this draft can be found at <eref target="https://rohanmahy.github.io/mimi-av-metadata/draft-mahy-mimi-av-metadata.html"/>.
        Status information for this document may be found at <eref target="https://datatracker.ietf.org/doc/draft-mahy-mimi-av-metadata/"/>.
      </t>
      <t>
        Discussion of this document takes place on the
        More Instant Messaging Interoperability Working Group mailing list (<eref target="mailto:mimi@ietf.org"/>),
        which is archived at <eref target="https://mailarchive.ietf.org/arch/browse/mimi/"/>.
        Subscribe at <eref target="https://www.ietf.org/mailman/listinfo/mimi/"/>.
      </t>
      <t>Source for this draft and an issue tracker can be found at
        <eref target="https://github.com/rohanmahy/mimi-av-metadata"/>.</t>
    </note>
  </front>
  <middle>
    <?line 43?>

<section anchor="introduction">
      <name>Introduction</name>
      <t>The MIMI content format <xref target="I-D.ietf-mimi-content"/> can convey a variety of media types, as either inline or referenced external content.
In messaging applications it is common to display audio, video, and static image content, collectively audio/video (AV).
The layout for messaging applications often reserves a placeholder for the AV content.
While it is common for static images to be immediately displayed, audio and video content is often not immediately downloaded and rendered.
Even if image data is downloaded immediately, if there is a network or server delay there can be time when the aspect ratio or dimensions of the image are not yet know.
It is therefore useful to have some rendering hints about the media for more pleasant rendering.
This document defines extensions to the MIMI content format to provide these hints.</t>
    </section>
    <section anchor="conventions-and-definitions">
      <name>Conventions and Definitions</name>
      <t>The key words "<bcp14>MUST</bcp14>", "<bcp14>MUST NOT</bcp14>", "<bcp14>REQUIRED</bcp14>", "<bcp14>SHALL</bcp14>", "<bcp14>SHALL
NOT</bcp14>", "<bcp14>SHOULD</bcp14>", "<bcp14>SHOULD NOT</bcp14>", "<bcp14>RECOMMENDED</bcp14>", "<bcp14>NOT RECOMMENDED</bcp14>",
"<bcp14>MAY</bcp14>", and "<bcp14>OPTIONAL</bcp14>" in this document are to be interpreted as
described in BCP 14 <xref target="RFC2119"/> <xref target="RFC8174"/> when, and only when, they
appear in all capitals, as shown here.</t>
      <?line -18?>

<t>This document uses a variety of terms from the MIMI content format definition, especially <tt>NestedPart</tt>, <tt>SinglePart</tt>, <tt>ExternalPart</tt>, and <tt>MultiPart</tt>.</t>
    </section>
    <section anchor="av-metadata-extensions">
      <name>AV Metadata Extensions</name>
      <t>The AV Metadata MIMI content extension is an array of AV metadata entries.
Each AV metadata entry is a CBOR map of AV metadata properties, all of which are optional except the <tt>part_index</tt> and <tt>type</tt>. The semantics of the individual property fields is as follows:</t>
      <ul spacing="normal">
        <li>
          <t><tt>part_index</tt>: refers to the order of MIMI parts from the relevant part inside the NestedPart structure in a MIMI content message. It can refer to a <tt>SinglePart</tt> or <tt>ExternalPart</tt>.</t>
        </li>
        <li>
          <t><tt>type</tt>: an integer enumeration representing the media type (not-including the subtype). audio is 1, image is 2, and video is 3. An extension socket is defined, although its need is not anticipated.</t>
        </li>
        <li>
          <t><tt>width</tt>: the width of the image or video in pixels</t>
        </li>
        <li>
          <t><tt>height</tt>: the height of the image or video in pixels</t>
        </li>
        <li>
          <t><tt>duration</tt>: the duration of the audio or video in seconds. It can be expressed as an unsigned integer or a positive floating point number</t>
        </li>
        <li>
          <t><tt>preview_index</tt>: for a video part, the <tt>partIndex</tt> of another related part that represents its image preview. It can refer to a <tt>SinglePart</tt> or <tt>ExternalPart</tt>, or a <tt>MultiPart</tt> with <tt>chooseOne</tt> <tt>partSemantics</tt> which contains only <tt>SinglePart</tt> or <tt>ExternalPart</tt> types, all of which must be an image with a disposition value of <tt>preview</tt>.</t>
        </li>
        <li>
          <t><tt>accessibility_text</tt>: this text could be rendered instead of the audio, image, or video when various accessiblity settings are enabled, or during no or slow network access when a cached or preview image is not available.</t>
        </li>
        <li>
          <t><tt>rotation</tt>: one of four values: 0, 90, 180, or 270. This integer refers to the number of degrees of clockwise rotation (in 90 degree increments) needed to correctly view the image.</t>
        </li>
      </ul>
      <ul empty="true">
        <li>
          <t>Note that an orientation field is not necessary. Any image and video with a <tt>width</tt> field which is larger than its <tt>height</tt> is assumed to have a landscape mode orientation, while one with a <tt>height</tt> larger than its width is assumed to have a portrait mode orientation.</t>
        </li>
      </ul>
      <t>The following snippet of Concise Data Definition Language (CDDL) <xref target="RFC8160"/>  is used to formally define the structure of the extension.</t>
      <sourcecode type="cddl"><![CDATA[
av_metadata_array = (
    "av_metadata" : [ * metadata_entry ]
)

metadata_entry = {
    &(part_index: 1) : uint16,
    &(type: 2)       : audio / image / video / $ext_media,
    ? &(width: 3)              : uint,
    ? &(height: 4)             : uint,
    ? &(duration: 5)           : nonnegative_number,
    ? &(preview_index: 6)      : uint16,
    ? &(accessibility_text: 7) : tstr,
    ? &(rotation: 8)           : 0 / 90 / 180 / 270
    $ext_av_metadata
}

nonnegative_number = uint / float .gt 0.0
uint16 = uint .size 2

audio = 1
image = 2
video = 3
]]></sourcecode>
    </section>
    <section anchor="example">
      <name>Example</name>
      <t>Below is an example of a video of puppies, a preview image, and an audio clip.</t>
      <sourcecode type="cbor-diag"><![CDATA[
"av_metadata" : [
  {
     /partIndex /         1: 2,
     /type      /         2: 3, /video/
     /width     /         3: 1920,
     /height    /         4: 1080,
     /duration  /         5: 37, / in seconds. can be uint or float /
     /preview_index /     6: 4,
     /accessibility_text/ 7: "two golden retriever puppies playing in" +
                             "overgrown grass lit with low sunlight"
  },
  {
     /partIndex /         1: 4,
     /type      /         2: 2, /image/
     /width     /         3: 1920,
     /height    /         4: 1080
  },
  {
     /partIndex /         1: 7,
     /type      /         2: 1, /audio/
     /duration  /         5: 9.45, / in seconds. can be uint or float /
     /accessibility_text/ 7: "uproarious laughter"
  }
]
]]></sourcecode>
    </section>
    <section anchor="security-considerations">
      <name>Security Considerations</name>
      <t>TODO Security</t>
    </section>
    <section anchor="iana-considerations">
      <name>IANA Considerations</name>
      <t>TODO register the extension with IANA.</t>
    </section>
  </middle>
  <back>
    <references anchor="sec-normative-references">
      <name>Normative References</name>
      <reference anchor="I-D.ietf-mimi-content">
        <front>
          <title>More Instant Messaging Interoperability (MIMI) message content</title>
          <author fullname="Rohan Mahy" initials="R." surname="Mahy">
            <organization>Rohan Mahy Consulting Services</organization>
          </author>
          <date day="28" month="February" year="2025"/>
          <abstract>
            <t>   This document describes content semantics common in Instant Messaging
   (IM) systems and describes a profile suitable for instant messaging
   interoperability of messages end-to-end encrypted inside the MLS
   (Message Layer Security) Protocol.

            </t>
          </abstract>
        </front>
        <seriesInfo name="Internet-Draft" value="draft-ietf-mimi-content-06"/>
      </reference>
      <reference anchor="RFC2119" xml:base="https://bib.ietf.org/public/rfc/bibxml/reference.RFC.2119.xml">
        <front>
          <title>Key words for use in RFCs to Indicate Requirement Levels</title>
          <author fullname="S. Bradner" initials="S." surname="Bradner"/>
          <date month="March" year="1997"/>
          <abstract>
            <t>In many standards track documents several words are used to signify the requirements in the specification. These words are often capitalized. This document defines these words as they should be interpreted in IETF documents. This document specifies an Internet Best Current Practices for the Internet Community, and requests discussion and suggestions for improvements.</t>
          </abstract>
        </front>
        <seriesInfo name="BCP" value="14"/>
        <seriesInfo name="RFC" value="2119"/>
        <seriesInfo name="DOI" value="10.17487/RFC2119"/>
      </reference>
      <reference anchor="RFC8174" xml:base="https://bib.ietf.org/public/rfc/bibxml/reference.RFC.8174.xml">
        <front>
          <title>Ambiguity of Uppercase vs Lowercase in RFC 2119 Key Words</title>
          <author fullname="B. Leiba" initials="B." surname="Leiba"/>
          <date month="May" year="2017"/>
          <abstract>
            <t>RFC 2119 specifies common key words that may be used in protocol specifications. This document aims to reduce the ambiguity by clarifying that only UPPERCASE usage of the key words have the defined special meanings.</t>
          </abstract>
        </front>
        <seriesInfo name="BCP" value="14"/>
        <seriesInfo name="RFC" value="8174"/>
        <seriesInfo name="DOI" value="10.17487/RFC8174"/>
      </reference>
      <reference anchor="RFC8160">
        <front>
          <title>IUTF8 Terminal Mode in Secure Shell (SSH)</title>
          <author fullname="S. Tatham" initials="S." surname="Tatham"/>
          <author fullname="D. Tucker" initials="D." surname="Tucker"/>
          <date month="April" year="2017"/>
          <abstract>
            <t>This document specifies a new opcode in the Secure Shell terminal modes encoding. The new opcode describes the widely used IUTF8 terminal mode bit, which indicates that terminal I/O uses UTF-8 character encoding.</t>
          </abstract>
        </front>
        <seriesInfo name="RFC" value="8160"/>
        <seriesInfo name="DOI" value="10.17487/RFC8160"/>
      </reference>
    </references>
    <?line 165?>

<section numbered="false" anchor="acknowledgments">
      <name>Acknowledgments</name>
      <t>TODO acknowledge.</t>
    </section>
  </back>
  <!-- ##markdown-source:
H4sIAAAAAAAAA61Y7W7bOBb9r6fgehaLZNaWnTRtGmPSTpqkOwby0U3SDgaD
IqEl2iYqkwJJOfEU7bPss+yT7bkkJVtJm7bABkgikbzk/Tj33Ev1er3ESVeI
IescVLnUXfZO5gL/uMrZaM6ngp0Kx3PuOBN3TigrtbJsog1zM8xpI9hIWceV
w0Jr+VSqKUacMLoUho9lId2SbZyOTkeb7FBjAishPueuk/Dx2IgFzqbpZvbg
XXNmJ8m4E1NtlkMm1UQnSa4zxefQNzd84npzPlv25nIue3zRm0ep3mCQ2Go8
l5a0dcsSy0fHV68Z+4nxwmocKFUuSoE/ynW6rCNy6bSRvKCX0cEr/IOFndHF
1etOoqr5WJhhgq3FMMlgP9xQ2SFzphIJ1H+ScCM4ubAsCwmNvY/IgxeCF70r
ORed5FabD1Ojq5LM/T63dZIPYgm5fJiwHiMrWRZ8RO/SR6e2mUY4RbA1sqBg
rkYWQlUwgbEfVoSx4MbO7zCDlvyLdqDxOZcFxkm7X6Vwk1SbKY1zk80wPnOu
tMN+n5bRkFyItF7Wp4H+2OhbK/q0QZ8Ep9LNqjFEjZ5xRQHu3w8wLSsQDevW
TmiWp2GHVOoHgv1HUJPO3LzoJAmv3EwbcjlOYWxSFUVA3AUdwE4h6ydgAFfy
Lx/toR8RwRdekZTO+HVKI2mm50miPOZh/jBJCMqrt6TXQ+zG1hmeuSS5+vG8
ylp5xSTA58e4VML4ZDUym9XLuux25l9hjRETYYTKRIBTNyAmpH+A00QWwqbQ
Crsi+ao5HZQLmxk5FrbBVk0JVgQJ5jRyrdC3fmJO9pSF4JYMMpR4Bgalwfa5
zPNCJMlPZKDReZWRT6MniBnu2ffx499GvSMPoxDFOP/pk7cJbwuxhAsW3GDN
kukJ1Mwl9xi2MM4yAYjANVIVcBHleuOI3POcUbyoj02TkcIGdRD4epJL720E
eK4VmZxLWxZ8GXzX8iZi6WQWs7aJRKaLQmQEhCIK9UPObhy820y9B7Cdrlxw
45eV0BNsBgusMAtBsYcKmZjpIo/RJ6oGqzb2/D5DiNq607J1FX0AxwQL7zpH
+kXjRN6N2CC7grp1hGStjdKuLatvVaF5Dv+SVICAyNPkGKTE5CQ6xkPJI61Z
vrZLlxZS4ETAuBKOaJXi5203ACZ5PywhLMACB/oF4nEKuYHbEv5mhlxHcjlm
Y1HTfu+oCBjd27AUjn1Q+hYg8Nb5rScE58oKcAO5acYXglmNYxpks5lUDiqO
KXS0awDgo7lwP8UmgKZdr7o4yn0lJTBVGk2xiEnoz0d+IakOKR/Uqigd0c7S
v4ccQ5VhVGYsCsLbyyuqgfSfnZ3754vjf78dXRwf0fPlbwcnJ81DEldc/nb+
9uRo9bSSPDw/PT0+OwrCGGWtoaRzevBHJ+RH5/zN1ej87OCkg6yEEeu+oGBE
OBIDlkY4wpFNah7KSebV4Zv//mdrh+jh4vXh9tbWHgghvDzf2t3BC6EgnKYV
MBle4a9lgnwSnPiASAvIKaVDp+Cpws6ARUZhhzt//pM8837Ifhln5dbOizhA
BrcGa5+1Br3PHo48EA5O/MLQF45pvNkav+fptr4Hf7Tea7+vDf7y0tNib+v5
yxdJcg+YwL1tkytCMkdPaPT8q/jMG8x1maAURK+FCNycoYaL/A037qbLbi6R
B4Wo344jD8d3CtvNaVU46QdSgvZaq8iOm0QJqF6fa6nUZJQnEUTcGO7tgERT
zrAQ9iGDjjlK5f2ZZeCfw1fnF2iAyvvCpS/STvpiAzxhOpRcArIuyQ0oL+Iu
E2Vgh5sSJl1TU3p3EwylUnWTMjLEoq9A+mYrhlK5RKpX2COetETRFQUSmNSi
9pwqr0Vv0WttPQxlrmES5Dw4E7t6/9DCtTAaUYgFURSN40wbyYWtYoaKYVCs
K+Jj+LHt5lCsRMpAnE2r4duCVqCJhNuhTklrb/+QwkMpP4Ukete58LxNe5VU
7uAVUO2KXkmIbYC2e1JlBUpUnMVlgKY201i34KWtbiR6PG931yoZ3p+k7ECt
wcTq7IPw7B9ImepfgSaxms5QRC3KEBGQ9fXCR0qWKFe5t+NW5m4GQ0gN/9wu
MzA+HqtYKe9EYUloJuR05qJUePkesbwK7omC9WstGmxfF7UCwcptEyIQrLgj
z1pPr+T9Ci6YKk+wIQyQB8C1ldS0sAmKtA9CqbGAhauSRx0udlLcNsCbeLlw
MgGqu8L9KMAeWnJ4kNoyQI8cGJDnZtytAm69x4MX4hk/jrBusGKNTRAbhOYm
m2ltxbkSN0G1yzr1buqeOXTVNhSQx09p2s11BphX1pGfCdjeCH8w972Vdyri
teBFJUik9mJICZ5liIwMrf+1Az59oKkpwTNUq4qctq57K8pZJ3jein+3bvQb
HPjWiNhcV4h4PMPfLqxwFFvraUsoPi4I+tQ0Vb7HUR5Nlpr8uhML8mFP3EHA
ndADi6Ihq5zzubKgSyF29eYZ7Wr0auXNn+jKBGfgsj3osj38bj0feBW2dwfE
jtipBmab2gISaZdcTI0QnjyzApl8K9Ee1YexDaTB3iAuwl6ZEVTl7KbPaijv
qLk1Bk0jAu5taNIQJejFmXYiQBQR1SgZKm7sCbm2VAlyCzdLYpZl3WI2nBMx
EMkiiga8YAPcm8k+R7dPAn9ND4HsLVgxb7pQjtXIaDQw4ESdi3WV/L0PfT95
tz6x3ur+GYGqvnhAqQ1uqrg83N8/DXU3FB/Ch1USbZWnLvSgGfn9iMrjqgFl
J1xNK3LGxuHR0clm0649G6Bdo/MrGw73fQS1DIGCA6031ScivKFsqPL582eW
4V6Z8MV1XZivQ6nfZxv+vt5Zm+qwIfuT/dzU8OtQ5t8nm0lyb2yfffTi/9hY
ldYh29rEDhXAuPWsG6fDJ5PtTRZ+hpGB+zH+/Rj9Pvs7FL/2JSyIvoSwj8CQ
Pamlm03ojNWyEMAh29l8dFldC4bs6WZrmdJKian/FHEdcmYl1OLwIXu2ub53
bSYtfMhMQ7ZL/nAI0WpZnXRD9rytxAA+2KM/yG78RW57Ge+WtRgln+gzyn19
EQ/SB3K+GLF06tggHSRByXo2tfIvwbbp8w7FYJ9tJSEK+xgMcdhnTwg11Fce
3/F5SZ8kXglit9AnijDoC1UMHR7LqixDo9cmufgNRcWgZ4Usa1SOtekh1tPk
AQBhdsAW6zeVEYbVP1tAUzfO+14nPDbz28BLl4UvCP24LmRye90TwHVve1Bv
FTuM1pIdLBk8b5Y0vcTakqc4bbdLcF7rJGIb4X0Omg4hqXVpASru9AzYrU95
iKM+2x2yDmoLm9LnDKrx1JfTVT86nj52LIltpOqwfybssZ+OhuDU0H1uasBs
DOcEKqQo20oV5Aj6uvip++1Q7HwjFGgr+x4K/59QfKdWu9/QCo1vP3xoejy0
e+nO0x8K7teCV+GOEruLgqNlRofkPZy8D9mGdLsUGToKdBwoE3TNCArRPe78
6LyZ9UtHB2cHX15mxFSi3zHtShDCS1Jp/NI65tkHf3fM6LsOWpqpL/jJx2Hg
E5Hvdya4/IvOp7gzb1ai5P8PgV7omjMZAAA=

-->

</rfc>
