<?xml version="1.0" encoding="utf-8"?>

<?xml-model href="rfc7991bis.rnc"?>
<!DOCTYPE rfc [
  <!ENTITY nbsp "&#160;">
  <!ENTITY zwsp "&#8203;">
  <!ENTITY nbhy "&#8209;">
  <!ENTITY wj "&#8288;">
]>

<rfc xmlns:xi="http://www.w3.org/2001/XInclude" submissionType="independent"
category="info" ipr="trust200902" xml:lang="en" version="3"
docName="draft-lemieux-doi-uri-scheme-06" >
  <front>
    <title>The "doi" URI Scheme</title>

    <seriesInfo name="Internet-Draft" value="draft-lemieux-doi-uri-scheme-06"
    status="informational"/>

    <author initials='P.-A.' surname='Lemieux' fullname='Pierre-Anthony Lemieux' role="editor">

      <organization>Sandflow Consulting LLC</organization>
      <address>
        <postal>
          <city>San Mateo</city>
          <region>CA</region>
          <country>US</country>
        </postal>
        <email>pal@sandflow.com</email>
      </address>
    </author>

    <keyword>DOI</keyword>
    <keyword>URI</keyword>
    <keyword>digital object identifier</keyword>

    <abstract>
      <t>This document specifies the <tt>"doi"</tt> URI scheme.</t>
    </abstract>
  </front>

  <middle>
    <section>
      <name>Introduction</name>
      <t>A DOI name is a global unique identifier of a referent, which can be
      any digital, physical or abstract entity, including inventions, literary
      and artistic works, ideas, symbols, names, images, designs, etc. DOI names
      are, for example, widely used to identify academic publications. The DOI
      system is specified in <xref target="iso26324"/> and <xref
      target="doi-handbook"/>, with the former offering regular formal snapshot
      of the latter.</t>

      <t>EXAMPLE 1: The DOI name "10.1103/PhysRevLett.59.381" refers to the
      article Per Bak, Chao Tang, and Kurt Wiesenfeld, "Self-organized
      criticality: An explanation of the 1/f noise", Phys. Rev. Lett. 59,
      381.</t>

      <t>A DOI name is persistent over time. This persistence is provided by the
      independence of the DOI name from the referent itself and its descriptive
      elements. These descriptive elements of a referent, including location and
      ownership, can change over time, and their current values are retrieved by
      resolving the DOI name. The set of elements retrieved by resolving a DOI
      name is called the DOI record. The DOI name resolution process uses the
      Handle System specified at <xref target="RFC3650"/>, <xref
      target="RFC3651"/> and <xref target="RFC3652"/>, as updated by <xref
      target="DOI-RP"/>.</t>

      <t>This document specifies a URI scheme for DOI names. This scheme
      conforms to the syntax specified at <xref target="RFC3986"/> and
      formalizes the notation "doi:&lt;DOI name&gt;", which is in widespread
      use. When derefenced as detailed in <xref
      target="sec-doi-name-resolution"/>, the URI corresponding to a DOI name
      yields the DOI record associated with the name.</t>

      <t>EXAMPLE 2: "doi:10.1103/PhysRevLett.59.381" is the URI corresponding to
      the DOI name above.</t>

      <t>This document intended to satisfy the guidelines and registration
      procedures specified at <xref target="RFC7595"/>.</t>
    </section>

    <section  anchor="doi-name-syntax">
      <name>Syntax</name>

      <t>As specified at <xref target="iso26324"/>, a DOI name consists of an
      ordered sequence of Unicode code points of the Graphic type.</t>

      <t>A DOI Name URI is a URI that corresponds to a given DOI name. As
      defined at <xref target="RFC7595"/>, its scheme name SHALL be "doi" and
      its <tt>scheme-specific-part</tt> SHALL be equal to the result of the
      following ordered sequence of steps:</t>

      <ol>
        <li>express the ordered sequence of Unicode code points that comprise
        the DOI name as a UTF-8 String, as defined at <xref target="iso10646"/>,
        without the byte order mark and without any normalization;</li>
        <li>percent-encode any byte in the UTF-8 String that is neither
        <tt>unreserved</tt> nor equal to <tt>"/"</tt>.</li>
      </ol>

      <t>A DOI Name URI shall contain neither a query component nor a fragment
      component.</t>

      <t>EXAMPLE 1: The DOI name "10.5594/SMPTE.ST2067-21.2020" corresponds to
      the URI <tt>&lt;doi:10.5594/SMPTE.ST2067-21.2020&gt;</tt>.</t>

      <t>EXAMPLE 2: The DOI name "10.26321/Á.GUTIÉRREZ.ZARZA.02.2018.03" with
      the code point sequence &lt;U+0031, U+0030, U+002E, U+0032, U+0036, U+0033,
      U+0032, U+0031, U+002F, U+00C1, U+002E, U+0047, U+0055, U+0054, U+0049,
      U+00C9, U+0052, U+0052, U+0045, U+005A, U+002E, U+005A, U+0041, U+0052,
      U+005A, U+0041, U+002E, U+0030, U+0032, U+002E, U+0032, U+0030, U+0031,
      U+0038, U+002E, U+0030, U+0033&gt; corresponds to the URI
      <tt>&lt;doi:10.26321/%C3%81.GUTI%C3%89RREZ.ZARZA.02.2018.03&gt;</tt>.</t>

      <t>NOTE 1: The sequence of code points comprising a DOI name is not
      normalized and equivalence between DOI names is based on code points. For
      example, two DOI names that differ only in the abstract character "Á"
      being encoded as &lt;U+00C1&gt; in the first and as &lt;U+0041, U+0301&gt;
      in the second are not identical.</t>

      <t>NOTE 2: Presenting a DOI name by rendering its sequence of code points
      to glyphs can be ambiguous since multiple code points or sequences of code
      points can result in the same glyphs. For example, U+002D HYPHEN-MINUS,
      U+2212 MINUS SIGN and U+2013 EN DASH are rendered as similar glyphs. As
      another example, the abstract character "á" can be represented by either
      the code point U+00E1 or the sequence of code points &lt;U+0061,
      U+0301&gt;. Presenting a DOI name in its URI form resolves this
      ambiguity.</t>
    </section>

    <section anchor="sec-doi-name-equivalence">
      <name>Equivalence</name>

      <t>The following procedure SHALL be performed to determine whether two DOI
      Name URIs are equivalent:</t>

      <ol>
        <li>the <tt>scheme-specific-part</tt> of each of the two URIs is
        percent-decoded into a UTF-8 String;</li>
        <li>the two UTF-8 Strings are interpreted as two DOI names;</li>
        <li>the two DOI Name URIs are equivalent if the two DOI names are
        equivalent, as defined at <xref target="DOI-RP"/>.</li>
      </ol>

      <t>NOTE: When testing for equivalence, DOI names are case-insensitive only
      with respect to the Basic Latin Unicode block.</t>
    </section>

    <section anchor="sec-doi-name-resolution">
      <name>DOI Name Resolution</name>

      <t>Resolving a DOI name means retrieving its DOI record, which contains
      the descriptive elements associated with the referent identified by the
      DOI name.</t>

      <t>A DOI name URI can be used to resolve its corresponding DOI name by
      performing an HTTP GET request at the following URL (expressed using ABNF
      syntax as defined at <xref target="RFC5234"/>):</t>

      <t><tt>"https://doi.org/api/handles/" scheme-specific-part</tt></t>

      <t>where <tt>scheme-specific-part</tt> is the scheme-specific-part of the
      DOI name URI, as defined at <xref target="doi-name-syntax"/>, and the
      "https" scheme is specified at <xref target="RFC9110"/>.</t>

      <t>The body of the response is a JSON object, as defined at <xref
      target="RFC8259"/>, that contains the following members:</t>

      <dl newline="true">
        <dt>responseCode</dt>
        <dd>
          <t>The property is a Number. The following values are defined:</t>
          <dl>
            <dt>1</dt>
            <dd>The resolution completed successfully. The HTTP response status
            code is <tt>200</tt>.</dd>
            <dt>2</dt>
            <dd>The resolution did not complete successfully because of a server
            error. The HTTP response status code is <tt>500</tt>.</dd>
            <dt>100</dt>
            <dd>The DOI name was not found. The HTTP response status code is
            <tt>404</tt>.</dd>
            <dt>200</dt>
            <dd>No descriptive elements were found for the requested DOI name.
            The HTTP response status code is <tt>200</tt>.</dd>
          </dl>
        </dd>

        <dt>handle</dt>
        <dd>The property is a String. It is equal to the DOI name for which
        resolution was requested.</dd>

        <dt>values</dt>
        <dd>The property is an Object. It contains the descriptive elements for
        the referent identified by the DOI name. The contents of the property
        are specified at <xref target="RFC3651"></xref>.</dd>
      </dl>

      <t><xref target="fig-sample-response"/> illustrates the DOI record, at the
      time of this writing, for the DOI name corresponding to the URI
      &lt;doi:10.1000/182&gt;. The DOI record was retrieved by performing an
      HTTP GET request to &lt;https://doi.org/api/handles/10.1000/182&gt;.</t>

      <figure anchor="fig-sample-response">
        <name>DOI record for the DOI name "10.1000/182" (at the time of this writing)</name>
        <sourcecode type="json">
<![CDATA[
{
  "responseCode": 1,
  "handle": "10.1000/182",
  "values": [
    {
      "index": 1,
      "type": "URL",
      "data": {
        "format": "string",
        "value": "http://www.doi.org/hb.html"
      },
      "ttl": 86400,
      "timestamp": "2004-01-21T14:14:17Z"
    },
    {
      "index": 100,
      "type": "HS_ADMIN",
      "data": {
        "format": "admin",
        "value": {
          "handle": "0.na/10.1000",
          "index": 200,
          "permissions": "011111110010",
          "legacyByteLength": true
        }
      },
      "ttl": 86400,
      "timestamp": "2000-06-23T15:17:46Z"
    }
  ]
}
]]>
        </sourcecode>
      </figure>

    </section>

    <section anchor="sec-doi-uri-dereferencing">
      <name>Retrieving the referent identified by a DOI name</name>

      <t>While <xref target="sec-doi-name-resolution"/> specifies the procedure
      for retrieving the DOI record associated with DOI name, the steps
      necessary to retrieve the actual referent described by the record depends
      on the nature of the referent, e.g., a referent can be a physical
      object.</t>

      <t>Some, but not all, referents can be retrieved by dereferencing an
      HTTP/HTTPS URI found in their respective DOI records, as illustrated in
      <xref target="fig-sample-response"/> where the referent identified by the
      DOI name "10.1000/182" can be retrieved at "http://www.doi.org/hb.html".</t>

      <t>The <em>single DOI resolution</em> and <em>multiple doi resolution</em>
      functions at <xref target="DOI-RP"/> specify the process of retrieving a
      referent that is available by dereferencing an HTTP/HTTPS URI.</t>
    </section>

    <section>
      <name>Security Considerations</name>
      <t>A DOI name is an opaque string, which does not have a discernible
      meaning on its own and is for use by humans and machines alike. It
      consists of a sequence of Unicode codepoints and the security
      considerations at <xref target="UNICODE-TR36"/> apply. In particular, and
      as noted at <xref target="doi-name-syntax"/>, presenting a DOI name by
      rendering its sequence of code points to glyphs can be ambiguous. As a
      result, two DOI names rendering to the same sequence of glyphs can
      identify referents, including, for example, two software executables with
      wildly different side-effects. Presenting a DOI name in its URI form,
      which consists of a limited subset of characters, can lessen this
      risk.</t>

      <t>The DOI name resolution process is conducted using the Hypertext
      Transfer Protocol Secure, which ensures condifentiality and integrity of
      the transaction, and he security considerations at <xref
      target="RFC9110"/> apply.</t>

      <t>The results of the DOI name resolution process is a JSON object and the
      security considerations at <xref target="RFC8259"/> apply.</t>
    </section>

    <section>
      <name>IANA Considerations</name>
      <t>The following is the permanent URI Scheme Registration request, as
      defined in <xref target="RFC7595"></xref>:</t>
      <dl newline="true">
        <dt>Scheme name</dt>
        <dd>doi</dd>
        <dt>Status</dt>
        <dd>Permanent</dd>
        <dt>Contact</dt>
        <dd>Pierre-Anthony Lemieux &lt;pal@sandflow.com&gt;</dd>
        <dt>Change controller</dt>
        <dd>
          DOI Foundation<br/>
          Web: &lt;https://www.doi.org&gt;<br/>
          Email: &lt;info@doi.org&gt;
        </dd>
        <dt>References</dt>
        <dd>This document</dd>
      </dl>
    </section>
  </middle>

  <back>
    <references>
      <name>References</name>
      <references>
        <name>Normative References</name>
        <reference anchor="iso26324">
          <front>
            <title abbrev="ISO 26324">ISO 26324, Information and documentation,
            Digital object identifier system</title>
            <author>
              <organization>ISO</organization>
            </author>
          </front>
        </reference>
        <reference anchor="iso10646">
          <front>
            <title abbrev="ISO/IEC 10646">ISO/IEC 10646, Information technology,
            Universal coded character set (UCS)</title>
            <author>
              <organization>ISO</organization>
            </author>
          </front>
        </reference>
        <xi:include href="https://bib.ietf.org/public/rfc/bibxml/reference.RFC.3986.xml"/>
        <xi:include href="https://bib.ietf.org/public/rfc/bibxml/reference.RFC.5234.xml"/>
        <xi:include href="https://bib.ietf.org/public/rfc/bibxml/reference.RFC.3651.xml"/>
        <xi:include href="https://bib.ietf.org/public/rfc/bibxml/reference.RFC.8259.xml"/>
        <xi:include href="https://bib.ietf.org/public/rfc/bibxml/reference.RFC.9110.xml"/>
      </references>

      <references>
        <name>Informative References</name>
        <xi:include href="https://www.rfc-editor.org/refs/bibxml/reference.RFC.7595.xml"/>
        <xi:include href="https://bib.ietf.org/public/rfc/bibxml/reference.RFC.3650.xml"/>
        <xi:include href="https://bib.ietf.org/public/rfc/bibxml/reference.RFC.3652.xml"/>
        <reference anchor="doi-handbook" target="https://www.doi.org/the-identifier/resources/handbook/">
          <front>
            <title>DOI Handbook</title>
            <author>
              <organization>DOI Foundation</organization>
            </author>
          </front>
          <seriesInfo name="DOI" value="10.1000/182"/>
        </reference>
        <reference anchor="DOI-RP" target="https://www.dona.net/sites/default/files/2022-06/DO-IRPV3.0--2022-06-30.pdf">
          <front>
            <title>Digital Object Identifier Resolution Protocol Specification</title>
            <author>
              <organization>DONA Foundation</organization>
            </author>
          </front>
        </reference>
        <reference anchor="UNICODE-TR36" target="https://www.unicode.org/reports/tr36/">
          <front>
            <title>Unicode Security Considerations</title>
            <author>
              <organization>Unicode Consortium</organization>
            </author>
          </front>
        </reference>
      </references>
    </references>

  </back>
</rfc>