<?xml version="1.0" encoding="utf-8"?>
<!-- name="GENERATOR" content="github.com/mmarkdown/mmark Mmark Markdown Processor - mmark.miek.nl" -->
<rfc version="3" ipr="trust200902" docName="draft-timbru-sidrops-publication-server-bcp-00" submissionType="IETF" category="bcp" xml:lang="en" xmlns:xi="http://www.w3.org/2001/XInclude" obsoletes="8416" indexInclude="true" consensus="true">

<front>
<title abbrev="RPKI Publication Server Operations">RPKI Publication Server Best Current Practices</title><seriesInfo value="draft-timbru-sidrops-publication-server-bcp-00" status="bcp" name="Internet-Draft"></seriesInfo>
<author initials="T." surname="Bruijnzeels" fullname="Tim Bruijnzeels"><organization>NLnet Labs</organization><address><postal><street></street>
</postal><email>tim@nlnetlabs.nl</email>
<uri>https://www.nlnetlabs.nl/</uri>
</address></author><author initials="T." surname="de Kock" fullname="Ties de Kock"><organization>RIPE NCC</organization><address><postal><street></street>
</postal><email>tdekock@ripe.net</email>
</address></author><date/>
<area>Internet</area>
<workgroup></workgroup>

<abstract>
<t>This document describes best current practices for operating an RFC 8181
RPKI Publication Server and its rsync and RRDP (RFC 8182) public
repositories.</t>
</abstract>

</front>

<middle>

<section anchor="requirements-notation"><name>Requirements notation</name>
<t>The key words &quot;MUST&quot;, &quot;MUST NOT&quot;, &quot;REQUIRED&quot;, &quot;SHALL&quot;, &quot;SHALL NOT&quot;, &quot;SHOULD&quot;,
&quot;SHOULD NOT&quot;, &quot;RECOMMENDED&quot;, &quot;NOT RECOMMENDED&quot;, &quot;MAY&quot;, and &quot;OPTIONAL&quot; in
this document are to be interpreted as described in BCP 14 <xref target="RFC2119"></xref>
<xref target="RFC8174"></xref> when, and only when, they appear in all capitals, as shown here.</t>
</section>

<section anchor="introduction"><name>Introduction</name>
<t><xref target="RFC8181"></xref> describes the RPKI Publication Protocol used between
RPKI Certificate Authorities (CAs) and their Publication Repository server.
The server is responsible for handling publication requests sent by the
CAs, called Publishers in this context, and ensuring that their data is
made available to RPKI Relying Parties (RPs) in (public) rsync and RRDP
<xref target="RFC8182"></xref> publication points.</t>
<t>In this document, we will describe best current practices based on the
operational experience of several implementers and operators.</t>
</section>

<section anchor="glossary"><name>Glossary</name>
<table>
<thead>
<tr>
<th>Term</th>
<th>Description</th>
</tr>
</thead>

<tbody>
<tr>
<td>Publication Server</td>
<td><xref target="RFC8181"></xref> Publication Repository server</td>
</tr>

<tr>
<td>Publishers</td>
<td><xref target="RFC8181"></xref> Publishers (Certificate Authorities)</td>
</tr>

<tr>
<td>RRDP Repository</td>
<td>Public facing <xref target="RFC8182"></xref> RRDP repository</td>
</tr>

<tr>
<td>Rsync Repository</td>
<td>Public facing rsync server</td>
</tr>
</tbody>
</table></section>

<section anchor="publication-server"><name>Publication Server</name>
<t>The Publication Server handles the server side of the <xref target="RFC8181"></xref> Publication
Protocol. The Publication Server generates the content for the public-facing
RRDP and Rsync Repositories. It is strongly RECOMMENDED that these functions
are separated from serving the repository content.</t>

<section anchor="availability"><name>Availability</name>
<t>The Publication Server and repository content have different demands on their
availability and reachability. While the repository content MUST be highly
available to any RP worldwide, only publishers need to access the Publication
Server. Dependent on the specific setup, this may allow for additional access
restrictions in this context. For example, the Publication Server can limit
access to known source IP addresses or apply rate limits.</t>
<t>If the Publication Server is unavailable for some reason, this will prevent
Publishers from publishing any updated RPKI objects. The most immediate impact
of this is that the publisher cannot update their ROAs, ASPAs or BGPSec Router
Certificates during this outage. Thus, it cannot authorise changes in its
routing operations. If the outage persists for a more extended period, then the
RPKI manifests and CRLs published will expire, resulting in the RPs rejecting
CA publication points.</t>
<t>For this reason, the Publication Server MUST be operated in a highly available
fashion. Maintenance windows SHOULD be planned and communicated to publishers,
so they can avoid - if possible - that changes in published RPKI objects are
needed during these windows.</t>
</section>
</section>

<section anchor="rrdp-repository"><name>RRDP Repository</name>
<t>In this section, we will elaborate on the following recommendations:</t>

<ul spacing="compact">
<li>Use a separate hostname: do not share fate with rsync or the Publication Server.</li>
<li>Use a CDN if possible</li>
<li>Use randomized filenames for Snapshot and Delta Files</li>
<li>Limit the size of the Notification File</li>
<li>Combine deltas to limit the size of the Notification File</li>
<li>Timing of publication of Notification File</li>
</ul>

<section anchor="unique-hostname"><name>Unique Hostname</name>
<t>It is RECOMMENDED that the public RRDP Repository URIs use a hostname different
from both the <xref target="RFC8181"></xref> service_uri used by publishers, and the hostname used
in rsync URIs (<tt>sia_base</tt>).</t>
<t>Using a unique hostname will allow the operator to use dedicated infrastructure
and/or a Content Delivery Network for its RRDP content without interfering with
the other functions.</t>
</section>

<section anchor="content-delivery-network"><name>Content Delivery Network</name>
<t>If possible, it is strongly RECOMMENDED that a Content Delivery Network is used
to serve the RRDP content. Care MUST BE taken to ensure that the Notification
File is not cached for longer than 1 minute unless the back-end RRDP Repository
is unavailable, in which case it is RECOMMENDED that stale files are served.</t>
<t>When using a CDN, it will likely cache 404s for files not found on the back-end
server. Because of this, the Publication Server SHOULD use randomized,
unpredictable paths for Snapshot and Delta Files to avoid the CDN caching such
404s for future updates.</t>
<t>Alternatively, the Publication Server can delay writing the notification file
for this duration or clear the CDN cache for any new files it publishes.</t>
</section>

<section anchor="limit-notification-file-size"><name>Limit Notification File Size</name>
<t>The size of the RRDP Notification File can significantly impact RRDP
operations. If this file becomes too large, then it can easily result in
significant traffic if the RRDP Repository does not use any CDN or in high
costs if it does.</t>
<t><xref target="RFC8182"></xref> stipulated that any deltas that, combined with all more recent
delta, will result in the total size of deltas exceeding the snapshot size MUST
be excluded to avoid Relying Parties downloading more data than necessary.</t>
<t>In addition to the restriction described above, we RECOMMEND that the
Notification File size is reduced by removing delta files that have been
available for more than 75 minutes. As RP typically refresh their caches every
10 minutes, this will ensure that deltas are available for the vast majority of
RPs, while limiting the size of the Notification File.</t>
<t>Furthermore, we RECOMMEND that Publication Servers with many, e.g. 1000s of,
Publishers ensure they do not produce Delta Files more frequently than once per
minute. A possible approach for this is that the Publication Server SHOULD
publish changes at a regular (one-minute) interval. The Publication Server then
publishes the updates received from all Publishers in this interval in a single
RRDP Delta File.</t>
</section>

<section anchor="consistent-load-balancing-and-notification-file-timing"><name>Consistent load-balancing and Notification File Timing</name>
<t>Notification Files MUST NOT be available to RPs before the referenced snapshot
and delta files are available.</t>
<t>As a result, when using a load-balancing setup, care SHOULD be taken to ensure
that RPs that make multiple subsequent requests receive content from the same
node. This way, clients view the timeline on one node where the referenced
snapshot and delta files are available. Alternatively, publication
infrastructure SHOULD ensure a particular ordering of the visibility of the
snapshot plus delta and notification file. All nodes should receive the new
snapshot and delta files before any node receives the new notification file.</t>
</section>
</section>

<section anchor="rsync-repository"><name>Rsync Repository</name>
<t>In this section, we will elaborate on the following recommendations:</t>

<ul spacing="compact">
<li>Use symlinks to provide consistent content</li>
<li>Use deterministic timestamps for files</li>
<li>Load balancing and testing</li>
</ul>

<section anchor="consistent-content"><name>Consistent Content</name>
<t>A naive implementation of the Rsync Repository can change the repository
content while RPs transfer files. Even when the repository is consistent from
the repository server's point of view, clients may read an inconsistent set of
files. Clients may get a combination of newer and older files. This &quot;phantom
read&quot; can lead to unpredictable and unreliable results. While modern RPs will
treat such inconsistencies as a &quot;Failed Fetch&quot; (<xref target="RFC9286"></xref>), it is best to
avoid this situation since a failed fetch for one repository can cause the
rejection of the publication point for a sub-CA when resources change.</t>
<t>One way to ensure that rsyncd serves connected clients (RPs) with a consistent
view of the repository is by configuring the rsyncd 'module' path to a path
that contains a symlink that the repository-writing process updates for every
repository publication.</t>
<t>Whenever there is an update:</t>

<ul spacing="compact">
<li>write the complete updated repository into a new directory</li>
<li>fix the timestamps of files (see next section)</li>
<li>change the symlink to point to the new directory</li>
</ul>
<t>This way, rsyncd does not need to restart, and since rsyncd resolves this
symlink when it <tt>chdir</tt>s into the module directory when a client connects, any
connected RPs will be able to read a consistent state.</t>
<t>The repository can remove old directories when no RP fetching at a reasonable
rate is reading that data. It's hard to determine this in practice. Empirical
data suggests that Rsync Repositories MAY assume that it is safe to do so after
one hour. We recommend repository operators monitor for &quot;file has vanished&quot;
lines in the rsync log file to detect how many clients are affected by these
deletions.</t>
</section>

<section anchor="deterministic-timestamps"><name>Deterministic Timestamps</name>
<t>By default, rsyncd uses the modification time and file size to determine if a
file has changed. Therefore, the modification time SHOULD not change for files
that did not change in content.</t>
<t>We RECOMMEND the following deterministic heuristics for objects' timestamps
when written to disk. These heuristics assume that a CA is compliant with
<xref target="RFC9286"></xref> and uses &quot;one-time-use&quot; EE certificates:</t>

<ul spacing="compact">
<li>For CRLs, use the value of &quot;this update&quot;.</li>
<li>For manifests, use the value of &quot;this update&quot;. Note that &quot;signing time&quot;
could, in theory, be a more accurate value for this, but since this
attribute is optional, it cannot be assumed to be present. A preference for
&quot;signing time&quot; with a fallback to &quot;not before&quot; can result in
inconsistencies between a manifest and its corresponding CRL.</li>
<li>For RPKI Signed Objects, use &quot;not before&quot; from the embedded EE Certificate.</li>
<li>For CA and BGPSec Router Certificates, use &quot;not before&quot;</li>
<li>For directories, use any constant value.</li>
</ul>
</section>

<section anchor="load-balancing-and-testing"><name>Load Balancing and Testing</name>
<t>It is RECOMMENDED that the Rsync Repository is load tested to ensure that it
can handle the requests by all RPs in case they need to fall back from using
RRDP (as is currently preferred).</t>
<t>Because Rsync exchanges rely on sessions over TCP, there is no need for
consistent load-balancing between multiple rsyncd servers as long as they (1)
each provide a consistent view and (2) are updated more frequently than the
typical refresh rate for rsync repositories used by RPs.</t>
<t>We RECOMMEND serving rsync repositories from local storage so the host
operating system can optimally use its I/O cache. Using network storage
is NOT RECOMMENDED because it may not benefit from this cache. For
example, the operating system cannot cache stat operations to list the
repository content if NFS is used.</t>
<t>We RECOMMENDED setting the &quot;max connections&quot; to a value that a single node can
handle within the time an RP allows for rsync to fetch data and re-evaluate as
the repository changes in size over time.</t>
<t>The number of rsyncd servers needed depends on the number of RPs, their refresh
rate, and the &quot;max connections&quot; used. These values are subject to change over
time, so we cannot give clear recommendations here except to restate that we
RECOMMEND load-testing rsync and re-evaluating these parameters over time.</t>
</section>
</section>

<section anchor="acknowledgements"><name>Acknowledgements</name>
<t>This document is the result of many informal discussions between implementers.
Proper acknowledgements will follow.</t>
</section>

</middle>

<back>
<references><name>Normative References</name>
<xi:include href="https://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC.2119.xml"/>
<xi:include href="https://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC.8174.xml"/>
<xi:include href="https://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC.8181.xml"/>
<xi:include href="https://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC.8182.xml"/>
<xi:include href="https://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC.9286.xml"/>
</references>

</back>

</rfc>
