Implementation Guidance for the PKCS #1 RSA Cryptography Specification

Implementation Guidance for the PKCS #1 RSA Cryptography Specification Red Hat, Inc.

Purkynova 115 Brno 61200 Czech Republic hkario@redhat.com

General Internet Engineering Task Force RSA This document specifies additions and amendments to RFC 8017. Specifically, it provides guidence to implementers of the standard to protect against side-channel attacks. It also deprecates the PKCS #1 v1.5 encryption padding, but provides an alternative depadding algorithm that protects against side-channel attacks raising from users of vulnerable APIs. The purpose of this specification is to increase security of RSA implementations.

The PKCS #1 specification describes the RSA cryptosystem, providing guidance on implementing encryption schemes and signature schemes. Unfortunately, typical uses of RSA encryption schemes leave it vulnerable to side-channel attacks. Protections against them are not documented there, and attacks are mentioned only in passing.

The PKCS #1 v1.5 padding is known to be problematic since 1998, when Daniel Bleichenbacher published his attack. Side-channel attacks against public key implementations, including RSA, are known to be possible since 1996 thanks to work by Paul Kocher. Despite those results, side-channel attacks against RSA implementations have proliferated for the next 25 years. We thus provide guidance how to implement those algorithms in a way that should be secure against at least the simple timing side channel attacks.

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119.

In this document we reuse the notation from RFC 8017, in addition, we define the following:

AL: alternative message length, non-negative integer, 0 <= AL <= k - 11
AM: alternative encoded message, an octet string
bb: base blinding factor, a positive integer
bbInv: base un-blinding factor, a positive integer,
bbInv = bb^(-1) mod n
D: octet string representation of d
DH: an octet string of a SHA-256 hash of D
KDK: an octet string containg a Key Derivation Key for a specific ciphertext C
l: length in octets of the message M
b_i: an exponent blinding factor for i-th prime, non-negative integer.

Cryptographic implementations may provide a lot of indirect signals to the attacker that includes information about the secret processed data. Depending on type of information, those leaks can be used to decrypt data or retrieve private keys. Most common side-channels that leak information about secret data are: Different errors returned Different processing times of operations Different patterns of jump instructions and memory accesses Use of hardware instructions that take different amount time to execute depending on operands or result Some of those leaks may be detectable over the network, while others may require closer access to the attacked system. With closer access, the attacker may be able to measure power usage, electromagnetic emanations, or sounds and correlate them with specific bits of secret information. Recent research into network based side channel detection has shown that even very small side channels (of just few clock cycles) can be reliably detected over the network. The detectability depends on the sample size the attacker is able to collect, not on size of the side-channel.

As a general rule, all operations that process secret information (be it parts of the private key or parts of encrypted message) MUST be performed with code that doesn't have secret data dependent branch instructions, secret data dependent memory accesses, or uses non-constant time machine instructions (which ones are those if architecture dependant, but division is commonly non-constant time). Special care should be placed around the code that handles the conversion of the numerical representation to the octet string representation in RSA decryption operations. All operations that use private keys SHOULD additionally employ both base blinding and exponent blinding as protections against leaks inside modular exponentiation code.

The underlying modular exponentiation algorithm MUST be constant time with regards to the exponent in all uses of the private key. For private key decryption the modular exponentiation algorithm MUST be constant time with regards to the output of the exponentiation. In case the Chinese remainder theorem optimisation is used the modular exponentiation algorithm must also be constant time with regards to the used moduli.

It's especially important to make sure that all values that are secret to the attacker are stored in memory buffers that have sizes determined by the public modulus. For example, the private exponenents should be stored in memory buffers that have sizes determined by the public modulus value, not the numerical values of the exponents themselves. Similarly, the size of the output buffer for multiplication should always be equal to the sum of buffer sizes of multiplicands. The output size of the modular reduction operation should similarly be equal to the size of the modulus and not depend on bit size of the output.

For the modular exponentiation algorithm to be side-channel free every step of the calculation MUST NOT depend on the bits of the exponent. In particular, use of simple square and multiply algorithm will leak information about bits of the exponent through lack of multiplication operation in individual exponentiation steps. The recommended workaround against it, is the use of the Montgomery ladder construction. While that approach ensures that both the square and multiply operations are performed, the fact that the results of them are placed in different memory locations based on bits of the secret exponent will provide enough information for an attacker to recover the bits of the exponent. To counteract it, the implementation should ensure that both memory locations are accessed and updated on every step.

As multiplication operations quickly make the intermediate values in modular exponentiation large, performing a modular reduction after every multiplication or squaring operation is a common optimisation. To further optimise the modular reduction, the Montgomery modular multiplication is used for performing the combined multiply-and-reduce operation. The last step of that operation is conditional on the value of the output. A side-channel free implementation should perfom the subtraction in all cases and then copy the result or the first operand of the subtraction based on sign of the result of the subtraction.

As protection against multiple attacks, it's RECOMMENDED to perform all operations involving the private key with the use of blinding . It should be noted that for decryption operations the unblinding operation MUST be performed using side-channel free code that does not leak information about the result of this multiplication and reduction modulo operation. To implement base blinding, select a number bb uniformely at random such that it is relatively prime to n and smaller than n. Compute multiplicative inverse of bb modulo n.

bbInv = bb^(-1) mod n In the RSADP() operation, after performing step 1, multiply c by bb mod n. Use the result as new c for all the remaining operations. Before returning the value m in step 3, multiply it by bbInv mod n. Note: multiplication by bbInv and reduction modulo n MUST be performed using side-channel free code with respect to value m. As calculating multiplicative inverse is expensive, implementations MAY calculate new values of bb and bbInv by squaring them:

new bb = bb^2 mod n new bbInv = bbInv^2 mod n A given pair of blinding factors (bb, bbInv) MUST NOT be used for more than one RSADP() operation. Unless the multiplication (squaring) and reduction modulo operations are verified to be side-channel free, it's RECOMMENDED to generate a completely new blinding parameters every few hundred private key operations.

To further protect against private key leaks, it's RECOMMENDED to perform the blinding of the used exponents . When performing the RSADP() operation, the blinding depends on the form of the private key. If the key is in the first form, the pair (n, d), then the exponent d should be modified by adding a multiplie of Euler phi(n): m = c^(d + b*phi(n)) mod n. Where r is a 64 bit long uniform random number. If the key is the the second form, the quintuple (p, q, dP, dQ, qInv) with optional sequence of triplets (r_i, d_i, t_i), i = 3, ..., u, then each exponent used MUST be blinded individually. The m_1 = c^(dP + b_1 * phi(p)) mod p The m_2 = c^(dQ + b_2 * phi(q)) mod q If u > 3, then m_i = c^(d_i + b_i * phi(r_i)) mod (r_i) Where b_1, b_2, ..., b_i are all uniformely selected random numbers at least 64 bits long (or at least 2 machine word sizes, whichever is greater). As Euler phi(p) for an argument p that's a prime is equal p - 1, it's simple to calculate in this case. Note: the selection of random b_i values, multiplication of them by the result of phi() function, and addition to the exponent MUST be performed with side-channel free free code. Use of smaller blinding factor is NOT RECOMMENDED, as values shorter than 64 bits have been shown to still be vulnerable to side-channel attacks

In case of RSA-OAEP, the padding is self-verifying, thus the depadding operation needs to follow the standard algorithm to provide a safe API to users. It MUST ignore the value of the very fist octet of padding and process the remaining bytes as if it was equal zero. The PKCS #1 v1.5 padding is considered deprecated, and should be used only to process legacy data. It MUST NOT be used as part of online protocols or API endpoints. For implementations that can't remove support for this padding mode it's RECOMMENDED to implement an implicit rejection mechanism that completely hides from the calling code whether the padding check failed or not. It should be noted that the algorithm MUST be implemented as stated, otherwise in case of heteregonous environments where two implementations use the same key but implement the implicit rejection differently, it may be possible for the attacker to compare behaviour between the implementations to guess if the padding check failed or not. The basic idea of the implicit rejection is to prepare a random but deterministic message to be returned in case the standard PKCS #1 v1.5 padding checks fail. To do that, use the private key and the provided ciphertext to derive a static, but unknown to the attacker, random value. It's a combination of the method documented in the TLS 1.2 (RFC 5246) and the deterministic (EC)DSA signatures (RFC 6979 ).

For the calculation of the random message for implicit rejection we define a Pseudo-Random Function (PRF) as follows: IRPRF( KDK, label, length ) Input: KDK the key derivation key label a label making the output unique for a given KDK length requested length of output in octets Output: derived key, an octet string Steps: If KDK is not 32 octets long, or if length is larger than 8192 return error and stop. The returned value is created by concatenation of subsequent calls to a SHA-256 HMAC function with the KDK as the HMAC key and following octet string as the message:

P_i = I || label || bitLength Where the I is an iterator value encoded as two octet long big endian integer, label is the passed in label, and bitLength is the length times 8 (to represent number of bits of output) encoded as two octet big endian integer. The iterator is initialised to 0 on first call, and then incremented by 1 for every subsequent HMAC call. The HMAC is iterated until the concatenated output is shorter than length The output is the length left-most octets of the concatenated HMAC output

For implementations that cannot remove support for the PKCS #1 v1.5 padding nor provide a usage-specific API, it's possible to implement an implicit rejection algorithm as a protection measure. It should be noted that implementing it correctly is hard, thus it's RECOMMENDED instead to disable support for PKCS #1 v1.5 padding instead. To implement implicit rejection, the RSAES-PKCS1-V1_5-DECRYPT from section 7.2.2 of RFC 8017 needs to be implemented as follows: Length checking: If the length of the ciphertext C is not k octets (or if k < 11), output "decryption error" and stop. RSA decryption: Convert the ciphertext C to an integer ciphertext representative c:

c = OS2IP (C). Apply the RSADP decryption primitive to the RSA private key (n, d) and the ciphertext representative c to produce an integer message representative m:

m = RSADP ((n, d), c). Note: the RSADP MUST be constant-time with respect of message m. If RSADP outputs "ciphertext representative out of range" (meaning that c >= n), output "decryption error" and stop. Convert the message representative m to an encoded message EM of length k octets:

EM = I2OSP (m, k). Note: I2OSP MUST be constant-time with respect of m. Derivation of alternative message Derive the Key Derivation Key (KDK) Convert the private expoent d to a string of length k octets:

D = I2OSP (d, k). Hash the private exponent using the SHA-256 algorithm:

DH = SHA256 (D). Note: This value MAY be cached between the decryption operations, but MUST be considered private-key equivalent. Use the DH as the SHA-256 HMAC key and the provided ciphertext C as the message. If the ciphertext C is not k octets long, it MUST be left padded with octets of value zero.

KDK = HMAC (DH, C, SHA256). Create the candidate lengths and the random message Use the IRPRF with key KDK, "length" as six octet label encoded with UTF-8, to generate 256 octet output. Interpret this output as 128 two octet long big-endian numbers.

CL = IRPRF (KDK, "length", 256). Use the IRPRF with key KDK, "message" as a seven octet label encoded with UTF-8 to generate k octet long output to be used as the alternative message:

AM = IRPRF (KDK, "message", k). Select the alternative length for the alternative message. Note: this must be performed in side-channel free way. Iterate over the 128 candidate CL lengths. For each zero out high order bits so that they have the same bit length as the maximum valid message size (k - 11). Select the last length that's not larger than k - 11, use 0 if none are. Save it as AL. EME-PKCS1-v1_5 decoding: Separate the encoded message EM into an octet string PS consisting of nonzero octets and a message M as

EM = 0x00 || 0x02 || PS || 0x00 || M. If the first octet of EM does not have hexadecimal value 0x00, if the second octet of EM does not have hexadecimal value 0x02, if there is no octet with hexadecimal value 0x00 to separate PS from M, or if the length of PS is less than 8 octets, the check variable must remember if any of those checks failed. Irrespective of the check variable value, the code should also return length of message M: L. If there is no octet with hexadecimal value 0x00 to separate PS from M, then L should equal 0. Note: All those checks MUST be performed irrespective if previous checks failed or not. A common technique for that is to have a check variable that is OR-ed with the results of subsequent checks. Decision which message to return: in case the check variable is set, the code should return the last AL octets of AM, in case the check variable is unset the code should return the last L octets of EM. Note: The decision which length to use MUST be performed in side-channel free manner. While the length of the returned message is not considered sensitive, the read memory location is. As such, when returning message M both EM and AM memory locations MUST be read.

Current protocols deployments MUST NOT use encryption with RSA PKCS #1 v1.5 padding. Support for RSA PKCS #1 v1.5 SHOULD be disabled in default configuration of any implementation of RSA cryptosystem. All new protocols MUST NOT specify PKCS #1 v1.5 as a valid encryption padding for RSA keys.

This memo includes no request to IANA.

This whole document specifies security considerations for RSA implementations.

&RFC8017; &RFC2119; &RFC5246; &RFC6979; Attacking Exponent Blinding in RSA without CRT Exponent Blinding Does Not Always Lift (Partial) Spa Resistance to Higher-Level Security Timing Attacks on Implementations of Diffie-Hellman, RSA, DSS, and Other Systems.

«provide test vectors here»