

# Recent advances with DPDK IPsec

DECLAN DOHERTY FAN ROY ZHANG KONSTANTIN ANANYEV VLADIMIR MEDVEDKIN INTEL





Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more complete information visit www.intel.com/benchmarks.

Configurations: Estimates are based on internal Intel analysis using at least Data Plane Development Kit IpSec sample application on Intel(R) Xeon(R) Platinum 8160 CPU @ 2.10GHz

Performance results are based on testing as of 12/09/2019 and may not reflect all publicly available security updates. See configuration disclosure for details. No product or component can be absolutely Secure.

No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this document.

Intel disclaims all express and implied warranties, including without limitation, the implied warranties of merchantability, fitness for a particular purpose, and non-infringement, as well as any warranty arising from course of performance, course of dealing, or usage in trade.

The products and services described may contain defects or errors known as errata which may cause deviations from published specifications. Current characterized errata are available on request.

Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries.

Other names and brands may be claimed as the property of others.

Copyright ©, Intel Corporation. All rights reserved.

Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice. Notice Revision #20110804

Tests document performance of components on a particular test, in specific systems. Differences in hardware, software, or configuration will affect actual performance. Consult other sources of information to evaluate performance as you consider your purchase. For more complete information about performance and benchmark results, visit www.intel.com/benchmarks Test and System Configurations

No computer system can be totally secure

Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors.



### librte\_ipsec: current status

- Introduced in 19.02
- Works with all different flavours of DPDK crypto devices (rte\_cryptodev, rte\_security)
- Supported features:
  - ESP protocol tunnel mode both IPv4/IPv6
  - ESP protocol transport mode both IPv4/IPv6
  - ESN and replay window
  - Supported algorithms:

3DES-CBC, AES-CBC, AES-CTR, AES-GCM, HMAC-SHA1, NULL

Multi-segment packets

Support for fragment/reassemble packets in examples/ipsec-secgw



Security Association Database (SAD)

http://patches.dpdk.org/cover/58536/

- New synchronous rte\_security API for SW based crypto devices <u>http://patches.dpdk.org/cover/58862/</u>
- One SA over multiple crypto devices
  - Fall-back sessions for inline-crypto processing http://patches.dpdk.org/cover/58567/





#### Requirements

- Scale up-to several millions of sessions
- > 2. Support fast lookup rate
- > 3. Support incremental updates
- > 4. RFC compliant

#### RFC 4301 4.4.2. Security Association Database (SAD):

In each IPsec implementation, there is a nominal Security Association Database (SAD), in which each entry defines the parameters associated with one SA. Each SA has an entry in the SAD.

#### RFC 4301 4.1 Definition and Scope:

Each entry in the SA Database (SAD) (Section 4.4.2) must indicate whether the SA lookup makes use of the destination IP address, or the destination and source IP addresses, in addition to the SPI. For each inbound, IPsec-protected packet, an implementation must conduct its search of the SAD such that it finds the entry that matches the "longest" SA identifier. In this context, if two or more SAD entries match based on the SPI value, then the entry that also matches based on destination address, or destination and source address (as indicated in the SAD entry) is the "longest" match.





struct rte\_ipsec\_sad \*rte\_ipsec\_sad\_create(const char \*name, const struct rte\_ipsec\_sad\_conf \*conf); void rte\_ipsec\_sad\_free(struct rte\_ipsec\_sad \*sad);

```
/** key to search for */
union rte_ipsec_sad_key {
    struct { uint32_t spi; uint32_t dip; uint32_t sip; } v4;
    struct {uint32_t spi; uint8_t dip[16]; uint8_t sip[16]; } v6;
};
/** type of key */
enum {
    RTE_IPSEC_SAD_SPI_ONLY, RTE_IPSEC_SAD_SPI_DIP, RTE_IPSEC_SAD_SPI_DIP_SIP
};
```

int rte\_ipsec\_sad\_add(struct rte\_ipsec\_sad \*sad, union rte\_ipsec\_sad\_key \*key, int key\_type, void \*sa); int rte\_ipsec\_sad\_del(struct rte\_ipsec\_sad \*sad, union rte\_ipsec\_sad\_key \*key, int key\_type); int rte\_ipsec\_sad\_lookup(const struct rte\_ipsec\_sad \*sad, const union rte\_ipsec\_sad\_key \*keys[], uint32\_t n, void \*sa[]);



```
struct rte_ipsec_sad {
    ...
    struct rte_hash *hash[RTE_IPSEC_SAD_KEY_TYPE_MASK];
    __extension__ struct hash_cnt cnt_arr[];
};
```

- 3 hash tables (rte\_hash).
- Each table keeps entries for a specific key type (SPI\_ONLY or SPI\_DIP or SPI\_DIP\_SIP)
- The value in *SPI\_ONLY* uses 2 lsb's to indicate presence of more specific keys in *SPI\_DIP* and *SPI\_DIP\_SIP* tables for a given SPI.
- Lookup always starts with SPI\_ONLY talble and progress to other two based on the
- values of presence bits.
- cnt\_arr[] entries contain counters for more specific keys in SPI\_DIP and SPI\_DIP\_SIP tables for given SPI and are used only by add/delete.



| SPI hash |             | cnt_arr[] |         | SPI+DIP hash |                |       | SPI+DIP+SIP hash            |       |
|----------|-------------|-----------|---------|--------------|----------------|-------|-----------------------------|-------|
| key      | value       | dip       | dip+sip |              | key            | value | key                         | value |
| 200      | (NIL   0x2) | 0         | 2       |              |                |       | 100, 192.0.2.1, 203.0.113.1 | V4    |
| 500      | V1          | 0         | 0       |              | 100, 192.0.2.1 | V3    |                             |       |
|          |             |           |         |              |                |       | 200, 192.0.2.1, 203.0.113.2 | V6    |
| 100      | (V2   0x3)  | 1         | 1       |              |                |       | 200, 192.0.2.1, 203.0.113.1 | V5    |

rte\_ipsec\_sad\_add(sad, &{.spi=500}, RTE\_IPSEC\_SAD\_SPI\_ONLY, V1); rte\_ipsec\_sad\_add(sad, &{.spi=100}, RTE\_IPSEC\_SAD\_SPI\_ONLY, V2); rte\_ipsec\_sad\_add(sad, &{.spi=100, .dip=192.0.2.1}, RTE\_IPSEC\_SAD\_SPI\_DIP, V3); rte\_ipsec\_sad\_add(sad, &{.spi=100, .dip=192.0.2.1, .sip=203.0.113.1}, RTE\_IPSEC\_SAD\_SPI\_DIP\_SIP, V4); rte\_ipsec\_sad\_add(sad, &{.spi=200, .dip=192.0.2.1, .sip=203.0.113.1}, RTE\_IPSEC\_SAD\_SPI\_DIP\_SIP, V5); rte\_ipsec\_sad\_add(sad, &{.spi=200, .dip=192.0.2.1, .sip=203.0.113.2}, RTE\_IPSEC\_SAD\_SPI\_DIP\_SIP, V6);

rte\_ipsec\_sad\_lookup(sad, &{spi=100, dip=192.0.2.1, sip=198.151.100.100}) → V3 rte\_ipsec\_sad\_lookup(sad, &{spi=100, dip=192.0.2.200, sip=203.0.113.1}) → V2 rte\_ipsec\_sad\_lookup(sad, &{spi=100, dip=192.0.2.1, sip=203.0.113.1}) → V4



./testsad <eal opts> -- -n <10K/100K/1M> -I 50M -d [34/33/33, 70/20/10, 90/9/1] (-d : ratio for SA key types SPI / SPI+DIP / SPI+DIP+SIP)







ADD DEL LOOKUP

ADD DEL LOOKUP

Disclaimer: For more complete information about performance and benchmark results, visit <u>www.intel.com/benchmarks</u>



#### Problem:

- DPDK crypto-dev API is comprehensive, generic and asynchronous (HW oriented)
- High and unnecessary overhead for SW backed PMDs (AESNI-MB, AESNI-GCM, etc.)
  - allocate/free rte\_crypto\_op
  - fill/read a rte\_crypto\_op (costs 3 cache lines)
  - o enqueue/dequeue per burst to simulate asynchronous mode
  - o dequeue: extra cache line access (check status, retrieve mbuf pointer)
  - based on rte\_mbuf (extra layer to de-reference for data buffer address)
- For bigger packets and slow SW implementation such overhead is less significant.
- BUT for small packets and/or faster SW implementation the overhead takes larger percentage and will grow even further.
- Propose new API that:
  - works in synchronous mode (function call, all input/output data passed as function parameters)
  - bursts on a per session basis
  - accepts raw data buffers

# CPU\_CRYPTO: API



#### • User level

/\* new session type and xform\*/

enum rte\_security\_session\_action\_type {..., RTE\_SECURITY\_ACTION\_TYPE\_CPU\_CRYPTO};

struct rte\_security\_cpu\_crypto\_xform {...};

/\* synchronous process function \*/

struct rte\_security\_vec {struct iovec \*vec; uint32\_t num;};

rte\_security\_process\_cpu\_crypto\_bulk(struct rte\_security\_ctx \*instance, struct rte\_security\_session \*sess,

struct rte\_security\_vec buf[], void \*iv[], void \*aad[], void \*digest[], int status[], uint32\_t num);

#### PMD level

/\* new function in the PMD ops table \*/

struct rte\_security\_ops {...; security\_process\_cpu\_crypto\_bulk\_t process\_cpu\_crypto\_bulk;}



- Currently supported by: AESNI-GCM, AESNI-MB PMDs
  - Further work for AESNI-MB improvements in plans
- librte\_ipsec supports new security type minimal integration effort for end user
  - examples/ipsec-secgw line changes:
    - > 43 for control path
    - > 2 for data path



./ipsec-secgw --lcores=7 -n 4 --vdev="crypto\_aesni\_gcm0" -w 18:00.0 -w 3b:00.0 -- -p 0x3 -u 1 \

-*P*-*I*--config="(0,0,7),(1,0,7)" ... (1 core, 1 SA, 1 inbound, 1 outbound ports)

Intel(R) Xeon(R) Platinum 8160 CPU @ 2.10GHz, 2x Intel XXV710 for 25GbE



Disclaimer: For more complete information about performance and benchmark results, visit www.intel.com/benchmarks

## Multiple sessions per SA



#### API brief

/\* opaque SW representation of the SA (HW neutral) \*/

struct rte\_ipsec\_sa;

/\* associates SA with particular HW device (session). Same SA can be referred by multiple sessions.\*/

struct rte\_ipsec\_session {

struct rte\_ipsec\_sa \*sa;

/\*\* session action type \*/

```
enum rte_security_session_action_type type;
```

```
/** session and related data */
```

union {

```
struct { struct rte_cryptodev_sym_session *ses; } crypto;
struct { struct rte_security_session *ses; ... } security;
```

```
};
```

...



- single SA / single core / multiple crypto-devs
  - load-balancing (not covered)
  - fall-back sessions
    - inline device does not support processing of IP fragmented IPsec packets
    - add fall-back session on crypto-dev for the same SA
- single SA / multiple cores / multiple crypto-devs
  - fat tunnel (not covered)

Current status:

- Since 19.08 ipsec-secgw has an ability to fragment packets bigger than the MTU, and reassemble fragmented packets.
  - inbound packets: RX => reassemble => IPsec process
  - outbound packets: IPsec process => fragment => TX
- To minimize possible performance effect, reassembly is implemented as RX callback using librte\_ip\_frag
- To support processing reassembled packets the ipsec-secgw relies on librte\_ipsec ability to handle multi-segment packets.
  - Also attached crypto devices have to support 'In Place SGL' offload capability.



- NIC provides IPsec offload ability ... but with some limitations
  - No IP reassemble support in HW
  - Though for many usage scenarios % of fragmented packets is relatively low.
- For INLINE sessions add an ability to have 2 sessions per SA:
  - PRIMARY (INLINE-CRYPTO HW) used for majority of packets (fast-path).
  - FALL-BACK (CRYPTO-DEV HW/SW) handles packets that can't be processed by PRIMARY: reassembled packets, etc. (exception-path).
- Works, but few things to be aware about ...



input burst of N IPsec packets (same SA): *<pkt0, pkt1, pkt2\_frag0, pkt2\_frag1, pkt3, ..., pktN-1>* after SW reassemble: *<pkt0, pkt1, pkt2, pkt3, ..., pktN-1>* 

#### fall-back over ASYNC crypto-device:

/\* process first two packets \*/

ipsec\_process(pkt0,pkt1);

/\*PKT2 enqueued for ASYNC processing and will be available somewhere in future \*/

ipsec\_prepare(pkt2);

rte\_cryptodev\_enqueue(pkt2);

/\* process rest of the bulk \*/

ipsec\_process(pkt3, ..., pktN-1);

/\* ... sometime later \*/

ipsec\_dequeue(...)=>pkt2; ipsec\_process(pkt2);

packet reorder in IPsec processing path:

<pkt0, pkt1, pkt3, ..., pktN-1,pktN, ..., pktN+M,pkt2>

 replay window size has to be >= N+M (M value depends on HW/PMD latency)

#### fall-back over SYNC crypto-device:

/\* process first two packets \*/ ipsec\_process(pkt0,pkt1); /\*SYNC processing for PKT2 \*/ ipsec\_prepare(pkt2); rte\_security\_process\_cpu\_crypto\_bulk(pkt2); ipsec\_process(pkt2); /\* process rest of the bulk \*/

ipsec\_process(pkt3, ..., pktN-1);

- input packet order is preserved
- requires crypto-dev with synchronous API (CPU\_CRYPTO)
- Might slowdown fast-path



# Q&A