Welcome: DPDK Awards & 10-Year Anniversary Celebration
To access the summary, slides, and video links for a specific session, click on each of the tabs below.
Story of perfect system tuning for latency measurement
Reshma Pattan & David Hunt, Intel
This presentation will show how far one can go tuning the system for
measuring the accurate latency , these are the learnings made
while measuring the latency using the DPDK skeleton application and i40e PMD.
Various kernel boot options , kernel system settings and secret i40e PMD setting will be
explained and how they can affect the latency.
These learnings can be leveraged by ecosystem to measure other DPDK application latency.
DPDK for ultra low latency applications
Muhammad Ahmad & Ali Rizvi, eMumba Inc.
DPDK is the go-to off the shelf, stable and reliable solution for data planes and switching applications globally. It is widely used to accelerate packet processing in various verticals, focusing more on throughput while providing decent latency.
In this presentation, we look at how to use DPDK to provide a network stack solution for ultra-low latency (ULL) applications in the world of algorithmic trading. We examine out of the box latency performance from DPDK. Next, we show how, through systematic tuning and benchmarking, we were able to reduce round trip time (RTT) latency. This involved configuring DPDK in scalar mode, pre-allocating mbuffs by enabling RX bulk allocation and using optimized versions of functions by enabling intrinsics. We used an open source FreeBSD network stack on top of DPDK and modified it in a way that favors low latency (burst_size=1, timeout=0). For low latency use cases, it is necessary that there are no context switches and data shared between the cores, so we used rte_flow to direct packets to specific cores. These optimizations enabled us to process the packets at wire speed and reduce latency by fivefold over the pre-tuning results. For benchmarking at these aggressively low latency levels we built a testbed with commodity hardware providing 7 nanosecond timestamp granularity. We replicated the STAC-T1 test which is a widely accepted latency benchmark in the electronic trading industry.
We also compare the results we achieved with DPDK against those we achieved with OpenOnload TCPDirect, the kernel bypass solution from Solarflare. We conclude with some thoughts on upstream contributions for enabling ULL use cases.
Do DPDK APIs provide the highest performance?
Harry van Haaren, Intel
DPDK is a project known for its performance, but are the APIs really the best they could possibly be? In this talk we review the best-practices in DPDK datapath APIs (e.g. Ethdev, Rings, Eventdev) and understand how these contribute to the performance of DPDK: there will be lots of diagrams to help visualize things!
Next we explore the hazards in writing high performance code, with a focus on SIMD implementations. This leads to some observations about specific APIs, where DPDK does not enable the highest performing PMDs.
Finally we make suggestions as to how the DPDK APIs could be improved to provide a PMD context of the calling code, and by doing so achieve even higher performance!
Introducing flow performance application
Wisam Jaddo, NVIDIA
We introduce a new application that is aimed at providing easy to use and accurate measurement of rte flow
Performance and footprint.
The application support most of the matching items and some set of actions supported today in DPDK and can be extended as needed.
In the session I’ll demonstrate the usage and discuss its features like:
1- Calculating rte_flow insertion rate.
2- Calculating rte_flow deletion rate.
3- Calculate Memory consumption of rte_flow
4- Packet forwarding performance stats in packet per second.
Debugging DPDK applications using rr
Dariusz Sosnowski
Debugging issues in DPDK applications running in production might be troublesome. Core dumps and sufficient logging can provide some insight, but finding root causes of application issues can be hard. Attaching debuggers to running applications can be sometimes unacceptable, because of application’s possible downtime. rr is a recording debugger, developed by Mozilla Foundation, which allows developers to record a trace of running application and debug it offline. This talk explores the possibility of using rr to troubleshoot issues with DPDK applications, steps required to use it in DPDK ecosystem and possible performance impact.
eBPF Probes in DPDK applications for troubleshooting and monitoring
Vipin Varghese & Siva Tummala, Intel
End-User Applications are often built with DPDK and other libraries. It becomes crumblesome to maintain well placed debug and counter logic without affecting performance.
We would like to share an approach with help of eBPF to accomodate debug, counters and metadata matching in various packet processing stages.
Cheat sheet to migrate from GNU make to meson
Vipin Varghese & Siva Tummala, Intel
GNU Makefile is getting phased out from DPDK build system, with meson.
But there are many open source and custom application which relies on GNU Make.
We would like to discuss our learnings while using meson build.
a. Passing DPDK libraries build with meson to existing libraries with GNU make.
b. Applications(OVS) making use of meson build
c. Things to take care for cross-build of applications with DPDK meson libraries.
Stateful Flow Table (SFT) – Connection tracking in DPDK
Ori Kam, NVIDIA & Andrey Vesnovaty, Mellanox (NVIDIA)
As more and more packet processing applications need to maintain the connection state, we propose to introduce the SFT DPDK lib and to provide a framework for connection tracking, both for offloaded and lookaside processing.
Example for such applications:
• Security (Suricata).
• Virtual switches (OVS)
• GTP
Device virtualization in DPDK
Xiuchun Lu & Chenbo Xia, Intel
QEMU, often used as the hypervisor for virtual machines running in Cloud, can be susceptible to security attack because it is a large monolithic program. Disaggregated QEMU which involves separating QEMU services into separate host processes reduces the attack surface. Disaggregating IO services is a good place to begin QEMU disaggregating.
VFIO-over-Socket, also known as vfio-user, is a protocol that allows a device to be virtualized in separate process outside QEMU. It can be the main transport mechanism for multi-process QEMU, and it can be used the by other application offering device virtualization. DPDK will have vfio-user support by introducing and implementing vfio-user bus driver. That provides the framework for DPDK application to offer device virtualization and accommodates QEMU out-of-tree emulated devices in DPDK.
This presentation will cover below items:
1. Why and how allow a device to be virtualized outside QEMU
2. Introducing framework for accommodating emulated/virtualized in DPDK
3. Introducing a specific emulated/virtualized device in DPDK
4. Other potential emulated devices in DPDK (optional)
vDPA: on the road to production
Maxime Coquelin & Adrian Moreno, Redhat
vDPA, which stands for Virtio Datapath Acceleration, aims at providing wire-speed and wire-latency L2 open and standard interfaces. The fundamental idea of vDPA is to push the specification based virtio interface from SW to physical NICs for VMs and containers to consume it.
After a short introduction to vDPA technology and a high level presentation of both DPDK and Kernel alternatives, the presenters will provide an update on DPDK’s vDPA framework which was introduced two years ago, and introduce the upcoming vDPA daemon which aims at managing DPDK vDPA VFs.
Then, they will give an update on the Virtio-user PMD driver which is being used in containers to consume both DPDK and Kernel vDPA interfaces.
Finally, the presenters will give an overview of the higher-level picture, presenting the work being done with the Kubernetes community to provide vDPA interfaces to containers as Multus seconday interfaces.
Key take aways from QUIC acceleration with DPDK
Siva Tummala & Vipin Varghese, Intel
For the NextGen Firewalls to inspect content, a high performant quic proxy is a must.
This lead to explore kernel quic alternative (~300Mbps) to user-space quic based on DPDK
(~2Gbps) per core.
Accelerating O-RAN fronthaul with DPDK
Shahaf Shuler & Dotan Levi, NVIDIA
An Open Radio Access Network (O-RAN) is a totally disaggregated approach to deploying mobile fronthaul and mid-haul networks built entirely on cloud-native principles. Under O-RAN architecture NICs along with accelerators (such as GPU, FPGA etc…) will be placed on the network edge to handle the 5G mac layer. DPDK is a good framework to implement such functionality enabling receiving of the RAW 5G packets for the MAC layer processing.
In this talk, we will show how we enabled a full softwarization of the telco Edge (not only 5G) using the different offloads in DPDK that can be used in order to accelerate the 5G packet processing. In specific, the ability to zero-copy between NIC and accelerator, the usage in PTP, advanced flow steering to HW dispatch between the control and data packets, and the usage in the NIC scheduling mechanisms to transmit a packet on a specific time fitting the radio unit receive window.
Closing Remarks