DPDK is a set of libraries and drivers for fast packet processing. It is designed to run on any processors. The first supported CPU was Intel x86 and it is now extended to IBM POWER and ARM. It runs mostly in Linux userland. A FreeBSD port is available for a subset of DPDK features. DPDK is an Open Source BSD licensed project. The most recent patches and enhancements, provided by the community, are available in master branch. The agenda for DPDK Summit North America 2018 will cover the latest developments to the DPDK framework and other related projects such as FD.io, including plans for future releases, and will provide an opportunity to hear from DPDK users who have used the framework in their applications. Let’s discuss the present and future, including DPDK roadmap suggestions, container networking, P4, hardware accelerators and any other networking innovation.
- This event has passed.
DPDK Summit North America 2018
December 3, 2018 @ 8:00 am - December 4, 2018 @ 5:00 pm PST
To access the summary, slides, and video links for a specific session, click on each of the tabs below.
SW Assisted vDPA for Live Migration
Xiao Wang, Intel
virtio is the de facto standard para-virtualization interface in cloud networking, vDPA (vhost data path acceleration) is designed to provide a HW acceleration framework for virtio, this framework provides both pass-thru like performance and virtio flexibility. One of the main advantages of vDPA is live migration support, HW can do dirty page logging and report ring status just like what SW vhost does.
To even further reduce HW requirement to support vDPA, we can have SW assisted solution to help device do the live migration related stuff, we add this helper into vhost lib then any vDPA device driver can leverage the helpers to perform SW assisted live migration. This SW assisted solution provides a new option for vDPA HW design and can reduce HW design complexity.
Using nDPI over DPDK to Classify and Block Unwanted Network Traffic
Luca Deri, ntop
nDPI is an open source library that used DPI (deep packet inspection) techniques to classify network traffic. It can be used in monitoring tools to characterise network traffic, or inline to enforce network traffic policies. nDPI currently supports over 250 protocols including skype, bit torrent, and tor, and it is part of many open source applications and Linux distributions. This talk will cover the design of nDPI, and it explains how to use it on top of DPDK to efficiently monitor and block selected communication flows. Various real case examples are demonstrated ranging from parental control enforcement to IoT devices protection.
Reclaiming Memory – Efficient and Lock Free – rte_tqs
Honnappa Nagarahalli, Arm
In Dublin summit, Arm introduced lock-less rte_hash algorithm. The lock less data structures require memory reclamation. Thread Quiescent State library was mentioned as a solution. In this presentation, I would like to talk about further details of the library, namely – APIs, design and various use cases it enables in DPDK.
A Hierarchical SW Load Balancing Solution for Cloud Deployment
Hongjun Ni, Intel
For deployment of Cloud Native applications, high throughtput, minimal latency and high-availability are critical. Traditionally, Load Balancers leverage dedicated hardware, which leads high cost and low flexibility.
This presentation will introduce a hierarchical software load balancing solution based on DPDK and VPP, which shows high performance and keeps flexibility in a large cloud deployment.
It contains below key elements:
1) Implement a software router on DPDK, VPP and legacy Routing Daemon, with enabling ECMP.
2) Implement a software Load Balancer enabling DSR (Direct Server Return), supporting Tunnel or Routing modes.
1) Implement a host based service proxy, including host load balancing, DNAT and SNAT.
4) Integrate SW router, Load Balancer and host based service proxy to build a few flexible load balancing solutions.
DPDK Based L4 Load Balancer
M Jayakumar, Intel
DPVS is DPDK based open source high performance Layer-4 Load Balancer. To highlight DPDK optimizations, Kernel based Load balancer, LVS will be touched upon to make points on Hashing and other algorithms. The presentation will illustrate three variances of Load balancer topologies – (a) NAT, (b) IP Tunnel and (c) Direct Server Reply. The session will wrap up with a discussion on configuration nuance as how to include load balancer in getting the client requests but send replies directly from servers to the client. The performance improvement is pronounced with replies doing heavy data movement compared to queries. DPVS https://github.com/iqiyi/dpvs
Accelerating Telco NFV Deployments with DPDK and Smart NIC
Kalimani Venkatesan Govindarajan, Aricent & Barak Perlman, Ethernity Network
Telco NFV deployments with white boxes and x86 compute are becoming more concrete. SD-WAN uCPEs and Telco cloud VNFs like vEPC, vBNG and vRouter have unique requirements, which need to be met by DPDK based VNFs. Specifically for the Telco VNFs, the approach for Hardware Acceleration using Smart NICs is emerging as an economic model, but the simplicity of disaggregation requires clean interfaces for multiple 3rd party VNFs to leverage the Hardware Acceleration offered by the Smart NICs. This talk proposes to share our experiments with DPDK based interface for Smart NICs for multi-party VNF co-existence.
NFF-Go: Bringing DPDK to the Cloud
Areg Melik-Adamyan, Intel
NFF-Go provides a novel approach to network function development. Transmitting speed and amount of data in the networks are exponentially increasing, which makes middle-boxes to be less efficient due to cost, deployment, inflexibility, scalability and other issues. Network function virtualization technology, on the other hand, was proposed to solve this problem by moving hardware functionality to be developed as a software and deployed to commodity hardware. However, this approach brought several new problems: slow speed of network functions’ development, lower performance compared to the middle-boxes, virtual machines scaling and deployment issues. Our approach presents a framework with a new high-level programming model for the rapid development of performant, scalable virtualized network functions based on DPDK as a performant I/O engine. It significantly lowers the entry bar for newcomers to enter packet processing world, order of magnitude ease the development of a custom packet processing applications, and drastically improves deployment to the cloud via API controlled and cloud-native scheduler support. NFF-Go already is a part of DPDK umbrella project and can be found in apps repository.
Enabing P4 in DPDK
Cristian Dumitrescu, Intel & Antonin Bas, Barefoot Networks
This presentation provides a technical overview for companies and developers interested in describing their data plane pipelines in the P4 language on how to generate performance optimized DPDK code from a P4 program and the associated P4 Runtime API.
Accelerating DPDK via P4-programmable FPGA-based Smart NICs
Petr Kastovsky, Netcope Technologies
DPDK is an open source standard for developing the data plane of modern virtual network functions running on CPUs. There are various benefits of accelerating selected workloads in order to achieve better performance per watt and latency that is becoming critical for edge applications. FPGAs are well positioned to be the right acceleration technology. On the other hand, DPDK being a software library is making fast progress introducing wide set of new features with every release. Keeping up with such innovation pace is not possible considering standard FPGA development workflow. Netcope provides P4 programmability for various FPGA-based smart NICs to remove that obstacle. Key component of successful adoption of P4-programmable FPGA-based smart NICs is a standardized API for the users. DPDK is the best positioned development kit to address this challenge and there are various extensions of DPDK that could be used already, namely DPDK RTE Flow and/or RTE Pipeline. During this presentation we will look into pros and cons of these extensions from P4 perspective.
DPDK Tunnel Offloading
Yongseok Koh & Rony Efraim, Mellanox
Contemporary data centers use overlay network to support multi tenancy and virtualization features such as VM migration, and to boost operational agility. Overlay networks means tunnel protocols (VXLAN, GRE, GENEVE and more).
Handling a tunneled packets in high rate is a challenging task for a virtual switch. The standard RSS will not perform well, checksum computation will need to be validated on the inner part, and the tunnel header will need to be added/removed for each incoming/outgoing packet.
Recent work in DPDK exposed the APIs to offload much of the tunnel packets overhead into the device and thus save precious CPU cycles for the application.
The talk will overview the new offloads and demonstrate the use of them to achieve better and scalable vswitch solutions.
DPDK on F5 BIG-IP Virtual ADCs
Brent Blood, F5 Networks
F5 app services is built on a high performance, scalable architecture, BIG-IP Traffic Management Microkernel (TMM), and has been used by the largest enterprises and service providers for over twenty years to ensure the availability, performance, and security of their applications. As BIG-IP has transitioned from purpose built hardware to virtualized appliance (VM) on COTS, how can we continue to cost efficiently scale with the advent of 25/40/100G NICs on host servers. In this presentation, we will discuss F5’s strategy of using DPDK to support multiple NIC vendors, enable high performance workloads and services, and lessons learned around integrating custom TMM with its own TCP stack and memory manager with DPDK.
Arm’s Efforts for DPDK and Optimization Plan
Gavin Hu & Honnappa Nagarahalli, Arm
In this presentation, we will talk about what Arm has done and is doing for DPDK, including features enablement, build system/tool chains/documentations enhancement, DTS test cases adaptation, and bug fixing and performance tuning (rte ring, hash, KNI,…). We will also talk about our future optimization plan including NEON implementation, relaxed memory ordering tuning for other components, like PMDs, examples, virtio, and etc.
DPDK Flow Classification and Traffic Profiling & Measurement
Ren Wang & Yipeng Wang, Intel Labs
In this talk, we will present new technologies to extend the current membership library to provide efficient traffic profiling and measurement capabilities, such as heavy hitter detection and cardinality estimation.
We will fist provide an overview of the different classification libraries (e.g. hash library, EFD library, membership library.) and highlight the set of usages where each library is a best fit for, including the extendible bucket table design we recently added to the rte_hash library in DPDK v18.11 to support 100% guaranteed insertion of keys. Next, we provide details on the usages and design of the new extension we are adding to membership library for traffic profiling and measurement, which becomes increasingly important in both Telco and Data center networks. We propose a memory efficient and general-purpose “sketch” based data structure in DPDK, targeting on a wide range of traffic profiling usages. Specifically our sketch designs provide library support to: 1) efficiently profile flow size to report heavy hitters for congestion and DoS attack detection; 2) estimate the total number of active flows (cardinality estimation) for QoS and traffic management purposes; 3) perform anomaly detection via profiling flows that suddenly undergo heavy changes; and many more potential usages. The inline profiling process is both memory and computation efficient with high accuracy.
Projects using DPDK
Stephen Hemminger, Microsoft
Many open source (and proprietary) networking projects are using DPDK, but not all projects all features. This is a survey talk that discusses how these projects are integrating DPDK.
DPDK Open Lab Performance Continious Integration
Jeremy Plsek, University of New Hampshire InterOperability Laboratory
The DPDK Open Lab is a performance based continuous integration system, supported by the DPDK project. When a patch is submitted, it is automatically sent to our CI to be applied and built. Once the patch is compiled and installed, it is ran against each of the bare metal environments hosted in the lab. This is to check for performance degradations or speed ups within DPDK on various hardware platforms. This talk will explore how the this system supports the development community, such as accepting patches based on performance and tracking how performance has changed in DPDK over time. We will go over how to navigate and use the Dashboard. We will show how the performance has changed in DPDK over the past six months, looking at relative numbers and graphs of various platforms. Finally, we will also talk about the future of the Open Lab, such as running more test cases, running unit tests for DPDK, additional capabilities for the dashboard, and making the systems more accessible to the development community.
Fast Prototyping DPDK Apps in Containernet
Andrew Wang, Comcast
When we first set out to develop network functions to provide new functionality needed in our infrastructure, we knew we wanted to try DPDK for fast packet processing and build our app as a container for making packing, shipping, and deploying them easier. In our initial prototyping phase, our main focus was verifying the applications we wrote performed as expected. Our first challenge was on the correct setup that would allow us to successfully build a DPDK app. Then we were faced with where to run our app. Creating a virtual network out of multiple VMs on a single server soon exhausted its resources as we added more nodes. Our infrastructure team was (understandably) cautious on allowing us to run the functions in their production networks, and changing the network topology or dynamically scaling to add or remove nodes in a lab environment proved time-consuming.
Containernet is a fork of the mininet project, which supports using Docker containers as hosts in emulated networks. As we were able configure DPDK’s Environmental Abstraction Layer (EAL) correctly, we could create a virtual network in seconds, easily scale to more nodes as needed, have access to all the hosts in the network to debug, and all this in our own laptops, which allowed us to explore the space freely and to see how our apps operated as we developed.
In this talk I will introduce Containernet, explain how to create and setup a virtual network in it, how to configure DPDK’s EAL for communicating with other hosts, what limitations and surprises we faced when running apps in Containernet, and conclude with a short demo showing all pieces working together.
Implementing DPDK Based Application Container Framework with SPP
Yasufumi Ogawa, NTT
Soft Patch Panel (SPP) is a multi-process application for providing easy-to-use Service Function Chaining framework in NFV environment . SPP enables users to connect DPDK applications running on host and virtual machines with several PMDs including ring, vhost and PCAP. Zero copy packet forwarding between VM to VM can achieve 10GbE throughput for 64byte short packets.
We have tried to implement SPP for container networking with the latest DPDK. It is challenging to implement multi-process application because DPDK was largely updated in v18.05 and it is unstable for multi-process application support. In our presentation, we will introduce how to implement DPDK multi-process application for container networking support.
Shaping the Future of IP Broadcasting with Cisco’s vMI and DPDK on Windows
Harini Ramakrishnan, Microsoft & Michael O’Gorman, Cisco
The video broadcasting industry is undergoing a massive transformation, moving from domain specific Serial Digital Interface (SDI) interconnects, to an IP-based network. Media software vendors are accelerating this network re-architecture, scaling to meet bandwidth demands of next-gen media formats. Cisco’s virtual media interface(vMI) is a software toolkit – open sourced as “Herrison” – for media vendors undergoing this transition.
We are pleased to announce that Cisco, in partnership with Intel and Microsoft, is making this software toolkit highly optimized for media applications using DPDK on Windows, the platform of choice for media software vendors. We will talk about how vMI uses DPDK on Windows to overcome the performance limitations of kernel mediated IO. We will then demonstrate how vMI realizes capacity at parity with legacy SDI, scaling from 5HD streams to 62HD steams representing over 100Gbps in throughput. Lastly, we will talk about how media appliances can incorporate this solution to reap the benefits of the efficient path to the NICs.
Improving Security and Flexibility within Windows DPDK Networking Stacks
Ranjit Menon, Intel Corporation & Omar Cardona, Microsoft
Windows support for DPDK was announced at the DPDK North America summit in November 2017. Since then, the code has been made available in a ‘draft’ repo at dpdk.org. The software stack for DPDK on Windows is similar to that on other operating systems, including the use of a Linux-style UIO driver to obtain access to the networking device. The use of a UIO driver in Windows is problematic from a multi-user/multi-process security point of view. It cannot be certified and signed independently by DPDK consumers. Windows certification is minimal as it does not fully utilize the capabilities of the networking device.
This presentation introduces a miniport pass-through Windows driver that exposes the device to a user-space application which can concurrently support DPDK and standard network functions in a shared and secure manner. These enhanced and Windows Logo certifiable network drivers will contain all standard functions while exposing a subset of resources for DPDK through two models: first a bifurcated model for devices with minimal resources and secondly, a multi-process/multi-user secure model for server grade NICs.
Lastly, this presentation will also touch upon the current status of DPDK on Windows and the future roadmap.
Use DPDK to Accelerate Data Compression for Storage Applications
Fiona Trahe & Paul Luse, Intel
This presentation will showcase how the DPDK compressdev API can deliver data compression services through an accelerator-agnostic API, enabling the application to take advantage of either software or hardware acceleration engines. As Storage users also use SPDK to access DPDK services, it will report on the work in progress to integrate compressdev with SPDK. Feedback from Storage users will be welcomed to fine-tune the API to satisfy Storage use-cases.
Fine-grained Device Infrastructure for Network I/O Slicing in DPDK
Cunming Liang & John Mangan, Intel
Mediated device has been introduced to allow fine-grained device partitioning in a generic manner. Kernel drivers of parent device define the isolation boundaries and ultimately populate the mediated device instances.
Through the unified VFIO UAPI, mediated device instances can pass-thru to a VM just like normal VFIO devices on physical bus (e.g. PCIe). The recent DPDK PMD is able to access the isolated driver resource transparently on top of the emulated bus.
However, for ubiquitous use, in bare metal or container usage, it requires DPDK realize mediated device bus, identify the bus layout and consume the VFIO mediated device natively, which is not available yet. Meanwhile, it doesn’t expect to introduce new individual PMD for mediated device with only difference of the granularity against existing PMD for usual device.
This presentation talks the concept and outlines in stages the DPDK impact and design, shows the landscape of user space network functions in container.
It also describes some innovative uses case including transparent software abstraction (e.g. for NIC) and also a means to securely share FPGA device resource without SR-IOV.
Embracing Externally Allocated Memory
Yongseok Koh, Mellanox
There are a few applications (GPU, storage apps and VPP) out there to use externally allocated memory and DPDK is now ready to support that. Since v18.05, rte_buf started to support external buffer attachment. This can be useful to support storage applications which read bulk data from storage and send it out to network as mbuf can have indirect memory allocated out of mempool. One remaining issue was registering externally allocated memory for DMA. Thanks to Anatoly’s patchset for v18.11, externally allocated memory can now be managed within DPDK framework once it is registered by DPDK API. VFIO or Mellanox’s Memory Region (MR) which registers memory for DMA will seamlessly work with such external memory. I will present the latest changes which enable broader range of applications for DPDK and make further suggestions for DMA memory management.
Accelerating DPDK Para-Virtual I/O with DMA Copy Offload Engine
Jiayu Hu, Intel
VirtIO is a standard of para-virtual I/O for host and VMs communication. In VirtIO, host communicates with VMs by copying packets from and to VM’s memory. With enabling TCP Segment Offloading, VMs can use very large TCP packets, like 64KB, to mitigate the per-packet processing overhead. However, the overhead of copying large bulk of data in the memory makes the VirtIO host interface become the I/O bottleneck.
DMA copy offload engine is a PCI-enumerated device in the Intel chipset, which is extremely efficient in performing memory copy operations. With intensively benchmarks, we analyze DMA copy offload engine and CPU memory copy performance, and we propose an adaption mechanism for different applications to fully utilize DMA copy offload engine capability. In this talk, we present the design of integrating DMA copy offload engine in vhost-user and a dma-copy API framework for different usage scenarios. To our knowledge, our proposal is the first to use DMA copy offload engine to mitigate the memory copy overhead for VirtIO. The experimental results show DMA copy offload engine is capable of enhancing vhost-user throughput by up to 20%.
Revise 4K Pages Performance Impact for DPDK Applications
Lei Yao & Jiayu Hu, Intel
DPDK reduces TLB and IOTLB misses by using 2M and 1G pages, but it requires DPDK applications to run as privileged users. Since 17.11 release, DPDK supports 4K pages, thus enabling applications to run as non-roots. However, 4K pages may hurt packet processing performance in some usage scenarios.
In this talk, we introduce a detailed guidance for DPDK applications (e.g. Open vSwitch) using 4K pages. Our guidance reveals how 4K pages impact packet I/O performance and gives best deployment suggestions to mitigate performance degradation from 4K pages. Under the guidance, the experimental results show that testpmd P2P performance can improve throughput around 100%.
DPDK IPsec Library
Declan Doherty, Intel
This presentation will review the progress made in the community to enable a scaleable high performance IPsec library in DPDK which was announce at the DPDK Userspace event earlier this year, focusing on the evolving library APIs and development roadmap and upstream plans for 2019. The presentation will then also present a number of different example integration’s of the library into data plane applications and look at the early performance indicators.
Tungsten Fabric Performance Optimization by DPDK
Lei Yao, Intel
vRotuer-dpdk is the user space dataplane solution in Tungsten Fabric project. Although new Intel platform and new DPDK technology development rapid, vRouter-dpdk was designed in several years ago, it could not benefit from them. This presentation is to introduce some works that have been done for vRouter-dpdk. Those cover CPU cores extension on new SKL platform, tunnel acceleration by rte_flow library and powered by DDP technology on Intel NIC, Cuckoo hash library integration for flow table, multi-queue support, and batch TX/RX support. Eventually, with these enhancement on vRouter-dpdk, Tungsten Fabric becomes compatible with new hardware and software technology, and the performance is boosted as well.
DPDK Based Vswitch Upgrade
Yuanhan Liu, Tencent
Software has bugs. Also, more and more new features will be added. Both require software upgrade. Unlike other software, the Vswitch upgrade has more critical requirement: the downtime has to be as small as possible. Otherwise, it may have huge impacts on all virtual machines it connected to. This talk presents how we managed to reduce the downtime greatly. Initially, we made the downtime less than 400ms. With further enhancements, we made it below 50ms or so.
Using New DPDK Port Representor by Switch Application like OVS
Rony Efraim, Mellanox
New API for port representors introduced in DPDK, for switch application like OVS.
While running DPDK reduces the CPU overload of interrupt driven packet processing, CPU cores are still not completely freed up from polling of packet queues. We already implemented accelerated through HW offloads saves CPU cycles consumed for flow look ups.
To solve this challenge, DPDK switch app is further accelerated through internal HW switch offloads of virtual port like SR-IOV. Port representors for switches already introduced in DPDK and we present how OVS-DPDK will use it.
We introduce a classification and forward methodology that enables a full offloading datapath to the NIC hardware.
We present the open source work being done in the DPDK and OVS communities and significant performance gains achieved. We also present how this work can be extended to VXLAN and other tunneling traffic.
THANK YOU TO OUR SPONSORS