To access the summary, slides, and video links for a specific session, click on each of the tabs below.
Tech Board Presentation & Panel Discussion
Zhang Fan (Intel)
In 2014, the first Packet Framework library/application generator was born. With the combination of simple configuration items and the CLI commands, Various network functions such as firewall, flow metering, and edge routing, etc. can be built easily with impressive performance. Packet framework was second most used components within DPDK. However, the evolution of packet framework did not stop here. In this presentation, we will introduce our brand-new packet framework 2.0. The new packet framework utilizes more flexible and more scalable approach: it separates the dependency of tables and action profiles from the pipeline instance, to enable arbitrary action mapping to the pipeline ports and tables at will, and allows various configuration from external controllers such as OpenBras. It is expected the new packet framework will maintain the performance benefit and can be used to build a wider range of applications.
Multiple vDPI Functions Using DPDK and Hyperscan on OVS-DPDK Platform
Cheng-Chien Su, Lionic Corp.
We implement IPS, Application identification, Web Content Filter, and Antivirus using DPDK and Hyperscan. These vDPI Functions are integrated into the OVS-DPDK-based vCPE platform. The carrier can control the functions for consumer requirement via OpenFlow protocol. The DPI function reports an identified session and adds a flow entry to OVS-DPDK to skip packet inspection for this session. This feature reduces unnecessary packet inspections and increases network performance.
Hardware-Level Performance Analysis of Platform I/O
Roman Sudarikov, Intel
Performance analysis and software optimization have become increasingly challenging due to overall computer system complexities. Rapidly rising technological advancements of all layers of execution make application performance tuning a very complicated task. An existing common technique to solve performance related issues is based on the utilization of on-chip Performance Monitoring Units (PMUs). Traditionally, a performance analysis is mostly focused on the performance counters in CPU cores. However, when configuring a platform with I/O devices or when selecting a platform for I/O usage models, it is equally important to have performance data from the Uncore (rest of the processor besides the core), I/O and socket inter-connect counters. Cumulatively, there are more than one thousand performance monitoring events that can help understand microarchitecture activities while running an application. In this session, we introduce an Uncore-based performance analysis of I/O intensive applications as a complement to the traditional CPU core-centric approach. The presentation covers platform components that are critical for I/O flows and their performance monitoring capabilities. We discuss Intel® Data Direct I/O Technology and why it is extremely critical for applications dealing with concurrent I/O traffic. Finally, we describe the latest changes in the cache hierarchy and how this affects I/O transaction flows, and end up with an overview of Intel tools that can provide such a platform-wide observability. Accommodate with the presentation, we will demonstrate various tools using provider edge router sample application (ip_pipeline) from the DPDK to illustrate IO bandwidth, MMIO read/write access, DDIO hit/miss statistics, memory bandwidth and much more.
Link-Level Network Slicing with DPDK
Jie Zheng, VMware
Network Virtualization Engineer, VMwareAs NFV intrinsically demands, the virtual network must be featured with higher bandwidth and lower latency even on top of COTS hardware, to improve the network efficiency meanwhile maintaining high availability and scalability, we do layer 2 network virtualization with dedicated network nodes along with infrastructure network, to coordinate them in a link-level fabric view, a controller cluster which employs smart layer 3 techniques is introduced. Through working together, it provides the ability to slice the quantified network resource, this session we will focus on how DPDK fuels the data path.
What’s New in Virtio 1.1
Jason Wang, Red Hat
As a de-facto standard for virtual IO devices, virtio has become more popular in both software and hardware implementations. The talk will discuss several improvements for the incoming 1.1 version for achieving better performance. The talk will first have a brief introduction to virtio and its history. The three major features will be presented: The first one is the new packed ring layout, it aims to mitigate the cache stress and reduce the number of PCI transactions for hardware backends. The second is the in order feature, it allows a device to reduce the number of writes when adding used buffers. The third is the notification data feature, it will be useful for hardware implementation to fetch descriptors or for debugging purpose. In the end, the performance numbers, community status, and future work will be talked. The target audience is the one who is interested in networking and NFV, DPDK and virtualization.
DPDK Support for Vhost Acceleration
Xiao Wang, Intel
Vhost Data Path Acceleration (vDPA) enables offload of the Vhost vring data path to HW devices in a para-virtualized way without direct pass-through to the guest. In addition to the SW Vhost lib, vDPA allows device-specific configuration and management. As a result, it achieves SR-IOV like performance with cloud-friendly compatibility, supports live-migration which makes it possible to upgrade a stock VM with virtio to a new HW accelerated platform transparently. This session will give an introduction on how to leverage DPDK vDPA lib to support different kinds of accelerators, and the update on the latest upstream status.
Zero-Copy Improvement and Best Practice
Liu Yong, Intel
Vhost dequeue zero copy feature imported into DPDK since 17.02 and theoretically VM2VM and VM2NIC performance of large packets will be improved significantly. But there’re still some stumbling blocks in the usage of this feature. Like it won’t work with certain qemu version or even downgrade performance seen in OVS deployment. This session will dig into details of those obstacles and the best practice to remove them. With all these actions, we can make vhost dequeue achieve its expected performance in deployment environment like OVS.
Practices to Achieve Ultimate Performance in Cloud Networking
曹水 (Senior Researcher, 华为)
In the current Cloud networking, various network applications emerge in endlessly. As foundation as Cloud networking, vSwitch evolved quickly to fulfill unceasing performance requirements, from Kernel-based to Userspace, from Software to NIC offload. Today, we want to share our learnings to achieve ultimate performance within vSwitch. These practices already applied in Huawei new generation network infrastructure in Huawei cloud.
Accelerate Virtual Switch with Intelligent Adapter
Zhihui Chen, Mellanox
Virtual Switch(vSwitch) is widely deployed in Cloud/NFV environment for transparent switching of traffic between Virtual Machines(VMs) and with the outside work, it is normally deployed as a software in a server and challenged with poor performance and high CPU overhead. The emerging intelligent adapter provides flow-based switching capability among virtual NICs(vNIC) through its programmable embedded switch(eSwitch). Based on intelligent adapter, software vSwitch can offload a large portion of packet processing operations into hardware, especially computing-intensive operations including VxLAN Encapsulation/Decapsulation, packet classification based on a set of header field defined by OpenFlow, modification of packet header, QoS and Access control (ACL). There are two methods to optimize vSwitch over Intelligent adapters: Flex and Direct. For Flex mode, data path still exists in software while some key operations of packet processing are offloaded to hardware for saving CPU and improving the efficiency of packet classification at software. This mode keeps the compatibility with current vSwitch design and interface to VM. For Direct mode, data path is offloaded to hardware and eSwitch is configured to enable traffic switching among vNICs and handle all operations of packet processing. Software vSwitch is just used for control path, offload flow rules to eSwitch and process traffic which cannot be offloaded. With this mode, traffic bypasses the hypervisor and is delivered to VM directly through SRIOV interface. It can fully release the CPU resource from network processing and provide the best performance. Our test of Open vSwitch (OVS) with this mode over Mellanox ConnectX-5 shows 66Mpps with zero CPU%.
DPDK Multiple Sized Packet Buffer Pool
Gavin Hu, ARM
Currently, DPDK uses single sized 2KB buffers to accommodate coming packets, without discerning the sizes. This causes a big memory space waste for small packets, and having chain buffers for jumbo frames costs extra DMA transactions and extra CPU cycles. In this talk, we will discuss how to improve this situation by using multiple sized buffers.
FPGA Acceleration and Virtualization Technology in DPDK
Rosen Xu, Intel, 天飞 张, Intel
Many china e-market companies using cloud computing infrastructure to accelerate their business, the cloud aims to cut costs and helps the users focus on their core business instead of being impeded by IT obstacle. SDN and NVF are more popular deployed in internet companies. But how to make a software network scale to an era of 40/50+ Gigabit networks and provide great performance for network applications in cloud computing like Alibaba double 11 shopping spree? In this presentation, Tianfei and Rosen will introduce a new FPGA software framework in DPDK using Intel Xeno+A10 FPGA to accelerating Linux workloads using SRIOV and virtualization technology. We will introduce OPAE (Open Programmable Acceleration Engine), the open source software framework for FPGA devices, and its integration with DPDK for network function acceleration. With OPAE userspace drivers and APIs, we were able to create an open and consistent API for DPDK to integrate FPGA accelerated network functions without dealing with hardware differences among various FPGA devices. This significantly simplifies DPDK’s integration with FPGA accelerator devices. We have developed Software of SmartNICs which using OPAE and virtualization technology to accelerating some e-market company’s business in China. In the end, we will discuss the status of integration with DPDK community with this FPGA software framework.
DPDK to support InfiniBand Link Layer
Honnappa Nagarahalli, ARM
DPDK supports run-to-completion and pipeline model of packet processing. The pipeline model uses queues (rte_ring functions) to exchange packets between the cores running different stages of the pipeline. Many networking SoCs provide acceleration capability for the queues. Since there are no queue APIs for inter-core communication, the networking SoCs are forced to use software-based rte_ring functions for inter-core communication to support pipeline model. Creating Queue APIs also allows for introducing different types of queues (for ex: non-blocking queues) without having to create separate rte_ring functions for every type. This talk presents possible queue APIs and their advantages.
DPDK based Load Balancer to Support Alibaba Dual 11 Festival
Liang Jun, Alibaba Cloud
A network load balancer is a service to improve the distribution of network workloads across multiple computing resources, it extends application’s service capability by traffic distribution, in the meanwhile, eliminates the single point of failure to improve the availability of the system. Therefore, the load balancer has been widely deployed and becomes the important component for many of Alibaba services. The new generation of Alibaba’s load balancer is based on the DPDK. The high performance and high availability support the high-speed development of Alibaba’s business. It also successfully has been tested by the huge burst of traffic flow in 2017 Alibaba’s Dual 11 festival.
This presentation will introduce Alibaba’s new generation of load balancers from three aspects. First, it will introduce the architecture of the high-performance load balancer based on DPDK. Then, the horizontal-scalable, redundant physical network architecture will be discussed, which improves the performance of the load balancer to a new level. Finally, it introduces the concurrent session synchronization mechanism of the load balancer. This mechanism enables the load balancers to be always online of service in the disaster recovery and upgrade scenarios, which is transparent to tenants.
DPDK Accelerated Load Balancer
Lei Chen, i Q i Y i .com
DPVS (DPDK+LVS), an open source L4 load balancer (LB) based on DPDK.
* why LVS/Kernel is not fast enough.
* how to accelerate LB with DPDK and other techniques.
* DPVS architect and design detail.
* DPVS performance vs. LVS.
* Key issue we addressed during development.
* Use cases and deploy examples.
* DPVS roadmap.