Upcoming Webinar: Hyperscaling in the Cloud
Skip to main content
Category

Blog

DPDK blog posts

Now Available! DPDK Release 22.03

By Blog

A new release is available: https://fast.dpdk.org/rel/dpdk-22.03.tar.xz

Winter release numbers are quite small,  as usual:

  •  956 commits from 181 authors
  •   2289 files changed, 83849 insertions(+), 97801 deletions(-)

There are no plans for  a maintenance branch for 22.03.

This version is ABI-compatible with 21.11.

  • Below are some new features:
  • Fast restart by reusing hugepages
  • UDP/TCP checksum on multi-segments
  • IP reassembly offload
  • Queue-based priority flow control
  • Flow API for templates and async operations
  • Private ethdev driver info dump
  • Private user data in asymmetric crypto session

More details are available in the official  release notes: https://doc.dpdk.org/guides/rel_notes/release_22_03.html

There are 51 new contributors (including authors, reviewers and testers)!

Welcome to Abhimanyu Saini, Adham Masarwah, Asaf Ravid, Bin Zheng, Brian Dooley, Brick Yang, Bruce Merry, Christophe Fontaine, Chuanshe Zhang, Dawid Gorecki, Daxue Gao, Geoffrey Le Gourriérec, Gerry Gribbon, Harold Huang, Harshad Narayane, Igor Chauskin, Jakub Poczatek, Jeff Daly, Jie Hai, Josh Soref, Kamalakannan R, Karl Bonde Torp, Kevin Liu, Kumara Parameshwaran, Madhuker Mythri, Markus Theil, Martijn Bakker, Maxime Gouin, Megha Ajmera, Michael Barker, Michal Wilczynski, Nan Zhou, Nobuhiro Miki, Padraig Connolly, Peng Yu, Peng Zhang, Qiao Liu, Rahul Bhansali, Stephen Douthit, Tianli Lai, Tudor Brindus, Usama Arif, Wang Yunjian, Weiguo Li, Wenxiang Qian, Wenxuan Wu, Yajun Wu, Yiding Zhou, Yingya Han, Yu Wenjun and Yuan Wang.

Below is the percentage of commits per employer (with authors count):

  A big thank to all courageous people who took on the non rewarding task of reviewing others’ work.

Based on Reviewed-by and Acked-by tags, the top non-PMD reviewers are:
         41     Akhil Goyal 
         29     Bruce Richardson 
         26     Ferruh Yigit 
         20     Ori Kam 
         19     David Marchand 
         16     Tyler Retzlaff 
         15     Viacheslav Ovsiienko 
         15     Morten Brørup 
         15     Chenbo Xia 
         14     Stephen Hemminger 
         14     Jerin Jacob 
         12     Dmitry Kozlyuk 
         11     Ruifeng Wang 
         11     Maxime Coquelin 

 The next version of DPDK, 22.07,  will be released in July.

The new features for 22.07 can be submitted during the next 3 weeks:  http://core.dpdk.org/roadmap#dates.

Please share your roadmap.

Thanks everyone!

DPDK 21.11 is Now Available!

By Blog

By David Marchand

A new major release of DPDK, DPDK 21.11, is now available: https://fast.dpdk.org/rel/dpdk-21.11.tar.xz

This is a big DPDK release, with:

  •    1875 commits from 204 authors
  •     2413 files changed, 259559 insertions(+), 87876 deletions(-)

The branch 21.11 should be supported for at least two years, making it recommended for system integration and deployment. The new major ABI version is 22.The next releases 22.03 and 22.07 will be ABI compatible with 21.11.

As you probably noticed, the year 2022 will see only two intermediate releases before the next 22.11 LTS.

Below are some new features, grouped by category:

* General

  •     hugetlbfs subdirectories
  •     AddressSanitizer (ASan) integration for debug
  •     mempool flag for non-IO usages
  •      device class for DMA accelerators and drivers for
  •      HiSilicon, Intel DSA, Intel IOAT, Marvell CNXK and NXP DPAA
  •     device class for GPU devices and driver for NVIDIA CUDA
  •     Toeplitz hash using Galois Fields New Instructions (GFNI)

* Networking

  •     MTU handling rework
  •     get all MAC addresses of a port
  •     RSS based on L3/L4 checksum fields
  •     flow match on L2TPv2 and PPP
  •     flow flex parser for custom header
  •     control delivery of HW Rx metadata
  •     transfer flows API rework
  •     shared Rx queue
  •     Windows support of Intel e1000, ixgbe and iavf
  •     driver for NXP ENETFEC
  •     vDPA driver for Xilinx devices
  •     virtio RSS
  •     vhost power monitor wakeup
  •     testpmd multi-process
  •     pcapng library and dumpcap tool

* API/ABI

  •     API namespace improvements and cleanups
  •     API internals hidden
  •     flags check for future ABI compatibility

More details in the release notes:  http://doc.dpdk.org/guides/rel_notes/release_21_11.html

There are 55 new contributors (including authors, reviewers and testers)! Welcome to:  Abhijit Sinha, Ady Agbarih, Alexander Bechikov, Alice Michael, Artur Tyminski, Ben Magistro, Ben Pfaff, Charles Brett, Chengfeng Ye, Christopher Pau, Daniel Martin Buckley, Danny Patel, Dariusz Sosnowski, David George, Elena Agostini, Ganapati Kundapura, Georg Sauthoff, Hanumanth Reddy Pothula, Harneet Singh, Huichao Cai, Idan Hackmon, Ilyes Ben Hamouda, Jilei Chen, Jonathan Erb, Kumara Parameshwaran, Lewei Yang, Liang Longfeng, Longfeng Liang, Maciej Fijalkowski, Maciej Paczkowski, Maciej Szwed, Marcin Domagala, Miao Li, Michal Berger, Michal Michalik, Mihai Pogonaru, Mohamad Noor Alim Hussin, Nikhil Vasoya, Pawel Malinowski, Pei Zhang, Pravin Pathak, Przemyslaw Zegan, Qiming Chen, Rashmi Shetty, Richard Eklycke, Sean Zhang, Siddaraju DH, Steve Rempe, Sylwester Dziedziuch, Volodymyr Fialko, Wojciech Drewek, Wojciech Liguzinski, Xingguang He, Yu Wenjun, Yvonne Yang.

Below is the percentage of commits per employer:

A big thank you to all courageous people who took on the non-  rewarding task of reviewing other people’s work. 

Based on Reviewed-by and Acked-by tags, the top non-PMD reviewers are:
    113    Akhil Goyal 
     83    Ferruh Yigit 
     70    Andrew Rybchenko 
     51    Ray Kinsella 
     50    Konstantin Ananyev
     47    Bruce Richardson 
     46    Conor Walsh 
     45    David Marchand 
     39    Ruifeng Wang 
     37    Jerin Jacob 
     36    Olivier Matz 
     36    Fan Zhang 
     32    Chenbo Xia 
     32    Ajit Khaparde 
     25    Ori Kam 
     23    Kevin Laatz 
     22    Ciara Power 
     20    Thomas Monjalon 
     19    Xiaoyun Li 
     18    Maxime Coquelin 

The new features for 22.03 may be submitted during the next 4 weeks so that we can all enjoy a good break at the end of this year. 2022 will see a change in pace for releases timing, let’s make the best of it to make good reviews.

DPDK 22.03 is scheduled for early March:  http://core.dpdk.org/roadmap#dates

Please share your roadmap.

Thanks everyone!

DPDK 21.08 is Here!

By Blog

By Thomas Monjalon, DPDK Tech Board

 The DPDK community has issued its latest quarterly release, 21.08, which is available here: https://fast.dpdk.org/rel/dpdk-21.08.tar.xz

Stats for this (smaller) release:

  •   922 commits from 159 authors
  •   1069 files changed, 150746 insertions(+), 85146 deletions(-)

There are no plans to start a maintenance branch for 21.08. This version is ABI-compatible with 20.11, 21.02 and 21.05.

New features for 21.08 include:

  • Linux auxiliary bus
  • Aarch32 cross-compilation
  •  Arm CPPC power management
  •  Rx multi-queue monitoring for power management
  •  XZ compressed firmware read
  •  Marvell CNXK drivers for ethernet, crypto and baseband PHY
  •  Wangxun ngbe ethernet driver
  •  NVIDIA mlx5 crypto driver supporting AES-XTS
  •  ISAL compress support on Arm

More technical details about 21.08 are included in the the release notes:

https://doc.dpdk.org/guides/rel_notes/release_21_08.html

There are 30 new contributors (including authors, reviewers and testers): 

Welcome to Aakash Sasidharan, Aman Deep Singh, Cheng Liu, Chenglian Sun, Conor Fogarty, Douglas Flint, Gaoxiang Liu, Ghalem Boudour, Gordon Noonan, Heng Wang, Henry Nadeau, James Grant, Jeffrey Huang, Jochen Behrens, John Levon, Lior Margalit, Martin Havlik, Naga Harish K S V, Nathan Skrzypczak, Owen Hilyard, Paulis Gributs, Raja Zidane, Rebecca Troy, Rob Scheepens, Rongwei Liu, Shai Brandes, Srujana Challa, Tudor Cornea, Vanshika Shukla, and Yixue Wang.

Below is the percentage of commits per employer company:

Based on Reviewed-by and Acked-by tags, the top non-PMD reviewers are:        45     Akhil Goyal 

        34     Jerin Jacob 

        21     Ruifeng Wang

        20     Ajit Khaparde 

        19     Matan Azrad 

        19     Andrew Rybchenko

        17     Konstantin Ananyev 

        15     Chenbo Xia 

        14     Maxime Coquelin 

        14     David Marchand 

        13     Viacheslav Ovsiienko 

        11     Thomas Monjalon 

         9     Dmitry Kozlyuk 

         8     Stephen Hemminger 

         8     Bruce Richardson 

 

The next DPDK release, 21.11, will be robust.  

New features for 21.11 can be submitted during for the next month, at this link: http://core.dpdk.org/roadmap#dates.  Please  be sure to share your features roadmap.

Thanks to all involved! Enjoy the rest of the summer!

DPDK Testing Approaches

By Blog

The DPDK (https://www.dpdk.org) project presents a unique set of challenges for fully testing the code, with the largest coverage possible (i.e. the goal of continuous integration and gating). The DPDK testing efforts are overseen by the Community CI group, which meets every other week to review status and discuss open issues with the testing. Before we drive into the details of the testing, the CI group has developed some terminology to describe the different types of testing used within the DPDK CI efforts. First, and probably most obvious, is the simple “compile” testing, or the “does it build” type testing. This is typically referred to as “Compile Testing” by the CI group. Next up is the “Unit Testing,” which refers to running the unit tests provided directly within the DPDK source code tree and maintained by the DPDK developer community. From there, we have two more categories of testing, as “Functional Testing” and “Performance Testing.”  In these cases, the CI community is using the terms to refer to cases where the DPDK stack is operational on the test system (i.e. PMD drivers are bound). Next, we’ll delve a little further into each of the testing types, some of the system requirements, etc.

Compile testing also has the advantage of remaining relatively “uncoupled” from the testing infrastructure. For example, it doesn’t generally depend on specific hardware or PMDs. There are some small exceptions, around architecture, where the compile operation can be a “native” or a “cross compile” target. In the current CI testing, compile jobs are run natively on x86_64 and aarch64 architectures in the community lab, along with an additional cross compile case (x86_64 to aarch64). Another dimension of the compile testing involves the actual OS, libraries, and compiler used for the “test.”  In the community lab, thus far, most of the compile jobs run using GCC versions across various operating systems. The lab aims to maintain coverage of all operating systems officially supported by the current DPDK project releases, along with some additional common OSes, accomplished by running the compile jobs in containers. This greatly simplifies the OS update and maintenance processes, since we can always just update our container images to the latest base image versions, etc. Lastly, for a few of the OSes, the testing is also running on the Clang compiler as well.

Where the Compile Testing leaves of, the Unit Testing picks up. These tests start to run core parts of the compiled DPDK code, flexing operations involving memory allocations and event triggers. Since the unit testing is running parts of the compiled DPDK code, it does start to involve some requirements on the underlying infrastructure. For example, the unit testing expects the system to be prepared with some available hugepages and memory available for the unit testing consumption. In the Community Lab (https://www.iol.unh.edu/hosted-resources/dpdk-performance-lab), the unit testing is still running within the container based infrastructure. The containers run with elevated privileges to support the hugepages access, however, the automation system limits to running only one unit testing container at a time (other compile testing might still be running in parallel). This ensures the unit testing has full access to the required memory and there is only a single “instance” of DPDK running on the system. 

With Functional Testing and Performance Testing, the tool chain switches to use DTS (DPDK Test Suite), which is a Python based tool chain, developed within the community to specifically test DPDK running on bare-metal systems. The community splits the testing based around cases that focus on functionality and performance testing.  Functionality testing verifies items such as configuring a packet filter or frame manipulation and verifying the stack is implementing the function. Performance Testing is focused on verification of the stack against a specific benchmark, such as the maximum packet forwarding rate. In both types of testing, it’s run directly on bare-metal hardware as a combination of system architecture and the actual network interface or PMD. This ensures DPDK code is tested with different interface vendor hardware combinations. The hardware vendors supporting the DPDK testing efforts keep the interface devices and firmware updated to their latest versions, so as the DPDK code evolves, so does the test environment and its hardware / firmware.

Hopefully the above descriptions have helped to explain the testing categories and approaches being taken by the DPDK CI Community efforts to completely test the DPDK project code. In future blogs we get into more detail about how / when the different types of testing are run, i.e. triggered by / for new patches or run periodically on specific code branches. The CI Community meets bi-weekly, on Thursdays at 1pm UTC and all DPDK community members are welcome to join and ask questions or help contribute to the testing efforts. Similarly the community mailing list, ci@dpdk.org, is also a resource as well.

Reader-Writer Concurrency

By Blog

By Honnappa Nagarahalli

As an increasingly higher number of cores are packed together in a SoC, thread synchronization plays a key role in scalability of the application. In the context of networking applications, because of the  partitioning of the application into control plane (write-mostly threads) and data plane (read-mostly  threads), reader-writer concurrency is a frequently encountered thread synchronization use case.  Without an effective solution for reader-writer concurrency, the application will end up with: 

  • race conditions, which are hard to solve 
  • unnecessary/excessive validations in the reader, resulting in degraded performance excessive use of memory barriers, affecting the performance 
  • and, finally, code that is hard to understand and maintain 

In this blog, I will 

  • briefly introduce the reader-writer concurrency problem 
  • talk about solving reader-writer concurrency using full memory barriers and the C11 memory  model 
  • and, talk about complex scenarios that exist 

Problem Statement 

Consider two or more threads or processes sharing memory. Writer threads/processes (writers) update a data structure in shared memory in order to convey information to readers. Reader threads/processes (readers) read the same data structure to carry out some action. E.g.: in the context of a networking  application, the writer could be a control plane thread writing to a hash table, the reader could be a data  plane thread performing lookups in the hash table to identify an action to take on a packet. 

Essentially, the writer wants to communicate some data to the reader. It must be done such that the  reader observes the data consistently and atomically instead of observing an incomplete or  intermediate state of data. 

This problem can easily be solved using a reader-writer lock. But locks have scalability problems, and a  lock free solution is required so that performance scales linearly with the number of readers. 

Solution 

In order to communicate the ‘data’ atomically, a ‘guard variable’ can be used. Consider the following  diagram.

The writer sets the guard variable atomically after it writes the data. The write barrier between the two  ensures that the store to data completes before the store to guard variable. 

The reader reads the guard variable atomically and checks if it is set. If it is set, indicating that the writer  has completed writing data, the reader can read the data. The read barrier between the two ensures  that the load of data starts only after the guard variable has been loaded. i.e. the reader is prevented  from reading data (speculatively or due to reordering) till the guard variable is set. There is no need for any additional validations in the reader. 

As shown, the writer and the reader synchronize with each other using the guard variable. The use of  the guard variable ensures that the data is available to the reader atomically irrespective of the size of  the data. There is no need of any additional barriers in the writer or the reader. 

Some of the CPU architectures enforce the program order for memory operations in the writer and the  reader without the need for explicit barriers. However, since the compiler can introduce re-ordering, the  compiler barriers are required on such architectures. 

So, when working on a reader-writer concurrency use-case, the first step is to identify the ‘data’ and the  ‘guard variable’. 

Solving reader-writer concurrency using C11 memory model 

In the above algorithm, the read and write barriers can be implemented as full barriers. However, full  barriers are not necessary. The ordering needs to be enforced between the memory operations on the  data (and other operations dependent on data) and the guard variable. Other independent memory  operations need not be ordered with respect to memory operations on data or guard variable. This provides CPU micro-architectures more flexibility in re-ordering the operations while executing the  code. C11 memory model allows for expressing such behaviors. In particular, C11 memory model allows  for replacing the full barriers with one-way barriers. 

As shown above, the writer uses the atomic_store_explicit function with memory_order_release to set  the guard variable. This ensures that the write to data completes before the write to guard variable completes while allowing for later memory operations to hoist above the barrier. 

The reader uses the atomic_load_explicit function with memory_order_acquire to load the guard  variable. This ensures that the load of the data starts only after the load of the guard variable completes while allowing for the earlier operations to sink below the barrier. 

atomic_store_explicit and atomic_load_explicit functions ensure that the operations are atomic and  enforce the required compiler barriers implicitly. 

Challenges 

The above paragraphs did not go into various details associated with data. The contents of data, support  for atomic operations in the CPU architecture, and support for APIs to modify the data can present various challenges. 

Size of data is more than the size of atomic operations 

Consider data of the following structure.

The size of the data is 160 bits, which is more than the size of the atomic operations supported by the CPU architectures. In this case, the guard variable is required to communicate the data atomically to the  reader. 

If support required is for add API alone, the basic algorithm described in above paragraphs is sufficient.  The following code shows the implementation of writer and reader. 

However, several challenges exist if one wants to support an API to modify/update data depending on  the total size of modifications. 

Size of the modified elements is larger than the size of atomic operations

Let us consider the case when the total size of modifications is more than the size of atomic operations supported. 

Since all modified elements of data need to be observed by the reader atomically, a new copy of the  data needs to be created. This new copy can consist of a mix of modified and unmodified elements. The  guard variable can be a pointer to data or any atomically modifiable variable used to derive the address  of data, for ex: an index into an array. After the new copy is created, the writer updates the guard  variable to point to the new data. This ensures that, the modified data is observed by the reader  atomically.

As shown above, the elements b, c and d need to be updated. Memory at addr 0x200 is allocated for the  new copy of data. Element a is copied from the current data. Elements b, c and d are updated with new  values in the new copy of data. The guard variable is updated with the new memory address 0x200. This  ensures that the modifications of the elements b, c and d appear atomically to the reader. 

In the reader, there is dependency between data and the guard variable as data cannot be loaded  without loading the guard variable. Hence, the barrier between the load of the guard variable and data  is not required. While using C11 memory model, memory_order_relaxed can be used to load the guard  variable. 

The following code shows the implementation of writer and reader.

Note that, the writer must ensure that all the readers have stopped referencing the memory containing  the old data before freeing it. 

Size of the modified elements is equal to the size of atomic operations 

Now, consider the case when the total size of modifications is equal to the size of atomic operations supported. 

Such modifications do not need a new copy of data as the atomic operations supported by the CPU  architecture ensure that the updates are observed by the reader atomically. These updates must use  atomic operations. 

As shown above, if only element b has to be updated, it can be updated atomically without creating a  new copy of data in the writer. The store and load operations in writer and reader do not require any  barriers. memory_order_relaxed can be used with atomic_store_explicit and atomic_load_explicit  functions. 

Size of data is equal to the size of atomic operations 

Consider a structure for data as follows 

The size of the data is 64 bits. All modern CPU architectures support 64 bit atomic operations. 

In this case, there is no need of a separate guard variable for both add and modify APIs. Atomic store of  the data in writer and atomic load of the data in reader is sufficient. The store and load operations do  not require any barrier in the writer or reader either. memory_order_relaxed can be used with both  atomic_store_explicit and atomic_load_explicit functions. As shown above, if required, part of the data  can indicate if it contains valid data. 

Conclusion 

So, whenever faced with a reader-writer synchronization issue, identify the data, consider if a guard  variable is required. Keep in mind that what matters is the ordering of memory operations and not when  the data is visible to the reader. Following the methods mentioned above will ensure that unnecessary  barriers and validations are avoided, and the code is race free and optimized.

Community Issues DPDK 21.05 Release

By Blog

By Thomas Monjalon, DPDK Tech Board

A new DPDK release is now available, 21.05:  https://fast.dpdk.org/rel/dpdk-21.05.tar.xz

It was a quite big cycle, as is typical with our May releases:

  •  1352 commits from 176 authors
  •   2396 files changed, 134413 insertions(+), 63913 deletions(-)

There are no plans to start a maintenance branch for 21.05, and this version is ABI-compatible with 20.11 and 21.02.

Below are some new features:

  • General
    • compilation support for GCC 11, clang 12 and Alpine Linux
    •  mass renaming of lib directories
    • allow disabling some libraries
    • log names alignment and help command
    •  phase-fair lock
    • Marvell CN10K driver
  • Networking
    • predictable RSS
    •  port representor syntax for sub-function and multi-host
    • metering extended to flow rule matching
    • packet integrity in flow rule matching
    •  TCP connection tracking offload with flow rule
    •  Windows support of ice, pcap and vmxnet3 drivers

More details in the release notes:  https://doc.dpdk.org/guides/rel_notes/release_21_05.html

There are 41 new contributors (including authors, reviewers and testers)!  Welcome to Alexandre Ferrieux, Amir Shay, Ashish Paul, Ashwin Sekhar T K, Chaoyong He, David Bouyeure, Dheemanth Mallikarjun, Elad Nachman, Gabriel Ganne, Haifei Luo, Hengjian Zhang, Jie Wang, John Hurley, Joshua Hay, Junjie Wan, Kai Ji, Kamil Vojanec, Kathleen Capella, Keiichi Watanabe, Luc Pelletier, Maciej Machnikowski, Piotr Kubaj, Pu Xu, Richael Zhuang, Robert Malz, Roy Shterman, Salem Sol, Shay Agroskin, Shun Hao, Siwar Zitouni, Smadar Fuks, Sridhar Samudrala, Srikanth Yalavarthi, Stanislaw Kardach, Stefan Wegrzyn, Tengfei Zhang, Tianyu Li, Vidya Sagar Velumuri, Vishwas Danivas, Wenwu Ma and Yan Xia.

Below is the percentage of commits per employer company:

Based on Reviewed-by and Acked-by tags, the top non-PMD reviewers are:

         46     Ferruh Yigit 
         39     David Marchand 
         34     Andrew Rybchenko 
         30     Bruce Richardson
         27     Maxime Coquelin 
         26     Viacheslav Ovsiienko
         25     Akhil Goyal 
         23     Thomas Monjalon 
         22     Jerin Jacob 
         20     Ajit Khaparde 
         18     Xiaoyun Li 
         18     Anatoly Burakov 
         16     Ori Kam 
         12     Matan Azrad 
         12     Konstantin Ananyev 
         12     Honnappa Nagarahalli 
         11     Olivier Matz 
         11     Hemant Agrawal 
         10     Ruifeng Wang 

The new features for 21.08 may be submitted through 1 June.

DPDK 21.08 should be released on early August, in a tight schedule: http://core.dpdk.org/roadmap#dates

Please share your features roadmap.

Thanks, everyone, for allowing us to close a great 21.05 on 21/05!

DPDK Summit APAC Receives First “Diversity & Inclusion” Event Badge

By Blog

We are thrilled to announce that our recent DPDK Summit APAC was the first Linux Foundation event to have earned a Gold Diversity & Inclusion badge from the Linux Foundation’s CHAOSS project! 

Event organizers can apply for an Event D&I  Badge for reasons of leadership, self-reflection, and self-improvement on issues critical to fostering more diverse and inclusive events. For us, this means taking conscious steps to make it easier for people of all backgrounds to participate in our events — from submitting speaking sessions to attending. 

The awarding of the CHAOSS D&I badge is an acknowledgement of DPDK’s commitment to implementing healthy Diversity and Inclusion (D&I) practices. These efforts are complementary to community-driven efforts to engage in using more inclusive language within the DPDK codebase. For this event, we were  measured on our efforts related to Speaker and Attendee Diversity & Inclusion, commitment to a specific code of conduct for all attendees, and measurement of access to the event, specifically measuring what, if any, barriers to access attendees must overcome to attend the event. Because all recent DPDK Summit events have been virtual, it has been an especially important initiative to welcome attendees from all over the world. One way we have been able to accomplish this is to offer all virtual events free of cost. 

While we are still a long way off from a fully diverse crowd, we are thrilled to have been recognized for the efforts we are making. It is so important –especially in a community as traditionally homogeneous as DPDK — that all voices be heard and welcomed. 

Big thanks to Rachel Braun and the Linux Foundation Events team for help making this a reality. 

We look forward to making even more strides towards greater diversity and wider inclusion among DPDK events in the future. 

 

 

 

DPDK adopts the C11 memory model

By Blog

By Honnappa Nagarahalli

DPDK is widely used across the technology industry to accelerate packet processing on a varied mix of platforms and architectures. However, all architectures are not created the same way and they all have differing characteristics. For frameworks such as DPDK, it is a challenge to efficiently use specific architecture features without impacting performance of other supported architectures.

One such characteristic is the memory model, which describes the behavior of accesses to shared memory by multi-processor systems. The Arm and PowerPC architectures support a weakly ordered memory model whereas x86 supports a strongly ordered memory model. Consider the following table that shows the ordering guarantees provided by these architectures for various sequences of memory operations.

As shown above, in the Arm architecture, all four possibilities of sequences of memory operations can be reordered. If an algorithm requires these memory operations to be executed (completed) in the program order, memory barriers are required to enforce the ordering. On x86, only store – load sequence can be reordered and requires a barrier to enforce the program order. The other sequences are guaranteed to execute in the program order.

However, not all algorithms need the stronger ordering guarantees. For example, take the case of a simple spinlock protecting a critical section.

The operations inside the critical section are not allowed to hoist above ‘spinlock-lock’ and sink below ‘spinlock-unlock’. But the operations above are allowed to sink below ‘spinlock-lock’ and the operations below are allowed to hoist above ‘spinlock-unlock’. In this case, protecting the critical section using locks does not require that lock and unlock functions provide full barriers. This gives more flexibility to the CPU to execute the instructions efficiently during run time. Note that this is just one example and many other algorithms have similar behaviors. The Arm architecture ‘provides load/store instructions, atomic instructions and barriers that support sinking and hoisting of memory operations in one direction [3].

In order to support such architectural differences DPDK uses abstract APIs. The rte_smp_mb/wmb/rmb APIs provide support for full memory barrier, store memory barrier and load memory barrier. APIs for atomic operations such as  rte_atomic_load/store/add/sub are also available. These APIs use full barriers and do not take any memory ordering parameter that help implement one-way barriers. Hence, it is not possible for the DPDK internal algorithms and applications to make use of the underlying CPU architecture effectively.

Exploring options

The DPDK community explored several options to solve this issue. The first option looked at relaxing the memory barriers used in the implementation of the above-mentioned APIs. It was soon realized that the type of memory ordering required depends on the algorithm and cannot be generalized. i.e., different algorithms may call the same API with different memory ordering. The second option looked at extending the existing APIs to take the memory ordering as one of the parameters while providing backward compatibility to existing APIs. This would however have resulted in writing large number of APIs to conform with the existing APIs.

Enter C11 Memory Model

The C language introduced the C11 memory model to address the above discussed differences in CPU architectures. It provides a memory model, that encompasses the behavior of widely used architectures, for multiple threads of execution to communicate with each other using shared memory. The C language supports this memory model through the atomic_xxx APIs provided in stdatomic.h. GCC and Clang compilers also provide __atomic_xxx built-ins. These APIs and built-ins allow the programmers to express the weak memory ordering inherent in their algorithms.

Several algorithms in DPDK were modified to use the C11 memory model on Arm. These changes didn’t affect the performance on x86 and at the same time, it improved the performance on weakly ordered memory model architectures like Arm. Thorough testing was done on several more algorithms to prove this further. Based on these results, the DPDK Tech Board voted unanimously to adopt the C11 memory model as the default memory model starting in the 20.08 release [4]. This means that, all the patches submitted by the DPDK community for subsequent releases should use C11 memory model. In order to support this adoption, Arm has agreed to change the existing code to use C11 memory model.

Community Discussions

The community discussed whether to use the __atomic_xxx built-ins or the atomic_xxx APIs provided via stdatomic.h. The __atomic_xxx built-ins require the memory order parameter in every built-in, requiring the programmer to make a conscious decision on the right memory order to use. They are supported by GCC, ICC and Clang compilers. The atomic_xxx APIs are not supported by compilers packaged with older versions of Linux distributions. After some debate it was decided to use the built-ins as there was no advantage in using the atomic APIs.

For x86 architecture, the __atomic_thread_fence(__ATOMIC_SEQ_CST) generates mfence instruction. DPDK implements the same barrier semantics with less overhead. Hence a decision was taken to introduce a wrapper rte_atomic_thread_fence. For x86 architecture, it calls the optimized implementation for __ATOMIC_SEQ_CST and calls __atomic_thread_fence for the rest. For rest of the architectures, it calls __atomic_thread_fence.

The community also decided to enforce this adoption using the check patch script. The script is updated to throw a warning if the patch uses the rte_smp_mb/rmb/wmb and rte_atomic_xxx APIs. All patch owners must fix such warning by using the __atomic_xxx built-ins. It is also the maintainers’ responsibility to look out for these warnings.

Current Status

Even though Arm has agreed to change the existing code to use C11 memory model, contributions from the community are highly welcome. So far 12 libraries have been updated with several more still to be updated. Several race conditions and atomicity related bugs have been identified and fixed during this process.

Conclusion

The adoption of the C11 memory model has helped DPDK community to develop robust code that performs best on all supported architectures. The community has developed a very good understanding of the C11 memory model. 

If anyone needs help in designing/coding their next algorithm using C11 memory model, please ask at dev@dpdk.org.

Acknowledgements

Special thanks to Ola Liljedahl, Philippe Robin, Ananyev Konstantin, David Christensen and David Marchand for reviewing this blog.

References

[1] https://openpowerfoundation.org/?resource_lib=power-isa-version-3-0
[2] https://software.intel.com/content/www/us/en/develop/download/intel-64-and-ia-32-architectures-sdm-volume-3a-system-programming-guide-part-1.html
[3] https://developer.arm.com/documentation/100941/0100/
[4] https://mails.dpdk.org/archives/dev/2020-April/165143.html

DPDK Release 21.02 is Now Available!

By Blog

By Thomas Monjalon from the DPDK Tech Board 

The DPDK community has issued a new release, 21.02: https://fast.dpdk.org/rel/dpdk-21.02.tar.xz

It was a light period as February releases of the 2 previous years:

  •  941 commits from 140 authors
  •  1208 files changed, 53571 insertions(+), 21946 deletions(-)

There are no plans to start a maintenance branch for 21.02. This version is ABI-compatible with 20.11.

Note 1: you may need pyelftools to compile DPDK drivers.
Note 2: a cleanup made out-of-tree drivers more difficult to compile.

Below are some new features, grouped by category.

  • Networking
    • power management for core polling single ethdev queue
    •  generic modify action in flow API
    • GENEVE TLV option in flow API
    • Marvell OCTEON TX EP net PMD
    •  Virtio PMD rework
    •  Windows support of i40e and mlx5
  • Cryptography
    • callback API for enqueue/dequeue
  • Compression
    • Mellanox compress PMD
  •  Others
    • new pmdinfogen with Windows support

More technical details in the full release notes: https://doc.dpdk.org/guides/rel_notes/release_21_02.html

There are 34 new contributors (including authors, reviewers and testers).

Welcome to Andrew Boyer, Andrii Pypchenko, Ashish Sadanandan, Barry Cao, Dana Vardi, Dapeng Yu, Fabio Pricoco, Fei Chen, Francis Kelly, Fredrik A Lindgren, George Prekas, Jacek Bułatek, Jiawei Zhu, Kiran KN, Lingyu Liu, Louis Peens, Mateusz Pacuszka, Meir Levi, Milena Olech, Murphy Yang, Nalla Pradeep, Neel Patel, Odi Assli, Paolo Valerio, Peng He, Samik Gupta, Simon Ellmann, Somalapuram Amaranath, Subhi Masri, Szymon T Cudzilo, Tyler Retzlaff, Viacheslav Galaktionov, Wenjun Wu and Yongxin Liu.

Below is the percentage of commits per employer (with authors count):

Based on Reviewed-by and Acked-by tags, the top non-PMD reviewers are:

         31     Ruifeng Wang <ruifeng.wang@arm.com>
         29     Ferruh Yigit <ferruh.yigit@intel.com>
         25     David Marchand <david.marchand@redhat.com>
         23     Maxime Coquelin <maxime.coquelin@redhat.com>
         17     Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
         16     Ori Kam <orika@nvidia.com>
         15     Jerin Jacob <jerinj@marvell.com>
         13     Konstantin Ananyev <konstantin.ananyev@intel.com>
         11     Viacheslav Ovsiienko <viacheslavo@nvidia.com>

The new features for 21.05 may be submitted until mid-March, in order to be reviewed and integrated before mid-April.

DPDK 21.05 should be released at mid-May: https://core.dpdk.org/roadmap#dates

Please share your features roadmap.

Thanks everyone, enjoy Spring Festival and Valentine’s Day, and let’s share as much love as we can.

  

On ABI Stability, v20 in review

By Blog

In DPDK 20.11, the DPDK community adopted ABI stable releases with the aim of easing the adoption of new features for DPDK users. If ABI stable releases is a new concept for you, or to simply refresh your memory, please read the previous post on Why is ABI Stability Important? The DPDK community put the policies and mechanisms in place to make ABI stability a reality from DPDK 19.11 onwards with a commitment to reviewing progress after a year. A year has now passed, so how did things go? The good news is that 100% backward compatibility was preserved through all the quarterly releases since DPDK 19.11.

100% compatibility means that a packet processing application built with DPDK 19.11, works just as well when upgraded to DPDK 20.08. This is crucially without necessitating an application rebuild, making for seamless upgrades. This is during a period when significant new features were added to DPDK, such as Cryptodev support for ChaCha20-Poly1305 AEAD for example. To really understand what DPDK ABI Stability means for a DPDK user, watch the demo …

 

https://bit.ly/33PcDFd 

The DPDK 20.11 release declares a new major ABI version v21 and begins a new period of backward compatibly with the DPDK 20.11 release. Kudos and thanks to all the maintainers and contributors whose hard work have made ABI stability a success.

Authored by: Ray Kinsella