[dpdk-dev] i40e: enable i40e pmd on ARM platform

Message ID 2DBBFF226F7CF64BAFCA79B681719D9502046341@shsmsx102.ccr.corp.intel.com (mailing list archive)
State Not Applicable, archived
Headers

Commit Message

Yao, Lei A Aug. 3, 2016, 3:26 a.m. UTC
  Hi, Jianbo

I have tested you patch on my X86 platform,  the single core performance for Non-vector PMD will have about 1Mpps drop
Non-vector PMD single core performance with patch               :  ~33.9 Mpps
Non-vector PMD single core performance without patch        :  ~35.1 Mpps
Is there any way to avoid such performance drop on X86? Thanks.

BRs
Lei

-----Original Message-----
From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Jianbo Liu
Sent: Tuesday, August 2, 2016 2:58 PM
To: dev@dpdk.org; Zhang, Helin <helin.zhang@intel.com>; Wu, Jingjing <jingjing.wu@intel.com>
Cc: Jianbo Liu <jianbo.liu@linaro.org>
Subject: [dpdk-dev] [PATCH] i40e: enable i40e pmd on ARM platform

And add read memory barrier to avoid status inconsistency between two RX descriptors readings.

Signed-off-by: Jianbo Liu <jianbo.liu@linaro.org>
---
 config/defconfig_arm64-armv8a-linuxapp-gcc | 2 +-
 doc/guides/nics/overview.rst               | 2 +-
 drivers/net/i40e/i40e_rxtx.c               | 2 ++
 3 files changed, 4 insertions(+), 2 deletions(-)

--
2.4.11
  

Comments

Jianbo Liu Aug. 3, 2016, 6:02 a.m. UTC | #1
On 3 August 2016 at 11:26, Yao, Lei A <lei.a.yao@intel.com> wrote:
> Hi, Jianbo
>
> I have tested you patch on my X86 platform,  the single core performance for Non-vector PMD will have about 1Mpps drop
> Non-vector PMD single core performance with patch               :  ~33.9 Mpps
> Non-vector PMD single core performance without patch        :  ~35.1 Mpps
> Is there any way to avoid such performance drop on X86? Thanks.
>

I think we can place a compiling condition before rte_rmb() to avoid
performance decrease on x86.
For example:  #if defined(RTE_ARCH_ARM) || defined(RTE_ARCH_ARM64)

Thanks!
Jianbo

> BRs
> Lei
>
> -----Original Message-----
> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Jianbo Liu
> Sent: Tuesday, August 2, 2016 2:58 PM
> To: dev@dpdk.org; Zhang, Helin <helin.zhang@intel.com>; Wu, Jingjing <jingjing.wu@intel.com>
> Cc: Jianbo Liu <jianbo.liu@linaro.org>
> Subject: [dpdk-dev] [PATCH] i40e: enable i40e pmd on ARM platform
>
> And add read memory barrier to avoid status inconsistency between two RX descriptors readings.
>
> Signed-off-by: Jianbo Liu <jianbo.liu@linaro.org>
> ---
>  config/defconfig_arm64-armv8a-linuxapp-gcc | 2 +-
>  doc/guides/nics/overview.rst               | 2 +-
>  drivers/net/i40e/i40e_rxtx.c               | 2 ++
>  3 files changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/config/defconfig_arm64-armv8a-linuxapp-gcc b/config/defconfig_arm64-armv8a-linuxapp-gcc
> index 1a17126..08f282b 100644
> --- a/config/defconfig_arm64-armv8a-linuxapp-gcc
> +++ b/config/defconfig_arm64-armv8a-linuxapp-gcc
> @@ -46,6 +46,6 @@ CONFIG_RTE_EAL_IGB_UIO=n
>
>  CONFIG_RTE_LIBRTE_IVSHMEM=n
>  CONFIG_RTE_LIBRTE_FM10K_PMD=n
> -CONFIG_RTE_LIBRTE_I40E_PMD=n
> +CONFIG_RTE_LIBRTE_I40E_INC_VECTOR=n
>
>  CONFIG_RTE_SCHED_VECTOR=n
> diff --git a/doc/guides/nics/overview.rst b/doc/guides/nics/overview.rst index 6abbae6..5175591 100644
> --- a/doc/guides/nics/overview.rst
> +++ b/doc/guides/nics/overview.rst
> @@ -138,7 +138,7 @@ Most of these differences are summarized below.
>     Linux VFIO                     Y Y   Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y                     Y   Y Y
>     Other kdrv                                                         Y Y               Y
>     ARMv7                                                                        Y             Y Y
> -   ARMv8                                              Y Y Y Y                   Y         Y   Y Y
> +   ARMv8                                  Y           Y Y Y Y                   Y         Y   Y Y
>     Power8                                                             Y Y       Y
>     TILE-Gx                                                                      Y
>     x86-32                         Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y       Y           Y Y Y
> diff --git a/drivers/net/i40e/i40e_rxtx.c b/drivers/net/i40e/i40e_rxtx.c index 554d167..4004b8e 100644
> --- a/drivers/net/i40e/i40e_rxtx.c
> +++ b/drivers/net/i40e/i40e_rxtx.c
> @@ -994,6 +994,8 @@ i40e_rx_scan_hw_ring(struct i40e_rx_queue *rxq)
>                                         I40E_RXD_QW1_STATUS_SHIFT;
>                 }
>
> +               rte_rmb();
> +
>                 /* Compute how many status bits were set */
>                 for (j = 0, nb_dd = 0; j < I40E_LOOK_AHEAD; j++)
>                         nb_dd += s[j] & (1 << I40E_RX_DESC_STATUS_DD_SHIFT);
> --
> 2.4.11
>
  
Thomas Monjalon Aug. 3, 2016, 7:58 a.m. UTC | #2
2016-08-03 14:02, Jianbo Liu:
> I think we can place a compiling condition before rte_rmb() to avoid
> performance decrease on x86.
> For example:  #if defined(RTE_ARCH_ARM) || defined(RTE_ARCH_ARM64)

Please could you explain why a memory barrier would be needed on ARM but
not on x86? What about other architectures?
  
Ananyev, Konstantin Aug. 3, 2016, 8:29 a.m. UTC | #3
Hi Jianbo,

> > Hi, Jianbo

> >

> > I have tested you patch on my X86 platform,  the single core performance for Non-vector PMD will have about 1Mpps drop

> > Non-vector PMD single core performance with patch               :  ~33.9 Mpps

> > Non-vector PMD single core performance without patch        :  ~35.1 Mpps

> > Is there any way to avoid such performance drop on X86? Thanks.

> >

> 

> I think we can place a compiling condition before rte_rmb() to avoid performance decrease on x86.

> For example:  #if defined(RTE_ARCH_ARM) || defined(RTE_ARCH_ARM64)


I suppose you can use rte_smp_rmb() here?
Konstantin

> 

> Thanks!

> Jianbo

> 

> > BRs

> > Lei

> >

> > -----Original Message-----

> > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Jianbo Liu

> > Sent: Tuesday, August 2, 2016 2:58 PM

> > To: dev@dpdk.org; Zhang, Helin <helin.zhang@intel.com>; Wu, Jingjing

> > <jingjing.wu@intel.com>

> > Cc: Jianbo Liu <jianbo.liu@linaro.org>

> > Subject: [dpdk-dev] [PATCH] i40e: enable i40e pmd on ARM platform

> >

> > And add read memory barrier to avoid status inconsistency between two RX descriptors readings.

> >

> > Signed-off-by: Jianbo Liu <jianbo.liu@linaro.org>

> > ---

> >  config/defconfig_arm64-armv8a-linuxapp-gcc | 2 +-

> >  doc/guides/nics/overview.rst               | 2 +-

> >  drivers/net/i40e/i40e_rxtx.c               | 2 ++

> >  3 files changed, 4 insertions(+), 2 deletions(-)

> >

> > diff --git a/config/defconfig_arm64-armv8a-linuxapp-gcc

> > b/config/defconfig_arm64-armv8a-linuxapp-gcc

> > index 1a17126..08f282b 100644

> > --- a/config/defconfig_arm64-armv8a-linuxapp-gcc

> > +++ b/config/defconfig_arm64-armv8a-linuxapp-gcc

> > @@ -46,6 +46,6 @@ CONFIG_RTE_EAL_IGB_UIO=n

> >

> >  CONFIG_RTE_LIBRTE_IVSHMEM=n

> >  CONFIG_RTE_LIBRTE_FM10K_PMD=n

> > -CONFIG_RTE_LIBRTE_I40E_PMD=n

> > +CONFIG_RTE_LIBRTE_I40E_INC_VECTOR=n

> >

> >  CONFIG_RTE_SCHED_VECTOR=n

> > diff --git a/doc/guides/nics/overview.rst

> > b/doc/guides/nics/overview.rst index 6abbae6..5175591 100644

> > --- a/doc/guides/nics/overview.rst

> > +++ b/doc/guides/nics/overview.rst

> > @@ -138,7 +138,7 @@ Most of these differences are summarized below.

> >     Linux VFIO                     Y Y   Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y                     Y   Y Y

> >     Other kdrv                                                         Y Y               Y

> >     ARMv7                                                                        Y             Y Y

> > -   ARMv8                                              Y Y Y Y                   Y         Y   Y Y

> > +   ARMv8                                  Y           Y Y Y Y                   Y         Y   Y Y

> >     Power8                                                             Y Y       Y

> >     TILE-Gx                                                                      Y

> >     x86-32                         Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y       Y           Y Y Y

> > diff --git a/drivers/net/i40e/i40e_rxtx.c

> > b/drivers/net/i40e/i40e_rxtx.c index 554d167..4004b8e 100644

> > --- a/drivers/net/i40e/i40e_rxtx.c

> > +++ b/drivers/net/i40e/i40e_rxtx.c

> > @@ -994,6 +994,8 @@ i40e_rx_scan_hw_ring(struct i40e_rx_queue *rxq)

> >                                         I40E_RXD_QW1_STATUS_SHIFT;

> >                 }

> >

> > +               rte_rmb();

> > +

> >                 /* Compute how many status bits were set */

> >                 for (j = 0, nb_dd = 0; j < I40E_LOOK_AHEAD; j++)

> >                         nb_dd += s[j] & (1 <<

> > I40E_RX_DESC_STATUS_DD_SHIFT);

> > --

> > 2.4.11

> >
  
Jianbo Liu Aug. 3, 2016, 9:33 a.m. UTC | #4
Hi Thomas,

On 3 August 2016 at 15:58, Thomas Monjalon <thomas.monjalon@6wind.com> wrote:
> 2016-08-03 14:02, Jianbo Liu:
>> I think we can place a compiling condition before rte_rmb() to avoid
>> performance decrease on x86.
>> For example:  #if defined(RTE_ARCH_ARM) || defined(RTE_ARCH_ARM64)
>
> Please could you explain why a memory barrier would be needed on ARM but

The reason is that ARM is weealy ordered processor, and data access
will be executed out of order to improve performance.
In this case, we have to read 2 times, 8 descriptors each. The read
statuses could be wrong if no memory barrier.
I also got the outdated status for some descriptors in my testing.

> not on x86? What about other architectures?

I think Konstantin gave me a good solution, by using rte_smp_rmb :)

Jianbo
  
Jianbo Liu Aug. 3, 2016, 9:36 a.m. UTC | #5
On 3 August 2016 at 16:29, Ananyev, Konstantin
<konstantin.ananyev@intel.com> wrote:
>
> Hi Jianbo,
>
>> > Hi, Jianbo
>> >
>> > I have tested you patch on my X86 platform,  the single core performance for Non-vector PMD will have about 1Mpps drop
>> > Non-vector PMD single core performance with patch               :  ~33.9 Mpps
>> > Non-vector PMD single core performance without patch        :  ~35.1 Mpps
>> > Is there any way to avoid such performance drop on X86? Thanks.
>> >
>>
>> I think we can place a compiling condition before rte_rmb() to avoid performance decrease on x86.
>> For example:  #if defined(RTE_ARCH_ARM) || defined(RTE_ARCH_ARM64)
>
> I suppose you can use rte_smp_rmb() here?

Great. Thank Konstantin.
I'll send v2.

Jianbo
  

Patch

diff --git a/config/defconfig_arm64-armv8a-linuxapp-gcc b/config/defconfig_arm64-armv8a-linuxapp-gcc
index 1a17126..08f282b 100644
--- a/config/defconfig_arm64-armv8a-linuxapp-gcc
+++ b/config/defconfig_arm64-armv8a-linuxapp-gcc
@@ -46,6 +46,6 @@  CONFIG_RTE_EAL_IGB_UIO=n
 
 CONFIG_RTE_LIBRTE_IVSHMEM=n
 CONFIG_RTE_LIBRTE_FM10K_PMD=n
-CONFIG_RTE_LIBRTE_I40E_PMD=n
+CONFIG_RTE_LIBRTE_I40E_INC_VECTOR=n
 
 CONFIG_RTE_SCHED_VECTOR=n
diff --git a/doc/guides/nics/overview.rst b/doc/guides/nics/overview.rst index 6abbae6..5175591 100644
--- a/doc/guides/nics/overview.rst
+++ b/doc/guides/nics/overview.rst
@@ -138,7 +138,7 @@  Most of these differences are summarized below.
    Linux VFIO                     Y Y   Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y                     Y   Y Y
    Other kdrv                                                         Y Y               Y
    ARMv7                                                                        Y             Y Y
-   ARMv8                                              Y Y Y Y                   Y         Y   Y Y
+   ARMv8                                  Y           Y Y Y Y                   Y         Y   Y Y
    Power8                                                             Y Y       Y
    TILE-Gx                                                                      Y
    x86-32                         Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y       Y           Y Y Y
diff --git a/drivers/net/i40e/i40e_rxtx.c b/drivers/net/i40e/i40e_rxtx.c index 554d167..4004b8e 100644
--- a/drivers/net/i40e/i40e_rxtx.c
+++ b/drivers/net/i40e/i40e_rxtx.c
@@ -994,6 +994,8 @@  i40e_rx_scan_hw_ring(struct i40e_rx_queue *rxq)
 					I40E_RXD_QW1_STATUS_SHIFT;
 		}
 
+		rte_rmb();
+
 		/* Compute how many status bits were set */
 		for (j = 0, nb_dd = 0; j < I40E_LOOK_AHEAD; j++)
 			nb_dd += s[j] & (1 << I40E_RX_DESC_STATUS_DD_SHIFT);