[dpdk-dev,v3,4/4] eal/arm: introduce CONFIG_RTE_ARCH_ARM_NEON_MEMCPY

Message ID 1458417485-29436-5-git-send-email-viktorin@rehivetech.com (mailing list archive)
State Accepted, archived
Delegated to: Thomas Monjalon
Headers

Commit Message

Jan Viktorin March 19, 2016, 7:58 p.m. UTC
  The flag is used to enable memcpy optimizations in EAL. As it is not always
the performance benefit, the flag allows to disable it.

Signed-off-by: Jan Viktorin <viktorin@rehivetech.com>
---
 config/defconfig_arm-armv7a-linuxapp-gcc               | 1 +
 lib/librte_eal/common/include/arch/arm/rte_memcpy_32.h | 8 ++++++--
 2 files changed, 7 insertions(+), 2 deletions(-)
  

Comments

Thomas Monjalon March 19, 2016, 8:14 p.m. UTC | #1
2016-03-19 20:58, Jan Viktorin:
> The flag is used to enable memcpy optimizations in EAL. As it is not always
> the performance benefit, the flag allows to disable it.

Ideally the default should be to choose the best optimization.
If it is not possible, it would help to have some comments explaining
how to choose wether enabling NEON memcpy or not.
  
Jan Viktorin March 20, 2016, 9:41 a.m. UTC | #2
On Sat, 19 Mar 2016 21:14:57 +0100
Thomas Monjalon <thomas.monjalon@6wind.com> wrote:

> 2016-03-19 20:58, Jan Viktorin:
> > The flag is used to enable memcpy optimizations in EAL. As it is not always
> > the performance benefit, the flag allows to disable it.  
> 
> Ideally the default should be to choose the best optimization.
> If it is not possible, it would help to have some comments explaining
> how to choose wether enabling NEON memcpy or not.

Ok, we can rename the option to CONFIG_RTE_ARCH_ARM_AVOID_NEON_MEMCPY,
delete it from the defconfig and change the test in rte_memcpy_32.h to

#ifndef CONFIG_RTE_ARCH_ARM_AVOID_NEON_MEMCPY

Alternatively, to have a positive test like

#ifdef CONFIG_RTE_ARCH_ARM_AVOID_NEON_MEMCPY

I can create a bigger change that moves the non-neon-memcpy up in the
file...

Should I resend the whole series as v3?

Regards
Jan
  
Jan Viktorin March 20, 2016, 9:46 a.m. UTC | #3
On Sun, 20 Mar 2016 10:41:10 +0100
Jan Viktorin <viktorin@rehivetech.com> wrote:

> On Sat, 19 Mar 2016 21:14:57 +0100
> Thomas Monjalon <thomas.monjalon@6wind.com> wrote:
> 
> > 2016-03-19 20:58, Jan Viktorin:  
> > > The flag is used to enable memcpy optimizations in EAL. As it is not always
> > > the performance benefit, the flag allows to disable it.    
> > 
> > Ideally the default should be to choose the best optimization.
> > If it is not possible, it would help to have some comments explaining
> > how to choose wether enabling NEON memcpy or not.

The related statistics are mentioned here:

commit 04a2fde35daf5e9a271e72331a70b48b951d7568
Author: Vlastimil Kosar <kosar@rehivetech.com>
Date:   Tue Nov 3 00:47:20 2015 +0100

    eal/arm: add vector memcpy for ARMv7

It's quite difficult to easily summarize it, especially for so many
CPUs...
 
> 
> Ok, we can rename the option to CONFIG_RTE_ARCH_ARM_AVOID_NEON_MEMCPY,
> delete it from the defconfig and change the test in rte_memcpy_32.h to
> 
> #ifndef CONFIG_RTE_ARCH_ARM_AVOID_NEON_MEMCPY
> 
> Alternatively, to have a positive test like
> 
> #ifdef CONFIG_RTE_ARCH_ARM_AVOID_NEON_MEMCPY
> 
> I can create a bigger change that moves the non-neon-memcpy up in the
> file...
> 
> Should I resend the whole series as v3?
> 
> Regards
> Jan
  
Thomas Monjalon March 20, 2016, 10:29 a.m. UTC | #4
2016-03-20 10:41, Jan Viktorin:
> On Sat, 19 Mar 2016 21:14:57 +0100
> Thomas Monjalon <thomas.monjalon@6wind.com> wrote:
> 
> > 2016-03-19 20:58, Jan Viktorin:
> > > The flag is used to enable memcpy optimizations in EAL. As it is not always
> > > the performance benefit, the flag allows to disable it.  
> > 
> > Ideally the default should be to choose the best optimization.
> > If it is not possible, it would help to have some comments explaining
> > how to choose wether enabling NEON memcpy or not.
> 
> Ok, we can rename the option to CONFIG_RTE_ARCH_ARM_AVOID_NEON_MEMCPY,
> delete it from the defconfig and change the test in rte_memcpy_32.h to
> 
> #ifndef CONFIG_RTE_ARCH_ARM_AVOID_NEON_MEMCPY
> 
> Alternatively, to have a positive test like
> 
> #ifdef CONFIG_RTE_ARCH_ARM_AVOID_NEON_MEMCPY
> 
> I can create a bigger change that moves the non-neon-memcpy up in the
> file...
> 
> Should I resend the whole series as v3?

No, I don't think changing the name of the config or moving code
will change anything.
We just need to understand when it must be enabled or disabled.
  
Thomas Monjalon March 20, 2016, 10:33 a.m. UTC | #5
2016-03-20 10:46, Jan Viktorin:
> On Sun, 20 Mar 2016 10:41:10 +0100
> Jan Viktorin <viktorin@rehivetech.com> wrote:
> 
> > On Sat, 19 Mar 2016 21:14:57 +0100
> > Thomas Monjalon <thomas.monjalon@6wind.com> wrote:
> > 
> > > 2016-03-19 20:58, Jan Viktorin:  
> > > > The flag is used to enable memcpy optimizations in EAL. As it is not always
> > > > the performance benefit, the flag allows to disable it.    
> > > 
> > > Ideally the default should be to choose the best optimization.
> > > If it is not possible, it would help to have some comments explaining
> > > how to choose wether enabling NEON memcpy or not.
> 
> The related statistics are mentioned here:
> 
> commit 04a2fde35daf5e9a271e72331a70b48b951d7568
> Author: Vlastimil Kosar <kosar@rehivetech.com>
> Date:   Tue Nov 3 00:47:20 2015 +0100
> 
>     eal/arm: add vector memcpy for ARMv7
> 
> It's quite difficult to easily summarize it, especially for so many
> CPUs...

If it is difficult for you, it will be  impossible for the users of
this config option.
When someone will ask what is the best value for his CPU, what will
you answer?
At least, we can add a comment explaining that the performance is not
always better, depending of the buffer size and the CPU.
  
Jerin Jacob March 20, 2016, 5:38 p.m. UTC | #6
On Sun, Mar 20, 2016 at 11:29:48AM +0100, Thomas Monjalon wrote:
> 2016-03-20 10:41, Jan Viktorin:
> > On Sat, 19 Mar 2016 21:14:57 +0100
> > Thomas Monjalon <thomas.monjalon@6wind.com> wrote:
> > 
> > > 2016-03-19 20:58, Jan Viktorin:
> > > > The flag is used to enable memcpy optimizations in EAL. As it is not always
> > > > the performance benefit, the flag allows to disable it.  
> > > 
> > > Ideally the default should be to choose the best optimization.
> > > If it is not possible, it would help to have some comments explaining
> > > how to choose wether enabling NEON memcpy or not.
> > 
> > Ok, we can rename the option to CONFIG_RTE_ARCH_ARM_AVOID_NEON_MEMCPY,
> > delete it from the defconfig and change the test in rte_memcpy_32.h to
> > 
> > #ifndef CONFIG_RTE_ARCH_ARM_AVOID_NEON_MEMCPY
> > 
> > Alternatively, to have a positive test like
> > 
> > #ifdef CONFIG_RTE_ARCH_ARM_AVOID_NEON_MEMCPY
> > 
> > I can create a bigger change that moves the non-neon-memcpy up in the
> > file...
> > 
> > Should I resend the whole series as v3?
> 
> No, I don't think changing the name of the config or moving code
> will change anything.
> We just need to understand when it must be enabled or disabled.

By default, NEON implementation should be enabled in default config file,
if a given arm target/cpu has issue with NEON specific implementation
at target/cpu config level it can be disabled. IMO, Its inline with Jan's Patch.

The factors like NEON instruction execution cycles and pipelines
supported etc highly depend on the ARM target vendor implementation.
(ie arm specification does not mandate those fine-grained details)
so let target/cpu configuration decides any expectation is required or
not.

Jerin
  
Jianbo Liu March 21, 2016, 5:42 a.m. UTC | #7
On 20 March 2016 at 03:58, Jan Viktorin <viktorin@rehivetech.com> wrote:
> The flag is used to enable memcpy optimizations in EAL. As it is not always
> the performance benefit, the flag allows to disable it.
>
> Signed-off-by: Jan Viktorin <viktorin@rehivetech.com>
> ---
>  config/defconfig_arm-armv7a-linuxapp-gcc               | 1 +
>  lib/librte_eal/common/include/arch/arm/rte_memcpy_32.h | 8 ++++++--
>  2 files changed, 7 insertions(+), 2 deletions(-)
>
> diff --git a/config/defconfig_arm-armv7a-linuxapp-gcc b/config/defconfig_arm-armv7a-linuxapp-gcc
> index 96c3343..2c60c2c 100644
> --- a/config/defconfig_arm-armv7a-linuxapp-gcc
> +++ b/config/defconfig_arm-armv7a-linuxapp-gcc
> @@ -36,6 +36,7 @@ CONFIG_RTE_ARCH="arm"
>  CONFIG_RTE_ARCH_ARM=y
>  CONFIG_RTE_ARCH_ARMv7=y
>  CONFIG_RTE_ARCH_ARM_TUNE="cortex-a9"
> +CONFIG_RTE_ARCH_ARM_NEON_MEMCPY=y
>
If it's not always benefit, why not disable here since it is common
armv7a config, and enable in your or other user's own config file?

Thanks!
Jianbo
  
Jan Viktorin March 21, 2016, 12:21 p.m. UTC | #8
On Mon, 21 Mar 2016 13:42:31 +0800
Jianbo Liu <jianbo.liu@linaro.org> wrote:

> On 20 March 2016 at 03:58, Jan Viktorin <viktorin@rehivetech.com> wrote:
> > The flag is used to enable memcpy optimizations in EAL. As it is not always
> > the performance benefit, the flag allows to disable it.
> >
> > Signed-off-by: Jan Viktorin <viktorin@rehivetech.com>
> > ---
> >  config/defconfig_arm-armv7a-linuxapp-gcc               | 1 +
> >  lib/librte_eal/common/include/arch/arm/rte_memcpy_32.h | 8 ++++++--
> >  2 files changed, 7 insertions(+), 2 deletions(-)
> >
> > diff --git a/config/defconfig_arm-armv7a-linuxapp-gcc b/config/defconfig_arm-armv7a-linuxapp-gcc
> > index 96c3343..2c60c2c 100644
> > --- a/config/defconfig_arm-armv7a-linuxapp-gcc
> > +++ b/config/defconfig_arm-armv7a-linuxapp-gcc
> > @@ -36,6 +36,7 @@ CONFIG_RTE_ARCH="arm"
> >  CONFIG_RTE_ARCH_ARM=y
> >  CONFIG_RTE_ARCH_ARMv7=y
> >  CONFIG_RTE_ARCH_ARM_TUNE="cortex-a9"
> > +CONFIG_RTE_ARCH_ARM_NEON_MEMCPY=y
> >  
> If it's not always benefit, why not disable here since it is common
> armv7a config, and enable in your or other user's own config file?

Jianbo, you are right. In that case, I'd just turn it off by default.
And when there is a new platform-specific defconfig, it can enable it.

Anyway, I am thinking of adding some comment into the rte_memcpy_32.h
file describing the potential of the NEON code. What about:

/* Enable in your defconfig to accelerate memcpy operations. Consider
   enabling this for Cortex-A15. For Cortex-A7 and Cortex-A9, It might
   accelerate short data copies (< 64 B). */

Thomas, do you consider this enough?

Jan

> 
> Thanks!
> Jianbo
  
Thomas Monjalon March 21, 2016, 1:24 p.m. UTC | #9
2016-03-21 13:21, Jan Viktorin:
> On Mon, 21 Mar 2016 13:42:31 +0800
> Jianbo Liu <jianbo.liu@linaro.org> wrote:
> 
> > On 20 March 2016 at 03:58, Jan Viktorin <viktorin@rehivetech.com> wrote:
> > > The flag is used to enable memcpy optimizations in EAL. As it is not always
> > > the performance benefit, the flag allows to disable it.
> > >
> > > Signed-off-by: Jan Viktorin <viktorin@rehivetech.com>
> > > ---
> > >  config/defconfig_arm-armv7a-linuxapp-gcc               | 1 +
> > >  lib/librte_eal/common/include/arch/arm/rte_memcpy_32.h | 8 ++++++--
> > >  2 files changed, 7 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/config/defconfig_arm-armv7a-linuxapp-gcc b/config/defconfig_arm-armv7a-linuxapp-gcc
> > > index 96c3343..2c60c2c 100644
> > > --- a/config/defconfig_arm-armv7a-linuxapp-gcc
> > > +++ b/config/defconfig_arm-armv7a-linuxapp-gcc
> > > @@ -36,6 +36,7 @@ CONFIG_RTE_ARCH="arm"
> > >  CONFIG_RTE_ARCH_ARM=y
> > >  CONFIG_RTE_ARCH_ARMv7=y
> > >  CONFIG_RTE_ARCH_ARM_TUNE="cortex-a9"
> > > +CONFIG_RTE_ARCH_ARM_NEON_MEMCPY=y
> > >  
> > If it's not always benefit, why not disable here since it is common
> > armv7a config, and enable in your or other user's own config file?
> 
> Jianbo, you are right. In that case, I'd just turn it off by default.
> And when there is a new platform-specific defconfig, it can enable it.
> 
> Anyway, I am thinking of adding some comment into the rte_memcpy_32.h
> file describing the potential of the NEON code. What about:
> 
> /* Enable in your defconfig to accelerate memcpy operations. Consider
>    enabling this for Cortex-A15. For Cortex-A7 and Cortex-A9, It might
>    accelerate short data copies (< 64 B). */
> 
> Thomas, do you consider this enough?

Yes it is perfect.
Why not put it in defconfig_arm-armv7a-linuxapp-gcc?
  
Jan Viktorin March 21, 2016, 2:01 p.m. UTC | #10
On Mon, 21 Mar 2016 06:24:37 -0700 (PDT)
Thomas Monjalon <thomas.monjalon@6wind.com> wrote:

> 2016-03-21 13:21, Jan Viktorin:
> > On Mon, 21 Mar 2016 13:42:31 +0800
> > Jianbo Liu <jianbo.liu@linaro.org> wrote:
> >   
> > > On 20 March 2016 at 03:58, Jan Viktorin <viktorin@rehivetech.com> wrote:  
> > > > The flag is used to enable memcpy optimizations in EAL. As it is not always
> > > > the performance benefit, the flag allows to disable it.
> > > >
> > > > Signed-off-by: Jan Viktorin <viktorin@rehivetech.com>
> > > > ---
> > > >  config/defconfig_arm-armv7a-linuxapp-gcc               | 1 +
> > > >  lib/librte_eal/common/include/arch/arm/rte_memcpy_32.h | 8 ++++++--
> > > >  2 files changed, 7 insertions(+), 2 deletions(-)
> > > >
> > > > diff --git a/config/defconfig_arm-armv7a-linuxapp-gcc b/config/defconfig_arm-armv7a-linuxapp-gcc
> > > > index 96c3343..2c60c2c 100644
> > > > --- a/config/defconfig_arm-armv7a-linuxapp-gcc
> > > > +++ b/config/defconfig_arm-armv7a-linuxapp-gcc
> > > > @@ -36,6 +36,7 @@ CONFIG_RTE_ARCH="arm"
> > > >  CONFIG_RTE_ARCH_ARM=y
> > > >  CONFIG_RTE_ARCH_ARMv7=y
> > > >  CONFIG_RTE_ARCH_ARM_TUNE="cortex-a9"
> > > > +CONFIG_RTE_ARCH_ARM_NEON_MEMCPY=y
> > > >    
> > > If it's not always benefit, why not disable here since it is common
> > > armv7a config, and enable in your or other user's own config file?  
> > 
> > Jianbo, you are right. In that case, I'd just turn it off by default.
> > And when there is a new platform-specific defconfig, it can enable it.
> > 
> > Anyway, I am thinking of adding some comment into the rte_memcpy_32.h
> > file describing the potential of the NEON code. What about:
> > 
> > /* Enable in your defconfig to accelerate memcpy operations. Consider
> >    enabling this for Cortex-A15. For Cortex-A7 and Cortex-A9, It might
> >    accelerate short data copies (< 64 B). */
> > 
> > Thomas, do you consider this enough?  
> 
> Yes it is perfect.
> Why not put it in defconfig_arm-armv7a-linuxapp-gcc?

So, for now, I leave the patch as is and just add the comment.

Jan
  

Patch

diff --git a/config/defconfig_arm-armv7a-linuxapp-gcc b/config/defconfig_arm-armv7a-linuxapp-gcc
index 96c3343..2c60c2c 100644
--- a/config/defconfig_arm-armv7a-linuxapp-gcc
+++ b/config/defconfig_arm-armv7a-linuxapp-gcc
@@ -36,6 +36,7 @@  CONFIG_RTE_ARCH="arm"
 CONFIG_RTE_ARCH_ARM=y
 CONFIG_RTE_ARCH_ARMv7=y
 CONFIG_RTE_ARCH_ARM_TUNE="cortex-a9"
+CONFIG_RTE_ARCH_ARM_NEON_MEMCPY=y
 
 CONFIG_RTE_FORCE_INTRINSICS=y
 CONFIG_RTE_ARCH_STRICT_ALIGN=y
diff --git a/lib/librte_eal/common/include/arch/arm/rte_memcpy_32.h b/lib/librte_eal/common/include/arch/arm/rte_memcpy_32.h
index ad8bc65..988125b 100644
--- a/lib/librte_eal/common/include/arch/arm/rte_memcpy_32.h
+++ b/lib/librte_eal/common/include/arch/arm/rte_memcpy_32.h
@@ -42,7 +42,11 @@  extern "C" {
 
 #include "generic/rte_memcpy.h"
 
-#ifdef RTE_MACHINE_CPUFLAG_NEON
+#ifdef RTE_ARCH_ARM_NEON_MEMCPY
+
+#ifndef RTE_MACHINE_CPUFLAG_NEON
+#error "Cannot optimize memcpy by NEON as the CPU seems to not support this"
+#endif
 
 /* ARM NEON Intrinsics are used to copy data */
 #include <arm_neon.h>
@@ -325,7 +329,7 @@  rte_memcpy_func(void *dst, const void *src, size_t n)
 	return memcpy(dst, src, n);
 }
 
-#endif /* RTE_MACHINE_CPUFLAG_NEON */
+#endif /* RTE_ARCH_ARM_NEON_MEMCPY */
 
 #ifdef __cplusplus
 }