Message ID | 1458417485-29436-5-git-send-email-viktorin@rehivetech.com (mailing list archive) |
---|---|
State | Accepted, archived |
Delegated to: | Thomas Monjalon |
Headers |
Return-Path: <dev-bounces@dpdk.org> X-Original-To: patchwork@dpdk.org Delivered-To: patchwork@dpdk.org Received: from [92.243.14.124] (localhost [IPv6:::1]) by dpdk.org (Postfix) with ESMTP id 81036595A; Sat, 19 Mar 2016 20:58:15 +0100 (CET) Received: from wes1-so1.wedos.net (wes1-so1.wedos.net [46.28.106.15]) by dpdk.org (Postfix) with ESMTP id DC5025685 for <dev@dpdk.org>; Sat, 19 Mar 2016 20:58:10 +0100 (CET) Received: from pcviktorin.fit.vutbr.cz (pcviktorin.fit.vutbr.cz [147.229.13.147]) by wes1-so1.wedos.net (Postfix) with ESMTPSA id 3qSCZB4pwLz5qG; Sat, 19 Mar 2016 20:58:10 +0100 (CET) From: Jan Viktorin <viktorin@rehivetech.com> To: dev@dpdk.org Cc: Jan Viktorin <viktorin@rehivetech.com>, thomas.monjalon@6wind.com, jerin.jacob@caviumnetworks.com, tomaszx.kulasek@intel.com, jianbo.liu@linaro.org Date: Sat, 19 Mar 2016 20:58:05 +0100 Message-Id: <1458417485-29436-5-git-send-email-viktorin@rehivetech.com> X-Mailer: git-send-email 2.7.0 In-Reply-To: <1458417485-29436-1-git-send-email-viktorin@rehivetech.com> References: <1458417485-29436-1-git-send-email-viktorin@rehivetech.com> In-Reply-To: <1458379590-18618-1-git-send-email-viktorin@rehivetech.com> References: <1458379590-18618-1-git-send-email-viktorin@rehivetech.com> Subject: [dpdk-dev] [PATCH v3 4/4] eal/arm: introduce CONFIG_RTE_ARCH_ARM_NEON_MEMCPY X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK <dev.dpdk.org> List-Unsubscribe: <http://dpdk.org/ml/options/dev>, <mailto:dev-request@dpdk.org?subject=unsubscribe> List-Archive: <http://dpdk.org/ml/archives/dev/> List-Post: <mailto:dev@dpdk.org> List-Help: <mailto:dev-request@dpdk.org?subject=help> List-Subscribe: <http://dpdk.org/ml/listinfo/dev>, <mailto:dev-request@dpdk.org?subject=subscribe> Errors-To: dev-bounces@dpdk.org Sender: "dev" <dev-bounces@dpdk.org> |
Commit Message
Jan Viktorin
March 19, 2016, 7:58 p.m. UTC
The flag is used to enable memcpy optimizations in EAL. As it is not always
the performance benefit, the flag allows to disable it.
Signed-off-by: Jan Viktorin <viktorin@rehivetech.com>
---
config/defconfig_arm-armv7a-linuxapp-gcc | 1 +
lib/librte_eal/common/include/arch/arm/rte_memcpy_32.h | 8 ++++++--
2 files changed, 7 insertions(+), 2 deletions(-)
Comments
2016-03-19 20:58, Jan Viktorin: > The flag is used to enable memcpy optimizations in EAL. As it is not always > the performance benefit, the flag allows to disable it. Ideally the default should be to choose the best optimization. If it is not possible, it would help to have some comments explaining how to choose wether enabling NEON memcpy or not.
On Sat, 19 Mar 2016 21:14:57 +0100 Thomas Monjalon <thomas.monjalon@6wind.com> wrote: > 2016-03-19 20:58, Jan Viktorin: > > The flag is used to enable memcpy optimizations in EAL. As it is not always > > the performance benefit, the flag allows to disable it. > > Ideally the default should be to choose the best optimization. > If it is not possible, it would help to have some comments explaining > how to choose wether enabling NEON memcpy or not. Ok, we can rename the option to CONFIG_RTE_ARCH_ARM_AVOID_NEON_MEMCPY, delete it from the defconfig and change the test in rte_memcpy_32.h to #ifndef CONFIG_RTE_ARCH_ARM_AVOID_NEON_MEMCPY Alternatively, to have a positive test like #ifdef CONFIG_RTE_ARCH_ARM_AVOID_NEON_MEMCPY I can create a bigger change that moves the non-neon-memcpy up in the file... Should I resend the whole series as v3? Regards Jan
On Sun, 20 Mar 2016 10:41:10 +0100 Jan Viktorin <viktorin@rehivetech.com> wrote: > On Sat, 19 Mar 2016 21:14:57 +0100 > Thomas Monjalon <thomas.monjalon@6wind.com> wrote: > > > 2016-03-19 20:58, Jan Viktorin: > > > The flag is used to enable memcpy optimizations in EAL. As it is not always > > > the performance benefit, the flag allows to disable it. > > > > Ideally the default should be to choose the best optimization. > > If it is not possible, it would help to have some comments explaining > > how to choose wether enabling NEON memcpy or not. The related statistics are mentioned here: commit 04a2fde35daf5e9a271e72331a70b48b951d7568 Author: Vlastimil Kosar <kosar@rehivetech.com> Date: Tue Nov 3 00:47:20 2015 +0100 eal/arm: add vector memcpy for ARMv7 It's quite difficult to easily summarize it, especially for so many CPUs... > > Ok, we can rename the option to CONFIG_RTE_ARCH_ARM_AVOID_NEON_MEMCPY, > delete it from the defconfig and change the test in rte_memcpy_32.h to > > #ifndef CONFIG_RTE_ARCH_ARM_AVOID_NEON_MEMCPY > > Alternatively, to have a positive test like > > #ifdef CONFIG_RTE_ARCH_ARM_AVOID_NEON_MEMCPY > > I can create a bigger change that moves the non-neon-memcpy up in the > file... > > Should I resend the whole series as v3? > > Regards > Jan
2016-03-20 10:41, Jan Viktorin: > On Sat, 19 Mar 2016 21:14:57 +0100 > Thomas Monjalon <thomas.monjalon@6wind.com> wrote: > > > 2016-03-19 20:58, Jan Viktorin: > > > The flag is used to enable memcpy optimizations in EAL. As it is not always > > > the performance benefit, the flag allows to disable it. > > > > Ideally the default should be to choose the best optimization. > > If it is not possible, it would help to have some comments explaining > > how to choose wether enabling NEON memcpy or not. > > Ok, we can rename the option to CONFIG_RTE_ARCH_ARM_AVOID_NEON_MEMCPY, > delete it from the defconfig and change the test in rte_memcpy_32.h to > > #ifndef CONFIG_RTE_ARCH_ARM_AVOID_NEON_MEMCPY > > Alternatively, to have a positive test like > > #ifdef CONFIG_RTE_ARCH_ARM_AVOID_NEON_MEMCPY > > I can create a bigger change that moves the non-neon-memcpy up in the > file... > > Should I resend the whole series as v3? No, I don't think changing the name of the config or moving code will change anything. We just need to understand when it must be enabled or disabled.
2016-03-20 10:46, Jan Viktorin: > On Sun, 20 Mar 2016 10:41:10 +0100 > Jan Viktorin <viktorin@rehivetech.com> wrote: > > > On Sat, 19 Mar 2016 21:14:57 +0100 > > Thomas Monjalon <thomas.monjalon@6wind.com> wrote: > > > > > 2016-03-19 20:58, Jan Viktorin: > > > > The flag is used to enable memcpy optimizations in EAL. As it is not always > > > > the performance benefit, the flag allows to disable it. > > > > > > Ideally the default should be to choose the best optimization. > > > If it is not possible, it would help to have some comments explaining > > > how to choose wether enabling NEON memcpy or not. > > The related statistics are mentioned here: > > commit 04a2fde35daf5e9a271e72331a70b48b951d7568 > Author: Vlastimil Kosar <kosar@rehivetech.com> > Date: Tue Nov 3 00:47:20 2015 +0100 > > eal/arm: add vector memcpy for ARMv7 > > It's quite difficult to easily summarize it, especially for so many > CPUs... If it is difficult for you, it will be impossible for the users of this config option. When someone will ask what is the best value for his CPU, what will you answer? At least, we can add a comment explaining that the performance is not always better, depending of the buffer size and the CPU.
On Sun, Mar 20, 2016 at 11:29:48AM +0100, Thomas Monjalon wrote: > 2016-03-20 10:41, Jan Viktorin: > > On Sat, 19 Mar 2016 21:14:57 +0100 > > Thomas Monjalon <thomas.monjalon@6wind.com> wrote: > > > > > 2016-03-19 20:58, Jan Viktorin: > > > > The flag is used to enable memcpy optimizations in EAL. As it is not always > > > > the performance benefit, the flag allows to disable it. > > > > > > Ideally the default should be to choose the best optimization. > > > If it is not possible, it would help to have some comments explaining > > > how to choose wether enabling NEON memcpy or not. > > > > Ok, we can rename the option to CONFIG_RTE_ARCH_ARM_AVOID_NEON_MEMCPY, > > delete it from the defconfig and change the test in rte_memcpy_32.h to > > > > #ifndef CONFIG_RTE_ARCH_ARM_AVOID_NEON_MEMCPY > > > > Alternatively, to have a positive test like > > > > #ifdef CONFIG_RTE_ARCH_ARM_AVOID_NEON_MEMCPY > > > > I can create a bigger change that moves the non-neon-memcpy up in the > > file... > > > > Should I resend the whole series as v3? > > No, I don't think changing the name of the config or moving code > will change anything. > We just need to understand when it must be enabled or disabled. By default, NEON implementation should be enabled in default config file, if a given arm target/cpu has issue with NEON specific implementation at target/cpu config level it can be disabled. IMO, Its inline with Jan's Patch. The factors like NEON instruction execution cycles and pipelines supported etc highly depend on the ARM target vendor implementation. (ie arm specification does not mandate those fine-grained details) so let target/cpu configuration decides any expectation is required or not. Jerin
On 20 March 2016 at 03:58, Jan Viktorin <viktorin@rehivetech.com> wrote: > The flag is used to enable memcpy optimizations in EAL. As it is not always > the performance benefit, the flag allows to disable it. > > Signed-off-by: Jan Viktorin <viktorin@rehivetech.com> > --- > config/defconfig_arm-armv7a-linuxapp-gcc | 1 + > lib/librte_eal/common/include/arch/arm/rte_memcpy_32.h | 8 ++++++-- > 2 files changed, 7 insertions(+), 2 deletions(-) > > diff --git a/config/defconfig_arm-armv7a-linuxapp-gcc b/config/defconfig_arm-armv7a-linuxapp-gcc > index 96c3343..2c60c2c 100644 > --- a/config/defconfig_arm-armv7a-linuxapp-gcc > +++ b/config/defconfig_arm-armv7a-linuxapp-gcc > @@ -36,6 +36,7 @@ CONFIG_RTE_ARCH="arm" > CONFIG_RTE_ARCH_ARM=y > CONFIG_RTE_ARCH_ARMv7=y > CONFIG_RTE_ARCH_ARM_TUNE="cortex-a9" > +CONFIG_RTE_ARCH_ARM_NEON_MEMCPY=y > If it's not always benefit, why not disable here since it is common armv7a config, and enable in your or other user's own config file? Thanks! Jianbo
On Mon, 21 Mar 2016 13:42:31 +0800 Jianbo Liu <jianbo.liu@linaro.org> wrote: > On 20 March 2016 at 03:58, Jan Viktorin <viktorin@rehivetech.com> wrote: > > The flag is used to enable memcpy optimizations in EAL. As it is not always > > the performance benefit, the flag allows to disable it. > > > > Signed-off-by: Jan Viktorin <viktorin@rehivetech.com> > > --- > > config/defconfig_arm-armv7a-linuxapp-gcc | 1 + > > lib/librte_eal/common/include/arch/arm/rte_memcpy_32.h | 8 ++++++-- > > 2 files changed, 7 insertions(+), 2 deletions(-) > > > > diff --git a/config/defconfig_arm-armv7a-linuxapp-gcc b/config/defconfig_arm-armv7a-linuxapp-gcc > > index 96c3343..2c60c2c 100644 > > --- a/config/defconfig_arm-armv7a-linuxapp-gcc > > +++ b/config/defconfig_arm-armv7a-linuxapp-gcc > > @@ -36,6 +36,7 @@ CONFIG_RTE_ARCH="arm" > > CONFIG_RTE_ARCH_ARM=y > > CONFIG_RTE_ARCH_ARMv7=y > > CONFIG_RTE_ARCH_ARM_TUNE="cortex-a9" > > +CONFIG_RTE_ARCH_ARM_NEON_MEMCPY=y > > > If it's not always benefit, why not disable here since it is common > armv7a config, and enable in your or other user's own config file? Jianbo, you are right. In that case, I'd just turn it off by default. And when there is a new platform-specific defconfig, it can enable it. Anyway, I am thinking of adding some comment into the rte_memcpy_32.h file describing the potential of the NEON code. What about: /* Enable in your defconfig to accelerate memcpy operations. Consider enabling this for Cortex-A15. For Cortex-A7 and Cortex-A9, It might accelerate short data copies (< 64 B). */ Thomas, do you consider this enough? Jan > > Thanks! > Jianbo
2016-03-21 13:21, Jan Viktorin: > On Mon, 21 Mar 2016 13:42:31 +0800 > Jianbo Liu <jianbo.liu@linaro.org> wrote: > > > On 20 March 2016 at 03:58, Jan Viktorin <viktorin@rehivetech.com> wrote: > > > The flag is used to enable memcpy optimizations in EAL. As it is not always > > > the performance benefit, the flag allows to disable it. > > > > > > Signed-off-by: Jan Viktorin <viktorin@rehivetech.com> > > > --- > > > config/defconfig_arm-armv7a-linuxapp-gcc | 1 + > > > lib/librte_eal/common/include/arch/arm/rte_memcpy_32.h | 8 ++++++-- > > > 2 files changed, 7 insertions(+), 2 deletions(-) > > > > > > diff --git a/config/defconfig_arm-armv7a-linuxapp-gcc b/config/defconfig_arm-armv7a-linuxapp-gcc > > > index 96c3343..2c60c2c 100644 > > > --- a/config/defconfig_arm-armv7a-linuxapp-gcc > > > +++ b/config/defconfig_arm-armv7a-linuxapp-gcc > > > @@ -36,6 +36,7 @@ CONFIG_RTE_ARCH="arm" > > > CONFIG_RTE_ARCH_ARM=y > > > CONFIG_RTE_ARCH_ARMv7=y > > > CONFIG_RTE_ARCH_ARM_TUNE="cortex-a9" > > > +CONFIG_RTE_ARCH_ARM_NEON_MEMCPY=y > > > > > If it's not always benefit, why not disable here since it is common > > armv7a config, and enable in your or other user's own config file? > > Jianbo, you are right. In that case, I'd just turn it off by default. > And when there is a new platform-specific defconfig, it can enable it. > > Anyway, I am thinking of adding some comment into the rte_memcpy_32.h > file describing the potential of the NEON code. What about: > > /* Enable in your defconfig to accelerate memcpy operations. Consider > enabling this for Cortex-A15. For Cortex-A7 and Cortex-A9, It might > accelerate short data copies (< 64 B). */ > > Thomas, do you consider this enough? Yes it is perfect. Why not put it in defconfig_arm-armv7a-linuxapp-gcc?
On Mon, 21 Mar 2016 06:24:37 -0700 (PDT) Thomas Monjalon <thomas.monjalon@6wind.com> wrote: > 2016-03-21 13:21, Jan Viktorin: > > On Mon, 21 Mar 2016 13:42:31 +0800 > > Jianbo Liu <jianbo.liu@linaro.org> wrote: > > > > > On 20 March 2016 at 03:58, Jan Viktorin <viktorin@rehivetech.com> wrote: > > > > The flag is used to enable memcpy optimizations in EAL. As it is not always > > > > the performance benefit, the flag allows to disable it. > > > > > > > > Signed-off-by: Jan Viktorin <viktorin@rehivetech.com> > > > > --- > > > > config/defconfig_arm-armv7a-linuxapp-gcc | 1 + > > > > lib/librte_eal/common/include/arch/arm/rte_memcpy_32.h | 8 ++++++-- > > > > 2 files changed, 7 insertions(+), 2 deletions(-) > > > > > > > > diff --git a/config/defconfig_arm-armv7a-linuxapp-gcc b/config/defconfig_arm-armv7a-linuxapp-gcc > > > > index 96c3343..2c60c2c 100644 > > > > --- a/config/defconfig_arm-armv7a-linuxapp-gcc > > > > +++ b/config/defconfig_arm-armv7a-linuxapp-gcc > > > > @@ -36,6 +36,7 @@ CONFIG_RTE_ARCH="arm" > > > > CONFIG_RTE_ARCH_ARM=y > > > > CONFIG_RTE_ARCH_ARMv7=y > > > > CONFIG_RTE_ARCH_ARM_TUNE="cortex-a9" > > > > +CONFIG_RTE_ARCH_ARM_NEON_MEMCPY=y > > > > > > > If it's not always benefit, why not disable here since it is common > > > armv7a config, and enable in your or other user's own config file? > > > > Jianbo, you are right. In that case, I'd just turn it off by default. > > And when there is a new platform-specific defconfig, it can enable it. > > > > Anyway, I am thinking of adding some comment into the rte_memcpy_32.h > > file describing the potential of the NEON code. What about: > > > > /* Enable in your defconfig to accelerate memcpy operations. Consider > > enabling this for Cortex-A15. For Cortex-A7 and Cortex-A9, It might > > accelerate short data copies (< 64 B). */ > > > > Thomas, do you consider this enough? > > Yes it is perfect. > Why not put it in defconfig_arm-armv7a-linuxapp-gcc? So, for now, I leave the patch as is and just add the comment. Jan
diff --git a/config/defconfig_arm-armv7a-linuxapp-gcc b/config/defconfig_arm-armv7a-linuxapp-gcc index 96c3343..2c60c2c 100644 --- a/config/defconfig_arm-armv7a-linuxapp-gcc +++ b/config/defconfig_arm-armv7a-linuxapp-gcc @@ -36,6 +36,7 @@ CONFIG_RTE_ARCH="arm" CONFIG_RTE_ARCH_ARM=y CONFIG_RTE_ARCH_ARMv7=y CONFIG_RTE_ARCH_ARM_TUNE="cortex-a9" +CONFIG_RTE_ARCH_ARM_NEON_MEMCPY=y CONFIG_RTE_FORCE_INTRINSICS=y CONFIG_RTE_ARCH_STRICT_ALIGN=y diff --git a/lib/librte_eal/common/include/arch/arm/rte_memcpy_32.h b/lib/librte_eal/common/include/arch/arm/rte_memcpy_32.h index ad8bc65..988125b 100644 --- a/lib/librte_eal/common/include/arch/arm/rte_memcpy_32.h +++ b/lib/librte_eal/common/include/arch/arm/rte_memcpy_32.h @@ -42,7 +42,11 @@ extern "C" { #include "generic/rte_memcpy.h" -#ifdef RTE_MACHINE_CPUFLAG_NEON +#ifdef RTE_ARCH_ARM_NEON_MEMCPY + +#ifndef RTE_MACHINE_CPUFLAG_NEON +#error "Cannot optimize memcpy by NEON as the CPU seems to not support this" +#endif /* ARM NEON Intrinsics are used to copy data */ #include <arm_neon.h> @@ -325,7 +329,7 @@ rte_memcpy_func(void *dst, const void *src, size_t n) return memcpy(dst, src, n); } -#endif /* RTE_MACHINE_CPUFLAG_NEON */ +#endif /* RTE_ARCH_ARM_NEON_MEMCPY */ #ifdef __cplusplus }