Message ID | 1459781123-7556-1-git-send-email-tomaszx.kulasek@intel.com (mailing list archive) |
---|---|
State | Accepted, archived |
Headers |
Return-Path: <dev-bounces@dpdk.org> X-Original-To: patchwork@dpdk.org Delivered-To: patchwork@dpdk.org Received: from [92.243.14.124] (localhost [IPv6:::1]) by dpdk.org (Postfix) with ESMTP id B57DE2949; Mon, 4 Apr 2016 16:45:58 +0200 (CEST) Received: from mga14.intel.com (mga14.intel.com [192.55.52.115]) by dpdk.org (Postfix) with ESMTP id 0B44C28F2 for <dev@dpdk.org>; Mon, 4 Apr 2016 16:45:56 +0200 (CEST) Received: from orsmga002.jf.intel.com ([10.7.209.21]) by fmsmga103.fm.intel.com with ESMTP; 04 Apr 2016 07:45:50 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.24,440,1455004800"; d="scan'208";a="947686515" Received: from unknown (HELO Sent) ([10.217.248.31]) by orsmga002.jf.intel.com with SMTP; 04 Apr 2016 07:45:49 -0700 Received: by Sent (sSMTP sendmail emulation); Mon, 04 Apr 2016 16:45:27 +0200 From: Tomasz Kulasek <tomaszx.kulasek@intel.com> To: dev@dpdk.org Date: Mon, 4 Apr 2016 16:45:23 +0200 Message-Id: <1459781123-7556-1-git-send-email-tomaszx.kulasek@intel.com> X-Mailer: git-send-email 2.1.4 Subject: [dpdk-dev] [PATCH] examples/l3fwd: fix segfault with gcc 5.x X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK <dev.dpdk.org> List-Unsubscribe: <http://dpdk.org/ml/options/dev>, <mailto:dev-request@dpdk.org?subject=unsubscribe> List-Archive: <http://dpdk.org/ml/archives/dev/> List-Post: <mailto:dev@dpdk.org> List-Help: <mailto:dev-request@dpdk.org?subject=help> List-Subscribe: <http://dpdk.org/ml/listinfo/dev>, <mailto:dev-request@dpdk.org?subject=subscribe> Errors-To: dev-bounces@dpdk.org Sender: "dev" <dev-bounces@dpdk.org> |
Commit Message
Tomasz Kulasek
April 4, 2016, 2:45 p.m. UTC
It seems that with gcc >5.x and -O2/-O3 optimization breaks packet grouping
algorithm.
When last packet pointer "lp" and "pnum->u64" buffer points the same
memory buffer, high optimization can cause unpredictable results. It seems
that assignment of precalculated group sizes may interfere with
initialization of new group size when lp points value inside current group
and didn't should be changed.
With gcc >5.x and optimization we cannot be sure which assignment will be
done first, so the group size can be counted incorrectly.
This patch eliminates intersection of assignment of initial group size
(lp[0] = 1) and precalculated group sizes when gptbl[v].idx < 4.
Fixes: 94c54b4158d5 ("examples/l3fwd: rework exact-match")
Signed-off-by: Tomasz Kulasek <tomaszx.kulasek@intel.com>
---
examples/l3fwd/l3fwd_sse.h | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
Comments
Hi Tomasz, > -----Original Message----- > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Tomasz Kulasek > Sent: Monday, April 04, 2016 3:45 PM > To: dev@dpdk.org > Subject: [dpdk-dev] [PATCH] examples/l3fwd: fix segfault with gcc 5.x > > It seems that with gcc >5.x and -O2/-O3 optimization breaks packet grouping > algorithm. > > When last packet pointer "lp" and "pnum->u64" buffer points the same > memory buffer, high optimization can cause unpredictable results. It seems > that assignment of precalculated group sizes may interfere with > initialization of new group size when lp points value inside current group > and didn't should be changed. > > With gcc >5.x and optimization we cannot be sure which assignment will be > done first, so the group size can be counted incorrectly. > > This patch eliminates intersection of assignment of initial group size > (lp[0] = 1) and precalculated group sizes when gptbl[v].idx < 4. > > Fixes: 94c54b4158d5 ("examples/l3fwd: rework exact-match") > > Signed-off-by: Tomasz Kulasek <tomaszx.kulasek@intel.com> > --- > examples/l3fwd/l3fwd_sse.h | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/examples/l3fwd/l3fwd_sse.h b/examples/l3fwd/l3fwd_sse.h > index f9cf50a..1afa1f0 100644 > --- a/examples/l3fwd/l3fwd_sse.h > +++ b/examples/l3fwd/l3fwd_sse.h > @@ -283,9 +283,9 @@ port_groupx4(uint16_t pn[FWDSTEP + 1], uint16_t *lp, __m128i dp1, __m128i dp2) > > /* if dest port value has changed. */ > if (v != GRPMSK) { > - lp = pnum->u16 + gptbl[v].idx; > - lp[0] = 1; > pnum->u64 = gptbl[v].pnum; > + pnum->u16[FWDSTEP] = 1; Hmm, but FWDSTEP and gptbl[v].idx are not always equal. Actually could you explain a bit more - what exactly is reordered by gcc 5.x, and how to reproduce it? i.e what sequence of input packets will trigger an error? Konstantin > + lp = pnum->u16 + gptbl[v].idx; > } > > return lp; > -- > 1.7.9.5
> -----Original Message----- > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Ananyev, > Konstantin > Sent: Monday, April 04, 2016 4:35 PM > To: Kulasek, TomaszX > Cc: dev@dpdk.org > Subject: Re: [dpdk-dev] [PATCH] examples/l3fwd: fix segfault with gcc 5.x > > Hi Tomasz, > > > -----Original Message----- > > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Tomasz Kulasek > > Sent: Monday, April 04, 2016 3:45 PM > > To: dev@dpdk.org > > Subject: [dpdk-dev] [PATCH] examples/l3fwd: fix segfault with gcc 5.x > > > > It seems that with gcc >5.x and -O2/-O3 optimization breaks packet > grouping > > algorithm. > > > > When last packet pointer "lp" and "pnum->u64" buffer points the same > > memory buffer, high optimization can cause unpredictable results. It seems > > that assignment of precalculated group sizes may interfere with > > initialization of new group size when lp points value inside current group > > and didn't should be changed. > > > > With gcc >5.x and optimization we cannot be sure which assignment will be > > done first, so the group size can be counted incorrectly. > > > > This patch eliminates intersection of assignment of initial group size > > (lp[0] = 1) and precalculated group sizes when gptbl[v].idx < 4. > > > > Fixes: 94c54b4158d5 ("examples/l3fwd: rework exact-match") > > > > Signed-off-by: Tomasz Kulasek <tomaszx.kulasek@intel.com> > > --- > > examples/l3fwd/l3fwd_sse.h | 4 ++-- > > 1 file changed, 2 insertions(+), 2 deletions(-) > > > > diff --git a/examples/l3fwd/l3fwd_sse.h b/examples/l3fwd/l3fwd_sse.h > > index f9cf50a..1afa1f0 100644 > > --- a/examples/l3fwd/l3fwd_sse.h > > +++ b/examples/l3fwd/l3fwd_sse.h > > @@ -283,9 +283,9 @@ port_groupx4(uint16_t pn[FWDSTEP + 1], uint16_t > *lp, __m128i dp1, __m128i dp2) > > > > /* if dest port value has changed. */ > > if (v != GRPMSK) { > > - lp = pnum->u16 + gptbl[v].idx; > > - lp[0] = 1; > > pnum->u64 = gptbl[v].pnum; > > + pnum->u16[FWDSTEP] = 1; > > Hmm, but FWDSTEP and gptbl[v].idx are not always equal. > Actually could you explain a bit more - what exactly is reordered by gcc 5.x, > and how to reproduce it? > i.e what sequence of input packets will trigger an error? Hi Konstantin, I could see the issue when having two flows in one port, one going to port 0 and the other to port 1 (using Exact Match). There is no issue when there is just one flow per port, using an older gcc version (< 5.0) or using O0/O1 (and of course, using LPM). Pablo > Konstantin > > > + lp = pnum->u16 + gptbl[v].idx; > > } > > > > return lp; > > -- > > 1.7.9.5
Hi Konstantin, > -----Original Message----- > From: Ananyev, Konstantin > Sent: Monday, April 4, 2016 17:35 > To: Kulasek, TomaszX <tomaszx.kulasek@intel.com> > Cc: dev@dpdk.org > Subject: RE: [dpdk-dev] [PATCH] examples/l3fwd: fix segfault with gcc 5.x > > Hi Tomasz, > > > -----Original Message----- > > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Tomasz Kulasek > > Sent: Monday, April 04, 2016 3:45 PM > > To: dev@dpdk.org > > Subject: [dpdk-dev] [PATCH] examples/l3fwd: fix segfault with gcc 5.x > > > > It seems that with gcc >5.x and -O2/-O3 optimization breaks packet > > grouping algorithm. > > > > When last packet pointer "lp" and "pnum->u64" buffer points the same > > memory buffer, high optimization can cause unpredictable results. It > > seems that assignment of precalculated group sizes may interfere with > > initialization of new group size when lp points value inside current > > group and didn't should be changed. > > > > With gcc >5.x and optimization we cannot be sure which assignment will > > be done first, so the group size can be counted incorrectly. > > > > This patch eliminates intersection of assignment of initial group size > > (lp[0] = 1) and precalculated group sizes when gptbl[v].idx < 4. > > > > Fixes: 94c54b4158d5 ("examples/l3fwd: rework exact-match") > > > > Signed-off-by: Tomasz Kulasek <tomaszx.kulasek@intel.com> > > --- > > examples/l3fwd/l3fwd_sse.h | 4 ++-- > > 1 file changed, 2 insertions(+), 2 deletions(-) > > > > diff --git a/examples/l3fwd/l3fwd_sse.h b/examples/l3fwd/l3fwd_sse.h > > index f9cf50a..1afa1f0 100644 > > --- a/examples/l3fwd/l3fwd_sse.h > > +++ b/examples/l3fwd/l3fwd_sse.h > > @@ -283,9 +283,9 @@ port_groupx4(uint16_t pn[FWDSTEP + 1], uint16_t > > *lp, __m128i dp1, __m128i dp2) > > > > /* if dest port value has changed. */ > > if (v != GRPMSK) { > > - lp = pnum->u16 + gptbl[v].idx; > > - lp[0] = 1; > > pnum->u64 = gptbl[v].pnum; > > + pnum->u16[FWDSTEP] = 1; > > Hmm, but FWDSTEP and gptbl[v].idx are not always equal. > Actually could you explain a bit more - what exactly is reordered by gcc > 5.x, and how to reproduce it? > i.e what sequence of input packets will trigger an error? > Konstantin > > > + lp = pnum->u16 + gptbl[v].idx; > > } > > > > return lp; > > -- > > 1.7.9.5 Eg. For this case, when group is changed: { /* 0xb: a == b, b == c, c != d, d == e */ .pnum = UINT64_C(0x0002000100020003), .idx = 3, .lpv = 2, }, We expect: pnum->u16 = { 3, 2, 1, 2, x } lp = pnum->u16 + 3; // should be lp[0] == 2 but for gcc 5.2 lp = pnum->u16 + gptbl[v].idx; lp[0] = 1; pnum->u64 = gptbl[v].pnum; gives, for some reason lp[0] == 1, even if pnum->u16[3] == 2. It causes, that group is shorter and fails trying to send next group with messy length. We should set lp[0] = 1 only when needed (gptbl[v].idx == 4), so this is why I set pnum->u16[4] = 1. I set it up always to prevent condition. For idx < 4 we don't need to set lp[0]. The problem is that both pointers operates on the same memory buffer and, it seems like gcc optimization will produce (it is wrong): lp = pnum->u16 + gptbl[v].idx; pnum->u64 = gptbl[v].pnum; lp[0] = 1; except: lp = pnum->u16 + gptbl[v].idx; lp[0] = 1; pnum->u64 = gptbl[v].pnum; This issue is with gcc 5.x and application seems to fail for the patterns where gptbl[v].idx < 4. Tomasz
> -----Original Message----- > From: Kulasek, TomaszX > Sent: Monday, April 04, 2016 5:20 PM > To: Ananyev, Konstantin > Cc: dev@dpdk.org > Subject: RE: [dpdk-dev] [PATCH] examples/l3fwd: fix segfault with gcc 5.x > > Hi Konstantin, > > > -----Original Message----- > > From: Ananyev, Konstantin > > Sent: Monday, April 4, 2016 17:35 > > To: Kulasek, TomaszX <tomaszx.kulasek@intel.com> > > Cc: dev@dpdk.org > > Subject: RE: [dpdk-dev] [PATCH] examples/l3fwd: fix segfault with gcc 5.x > > > > Hi Tomasz, > > > > > -----Original Message----- > > > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Tomasz Kulasek > > > Sent: Monday, April 04, 2016 3:45 PM > > > To: dev@dpdk.org > > > Subject: [dpdk-dev] [PATCH] examples/l3fwd: fix segfault with gcc 5.x > > > > > > It seems that with gcc >5.x and -O2/-O3 optimization breaks packet > > > grouping algorithm. > > > > > > When last packet pointer "lp" and "pnum->u64" buffer points the same > > > memory buffer, high optimization can cause unpredictable results. It > > > seems that assignment of precalculated group sizes may interfere with > > > initialization of new group size when lp points value inside current > > > group and didn't should be changed. > > > > > > With gcc >5.x and optimization we cannot be sure which assignment will > > > be done first, so the group size can be counted incorrectly. > > > > > > This patch eliminates intersection of assignment of initial group size > > > (lp[0] = 1) and precalculated group sizes when gptbl[v].idx < 4. > > > > > > Fixes: 94c54b4158d5 ("examples/l3fwd: rework exact-match") > > > > > > Signed-off-by: Tomasz Kulasek <tomaszx.kulasek@intel.com> > > > --- > > > examples/l3fwd/l3fwd_sse.h | 4 ++-- > > > 1 file changed, 2 insertions(+), 2 deletions(-) > > > > > > diff --git a/examples/l3fwd/l3fwd_sse.h b/examples/l3fwd/l3fwd_sse.h > > > index f9cf50a..1afa1f0 100644 > > > --- a/examples/l3fwd/l3fwd_sse.h > > > +++ b/examples/l3fwd/l3fwd_sse.h > > > @@ -283,9 +283,9 @@ port_groupx4(uint16_t pn[FWDSTEP + 1], uint16_t > > > *lp, __m128i dp1, __m128i dp2) > > > > > > /* if dest port value has changed. */ > > > if (v != GRPMSK) { > > > - lp = pnum->u16 + gptbl[v].idx; > > > - lp[0] = 1; > > > pnum->u64 = gptbl[v].pnum; > > > + pnum->u16[FWDSTEP] = 1; > > > > Hmm, but FWDSTEP and gptbl[v].idx are not always equal. > > Actually could you explain a bit more - what exactly is reordered by gcc > > 5.x, and how to reproduce it? > > i.e what sequence of input packets will trigger an error? > > Konstantin > > > > > + lp = pnum->u16 + gptbl[v].idx; > > > } > > > > > > return lp; > > > -- > > > 1.7.9.5 > > > Eg. For this case, when group is changed: > > { > /* 0xb: a == b, b == c, c != d, d == e */ > .pnum = UINT64_C(0x0002000100020003), > .idx = 3, > .lpv = 2, > }, > > We expect: > > pnum->u16 = { 3, 2, 1, 2, x } > lp = pnum->u16 + 3; > // should be lp[0] == 2 > > but for gcc 5.2 > > lp = pnum->u16 + gptbl[v].idx; > lp[0] = 1; > pnum->u64 = gptbl[v].pnum; > > gives, for some reason lp[0] == 1, even if pnum->u16[3] == 2. > > It causes, that group is shorter and fails trying to send next group with messy length. > > We should set lp[0] = 1 only when needed (gptbl[v].idx == 4), so this is why I set pnum->u16[4] = 1. I set it up always to prevent > condition. For idx < 4 we don't need to set lp[0]. > > The problem is that both pointers operates on the same memory buffer and, it seems like gcc optimization will produce (it is wrong): > > lp = pnum->u16 + gptbl[v].idx; > pnum->u64 = gptbl[v].pnum; > lp[0] = 1; > > except: > > lp = pnum->u16 + gptbl[v].idx; > lp[0] = 1; > pnum->u64 = gptbl[v].pnum; > > This issue is with gcc 5.x and application seems to fail for the patterns where gptbl[v].idx < 4. Thanks for explanation Tomasz. So it reordered: lp[0] = 1; pnum->u64 = gptbl[v].pnum; correct? My first thought was to insert a rte_complier_barrier() between these two lines, but actually your approach looks cleaner. Konstantin
> -----Original Message----- > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Tomasz Kulasek > Sent: Monday, April 04, 2016 3:45 PM > To: dev@dpdk.org > Subject: [dpdk-dev] [PATCH] examples/l3fwd: fix segfault with gcc 5.x > > It seems that with gcc >5.x and -O2/-O3 optimization breaks packet grouping > algorithm. > > When last packet pointer "lp" and "pnum->u64" buffer points the same > memory buffer, high optimization can cause unpredictable results. It seems > that assignment of precalculated group sizes may interfere with > initialization of new group size when lp points value inside current group > and didn't should be changed. > > With gcc >5.x and optimization we cannot be sure which assignment will be > done first, so the group size can be counted incorrectly. > > This patch eliminates intersection of assignment of initial group size > (lp[0] = 1) and precalculated group sizes when gptbl[v].idx < 4. > > Fixes: 94c54b4158d5 ("examples/l3fwd: rework exact-match") > > Signed-off-by: Tomasz Kulasek <tomaszx.kulasek@intel.com> > --- > examples/l3fwd/l3fwd_sse.h | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/examples/l3fwd/l3fwd_sse.h b/examples/l3fwd/l3fwd_sse.h > index f9cf50a..1afa1f0 100644 > --- a/examples/l3fwd/l3fwd_sse.h > +++ b/examples/l3fwd/l3fwd_sse.h > @@ -283,9 +283,9 @@ port_groupx4(uint16_t pn[FWDSTEP + 1], uint16_t *lp, __m128i dp1, __m128i dp2) > > /* if dest port value has changed. */ > if (v != GRPMSK) { > - lp = pnum->u16 + gptbl[v].idx; > - lp[0] = 1; > pnum->u64 = gptbl[v].pnum; > + pnum->u16[FWDSTEP] = 1; > + lp = pnum->u16 + gptbl[v].idx; > } > > return lp; > -- Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com> > 1.7.9.5
> -----Original Message----- > From: Ananyev, Konstantin > Sent: Monday, April 4, 2016 21:05 > To: Kulasek, TomaszX <tomaszx.kulasek@intel.com> > Cc: dev@dpdk.org > Subject: RE: [dpdk-dev] [PATCH] examples/l3fwd: fix segfault with gcc 5.x > > > > > -----Original Message----- > > From: Kulasek, TomaszX > > Sent: Monday, April 04, 2016 5:20 PM > > To: Ananyev, Konstantin > > Cc: dev@dpdk.org > > Subject: RE: [dpdk-dev] [PATCH] examples/l3fwd: fix segfault with gcc > > 5.x > > > > Hi Konstantin, > > > > > -----Original Message----- > > > From: Ananyev, Konstantin > > > Sent: Monday, April 4, 2016 17:35 > > > To: Kulasek, TomaszX <tomaszx.kulasek@intel.com> > > > Cc: dev@dpdk.org > > > Subject: RE: [dpdk-dev] [PATCH] examples/l3fwd: fix segfault with > > > gcc 5.x > > > > > > Hi Tomasz, > > > > > > > -----Original Message----- > > > > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Tomasz > > > > Kulasek > > > > Sent: Monday, April 04, 2016 3:45 PM > > > > To: dev@dpdk.org > > > > Subject: [dpdk-dev] [PATCH] examples/l3fwd: fix segfault with gcc > > > > 5.x > > > > > > > > It seems that with gcc >5.x and -O2/-O3 optimization breaks packet > > > > grouping algorithm. > > > > > > > > When last packet pointer "lp" and "pnum->u64" buffer points the > > > > same memory buffer, high optimization can cause unpredictable > > > > results. It seems that assignment of precalculated group sizes may > > > > interfere with initialization of new group size when lp points > > > > value inside current group and didn't should be changed. > > > > > > > > With gcc >5.x and optimization we cannot be sure which assignment > > > > will be done first, so the group size can be counted incorrectly. > > > > > > > > This patch eliminates intersection of assignment of initial group > > > > size (lp[0] = 1) and precalculated group sizes when gptbl[v].idx < > 4. > > > > > > > > Fixes: 94c54b4158d5 ("examples/l3fwd: rework exact-match") > > > > > > > > Signed-off-by: Tomasz Kulasek <tomaszx.kulasek@intel.com> > > > > --- > > > > examples/l3fwd/l3fwd_sse.h | 4 ++-- > > > > 1 file changed, 2 insertions(+), 2 deletions(-) > > > > > > > > diff --git a/examples/l3fwd/l3fwd_sse.h > > > > b/examples/l3fwd/l3fwd_sse.h index f9cf50a..1afa1f0 100644 > > > > --- a/examples/l3fwd/l3fwd_sse.h > > > > +++ b/examples/l3fwd/l3fwd_sse.h > > > > @@ -283,9 +283,9 @@ port_groupx4(uint16_t pn[FWDSTEP + 1], > > > > uint16_t *lp, __m128i dp1, __m128i dp2) > > > > > > > > /* if dest port value has changed. */ > > > > if (v != GRPMSK) { > > > > - lp = pnum->u16 + gptbl[v].idx; > > > > - lp[0] = 1; > > > > pnum->u64 = gptbl[v].pnum; > > > > + pnum->u16[FWDSTEP] = 1; > > > > > > Hmm, but FWDSTEP and gptbl[v].idx are not always equal. > > > Actually could you explain a bit more - what exactly is reordered by > > > gcc 5.x, and how to reproduce it? > > > i.e what sequence of input packets will trigger an error? > > > Konstantin > > > > > > > + lp = pnum->u16 + gptbl[v].idx; > > > > } > > > > > > > > return lp; > > > > -- > > > > 1.7.9.5 > > > > > > Eg. For this case, when group is changed: > > > > { > > /* 0xb: a == b, b == c, c != d, d == e */ > > .pnum = UINT64_C(0x0002000100020003), > > .idx = 3, > > .lpv = 2, > > }, > > > > We expect: > > > > pnum->u16 = { 3, 2, 1, 2, x } > > lp = pnum->u16 + 3; > > // should be lp[0] == 2 > > > > but for gcc 5.2 > > > > lp = pnum->u16 + gptbl[v].idx; > > lp[0] = 1; > > pnum->u64 = gptbl[v].pnum; > > > > gives, for some reason lp[0] == 1, even if pnum->u16[3] == 2. > > > > It causes, that group is shorter and fails trying to send next group > with messy length. > > > > We should set lp[0] = 1 only when needed (gptbl[v].idx == 4), so this > > is why I set pnum->u16[4] = 1. I set it up always to prevent condition. > For idx < 4 we don't need to set lp[0]. > > > > The problem is that both pointers operates on the same memory buffer > and, it seems like gcc optimization will produce (it is wrong): > > > > lp = pnum->u16 + gptbl[v].idx; > > pnum->u64 = gptbl[v].pnum; > > lp[0] = 1; > > > > except: > > > > lp = pnum->u16 + gptbl[v].idx; > > lp[0] = 1; > > pnum->u64 = gptbl[v].pnum; > > > > This issue is with gcc 5.x and application seems to fail for the > patterns where gptbl[v].idx < 4. > > > Thanks for explanation Tomasz. > So it reordered: > lp[0] = 1; > pnum->u64 = gptbl[v].pnum; > correct? > My first thought was to insert a rte_complier_barrier() between these two > lines, but actually your approach looks cleaner. > Konstantin Yes.
> > It seems that with gcc >5.x and -O2/-O3 optimization breaks packet grouping > > algorithm. > > > > When last packet pointer "lp" and "pnum->u64" buffer points the same > > memory buffer, high optimization can cause unpredictable results. It seems > > that assignment of precalculated group sizes may interfere with > > initialization of new group size when lp points value inside current group > > and didn't should be changed. > > > > With gcc >5.x and optimization we cannot be sure which assignment will be > > done first, so the group size can be counted incorrectly. > > > > This patch eliminates intersection of assignment of initial group size > > (lp[0] = 1) and precalculated group sizes when gptbl[v].idx < 4. > > > > Fixes: 94c54b4158d5 ("examples/l3fwd: rework exact-match") > > > > Signed-off-by: Tomasz Kulasek <tomaszx.kulasek@intel.com> > > Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com> Applied, thanks
diff --git a/examples/l3fwd/l3fwd_sse.h b/examples/l3fwd/l3fwd_sse.h index f9cf50a..1afa1f0 100644 --- a/examples/l3fwd/l3fwd_sse.h +++ b/examples/l3fwd/l3fwd_sse.h @@ -283,9 +283,9 @@ port_groupx4(uint16_t pn[FWDSTEP + 1], uint16_t *lp, __m128i dp1, __m128i dp2) /* if dest port value has changed. */ if (v != GRPMSK) { - lp = pnum->u16 + gptbl[v].idx; - lp[0] = 1; pnum->u64 = gptbl[v].pnum; + pnum->u16[FWDSTEP] = 1; + lp = pnum->u16 + gptbl[v].idx; } return lp;