[dpdk-dev] examples/l3fwd: fix segfault with gcc 5.x

Message ID 1459781123-7556-1-git-send-email-tomaszx.kulasek@intel.com (mailing list archive)
State Accepted, archived
Headers

Commit Message

Tomasz Kulasek April 4, 2016, 2:45 p.m. UTC
  It seems that with gcc >5.x and -O2/-O3 optimization breaks packet grouping
algorithm.

When last packet pointer "lp" and "pnum->u64" buffer points the same
memory buffer, high optimization can cause unpredictable results. It seems
that assignment of precalculated group sizes may interfere with
initialization of new group size when lp points value inside current group
and didn't should be changed.

With gcc >5.x and optimization we cannot be sure which assignment will be
done first, so the group size can be counted incorrectly.

This patch eliminates intersection of assignment of initial group size
(lp[0] = 1) and precalculated group sizes when gptbl[v].idx < 4.

Fixes: 94c54b4158d5 ("examples/l3fwd: rework exact-match")

Signed-off-by: Tomasz Kulasek <tomaszx.kulasek@intel.com>
---
 examples/l3fwd/l3fwd_sse.h |    4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)
  

Comments

Ananyev, Konstantin April 4, 2016, 3:34 p.m. UTC | #1
Hi Tomasz,

> -----Original Message-----
> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Tomasz Kulasek
> Sent: Monday, April 04, 2016 3:45 PM
> To: dev@dpdk.org
> Subject: [dpdk-dev] [PATCH] examples/l3fwd: fix segfault with gcc 5.x
> 
> It seems that with gcc >5.x and -O2/-O3 optimization breaks packet grouping
> algorithm.
> 
> When last packet pointer "lp" and "pnum->u64" buffer points the same
> memory buffer, high optimization can cause unpredictable results. It seems
> that assignment of precalculated group sizes may interfere with
> initialization of new group size when lp points value inside current group
> and didn't should be changed.
> 
> With gcc >5.x and optimization we cannot be sure which assignment will be
> done first, so the group size can be counted incorrectly.
> 
> This patch eliminates intersection of assignment of initial group size
> (lp[0] = 1) and precalculated group sizes when gptbl[v].idx < 4.
> 
> Fixes: 94c54b4158d5 ("examples/l3fwd: rework exact-match")
> 
> Signed-off-by: Tomasz Kulasek <tomaszx.kulasek@intel.com>
> ---
>  examples/l3fwd/l3fwd_sse.h |    4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/examples/l3fwd/l3fwd_sse.h b/examples/l3fwd/l3fwd_sse.h
> index f9cf50a..1afa1f0 100644
> --- a/examples/l3fwd/l3fwd_sse.h
> +++ b/examples/l3fwd/l3fwd_sse.h
> @@ -283,9 +283,9 @@ port_groupx4(uint16_t pn[FWDSTEP + 1], uint16_t *lp, __m128i dp1, __m128i dp2)
> 
>  	/* if dest port value has changed. */
>  	if (v != GRPMSK) {
> -		lp = pnum->u16 + gptbl[v].idx;
> -		lp[0] = 1;
>  		pnum->u64 = gptbl[v].pnum;
> +		pnum->u16[FWDSTEP] = 1;

Hmm, but  FWDSTEP and gptbl[v].idx are not always equal.
Actually could you explain a bit more - what exactly is reordered by gcc 5.x,
and how to reproduce it?
i.e what sequence of input packets will trigger an error? 
Konstantin

> +		lp = pnum->u16 + gptbl[v].idx;
>  	}
> 
>  	return lp;
> --
> 1.7.9.5
  
De Lara Guarch, Pablo April 4, 2016, 3:51 p.m. UTC | #2
> -----Original Message-----
> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Ananyev,
> Konstantin
> Sent: Monday, April 04, 2016 4:35 PM
> To: Kulasek, TomaszX
> Cc: dev@dpdk.org
> Subject: Re: [dpdk-dev] [PATCH] examples/l3fwd: fix segfault with gcc 5.x
> 
> Hi Tomasz,
> 
> > -----Original Message-----
> > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Tomasz Kulasek
> > Sent: Monday, April 04, 2016 3:45 PM
> > To: dev@dpdk.org
> > Subject: [dpdk-dev] [PATCH] examples/l3fwd: fix segfault with gcc 5.x
> >
> > It seems that with gcc >5.x and -O2/-O3 optimization breaks packet
> grouping
> > algorithm.
> >
> > When last packet pointer "lp" and "pnum->u64" buffer points the same
> > memory buffer, high optimization can cause unpredictable results. It seems
> > that assignment of precalculated group sizes may interfere with
> > initialization of new group size when lp points value inside current group
> > and didn't should be changed.
> >
> > With gcc >5.x and optimization we cannot be sure which assignment will be
> > done first, so the group size can be counted incorrectly.
> >
> > This patch eliminates intersection of assignment of initial group size
> > (lp[0] = 1) and precalculated group sizes when gptbl[v].idx < 4.
> >
> > Fixes: 94c54b4158d5 ("examples/l3fwd: rework exact-match")
> >
> > Signed-off-by: Tomasz Kulasek <tomaszx.kulasek@intel.com>
> > ---
> >  examples/l3fwd/l3fwd_sse.h |    4 ++--
> >  1 file changed, 2 insertions(+), 2 deletions(-)
> >
> > diff --git a/examples/l3fwd/l3fwd_sse.h b/examples/l3fwd/l3fwd_sse.h
> > index f9cf50a..1afa1f0 100644
> > --- a/examples/l3fwd/l3fwd_sse.h
> > +++ b/examples/l3fwd/l3fwd_sse.h
> > @@ -283,9 +283,9 @@ port_groupx4(uint16_t pn[FWDSTEP + 1], uint16_t
> *lp, __m128i dp1, __m128i dp2)
> >
> >  	/* if dest port value has changed. */
> >  	if (v != GRPMSK) {
> > -		lp = pnum->u16 + gptbl[v].idx;
> > -		lp[0] = 1;
> >  		pnum->u64 = gptbl[v].pnum;
> > +		pnum->u16[FWDSTEP] = 1;
> 
> Hmm, but  FWDSTEP and gptbl[v].idx are not always equal.
> Actually could you explain a bit more - what exactly is reordered by gcc 5.x,
> and how to reproduce it?
> i.e what sequence of input packets will trigger an error?

Hi Konstantin,

I could see the issue when having two flows in one port, one going to port 0 and the other to port 1 (using Exact Match).
There is no issue when there is just one flow per port, using an older gcc version (< 5.0) or using O0/O1 (and of course, using LPM).

Pablo


> Konstantin
> 
> > +		lp = pnum->u16 + gptbl[v].idx;
> >  	}
> >
> >  	return lp;
> > --
> > 1.7.9.5
  
Tomasz Kulasek April 4, 2016, 4:20 p.m. UTC | #3
Hi Konstantin,

> -----Original Message-----
> From: Ananyev, Konstantin
> Sent: Monday, April 4, 2016 17:35
> To: Kulasek, TomaszX <tomaszx.kulasek@intel.com>
> Cc: dev@dpdk.org
> Subject: RE: [dpdk-dev] [PATCH] examples/l3fwd: fix segfault with gcc 5.x
> 
> Hi Tomasz,
> 
> > -----Original Message-----
> > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Tomasz Kulasek
> > Sent: Monday, April 04, 2016 3:45 PM
> > To: dev@dpdk.org
> > Subject: [dpdk-dev] [PATCH] examples/l3fwd: fix segfault with gcc 5.x
> >
> > It seems that with gcc >5.x and -O2/-O3 optimization breaks packet
> > grouping algorithm.
> >
> > When last packet pointer "lp" and "pnum->u64" buffer points the same
> > memory buffer, high optimization can cause unpredictable results. It
> > seems that assignment of precalculated group sizes may interfere with
> > initialization of new group size when lp points value inside current
> > group and didn't should be changed.
> >
> > With gcc >5.x and optimization we cannot be sure which assignment will
> > be done first, so the group size can be counted incorrectly.
> >
> > This patch eliminates intersection of assignment of initial group size
> > (lp[0] = 1) and precalculated group sizes when gptbl[v].idx < 4.
> >
> > Fixes: 94c54b4158d5 ("examples/l3fwd: rework exact-match")
> >
> > Signed-off-by: Tomasz Kulasek <tomaszx.kulasek@intel.com>
> > ---
> >  examples/l3fwd/l3fwd_sse.h |    4 ++--
> >  1 file changed, 2 insertions(+), 2 deletions(-)
> >
> > diff --git a/examples/l3fwd/l3fwd_sse.h b/examples/l3fwd/l3fwd_sse.h
> > index f9cf50a..1afa1f0 100644
> > --- a/examples/l3fwd/l3fwd_sse.h
> > +++ b/examples/l3fwd/l3fwd_sse.h
> > @@ -283,9 +283,9 @@ port_groupx4(uint16_t pn[FWDSTEP + 1], uint16_t
> > *lp, __m128i dp1, __m128i dp2)
> >
> >  	/* if dest port value has changed. */
> >  	if (v != GRPMSK) {
> > -		lp = pnum->u16 + gptbl[v].idx;
> > -		lp[0] = 1;
> >  		pnum->u64 = gptbl[v].pnum;
> > +		pnum->u16[FWDSTEP] = 1;
> 
> Hmm, but  FWDSTEP and gptbl[v].idx are not always equal.
> Actually could you explain a bit more - what exactly is reordered by gcc
> 5.x, and how to reproduce it?
> i.e what sequence of input packets will trigger an error?
> Konstantin
> 
> > +		lp = pnum->u16 + gptbl[v].idx;
> >  	}
> >
> >  	return lp;
> > --
> > 1.7.9.5


Eg. For this case, when group is changed:

	{
		/* 0xb: a == b, b == c, c != d, d == e */
		.pnum = UINT64_C(0x0002000100020003),
		.idx = 3,
		.lpv = 2,
	},

We expect:

	pnum->u16 = { 3, 2, 1, 2, x }
	lp = pnum->u16 + 3;
	// should be lp[0] == 2

but for gcc 5.2

	lp = pnum->u16 + gptbl[v].idx;
	lp[0] = 1;
	pnum->u64 = gptbl[v].pnum;

gives, for some reason lp[0] == 1, even if pnum->u16[3] == 2.

It causes, that group is shorter and fails trying to send next group with messy length.

We should set lp[0] = 1 only when needed (gptbl[v].idx == 4), so this is why I set pnum->u16[4] = 1. I set it up always to prevent condition. For idx < 4 we don't need to set lp[0].

The problem is that both pointers operates on the same memory buffer and, it seems like gcc optimization will produce (it is wrong):

	lp = pnum->u16 + gptbl[v].idx;
	pnum->u64 = gptbl[v].pnum;
	lp[0] = 1;

except:

	lp = pnum->u16 + gptbl[v].idx;
	lp[0] = 1;
	pnum->u64 = gptbl[v].pnum;

This issue is with gcc 5.x and application seems to fail for the patterns where gptbl[v].idx < 4.

Tomasz
  
Ananyev, Konstantin April 4, 2016, 7:05 p.m. UTC | #4
> -----Original Message-----
> From: Kulasek, TomaszX
> Sent: Monday, April 04, 2016 5:20 PM
> To: Ananyev, Konstantin
> Cc: dev@dpdk.org
> Subject: RE: [dpdk-dev] [PATCH] examples/l3fwd: fix segfault with gcc 5.x
> 
> Hi Konstantin,
> 
> > -----Original Message-----
> > From: Ananyev, Konstantin
> > Sent: Monday, April 4, 2016 17:35
> > To: Kulasek, TomaszX <tomaszx.kulasek@intel.com>
> > Cc: dev@dpdk.org
> > Subject: RE: [dpdk-dev] [PATCH] examples/l3fwd: fix segfault with gcc 5.x
> >
> > Hi Tomasz,
> >
> > > -----Original Message-----
> > > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Tomasz Kulasek
> > > Sent: Monday, April 04, 2016 3:45 PM
> > > To: dev@dpdk.org
> > > Subject: [dpdk-dev] [PATCH] examples/l3fwd: fix segfault with gcc 5.x
> > >
> > > It seems that with gcc >5.x and -O2/-O3 optimization breaks packet
> > > grouping algorithm.
> > >
> > > When last packet pointer "lp" and "pnum->u64" buffer points the same
> > > memory buffer, high optimization can cause unpredictable results. It
> > > seems that assignment of precalculated group sizes may interfere with
> > > initialization of new group size when lp points value inside current
> > > group and didn't should be changed.
> > >
> > > With gcc >5.x and optimization we cannot be sure which assignment will
> > > be done first, so the group size can be counted incorrectly.
> > >
> > > This patch eliminates intersection of assignment of initial group size
> > > (lp[0] = 1) and precalculated group sizes when gptbl[v].idx < 4.
> > >
> > > Fixes: 94c54b4158d5 ("examples/l3fwd: rework exact-match")
> > >
> > > Signed-off-by: Tomasz Kulasek <tomaszx.kulasek@intel.com>
> > > ---
> > >  examples/l3fwd/l3fwd_sse.h |    4 ++--
> > >  1 file changed, 2 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/examples/l3fwd/l3fwd_sse.h b/examples/l3fwd/l3fwd_sse.h
> > > index f9cf50a..1afa1f0 100644
> > > --- a/examples/l3fwd/l3fwd_sse.h
> > > +++ b/examples/l3fwd/l3fwd_sse.h
> > > @@ -283,9 +283,9 @@ port_groupx4(uint16_t pn[FWDSTEP + 1], uint16_t
> > > *lp, __m128i dp1, __m128i dp2)
> > >
> > >  	/* if dest port value has changed. */
> > >  	if (v != GRPMSK) {
> > > -		lp = pnum->u16 + gptbl[v].idx;
> > > -		lp[0] = 1;
> > >  		pnum->u64 = gptbl[v].pnum;
> > > +		pnum->u16[FWDSTEP] = 1;
> >
> > Hmm, but  FWDSTEP and gptbl[v].idx are not always equal.
> > Actually could you explain a bit more - what exactly is reordered by gcc
> > 5.x, and how to reproduce it?
> > i.e what sequence of input packets will trigger an error?
> > Konstantin
> >
> > > +		lp = pnum->u16 + gptbl[v].idx;
> > >  	}
> > >
> > >  	return lp;
> > > --
> > > 1.7.9.5
> 
> 
> Eg. For this case, when group is changed:
> 
> 	{
> 		/* 0xb: a == b, b == c, c != d, d == e */
> 		.pnum = UINT64_C(0x0002000100020003),
> 		.idx = 3,
> 		.lpv = 2,
> 	},
> 
> We expect:
> 
> 	pnum->u16 = { 3, 2, 1, 2, x }
> 	lp = pnum->u16 + 3;
> 	// should be lp[0] == 2
> 
> but for gcc 5.2
> 
> 	lp = pnum->u16 + gptbl[v].idx;
> 	lp[0] = 1;
> 	pnum->u64 = gptbl[v].pnum;
> 
> gives, for some reason lp[0] == 1, even if pnum->u16[3] == 2.
> 
> It causes, that group is shorter and fails trying to send next group with messy length.
> 
> We should set lp[0] = 1 only when needed (gptbl[v].idx == 4), so this is why I set pnum->u16[4] = 1. I set it up always to prevent
> condition. For idx < 4 we don't need to set lp[0].
> 
> The problem is that both pointers operates on the same memory buffer and, it seems like gcc optimization will produce (it is wrong):
> 
> 	lp = pnum->u16 + gptbl[v].idx;
> 	pnum->u64 = gptbl[v].pnum;
> 	lp[0] = 1;
> 
> except:
> 
> 	lp = pnum->u16 + gptbl[v].idx;
> 	lp[0] = 1;
> 	pnum->u64 = gptbl[v].pnum;
> 
> This issue is with gcc 5.x and application seems to fail for the patterns where gptbl[v].idx < 4.


Thanks for explanation Tomasz.
So it reordered:
lp[0] = 1;
pnum->u64 = gptbl[v].pnum;
correct?
My first thought was to insert a rte_complier_barrier() between these two lines,
but actually your approach looks cleaner. 
Konstantin
  
Ananyev, Konstantin April 4, 2016, 7:06 p.m. UTC | #5
> -----Original Message-----
> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Tomasz Kulasek
> Sent: Monday, April 04, 2016 3:45 PM
> To: dev@dpdk.org
> Subject: [dpdk-dev] [PATCH] examples/l3fwd: fix segfault with gcc 5.x
> 
> It seems that with gcc >5.x and -O2/-O3 optimization breaks packet grouping
> algorithm.
> 
> When last packet pointer "lp" and "pnum->u64" buffer points the same
> memory buffer, high optimization can cause unpredictable results. It seems
> that assignment of precalculated group sizes may interfere with
> initialization of new group size when lp points value inside current group
> and didn't should be changed.
> 
> With gcc >5.x and optimization we cannot be sure which assignment will be
> done first, so the group size can be counted incorrectly.
> 
> This patch eliminates intersection of assignment of initial group size
> (lp[0] = 1) and precalculated group sizes when gptbl[v].idx < 4.
> 
> Fixes: 94c54b4158d5 ("examples/l3fwd: rework exact-match")
> 
> Signed-off-by: Tomasz Kulasek <tomaszx.kulasek@intel.com>
> ---
>  examples/l3fwd/l3fwd_sse.h |    4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/examples/l3fwd/l3fwd_sse.h b/examples/l3fwd/l3fwd_sse.h
> index f9cf50a..1afa1f0 100644
> --- a/examples/l3fwd/l3fwd_sse.h
> +++ b/examples/l3fwd/l3fwd_sse.h
> @@ -283,9 +283,9 @@ port_groupx4(uint16_t pn[FWDSTEP + 1], uint16_t *lp, __m128i dp1, __m128i dp2)
> 
>  	/* if dest port value has changed. */
>  	if (v != GRPMSK) {
> -		lp = pnum->u16 + gptbl[v].idx;
> -		lp[0] = 1;
>  		pnum->u64 = gptbl[v].pnum;
> +		pnum->u16[FWDSTEP] = 1;
> +		lp = pnum->u16 + gptbl[v].idx;
>  	}
> 
>  	return lp;
> --

Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>

> 1.7.9.5
  
Tomasz Kulasek April 5, 2016, 12:02 p.m. UTC | #6
> -----Original Message-----
> From: Ananyev, Konstantin
> Sent: Monday, April 4, 2016 21:05
> To: Kulasek, TomaszX <tomaszx.kulasek@intel.com>
> Cc: dev@dpdk.org
> Subject: RE: [dpdk-dev] [PATCH] examples/l3fwd: fix segfault with gcc 5.x
> 
> 
> 
> > -----Original Message-----
> > From: Kulasek, TomaszX
> > Sent: Monday, April 04, 2016 5:20 PM
> > To: Ananyev, Konstantin
> > Cc: dev@dpdk.org
> > Subject: RE: [dpdk-dev] [PATCH] examples/l3fwd: fix segfault with gcc
> > 5.x
> >
> > Hi Konstantin,
> >
> > > -----Original Message-----
> > > From: Ananyev, Konstantin
> > > Sent: Monday, April 4, 2016 17:35
> > > To: Kulasek, TomaszX <tomaszx.kulasek@intel.com>
> > > Cc: dev@dpdk.org
> > > Subject: RE: [dpdk-dev] [PATCH] examples/l3fwd: fix segfault with
> > > gcc 5.x
> > >
> > > Hi Tomasz,
> > >
> > > > -----Original Message-----
> > > > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Tomasz
> > > > Kulasek
> > > > Sent: Monday, April 04, 2016 3:45 PM
> > > > To: dev@dpdk.org
> > > > Subject: [dpdk-dev] [PATCH] examples/l3fwd: fix segfault with gcc
> > > > 5.x
> > > >
> > > > It seems that with gcc >5.x and -O2/-O3 optimization breaks packet
> > > > grouping algorithm.
> > > >
> > > > When last packet pointer "lp" and "pnum->u64" buffer points the
> > > > same memory buffer, high optimization can cause unpredictable
> > > > results. It seems that assignment of precalculated group sizes may
> > > > interfere with initialization of new group size when lp points
> > > > value inside current group and didn't should be changed.
> > > >
> > > > With gcc >5.x and optimization we cannot be sure which assignment
> > > > will be done first, so the group size can be counted incorrectly.
> > > >
> > > > This patch eliminates intersection of assignment of initial group
> > > > size (lp[0] = 1) and precalculated group sizes when gptbl[v].idx <
> 4.
> > > >
> > > > Fixes: 94c54b4158d5 ("examples/l3fwd: rework exact-match")
> > > >
> > > > Signed-off-by: Tomasz Kulasek <tomaszx.kulasek@intel.com>
> > > > ---
> > > >  examples/l3fwd/l3fwd_sse.h |    4 ++--
> > > >  1 file changed, 2 insertions(+), 2 deletions(-)
> > > >
> > > > diff --git a/examples/l3fwd/l3fwd_sse.h
> > > > b/examples/l3fwd/l3fwd_sse.h index f9cf50a..1afa1f0 100644
> > > > --- a/examples/l3fwd/l3fwd_sse.h
> > > > +++ b/examples/l3fwd/l3fwd_sse.h
> > > > @@ -283,9 +283,9 @@ port_groupx4(uint16_t pn[FWDSTEP + 1],
> > > > uint16_t *lp, __m128i dp1, __m128i dp2)
> > > >
> > > >  	/* if dest port value has changed. */
> > > >  	if (v != GRPMSK) {
> > > > -		lp = pnum->u16 + gptbl[v].idx;
> > > > -		lp[0] = 1;
> > > >  		pnum->u64 = gptbl[v].pnum;
> > > > +		pnum->u16[FWDSTEP] = 1;
> > >
> > > Hmm, but  FWDSTEP and gptbl[v].idx are not always equal.
> > > Actually could you explain a bit more - what exactly is reordered by
> > > gcc 5.x, and how to reproduce it?
> > > i.e what sequence of input packets will trigger an error?
> > > Konstantin
> > >
> > > > +		lp = pnum->u16 + gptbl[v].idx;
> > > >  	}
> > > >
> > > >  	return lp;
> > > > --
> > > > 1.7.9.5
> >
> >
> > Eg. For this case, when group is changed:
> >
> > 	{
> > 		/* 0xb: a == b, b == c, c != d, d == e */
> > 		.pnum = UINT64_C(0x0002000100020003),
> > 		.idx = 3,
> > 		.lpv = 2,
> > 	},
> >
> > We expect:
> >
> > 	pnum->u16 = { 3, 2, 1, 2, x }
> > 	lp = pnum->u16 + 3;
> > 	// should be lp[0] == 2
> >
> > but for gcc 5.2
> >
> > 	lp = pnum->u16 + gptbl[v].idx;
> > 	lp[0] = 1;
> > 	pnum->u64 = gptbl[v].pnum;
> >
> > gives, for some reason lp[0] == 1, even if pnum->u16[3] == 2.
> >
> > It causes, that group is shorter and fails trying to send next group
> with messy length.
> >
> > We should set lp[0] = 1 only when needed (gptbl[v].idx == 4), so this
> > is why I set pnum->u16[4] = 1. I set it up always to prevent condition.
> For idx < 4 we don't need to set lp[0].
> >
> > The problem is that both pointers operates on the same memory buffer
> and, it seems like gcc optimization will produce (it is wrong):
> >
> > 	lp = pnum->u16 + gptbl[v].idx;
> > 	pnum->u64 = gptbl[v].pnum;
> > 	lp[0] = 1;
> >
> > except:
> >
> > 	lp = pnum->u16 + gptbl[v].idx;
> > 	lp[0] = 1;
> > 	pnum->u64 = gptbl[v].pnum;
> >
> > This issue is with gcc 5.x and application seems to fail for the
> patterns where gptbl[v].idx < 4.
> 
> 
> Thanks for explanation Tomasz.
> So it reordered:
> lp[0] = 1;
> pnum->u64 = gptbl[v].pnum;
> correct?
> My first thought was to insert a rte_complier_barrier() between these two
> lines, but actually your approach looks cleaner.
> Konstantin

Yes.
  
Thomas Monjalon April 6, 2016, 9:27 a.m. UTC | #7
> > It seems that with gcc >5.x and -O2/-O3 optimization breaks packet grouping
> > algorithm.
> > 
> > When last packet pointer "lp" and "pnum->u64" buffer points the same
> > memory buffer, high optimization can cause unpredictable results. It seems
> > that assignment of precalculated group sizes may interfere with
> > initialization of new group size when lp points value inside current group
> > and didn't should be changed.
> > 
> > With gcc >5.x and optimization we cannot be sure which assignment will be
> > done first, so the group size can be counted incorrectly.
> > 
> > This patch eliminates intersection of assignment of initial group size
> > (lp[0] = 1) and precalculated group sizes when gptbl[v].idx < 4.
> > 
> > Fixes: 94c54b4158d5 ("examples/l3fwd: rework exact-match")
> > 
> > Signed-off-by: Tomasz Kulasek <tomaszx.kulasek@intel.com>
> 
> Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>

Applied, thanks
  

Patch

diff --git a/examples/l3fwd/l3fwd_sse.h b/examples/l3fwd/l3fwd_sse.h
index f9cf50a..1afa1f0 100644
--- a/examples/l3fwd/l3fwd_sse.h
+++ b/examples/l3fwd/l3fwd_sse.h
@@ -283,9 +283,9 @@  port_groupx4(uint16_t pn[FWDSTEP + 1], uint16_t *lp, __m128i dp1, __m128i dp2)
 
 	/* if dest port value has changed. */
 	if (v != GRPMSK) {
-		lp = pnum->u16 + gptbl[v].idx;
-		lp[0] = 1;
 		pnum->u64 = gptbl[v].pnum;
+		pnum->u16[FWDSTEP] = 1;
+		lp = pnum->u16 + gptbl[v].idx;
 	}
 
 	return lp;