[dpdk-dev] ixgbe: avoid unnessary break when checking at the tail of rx hwring

Message ID 1457965558-15331-1-git-send-email-jianbo.liu@linaro.org (mailing list archive)
State Rejected, archived
Delegated to: Bruce Richardson
Headers

Commit Message

Jianbo Liu March 14, 2016, 2:25 p.m. UTC
  When checking rx ring queue, it's possible that loop will break at the tail
while there are packets still in the queue header.

Signed-off-by: Jianbo Liu <jianbo.liu@linaro.org>
---
 drivers/net/ixgbe/ixgbe_rxtx_vec.c | 68 +++++++++++++++++++++-----------------
 1 file changed, 38 insertions(+), 30 deletions(-)
  

Comments

Wenzhuo Lu March 16, 2016, 6:06 a.m. UTC | #1
HI Jianbo,


> -----Original Message-----
> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Jianbo Liu
> Sent: Monday, March 14, 2016 10:26 PM
> To: Zhang, Helin; Ananyev, Konstantin; dev@dpdk.org
> Cc: Jianbo Liu
> Subject: [dpdk-dev] [PATCH] ixgbe: avoid unnessary break when checking at the
> tail of rx hwring
> 
> When checking rx ring queue, it's possible that loop will break at the tail while
> there are packets still in the queue header.
Would you like to give more details about in what scenario this issue will be hit? Thanks.
  
Jianbo Liu March 16, 2016, 7:51 a.m. UTC | #2
Hi Wenzhuo,

On 16 March 2016 at 14:06, Lu, Wenzhuo <wenzhuo.lu@intel.com> wrote:
> HI Jianbo,
>
>
>> -----Original Message-----
>> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Jianbo Liu
>> Sent: Monday, March 14, 2016 10:26 PM
>> To: Zhang, Helin; Ananyev, Konstantin; dev@dpdk.org
>> Cc: Jianbo Liu
>> Subject: [dpdk-dev] [PATCH] ixgbe: avoid unnessary break when checking at the
>> tail of rx hwring
>>
>> When checking rx ring queue, it's possible that loop will break at the tail while
>> there are packets still in the queue header.
> Would you like to give more details about in what scenario this issue will be hit? Thanks.
>

vPMD will place extra RTE_IXGBE_DESCS_PER_LOOP - 1 number of empty
descriptiors at the end of hwring to avoid overflow when do checking
on rx side.

For the loop in _recv_raw_pkts_vec(), we check 4 descriptors each
time. If all 4 DD are set, and all 4 packets are received.That's OK in
the middle.
But if come to the end of hwring, and less than 4 descriptors left, we
still need to check 4 descriptors at the same time, so the extra empty
descriptors are checked with them.
This time, the number of received packets is apparently less than 4,
and we break out of the loop because of the condition "var !=
RTE_IXGBE_DESCS_PER_LOOP".
So the problem arises. It is possible that there could be more packets
at the hwring beginning that still waiting for being received.
I think this fix can avoid this situation, and at least reduce the
latency for the packets in the header.

Thanks!
Jianbo
  
Bruce Richardson March 16, 2016, 11:14 a.m. UTC | #3
On Wed, Mar 16, 2016 at 03:51:53PM +0800, Jianbo Liu wrote:
> Hi Wenzhuo,
> 
> On 16 March 2016 at 14:06, Lu, Wenzhuo <wenzhuo.lu@intel.com> wrote:
> > HI Jianbo,
> >
> >
> >> -----Original Message-----
> >> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Jianbo Liu
> >> Sent: Monday, March 14, 2016 10:26 PM
> >> To: Zhang, Helin; Ananyev, Konstantin; dev@dpdk.org
> >> Cc: Jianbo Liu
> >> Subject: [dpdk-dev] [PATCH] ixgbe: avoid unnessary break when checking at the
> >> tail of rx hwring
> >>
> >> When checking rx ring queue, it's possible that loop will break at the tail while
> >> there are packets still in the queue header.
> > Would you like to give more details about in what scenario this issue will be hit? Thanks.
> >
> 
> vPMD will place extra RTE_IXGBE_DESCS_PER_LOOP - 1 number of empty
> descriptiors at the end of hwring to avoid overflow when do checking
> on rx side.
> 
> For the loop in _recv_raw_pkts_vec(), we check 4 descriptors each
> time. If all 4 DD are set, and all 4 packets are received.That's OK in
> the middle.
> But if come to the end of hwring, and less than 4 descriptors left, we
> still need to check 4 descriptors at the same time, so the extra empty
> descriptors are checked with them.
> This time, the number of received packets is apparently less than 4,
> and we break out of the loop because of the condition "var !=
> RTE_IXGBE_DESCS_PER_LOOP".
> So the problem arises. It is possible that there could be more packets
> at the hwring beginning that still waiting for being received.
> I think this fix can avoid this situation, and at least reduce the
> latency for the packets in the header.
> 
Packets are always received in order from the NIC, so no packets ever get left
behind or skipped on an RX burst call.

/Bruce
  
Jianbo Liu March 17, 2016, 2:20 a.m. UTC | #4
On 16 March 2016 at 19:14, Bruce Richardson <bruce.richardson@intel.com> wrote:
> On Wed, Mar 16, 2016 at 03:51:53PM +0800, Jianbo Liu wrote:
>> Hi Wenzhuo,
>>
>> On 16 March 2016 at 14:06, Lu, Wenzhuo <wenzhuo.lu@intel.com> wrote:
>> > HI Jianbo,
>> >
>> >
>> >> -----Original Message-----
>> >> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Jianbo Liu
>> >> Sent: Monday, March 14, 2016 10:26 PM
>> >> To: Zhang, Helin; Ananyev, Konstantin; dev@dpdk.org
>> >> Cc: Jianbo Liu
>> >> Subject: [dpdk-dev] [PATCH] ixgbe: avoid unnessary break when checking at the
>> >> tail of rx hwring
>> >>
>> >> When checking rx ring queue, it's possible that loop will break at the tail while
>> >> there are packets still in the queue header.
>> > Would you like to give more details about in what scenario this issue will be hit? Thanks.
>> >
>>
>> vPMD will place extra RTE_IXGBE_DESCS_PER_LOOP - 1 number of empty
>> descriptiors at the end of hwring to avoid overflow when do checking
>> on rx side.
>>
>> For the loop in _recv_raw_pkts_vec(), we check 4 descriptors each
>> time. If all 4 DD are set, and all 4 packets are received.That's OK in
>> the middle.
>> But if come to the end of hwring, and less than 4 descriptors left, we
>> still need to check 4 descriptors at the same time, so the extra empty
>> descriptors are checked with them.
>> This time, the number of received packets is apparently less than 4,
>> and we break out of the loop because of the condition "var !=
>> RTE_IXGBE_DESCS_PER_LOOP".
>> So the problem arises. It is possible that there could be more packets
>> at the hwring beginning that still waiting for being received.
>> I think this fix can avoid this situation, and at least reduce the
>> latency for the packets in the header.
>>
> Packets are always received in order from the NIC, so no packets ever get left
> behind or skipped on an RX burst call.
>
> /Bruce
>

I knew packets are received in order, and no packets will be skipped,
but some will be left behind as I explained above.
vPMD will not received nb_pkts required by one RX burst call, and
those at the beginning of hwring are still waiting to be received till
the next call.

Thanks!
Jianbo
  
Bruce Richardson March 18, 2016, 10:03 a.m. UTC | #5
On Thu, Mar 17, 2016 at 10:20:01AM +0800, Jianbo Liu wrote:
> On 16 March 2016 at 19:14, Bruce Richardson <bruce.richardson@intel.com> wrote:
> > On Wed, Mar 16, 2016 at 03:51:53PM +0800, Jianbo Liu wrote:
> >> Hi Wenzhuo,
> >>
> >> On 16 March 2016 at 14:06, Lu, Wenzhuo <wenzhuo.lu@intel.com> wrote:
> >> > HI Jianbo,
> >> >
> >> >
> >> >> -----Original Message-----
> >> >> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Jianbo Liu
> >> >> Sent: Monday, March 14, 2016 10:26 PM
> >> >> To: Zhang, Helin; Ananyev, Konstantin; dev@dpdk.org
> >> >> Cc: Jianbo Liu
> >> >> Subject: [dpdk-dev] [PATCH] ixgbe: avoid unnessary break when checking at the
> >> >> tail of rx hwring
> >> >>
> >> >> When checking rx ring queue, it's possible that loop will break at the tail while
> >> >> there are packets still in the queue header.
> >> > Would you like to give more details about in what scenario this issue will be hit? Thanks.
> >> >
> >>
> >> vPMD will place extra RTE_IXGBE_DESCS_PER_LOOP - 1 number of empty
> >> descriptiors at the end of hwring to avoid overflow when do checking
> >> on rx side.
> >>
> >> For the loop in _recv_raw_pkts_vec(), we check 4 descriptors each
> >> time. If all 4 DD are set, and all 4 packets are received.That's OK in
> >> the middle.
> >> But if come to the end of hwring, and less than 4 descriptors left, we
> >> still need to check 4 descriptors at the same time, so the extra empty
> >> descriptors are checked with them.
> >> This time, the number of received packets is apparently less than 4,
> >> and we break out of the loop because of the condition "var !=
> >> RTE_IXGBE_DESCS_PER_LOOP".
> >> So the problem arises. It is possible that there could be more packets
> >> at the hwring beginning that still waiting for being received.
> >> I think this fix can avoid this situation, and at least reduce the
> >> latency for the packets in the header.
> >>
> > Packets are always received in order from the NIC, so no packets ever get left
> > behind or skipped on an RX burst call.
> >
> > /Bruce
> >
> 
> I knew packets are received in order, and no packets will be skipped,
> but some will be left behind as I explained above.
> vPMD will not received nb_pkts required by one RX burst call, and
> those at the beginning of hwring are still waiting to be received till
> the next call.
> 
> Thanks!
> Jianbo
HI Jianbo,

ok, I understand now. I'm not sure that this is a significant problem though,
since we are working in polling mode. Is there a performance impact to your
change, because I don't think that we can reduce performance just to fix this?

Regards,
/Bruce
  
Jianbo Liu March 21, 2016, 2:26 a.m. UTC | #6
On 18 March 2016 at 18:03, Bruce Richardson <bruce.richardson@intel.com> wrote:
> On Thu, Mar 17, 2016 at 10:20:01AM +0800, Jianbo Liu wrote:
>> On 16 March 2016 at 19:14, Bruce Richardson <bruce.richardson@intel.com> wrote:
>> > On Wed, Mar 16, 2016 at 03:51:53PM +0800, Jianbo Liu wrote:
>> >> Hi Wenzhuo,
>> >>
>> >> On 16 March 2016 at 14:06, Lu, Wenzhuo <wenzhuo.lu@intel.com> wrote:
>> >> > HI Jianbo,
>> >> >
>> >> >
>> >> >> -----Original Message-----
>> >> >> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Jianbo Liu
>> >> >> Sent: Monday, March 14, 2016 10:26 PM
>> >> >> To: Zhang, Helin; Ananyev, Konstantin; dev@dpdk.org
>> >> >> Cc: Jianbo Liu
>> >> >> Subject: [dpdk-dev] [PATCH] ixgbe: avoid unnessary break when checking at the
>> >> >> tail of rx hwring
>> >> >>
>> >> >> When checking rx ring queue, it's possible that loop will break at the tail while
>> >> >> there are packets still in the queue header.
>> >> > Would you like to give more details about in what scenario this issue will be hit? Thanks.
>> >> >
>> >>
>> >> vPMD will place extra RTE_IXGBE_DESCS_PER_LOOP - 1 number of empty
>> >> descriptiors at the end of hwring to avoid overflow when do checking
>> >> on rx side.
>> >>
>> >> For the loop in _recv_raw_pkts_vec(), we check 4 descriptors each
>> >> time. If all 4 DD are set, and all 4 packets are received.That's OK in
>> >> the middle.
>> >> But if come to the end of hwring, and less than 4 descriptors left, we
>> >> still need to check 4 descriptors at the same time, so the extra empty
>> >> descriptors are checked with them.
>> >> This time, the number of received packets is apparently less than 4,
>> >> and we break out of the loop because of the condition "var !=
>> >> RTE_IXGBE_DESCS_PER_LOOP".
>> >> So the problem arises. It is possible that there could be more packets
>> >> at the hwring beginning that still waiting for being received.
>> >> I think this fix can avoid this situation, and at least reduce the
>> >> latency for the packets in the header.
>> >>
>> > Packets are always received in order from the NIC, so no packets ever get left
>> > behind or skipped on an RX burst call.
>> >
>> > /Bruce
>> >
>>
>> I knew packets are received in order, and no packets will be skipped,
>> but some will be left behind as I explained above.
>> vPMD will not received nb_pkts required by one RX burst call, and
>> those at the beginning of hwring are still waiting to be received till
>> the next call.
>>
>> Thanks!
>> Jianbo
> HI Jianbo,
>
> ok, I understand now. I'm not sure that this is a significant problem though,
> since we are working in polling mode. Is there a performance impact to your
> change, because I don't think that we can reduce performance just to fix this?
>
> Regards,
> /Bruce
It will be a problem because the possibility could be high.
Considering rx hwring size is 128 and rx burst is 32, the possiblity
can be 32/128.
I know this change is critical, so I want you (and maintainers) to do
full evaluations about throughput/latency..before making conclusion.

Jianbo
  
Ananyev, Konstantin March 22, 2016, 2:27 p.m. UTC | #7
> -----Original Message-----

> From: Jianbo Liu [mailto:jianbo.liu@linaro.org]

> Sent: Monday, March 21, 2016 2:27 AM

> To: Richardson, Bruce

> Cc: Lu, Wenzhuo; Zhang, Helin; Ananyev, Konstantin; dev@dpdk.org

> Subject: Re: [dpdk-dev] [PATCH] ixgbe: avoid unnessary break when checking at the tail of rx hwring

> 

> On 18 March 2016 at 18:03, Bruce Richardson <bruce.richardson@intel.com> wrote:

> > On Thu, Mar 17, 2016 at 10:20:01AM +0800, Jianbo Liu wrote:

> >> On 16 March 2016 at 19:14, Bruce Richardson <bruce.richardson@intel.com> wrote:

> >> > On Wed, Mar 16, 2016 at 03:51:53PM +0800, Jianbo Liu wrote:

> >> >> Hi Wenzhuo,

> >> >>

> >> >> On 16 March 2016 at 14:06, Lu, Wenzhuo <wenzhuo.lu@intel.com> wrote:

> >> >> > HI Jianbo,

> >> >> >

> >> >> >

> >> >> >> -----Original Message-----

> >> >> >> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Jianbo Liu

> >> >> >> Sent: Monday, March 14, 2016 10:26 PM

> >> >> >> To: Zhang, Helin; Ananyev, Konstantin; dev@dpdk.org

> >> >> >> Cc: Jianbo Liu

> >> >> >> Subject: [dpdk-dev] [PATCH] ixgbe: avoid unnessary break when checking at the

> >> >> >> tail of rx hwring

> >> >> >>

> >> >> >> When checking rx ring queue, it's possible that loop will break at the tail while

> >> >> >> there are packets still in the queue header.

> >> >> > Would you like to give more details about in what scenario this issue will be hit? Thanks.

> >> >> >

> >> >>

> >> >> vPMD will place extra RTE_IXGBE_DESCS_PER_LOOP - 1 number of empty

> >> >> descriptiors at the end of hwring to avoid overflow when do checking

> >> >> on rx side.

> >> >>

> >> >> For the loop in _recv_raw_pkts_vec(), we check 4 descriptors each

> >> >> time. If all 4 DD are set, and all 4 packets are received.That's OK in

> >> >> the middle.

> >> >> But if come to the end of hwring, and less than 4 descriptors left, we

> >> >> still need to check 4 descriptors at the same time, so the extra empty

> >> >> descriptors are checked with them.

> >> >> This time, the number of received packets is apparently less than 4,

> >> >> and we break out of the loop because of the condition "var !=

> >> >> RTE_IXGBE_DESCS_PER_LOOP".

> >> >> So the problem arises. It is possible that there could be more packets

> >> >> at the hwring beginning that still waiting for being received.

> >> >> I think this fix can avoid this situation, and at least reduce the

> >> >> latency for the packets in the header.

> >> >>

> >> > Packets are always received in order from the NIC, so no packets ever get left

> >> > behind or skipped on an RX burst call.

> >> >

> >> > /Bruce

> >> >

> >>

> >> I knew packets are received in order, and no packets will be skipped,

> >> but some will be left behind as I explained above.

> >> vPMD will not received nb_pkts required by one RX burst call, and

> >> those at the beginning of hwring are still waiting to be received till

> >> the next call.

> >>

> >> Thanks!

> >> Jianbo

> > HI Jianbo,

> >

> > ok, I understand now. I'm not sure that this is a significant problem though,

> > since we are working in polling mode. Is there a performance impact to your

> > change, because I don't think that we can reduce performance just to fix this?

> >

> > Regards,

> > /Bruce

> It will be a problem because the possibility could be high.

> Considering rx hwring size is 128 and rx burst is 32, the possiblity

> can be 32/128.

> I know this change is critical, so I want you (and maintainers) to do

> full evaluations about throughput/latency..before making conclusion.


I am still not sure what is a problem you are trying to solve here.
Yes recv_raw_pkts_vec() call wouldn't wrap around HW ring boundary,  
and yes can return less packets that are actually available by the HW.
Though as Bruce pointed, they'll be returned to the user by next call.
Actually recv_pkts_bulk_alloc() works in a similar way.
Why do you consider that as a problem?
Konstantin

> 

> Jianbo
  
Jianbo Liu March 25, 2016, 8:53 a.m. UTC | #8
On 22 March 2016 at 22:27, Ananyev, Konstantin
<konstantin.ananyev@intel.com> wrote:
>
>
>> -----Original Message-----
>> From: Jianbo Liu [mailto:jianbo.liu@linaro.org]
>> Sent: Monday, March 21, 2016 2:27 AM
>> To: Richardson, Bruce
>> Cc: Lu, Wenzhuo; Zhang, Helin; Ananyev, Konstantin; dev@dpdk.org
>> Subject: Re: [dpdk-dev] [PATCH] ixgbe: avoid unnessary break when checking at the tail of rx hwring
>>
>> On 18 March 2016 at 18:03, Bruce Richardson <bruce.richardson@intel.com> wrote:
>> > On Thu, Mar 17, 2016 at 10:20:01AM +0800, Jianbo Liu wrote:
>> >> On 16 March 2016 at 19:14, Bruce Richardson <bruce.richardson@intel.com> wrote:
>> >> > On Wed, Mar 16, 2016 at 03:51:53PM +0800, Jianbo Liu wrote:
>> >> >> Hi Wenzhuo,
>> >> >>
>> >> >> On 16 March 2016 at 14:06, Lu, Wenzhuo <wenzhuo.lu@intel.com> wrote:
>> >> >> > HI Jianbo,
>> >> >> >
>> >> >> >
>> >> >> >> -----Original Message-----
>> >> >> >> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Jianbo Liu
>> >> >> >> Sent: Monday, March 14, 2016 10:26 PM
>> >> >> >> To: Zhang, Helin; Ananyev, Konstantin; dev@dpdk.org
>> >> >> >> Cc: Jianbo Liu
>> >> >> >> Subject: [dpdk-dev] [PATCH] ixgbe: avoid unnessary break when checking at the
>> >> >> >> tail of rx hwring
>> >> >> >>
>> >> >> >> When checking rx ring queue, it's possible that loop will break at the tail while
>> >> >> >> there are packets still in the queue header.
>> >> >> > Would you like to give more details about in what scenario this issue will be hit? Thanks.
>> >> >> >
>> >> >>
>> >> >> vPMD will place extra RTE_IXGBE_DESCS_PER_LOOP - 1 number of empty
>> >> >> descriptiors at the end of hwring to avoid overflow when do checking
>> >> >> on rx side.
>> >> >>
>> >> >> For the loop in _recv_raw_pkts_vec(), we check 4 descriptors each
>> >> >> time. If all 4 DD are set, and all 4 packets are received.That's OK in
>> >> >> the middle.
>> >> >> But if come to the end of hwring, and less than 4 descriptors left, we
>> >> >> still need to check 4 descriptors at the same time, so the extra empty
>> >> >> descriptors are checked with them.
>> >> >> This time, the number of received packets is apparently less than 4,
>> >> >> and we break out of the loop because of the condition "var !=
>> >> >> RTE_IXGBE_DESCS_PER_LOOP".
>> >> >> So the problem arises. It is possible that there could be more packets
>> >> >> at the hwring beginning that still waiting for being received.
>> >> >> I think this fix can avoid this situation, and at least reduce the
>> >> >> latency for the packets in the header.
>> >> >>
>> >> > Packets are always received in order from the NIC, so no packets ever get left
>> >> > behind or skipped on an RX burst call.
>> >> >
>> >> > /Bruce
>> >> >
>> >>
>> >> I knew packets are received in order, and no packets will be skipped,
>> >> but some will be left behind as I explained above.
>> >> vPMD will not received nb_pkts required by one RX burst call, and
>> >> those at the beginning of hwring are still waiting to be received till
>> >> the next call.
>> >>
>> >> Thanks!
>> >> Jianbo
>> > HI Jianbo,
>> >
>> > ok, I understand now. I'm not sure that this is a significant problem though,
>> > since we are working in polling mode. Is there a performance impact to your
>> > change, because I don't think that we can reduce performance just to fix this?
>> >
>> > Regards,
>> > /Bruce
>> It will be a problem because the possibility could be high.
>> Considering rx hwring size is 128 and rx burst is 32, the possiblity
>> can be 32/128.
>> I know this change is critical, so I want you (and maintainers) to do
>> full evaluations about throughput/latency..before making conclusion.
>
> I am still not sure what is a problem you are trying to solve here.
> Yes recv_raw_pkts_vec() call wouldn't wrap around HW ring boundary,
> and yes can return less packets that are actually available by the HW.
> Though as Bruce pointed, they'll be returned to the user by next call.
Have you thought of the interval between these two call, how long could it be?
If application is a simple one like l2fwd/testpmd, that's fine.
But if the interval is long because application has more work to do,
they are different.

> Actually recv_pkts_bulk_alloc() works in a similar way.
> Why do you consider that as a problem?
Driver should pull packets out of hardware and give them to APP as
fast as possible.
If not, there is a possibility that overflow the hardware queue by
more incoming packets.

I did some testings with pktgen-dpdk, and it behaves a little better
with this patch (at least not worse).
Sorry I can't provide more concreate evidences because I don't have
ixia/sprint equipment at hand.
That's why I asked you to do full evaluations before reject this patch. :-)

Thanks!

> Konstantin
>
>>
>> Jianbo
  
Xu, Qian Q March 28, 2016, 2:30 a.m. UTC | #9
Jianbo
Could you tell me the case that can reproduce the issue? We can help evaluate the impact of performance on ixgbe, but I'm not sure how to check if your patch really fix a problem because I don’t know how to reproduce the problem! Could you first teach me on how to reproduce your issue? Or you may not reproduce it by yourself? 

Thanks
Qian


-----Original Message-----
From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Jianbo Liu

Sent: Friday, March 25, 2016 4:53 PM
To: Ananyev, Konstantin
Cc: Richardson, Bruce; Lu, Wenzhuo; Zhang, Helin; dev@dpdk.org
Subject: Re: [dpdk-dev] [PATCH] ixgbe: avoid unnessary break when checking at the tail of rx hwring

On 22 March 2016 at 22:27, Ananyev, Konstantin <konstantin.ananyev@intel.com> wrote:
>

>

>> -----Original Message-----

>> From: Jianbo Liu [mailto:jianbo.liu@linaro.org]

>> Sent: Monday, March 21, 2016 2:27 AM

>> To: Richardson, Bruce

>> Cc: Lu, Wenzhuo; Zhang, Helin; Ananyev, Konstantin; dev@dpdk.org

>> Subject: Re: [dpdk-dev] [PATCH] ixgbe: avoid unnessary break when 

>> checking at the tail of rx hwring

>>

>> On 18 March 2016 at 18:03, Bruce Richardson <bruce.richardson@intel.com> wrote:

>> > On Thu, Mar 17, 2016 at 10:20:01AM +0800, Jianbo Liu wrote:

>> >> On 16 March 2016 at 19:14, Bruce Richardson <bruce.richardson@intel.com> wrote:

>> >> > On Wed, Mar 16, 2016 at 03:51:53PM +0800, Jianbo Liu wrote:

>> >> >> Hi Wenzhuo,

>> >> >>

>> >> >> On 16 March 2016 at 14:06, Lu, Wenzhuo <wenzhuo.lu@intel.com> wrote:

>> >> >> > HI Jianbo,

>> >> >> >

>> >> >> >

>> >> >> >> -----Original Message-----

>> >> >> >> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Jianbo 

>> >> >> >> Liu

>> >> >> >> Sent: Monday, March 14, 2016 10:26 PM

>> >> >> >> To: Zhang, Helin; Ananyev, Konstantin; dev@dpdk.org

>> >> >> >> Cc: Jianbo Liu

>> >> >> >> Subject: [dpdk-dev] [PATCH] ixgbe: avoid unnessary break 

>> >> >> >> when checking at the tail of rx hwring

>> >> >> >>

>> >> >> >> When checking rx ring queue, it's possible that loop will 

>> >> >> >> break at the tail while there are packets still in the queue header.

>> >> >> > Would you like to give more details about in what scenario this issue will be hit? Thanks.

>> >> >> >

>> >> >>

>> >> >> vPMD will place extra RTE_IXGBE_DESCS_PER_LOOP - 1 number of 

>> >> >> empty descriptiors at the end of hwring to avoid overflow when 

>> >> >> do checking on rx side.

>> >> >>

>> >> >> For the loop in _recv_raw_pkts_vec(), we check 4 descriptors 

>> >> >> each time. If all 4 DD are set, and all 4 packets are 

>> >> >> received.That's OK in the middle.

>> >> >> But if come to the end of hwring, and less than 4 descriptors 

>> >> >> left, we still need to check 4 descriptors at the same time, so 

>> >> >> the extra empty descriptors are checked with them.

>> >> >> This time, the number of received packets is apparently less 

>> >> >> than 4, and we break out of the loop because of the condition 

>> >> >> "var != RTE_IXGBE_DESCS_PER_LOOP".

>> >> >> So the problem arises. It is possible that there could be more 

>> >> >> packets at the hwring beginning that still waiting for being received.

>> >> >> I think this fix can avoid this situation, and at least reduce 

>> >> >> the latency for the packets in the header.

>> >> >>

>> >> > Packets are always received in order from the NIC, so no packets 

>> >> > ever get left behind or skipped on an RX burst call.

>> >> >

>> >> > /Bruce

>> >> >

>> >>

>> >> I knew packets are received in order, and no packets will be 

>> >> skipped, but some will be left behind as I explained above.

>> >> vPMD will not received nb_pkts required by one RX burst call, and 

>> >> those at the beginning of hwring are still waiting to be received 

>> >> till the next call.

>> >>

>> >> Thanks!

>> >> Jianbo

>> > HI Jianbo,

>> >

>> > ok, I understand now. I'm not sure that this is a significant 

>> > problem though, since we are working in polling mode. Is there a 

>> > performance impact to your change, because I don't think that we can reduce performance just to fix this?

>> >

>> > Regards,

>> > /Bruce

>> It will be a problem because the possibility could be high.

>> Considering rx hwring size is 128 and rx burst is 32, the possiblity 

>> can be 32/128.

>> I know this change is critical, so I want you (and maintainers) to do 

>> full evaluations about throughput/latency..before making conclusion.

>

> I am still not sure what is a problem you are trying to solve here.

> Yes recv_raw_pkts_vec() call wouldn't wrap around HW ring boundary, 

> and yes can return less packets that are actually available by the HW.

> Though as Bruce pointed, they'll be returned to the user by next call.

Have you thought of the interval between these two call, how long could it be?
If application is a simple one like l2fwd/testpmd, that's fine.
But if the interval is long because application has more work to do, they are different.

> Actually recv_pkts_bulk_alloc() works in a similar way.

> Why do you consider that as a problem?

Driver should pull packets out of hardware and give them to APP as fast as possible.
If not, there is a possibility that overflow the hardware queue by more incoming packets.

I did some testings with pktgen-dpdk, and it behaves a little better with this patch (at least not worse).
Sorry I can't provide more concreate evidences because I don't have ixia/sprint equipment at hand.
That's why I asked you to do full evaluations before reject this patch. :-)

Thanks!

> Konstantin

>

>>

>> Jianbo
  
Jianbo Liu March 28, 2016, 8:48 a.m. UTC | #10
Hi Qian,

On 28 March 2016 at 10:30, Xu, Qian Q <qian.q.xu@intel.com> wrote:
> Jianbo
> Could you tell me the case that can reproduce the issue? We can help evaluate the impact of performance on ixgbe, but I'm not sure how to check if your patch really fix a problem because I don’t know how to reproduce the problem! Could you first teach me on how to reproduce your issue? Or you may not reproduce it by yourself?
>
It is more an refactoring to original design than fixing an issue. So
I don't know how to reproduce either.
Can you use your usual performance testing cases first, and see if
there is any impact or improvement?

Thanks!
Jianbo
  
Bruce Richardson June 17, 2016, 10:09 a.m. UTC | #11
On Mon, Mar 28, 2016 at 04:48:17PM +0800, Jianbo Liu wrote:
> Hi Qian,
> 
> On 28 March 2016 at 10:30, Xu, Qian Q <qian.q.xu@intel.com> wrote:
> > Jianbo
> > Could you tell me the case that can reproduce the issue? We can help evaluate the impact of performance on ixgbe, but I'm not sure how to check if your patch really fix a problem because I don’t know how to reproduce the problem! Could you first teach me on how to reproduce your issue? Or you may not reproduce it by yourself?
> >
> It is more an refactoring to original design than fixing an issue. So
> I don't know how to reproduce either.
> Can you use your usual performance testing cases first, and see if
> there is any impact or improvement?
> 

Since there is no further discussion or update on this patch, I'm going to mark
it as rejected in patchwork, rather than have it live on as a zombie patch.

If this change is wanted for 16.11 or any subsequent release, please resubmit
it for consideration with any performance data justifications (and a reference
back to this thread).

Thanks,
/Bruce
  

Patch

diff --git a/drivers/net/ixgbe/ixgbe_rxtx_vec.c b/drivers/net/ixgbe/ixgbe_rxtx_vec.c
index ccd93c7..611e431 100644
--- a/drivers/net/ixgbe/ixgbe_rxtx_vec.c
+++ b/drivers/net/ixgbe/ixgbe_rxtx_vec.c
@@ -206,10 +206,9 @@  static inline uint16_t
 _recv_raw_pkts_vec(struct ixgbe_rx_queue *rxq, struct rte_mbuf **rx_pkts,
 		uint16_t nb_pkts, uint8_t *split_packet)
 {
-	volatile union ixgbe_adv_rx_desc *rxdp;
+	volatile union ixgbe_adv_rx_desc *rxdp, *rxdp_end;
 	struct ixgbe_rx_entry *sw_ring;
-	uint16_t nb_pkts_recd;
-	int pos;
+	uint16_t rev;
 	uint64_t var;
 	__m128i shuf_msk;
 	__m128i crc_adjust = _mm_set_epi16(
@@ -232,6 +231,7 @@  _recv_raw_pkts_vec(struct ixgbe_rx_queue *rxq, struct rte_mbuf **rx_pkts,
 	/* Just the act of getting into the function from the application is
 	 * going to cost about 7 cycles */
 	rxdp = rxq->rx_ring + rxq->rx_tail;
+	rxdp_end = rxq->rx_ring + rxq->nb_rx_desc;
 
 	_mm_prefetch((const void *)rxdp, _MM_HINT_T0);
 
@@ -275,9 +275,7 @@  _recv_raw_pkts_vec(struct ixgbe_rx_queue *rxq, struct rte_mbuf **rx_pkts,
 	 * [C*. extract the end-of-packet bit, if requested]
 	 * D. fill info. from desc to mbuf
 	 */
-	for (pos = 0, nb_pkts_recd = 0; pos < nb_pkts;
-			pos += RTE_IXGBE_DESCS_PER_LOOP,
-			rxdp += RTE_IXGBE_DESCS_PER_LOOP) {
+	for (rev = 0; rev < nb_pkts; ) {
 		__m128i descs0[RTE_IXGBE_DESCS_PER_LOOP];
 		__m128i descs[RTE_IXGBE_DESCS_PER_LOOP];
 		__m128i pkt_mb1, pkt_mb2, pkt_mb3, pkt_mb4;
@@ -285,17 +283,17 @@  _recv_raw_pkts_vec(struct ixgbe_rx_queue *rxq, struct rte_mbuf **rx_pkts,
 		__m128i mbp1, mbp2; /* two mbuf pointer in one XMM reg. */
 
 		/* B.1 load 1 mbuf point */
-		mbp1 = _mm_loadu_si128((__m128i *)&sw_ring[pos]);
+		mbp1 = _mm_loadu_si128((__m128i *)&sw_ring[0]);
 
 		/* Read desc statuses backwards to avoid race condition */
 		/* A.1 load 4 pkts desc */
 		descs0[3] = _mm_loadu_si128((__m128i *)(rxdp + 3));
 
 		/* B.2 copy 2 mbuf point into rx_pkts  */
-		_mm_storeu_si128((__m128i *)&rx_pkts[pos], mbp1);
+		_mm_storeu_si128((__m128i *)&rx_pkts[rev], mbp1);
 
 		/* B.1 load 1 mbuf point */
-		mbp2 = _mm_loadu_si128((__m128i *)&sw_ring[pos+2]);
+		mbp2 = _mm_loadu_si128((__m128i *)&sw_ring[2]);
 
 		descs0[2] = _mm_loadu_si128((__m128i *)(rxdp + 2));
 		/* B.1 load 2 mbuf point */
@@ -303,13 +301,13 @@  _recv_raw_pkts_vec(struct ixgbe_rx_queue *rxq, struct rte_mbuf **rx_pkts,
 		descs0[0] = _mm_loadu_si128((__m128i *)(rxdp));
 
 		/* B.2 copy 2 mbuf point into rx_pkts  */
-		_mm_storeu_si128((__m128i *)&rx_pkts[pos+2], mbp2);
+		_mm_storeu_si128((__m128i *)&rx_pkts[rev + 2], mbp2);
 
 		if (split_packet) {
-			rte_prefetch0(&rx_pkts[pos]->cacheline1);
-			rte_prefetch0(&rx_pkts[pos + 1]->cacheline1);
-			rte_prefetch0(&rx_pkts[pos + 2]->cacheline1);
-			rte_prefetch0(&rx_pkts[pos + 3]->cacheline1);
+			rte_prefetch0(&rx_pkts[rev]->cacheline1);
+			rte_prefetch0(&rx_pkts[rev + 1]->cacheline1);
+			rte_prefetch0(&rx_pkts[rev + 2]->cacheline1);
+			rte_prefetch0(&rx_pkts[rev + 3]->cacheline1);
 		}
 
 		/* A* mask out 0~3 bits RSS type */
@@ -333,7 +331,7 @@  _recv_raw_pkts_vec(struct ixgbe_rx_queue *rxq, struct rte_mbuf **rx_pkts,
 		sterr_tmp1 = _mm_unpackhi_epi32(descs[1], descs[0]);
 
 		/* set ol_flags with vlan packet type */
-		desc_to_olflags_v(descs0, &rx_pkts[pos]);
+		desc_to_olflags_v(descs0, &rx_pkts[rev]);
 
 		/* D.2 pkt 3,4 set in_port/nb_seg and remove crc */
 		pkt_mb4 = _mm_add_epi16(pkt_mb4, crc_adjust);
@@ -348,9 +346,9 @@  _recv_raw_pkts_vec(struct ixgbe_rx_queue *rxq, struct rte_mbuf **rx_pkts,
 		staterr = _mm_unpacklo_epi32(sterr_tmp1, sterr_tmp2);
 
 		/* D.3 copy final 3,4 data to rx_pkts */
-		_mm_storeu_si128((void *)&rx_pkts[pos+3]->rx_descriptor_fields1,
+		_mm_storeu_si128((void *)&rx_pkts[rev+3]->rx_descriptor_fields1,
 				pkt_mb4);
-		_mm_storeu_si128((void *)&rx_pkts[pos+2]->rx_descriptor_fields1,
+		_mm_storeu_si128((void *)&rx_pkts[rev+2]->rx_descriptor_fields1,
 				pkt_mb3);
 
 		/* D.2 pkt 1,2 set in_port/nb_seg and remove crc */
@@ -375,13 +373,12 @@  _recv_raw_pkts_vec(struct ixgbe_rx_queue *rxq, struct rte_mbuf **rx_pkts,
 			eop_bits = _mm_shuffle_epi8(eop_bits, eop_shuf_mask);
 			/* store the resulting 32-bit value */
 			*(int *)split_packet = _mm_cvtsi128_si32(eop_bits);
-			split_packet += RTE_IXGBE_DESCS_PER_LOOP;
 
 			/* zero-out next pointers */
-			rx_pkts[pos]->next = NULL;
-			rx_pkts[pos + 1]->next = NULL;
-			rx_pkts[pos + 2]->next = NULL;
-			rx_pkts[pos + 3]->next = NULL;
+			rx_pkts[rev]->next = NULL;
+			rx_pkts[rev + 1]->next = NULL;
+			rx_pkts[rev + 2]->next = NULL;
+			rx_pkts[rev + 3]->next = NULL;
 		}
 
 		/* C.3 calc available number of desc */
@@ -389,24 +386,35 @@  _recv_raw_pkts_vec(struct ixgbe_rx_queue *rxq, struct rte_mbuf **rx_pkts,
 		staterr = _mm_packs_epi32(staterr, zero);
 
 		/* D.3 copy final 1,2 data to rx_pkts */
-		_mm_storeu_si128((void *)&rx_pkts[pos+1]->rx_descriptor_fields1,
+		_mm_storeu_si128((void *)&rx_pkts[rev+1]->rx_descriptor_fields1,
 				pkt_mb2);
-		_mm_storeu_si128((void *)&rx_pkts[pos]->rx_descriptor_fields1,
+		_mm_storeu_si128((void *)&rx_pkts[rev]->rx_descriptor_fields1,
 				pkt_mb1);
 
 		/* C.4 calc avaialbe number of desc */
 		var = __builtin_popcountll(_mm_cvtsi128_si64(staterr));
-		nb_pkts_recd += var;
-		if (likely(var != RTE_IXGBE_DESCS_PER_LOOP))
+		if (unlikely(var == 0))
 			break;
+		else {
+			if (split_packet)
+				 split_packet += var;
+
+			rev += var;
+			sw_ring += var;
+			rxdp += var;
+			if (rxdp == rxdp_end) {
+				sw_ring = rxq->sw_ring;
+				rxdp = rxq->rx_ring;
+			} else if (var < RTE_IXGBE_DESCS_PER_LOOP)
+				break;
+		}
 	}
 
 	/* Update our internal tail pointer */
-	rxq->rx_tail = (uint16_t)(rxq->rx_tail + nb_pkts_recd);
-	rxq->rx_tail = (uint16_t)(rxq->rx_tail & (rxq->nb_rx_desc - 1));
-	rxq->rxrearm_nb = (uint16_t)(rxq->rxrearm_nb + nb_pkts_recd);
+	rxq->rx_tail = rxdp - rxq->rx_ring;
+	rxq->rxrearm_nb = (uint16_t)(rxq->rxrearm_nb + rev);
 
-	return nb_pkts_recd;
+	return rev;
 }
 
 /*