[dpdk-dev] virtio: tx with can_push when VERSION_1 is set

Message ID 82834CFE-767C-41B0-9327-E64B8210E076@cisco.com (mailing list archive)
State Superseded, archived
Delegated to: Yuanhan Liu
Headers

Commit Message

Pierre Pfister Nov. 8, 2016, 9:31 a.m. UTC
  Current virtio driver advertises VERSION_1 support,
but does not handle device's VERSION_1 support when
sending packets (it looks for ANY_LAYOUT feature,
which is absent).

This patch enables 'can_push' in tx path when VERSION_1
is advertised by the device.

This significantly improves small packets forwarding rate
towards devices advertising VERSION_1 feature.

Signed-off-by: Pierre Pfister <ppfister@cisco.com>
---
 drivers/net/virtio/virtio_rxtx.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

--
2.7.4 (Apple Git-66)
  

Comments

Maxime Coquelin Nov. 8, 2016, 9:44 a.m. UTC | #1
Hi Pierre,

On 11/08/2016 10:31 AM, Pierre Pfister (ppfister) wrote:
> Current virtio driver advertises VERSION_1 support,
> but does not handle device's VERSION_1 support when
> sending packets (it looks for ANY_LAYOUT feature,
> which is absent).
>
> This patch enables 'can_push' in tx path when VERSION_1
> is advertised by the device.
>
> This significantly improves small packets forwarding rate
> towards devices advertising VERSION_1 feature.
I think it depends whether offloading is enabled or not.
If no offloading enabled, I measured significant drop.
Indeed, when no offloading is enabled, the Tx path in Virtio
does not access the virtio header before your patch, as the header is 
memset to zero at device init time.
With your patch, it gets memset to zero at every transmit in the hot
path.

With offloading enabled, it does makes sense though, as the header will
be accessed.

This patch is for v17.02 anyway, and we may provide a way to enable and
disable features at Virtio PMD init time by this release.

Thanks,
Maxime

>
> Signed-off-by: Pierre Pfister <ppfister@cisco.com>
> ---
>  drivers/net/virtio/virtio_rxtx.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/net/virtio/virtio_rxtx.c b/drivers/net/virtio/virtio_rxtx.c
> index 724517e..2fe0338 100644
> --- a/drivers/net/virtio/virtio_rxtx.c
> +++ b/drivers/net/virtio/virtio_rxtx.c
> @@ -925,7 +925,8 @@ virtio_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
>                 }
>
>                 /* optimize ring usage */
> -               if (vtpci_with_feature(hw, VIRTIO_F_ANY_LAYOUT) &&
> +               if ((vtpci_with_feature(hw, VIRTIO_F_ANY_LAYOUT) ||
> +                     vtpci_with_feature(hw, VIRTIO_F_VERSION_1)) &&
>                     rte_mbuf_refcnt_read(txm) == 1 &&
>                     RTE_MBUF_DIRECT(txm) &&
>                     txm->nb_segs == 1 &&
> --
> 2.7.4 (Apple Git-66)
>
  
Pierre Pfister Nov. 9, 2016, 12:42 p.m. UTC | #2
Hello Maxime,

Sorry for the late reply.


> Le 8 nov. 2016 à 10:44, Maxime Coquelin <maxime.coquelin@redhat.com> a écrit :
> 
> Hi Pierre,
> 
> On 11/08/2016 10:31 AM, Pierre Pfister (ppfister) wrote:
>> Current virtio driver advertises VERSION_1 support,
>> but does not handle device's VERSION_1 support when
>> sending packets (it looks for ANY_LAYOUT feature,
>> which is absent).
>> 
>> This patch enables 'can_push' in tx path when VERSION_1
>> is advertised by the device.
>> 
>> This significantly improves small packets forwarding rate
>> towards devices advertising VERSION_1 feature.
> I think it depends whether offloading is enabled or not.
> If no offloading enabled, I measured significant drop.
> Indeed, when no offloading is enabled, the Tx path in Virtio
> does not access the virtio header before your patch, as the header is memset to zero at device init time.
> With your patch, it gets memset to zero at every transmit in the hot
> path.

Right. On the virtio side that is true, but on the device side, we have to access the header anyway.
And accessing two descriptors (with the address resolution and memory fetch which comes with it)
is a costy operation compared to a single one.
In the case indirect descriptors are used, this is 1 desc access instead or 3.
And in the case chained descriptors are used, this doubles the number of packets that you can put in your queue.

Those are the results in my PHY -> VM (testpmd) -> PHY setup
Traffic is flowing bidirectionally. Numbers are for lossless-rates.

When chained buffers are used for dpdk's TX: 2x2.13Mpps
When indirect descriptors are used for dpdk's TX: 2x2.38Mpps
When shallow buffers are used for dpdk's TX (with this patch): 2x2.42Mpps

I must also note that qemu 2.5 does not seem to deal with VERSION_1 and ANY_LAYOUT correctly.
The patch I am proposing here works for qemu 2.7, but with qemu 2.5, testpmd still behaves as if ANY_LAYOUT (or VERSION_1) was not available. This is not catastrophic. But just note that you will not see performance in some cases with qemu 2.5.

Cheers

- Pierre

> 
> With offloading enabled, it does makes sense though, as the header will
> be accessed.
> 
> This patch is for v17.02 anyway, and we may provide a way to enable and
> disable features at Virtio PMD init time by this release.
> 
> Thanks,
> Maxime
> 
>> 
>> Signed-off-by: Pierre Pfister <ppfister@cisco.com>
>> ---
>> drivers/net/virtio/virtio_rxtx.c | 3 ++-
>> 1 file changed, 2 insertions(+), 1 deletion(-)
>> 
>> diff --git a/drivers/net/virtio/virtio_rxtx.c b/drivers/net/virtio/virtio_rxtx.c
>> index 724517e..2fe0338 100644
>> --- a/drivers/net/virtio/virtio_rxtx.c
>> +++ b/drivers/net/virtio/virtio_rxtx.c
>> @@ -925,7 +925,8 @@ virtio_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
>>                }
>> 
>>                /* optimize ring usage */
>> -               if (vtpci_with_feature(hw, VIRTIO_F_ANY_LAYOUT) &&
>> +               if ((vtpci_with_feature(hw, VIRTIO_F_ANY_LAYOUT) ||
>> +                     vtpci_with_feature(hw, VIRTIO_F_VERSION_1)) &&
>>                    rte_mbuf_refcnt_read(txm) == 1 &&
>>                    RTE_MBUF_DIRECT(txm) &&
>>                    txm->nb_segs == 1 &&
>> --
>> 2.7.4 (Apple Git-66)
>>
  
Maxime Coquelin Nov. 9, 2016, 2:42 p.m. UTC | #3
Hi Pierre,

On 11/09/2016 01:42 PM, Pierre Pfister (ppfister) wrote:
> Hello Maxime,
>
> Sorry for the late reply.
>
>
>> Le 8 nov. 2016 à 10:44, Maxime Coquelin <maxime.coquelin@redhat.com> a écrit :
>>
>> Hi Pierre,
>>
>> On 11/08/2016 10:31 AM, Pierre Pfister (ppfister) wrote:
>>> Current virtio driver advertises VERSION_1 support,
>>> but does not handle device's VERSION_1 support when
>>> sending packets (it looks for ANY_LAYOUT feature,
>>> which is absent).
>>>
>>> This patch enables 'can_push' in tx path when VERSION_1
>>> is advertised by the device.
>>>
>>> This significantly improves small packets forwarding rate
>>> towards devices advertising VERSION_1 feature.
>> I think it depends whether offloading is enabled or not.
>> If no offloading enabled, I measured significant drop.
>> Indeed, when no offloading is enabled, the Tx path in Virtio
>> does not access the virtio header before your patch, as the header is memset to zero at device init time.
>> With your patch, it gets memset to zero at every transmit in the hot
>> path.
>
> Right. On the virtio side that is true, but on the device side, we have to access the header anyway.
No more now, if no offload features have been negotiated.
I have done a patch that landed in v16.11 not to parse header in this case.
That said, we still have to access its descriptor.

> And accessing two descriptors (with the address resolution and memory fetch which comes with it)
> is a costy operation compared to a single one.
> In the case indirect descriptors are used, this is 1 desc access instead or 3.
> And in the case chained descriptors are used, this doubles the number of packets that you can put in your queue.
>
> Those are the results in my PHY -> VM (testpmd) -> PHY setup
> Traffic is flowing bidirectionally. Numbers are for lossless-rates.
>
> When chained buffers are used for dpdk's TX: 2x2.13Mpps
> When indirect descriptors are used for dpdk's TX: 2x2.38Mpps
> When shallow buffers are used for dpdk's TX (with this patch): 2x2.42Mpps
When I tried it, I also did PVP 0% benchmark, and I got opposite 
results. Chained and indirect cases were significantly better.

My PVP setup was using a single NIC and single Virtio PMD, and NIC2VM
forwarding was IO mode done with testpmd on host, and Rx->Tx forwarding
was macswap mode on guest side.

That said, if I'm the only one seeing a performance regression, maybe 
something is wrong with my setup.
Yuanhan, did you made some benchmarks when you implemented your
any_layout enabling series?

>
> I must also note that qemu 2.5 does not seem to deal with VERSION_1 and ANY_LAYOUT correctly.
> The patch I am proposing here works for qemu 2.7, but with qemu 2.5, testpmd still behaves as if ANY_LAYOUT (or VERSION_1) was not available. This is not catastrophic. But just note that you will not see performance in some cases with qemu 2.5.
Thanks for the info.

Regards,
Maxime
  
Maxime Coquelin Nov. 9, 2016, 2:51 p.m. UTC | #4
Hi Pierre,

On 11/09/2016 01:42 PM, Pierre Pfister (ppfister) wrote:
> Hello Maxime,
>
> Sorry for the late reply.
>
>
>> Le 8 nov. 2016 à 10:44, Maxime Coquelin <maxime.coquelin@redhat.com> a écrit :
>>
>> Hi Pierre,
>>
>> On 11/08/2016 10:31 AM, Pierre Pfister (ppfister) wrote:
>>> Current virtio driver advertises VERSION_1 support,
>>> but does not handle device's VERSION_1 support when
>>> sending packets (it looks for ANY_LAYOUT feature,
>>> which is absent).
>>>
>>> This patch enables 'can_push' in tx path when VERSION_1
>>> is advertised by the device.
>>>
>>> This significantly improves small packets forwarding rate
>>> towards devices advertising VERSION_1 feature.
>> I think it depends whether offloading is enabled or not.
>> If no offloading enabled, I measured significant drop.
>> Indeed, when no offloading is enabled, the Tx path in Virtio
>> does not access the virtio header before your patch, as the header is memset to zero at device init time.
>> With your patch, it gets memset to zero at every transmit in the hot
>> path.
>
> Right. On the virtio side that is true, but on the device side, we have to access the header anyway.
No more now, if no offload features have been negotiated.
I have done a patch that landed in v16.11 to skip header parsing in
this case.
That said, we still have to access its descriptor.

> And accessing two descriptors (with the address resolution and memory fetch which comes with it)
> is a costy operation compared to a single one.
> In the case indirect descriptors are used, this is 1 desc access instead or 3.
I agree this is far from being optimal.

> And in the case chained descriptors are used, this doubles the number of packets that you can put in your queue.
>
> Those are the results in my PHY -> VM (testpmd) -> PHY setup
> Traffic is flowing bidirectionally. Numbers are for lossless-rates.
>
> When chained buffers are used for dpdk's TX: 2x2.13Mpps
> When indirect descriptors are used for dpdk's TX: 2x2.38Mpps
> When shallow buffers are used for dpdk's TX (with this patch): 2x2.42Mpps
When I tried it, I also did PVP 0% benchmark, and I got opposite 
results. Chained and indirect cases were significantly better.

My PVP setup was using a single NIC and single Virtio PMD, and NIC2VM
forwarding was IO mode done with testpmd on host, and Rx->Tx forwarding
was macswap mode on guest side.

I also saw some perf regression when running simple tespmd test on both
ends.

Yuanhan, did you run some benchmark with your series enabling
ANY_LAYOUT?

>
> I must also note that qemu 2.5 does not seem to deal with VERSION_1 and ANY_LAYOUT correctly.
> The patch I am proposing here works for qemu 2.7, but with qemu 2.5, testpmd still behaves as if ANY_LAYOUT (or VERSION_1) was not available. This is not catastrophic. But just note that you will not see performance in some cases with qemu 2.5.

Thanks for the info.

Regards,
Maxime
  
Pierre Pfister Nov. 22, 2016, 9:54 a.m. UTC | #5
Hello Maxime,

> Le 9 nov. 2016 à 15:51, Maxime Coquelin <maxime.coquelin@redhat.com> a écrit :

> 

> Hi Pierre,

> 

> On 11/09/2016 01:42 PM, Pierre Pfister (ppfister) wrote:

>> Hello Maxime,

>> 

>> Sorry for the late reply.

>> 

>> 

>>> Le 8 nov. 2016 à 10:44, Maxime Coquelin <maxime.coquelin@redhat.com> a écrit :

>>> 

>>> Hi Pierre,

>>> 

>>> On 11/08/2016 10:31 AM, Pierre Pfister (ppfister) wrote:

>>>> Current virtio driver advertises VERSION_1 support,

>>>> but does not handle device's VERSION_1 support when

>>>> sending packets (it looks for ANY_LAYOUT feature,

>>>> which is absent).

>>>> 

>>>> This patch enables 'can_push' in tx path when VERSION_1

>>>> is advertised by the device.

>>>> 

>>>> This significantly improves small packets forwarding rate

>>>> towards devices advertising VERSION_1 feature.

>>> I think it depends whether offloading is enabled or not.

>>> If no offloading enabled, I measured significant drop.

>>> Indeed, when no offloading is enabled, the Tx path in Virtio

>>> does not access the virtio header before your patch, as the header is memset to zero at device init time.

>>> With your patch, it gets memset to zero at every transmit in the hot

>>> path.

>> 

>> Right. On the virtio side that is true, but on the device side, we have to access the header anyway.

> No more now, if no offload features have been negotiated.

> I have done a patch that landed in v16.11 to skip header parsing in

> this case.

> That said, we still have to access its descriptor.

> 

>> And accessing two descriptors (with the address resolution and memory fetch which comes with it)

>> is a costy operation compared to a single one.

>> In the case indirect descriptors are used, this is 1 desc access instead or 3.

> I agree this is far from being optimal.

> 

>> And in the case chained descriptors are used, this doubles the number of packets that you can put in your queue.

>> 

>> Those are the results in my PHY -> VM (testpmd) -> PHY setup

>> Traffic is flowing bidirectionally. Numbers are for lossless-rates.

>> 

>> When chained buffers are used for dpdk's TX: 2x2.13Mpps

>> When indirect descriptors are used for dpdk's TX: 2x2.38Mpps

>> When shallow buffers are used for dpdk's TX (with this patch): 2x2.42Mpps

> When I tried it, I also did PVP 0% benchmark, and I got opposite results. Chained and indirect cases were significantly better.

> 

> My PVP setup was using a single NIC and single Virtio PMD, and NIC2VM

> forwarding was IO mode done with testpmd on host, and Rx->Tx forwarding

> was macswap mode on guest side.

> 

> I also saw some perf regression when running simple tespmd test on both

> ends.

> 

> Yuanhan, did you run some benchmark with your series enabling

> ANY_LAYOUT?


It was enabled. But the specs specify that VERSION_1 includes ANY_LAYOUT.
Therefor, Qemu removes ANY_LAYOUT when VERSION_1 is set.

We can keep arguing about which is fastest. I guess we have different setups and different results, so we probably are deadlocked here.
But in any case, the current code is inconsistent, as it uses single descriptor when ANY_LAYOUT is set, but not when VERSION_1 is set.

I believe it makes sense to use single-descriptor any time it is possible, but you are free to think otherwise.
Please make a call and make the code consistent (removes single-descriptors all together, or use them when VERSION_1 is set too). Otherwise it just creates yet-another testing headache. 

Thanks,

- Pierre 

> 

>> 

>> I must also note that qemu 2.5 does not seem to deal with VERSION_1 and ANY_LAYOUT correctly.

>> The patch I am proposing here works for qemu 2.7, but with qemu 2.5, testpmd still behaves as if ANY_LAYOUT (or VERSION_1) was not available. This is not catastrophic. But just note that you will not see performance in some cases with qemu 2.5.

> 

> Thanks for the info.

> 

> Regards,

> Maxime
  
Maxime Coquelin Nov. 22, 2016, 1:17 p.m. UTC | #6
Hi Pierre,

On 11/22/2016 10:54 AM, Pierre Pfister (ppfister) wrote:
> Hello Maxime,
>
>> Le 9 nov. 2016 à 15:51, Maxime Coquelin <maxime.coquelin@redhat.com> a écrit :
>>
>> Hi Pierre,
>>
>> On 11/09/2016 01:42 PM, Pierre Pfister (ppfister) wrote:
>>> Hello Maxime,
>>>
>>> Sorry for the late reply.
>>>
>>>
>>>> Le 8 nov. 2016 à 10:44, Maxime Coquelin <maxime.coquelin@redhat.com> a écrit :
>>>>
>>>> Hi Pierre,
>>>>
>>>> On 11/08/2016 10:31 AM, Pierre Pfister (ppfister) wrote:
>>>>> Current virtio driver advertises VERSION_1 support,
>>>>> but does not handle device's VERSION_1 support when
>>>>> sending packets (it looks for ANY_LAYOUT feature,
>>>>> which is absent).
>>>>>
>>>>> This patch enables 'can_push' in tx path when VERSION_1
>>>>> is advertised by the device.
>>>>>
>>>>> This significantly improves small packets forwarding rate
>>>>> towards devices advertising VERSION_1 feature.
>>>> I think it depends whether offloading is enabled or not.
>>>> If no offloading enabled, I measured significant drop.
>>>> Indeed, when no offloading is enabled, the Tx path in Virtio
>>>> does not access the virtio header before your patch, as the header is memset to zero at device init time.
>>>> With your patch, it gets memset to zero at every transmit in the hot
>>>> path.
>>>
>>> Right. On the virtio side that is true, but on the device side, we have to access the header anyway.
>> No more now, if no offload features have been negotiated.
>> I have done a patch that landed in v16.11 to skip header parsing in
>> this case.
>> That said, we still have to access its descriptor.
>>
>>> And accessing two descriptors (with the address resolution and memory fetch which comes with it)
>>> is a costy operation compared to a single one.
>>> In the case indirect descriptors are used, this is 1 desc access instead or 3.
>> I agree this is far from being optimal.
>>
>>> And in the case chained descriptors are used, this doubles the number of packets that you can put in your queue.
>>>
>>> Those are the results in my PHY -> VM (testpmd) -> PHY setup
>>> Traffic is flowing bidirectionally. Numbers are for lossless-rates.
>>>
>>> When chained buffers are used for dpdk's TX: 2x2.13Mpps
>>> When indirect descriptors are used for dpdk's TX: 2x2.38Mpps
>>> When shallow buffers are used for dpdk's TX (with this patch): 2x2.42Mpps
>> When I tried it, I also did PVP 0% benchmark, and I got opposite results. Chained and indirect cases were significantly better.
>>
>> My PVP setup was using a single NIC and single Virtio PMD, and NIC2VM
>> forwarding was IO mode done with testpmd on host, and Rx->Tx forwarding
>> was macswap mode on guest side.
>>
>> I also saw some perf regression when running simple tespmd test on both
>> ends.
>>
>> Yuanhan, did you run some benchmark with your series enabling
>> ANY_LAYOUT?
>
> It was enabled. But the specs specify that VERSION_1 includes ANY_LAYOUT.
> Therefor, Qemu removes ANY_LAYOUT when VERSION_1 is set.
>
> We can keep arguing about which is fastest. I guess we have different setups and different results, so we probably are deadlocked here.
> But in any case, the current code is inconsistent, as it uses single descriptor when ANY_LAYOUT is set, but not when VERSION_1 is set.
>
> I believe it makes sense to use single-descriptor any time it is possible, but you are free to think otherwise.
> Please make a call and make the code consistent (removes single-descriptors all together, or use them when VERSION_1 is set too). Otherwise it just creates yet-another testing headache.
I also think it makes sense to have a single descriptor, but I had to
highlight that I noticed a significant performance degradation when
using single descriptor on my setup.

I'm fine we take your patch in virtio-next, so that more testing is 
conducted.

Thanks,
Maxime

>
> Thanks,
>
> - Pierre
>
>>
>>>
>>> I must also note that qemu 2.5 does not seem to deal with VERSION_1 and ANY_LAYOUT correctly.
>>> The patch I am proposing here works for qemu 2.7, but with qemu 2.5, testpmd still behaves as if ANY_LAYOUT (or VERSION_1) was not available. This is not catastrophic. But just note that you will not see performance in some cases with qemu 2.5.
>>
>> Thanks for the info.
>>
>> Regards,
>> Maxime
>
  
Pierre Pfister Nov. 30, 2016, 9:09 a.m. UTC | #7
> Le 22 nov. 2016 à 14:17, Maxime Coquelin <maxime.coquelin@redhat.com> a écrit :

> 

> Hi Pierre,

> 

> On 11/22/2016 10:54 AM, Pierre Pfister (ppfister) wrote:

>> Hello Maxime,

>> 

>>> Le 9 nov. 2016 à 15:51, Maxime Coquelin <maxime.coquelin@redhat.com> a écrit :

>>> 

>>> Hi Pierre,

>>> 

>>> On 11/09/2016 01:42 PM, Pierre Pfister (ppfister) wrote:

>>>> Hello Maxime,

>>>> 

>>>> Sorry for the late reply.

>>>> 

>>>> 

>>>>> Le 8 nov. 2016 à 10:44, Maxime Coquelin <maxime.coquelin@redhat.com> a écrit :

>>>>> 

>>>>> Hi Pierre,

>>>>> 

>>>>> On 11/08/2016 10:31 AM, Pierre Pfister (ppfister) wrote:

>>>>>> Current virtio driver advertises VERSION_1 support,

>>>>>> but does not handle device's VERSION_1 support when

>>>>>> sending packets (it looks for ANY_LAYOUT feature,

>>>>>> which is absent).

>>>>>> 

>>>>>> This patch enables 'can_push' in tx path when VERSION_1

>>>>>> is advertised by the device.

>>>>>> 

>>>>>> This significantly improves small packets forwarding rate

>>>>>> towards devices advertising VERSION_1 feature.

>>>>> I think it depends whether offloading is enabled or not.

>>>>> If no offloading enabled, I measured significant drop.

>>>>> Indeed, when no offloading is enabled, the Tx path in Virtio

>>>>> does not access the virtio header before your patch, as the header is memset to zero at device init time.

>>>>> With your patch, it gets memset to zero at every transmit in the hot

>>>>> path.

>>>> 

>>>> Right. On the virtio side that is true, but on the device side, we have to access the header anyway.

>>> No more now, if no offload features have been negotiated.

>>> I have done a patch that landed in v16.11 to skip header parsing in

>>> this case.

>>> That said, we still have to access its descriptor.

>>> 

>>>> And accessing two descriptors (with the address resolution and memory fetch which comes with it)

>>>> is a costy operation compared to a single one.

>>>> In the case indirect descriptors are used, this is 1 desc access instead or 3.

>>> I agree this is far from being optimal.

>>> 

>>>> And in the case chained descriptors are used, this doubles the number of packets that you can put in your queue.

>>>> 

>>>> Those are the results in my PHY -> VM (testpmd) -> PHY setup

>>>> Traffic is flowing bidirectionally. Numbers are for lossless-rates.

>>>> 

>>>> When chained buffers are used for dpdk's TX: 2x2.13Mpps

>>>> When indirect descriptors are used for dpdk's TX: 2x2.38Mpps

>>>> When shallow buffers are used for dpdk's TX (with this patch): 2x2.42Mpps

>>> When I tried it, I also did PVP 0% benchmark, and I got opposite results. Chained and indirect cases were significantly better.

>>> 

>>> My PVP setup was using a single NIC and single Virtio PMD, and NIC2VM

>>> forwarding was IO mode done with testpmd on host, and Rx->Tx forwarding

>>> was macswap mode on guest side.

>>> 

>>> I also saw some perf regression when running simple tespmd test on both

>>> ends.

>>> 

>>> Yuanhan, did you run some benchmark with your series enabling

>>> ANY_LAYOUT?

>> 

>> It was enabled. But the specs specify that VERSION_1 includes ANY_LAYOUT.

>> Therefor, Qemu removes ANY_LAYOUT when VERSION_1 is set.

>> 

>> We can keep arguing about which is fastest. I guess we have different setups and different results, so we probably are deadlocked here.

>> But in any case, the current code is inconsistent, as it uses single descriptor when ANY_LAYOUT is set, but not when VERSION_1 is set.

>> 

>> I believe it makes sense to use single-descriptor any time it is possible, but you are free to think otherwise.

>> Please make a call and make the code consistent (removes single-descriptors all together, or use them when VERSION_1 is set too). Otherwise it just creates yet-another testing headache.

> I also think it makes sense to have a single descriptor, but I had to

> highlight that I noticed a significant performance degradation when

> using single descriptor on my setup.

> 

> I'm fine we take your patch in virtio-next, so that more testing is conducted.

> 


Thanks,

I just realised there was an indentation error in the patch.
Meaning that this patch didn't make it to 16.11 ...
I will send a new version.

- Pierre


> Thanks,

> Maxime

> 

>> 

>> Thanks,

>> 

>> - Pierre

>> 

>>> 

>>>> 

>>>> I must also note that qemu 2.5 does not seem to deal with VERSION_1 and ANY_LAYOUT correctly.

>>>> The patch I am proposing here works for qemu 2.7, but with qemu 2.5, testpmd still behaves as if ANY_LAYOUT (or VERSION_1) was not available. This is not catastrophic. But just note that you will not see performance in some cases with qemu 2.5.

>>> 

>>> Thanks for the info.

>>> 

>>> Regards,

>>> Maxime

>>
  

Patch

diff --git a/drivers/net/virtio/virtio_rxtx.c b/drivers/net/virtio/virtio_rxtx.c
index 724517e..2fe0338 100644
--- a/drivers/net/virtio/virtio_rxtx.c
+++ b/drivers/net/virtio/virtio_rxtx.c
@@ -925,7 +925,8 @@  virtio_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts)
                }

                /* optimize ring usage */
-               if (vtpci_with_feature(hw, VIRTIO_F_ANY_LAYOUT) &&
+               if ((vtpci_with_feature(hw, VIRTIO_F_ANY_LAYOUT) ||
+                     vtpci_with_feature(hw, VIRTIO_F_VERSION_1)) &&
                    rte_mbuf_refcnt_read(txm) == 1 &&
                    RTE_MBUF_DIRECT(txm) &&
                    txm->nb_segs == 1 &&