[dpdk-dev] examples/vhost: fix perf regression

Message ID 1468936391-138371-1-git-send-email-jianfeng.tan@intel.com (mailing list archive)
State Superseded, archived
Headers

Commit Message

Jianfeng Tan July 19, 2016, 1:53 p.m. UTC
  We find significant perfermance drop introduced by below commit,
when vhost example is started with --mergeable 0 and inside vm,
kernel virtio-net driver is used to do ip based forwarding.

The root cause is that below commit adds support for
VIRTIO_NET_F_GUEST_TSO4 and VIRTIO_NET_F_GUEST_TSO6, and when
mergeable is disabled, it triggers big_packets path of virtio-net
driver. In this path, virtio driver uses 19 desc with 18 4K-sized
pages to receive each packet, so that it can receive a big packet
with size of 64K. But QEMU only creates 256 desc entries for each
vq, which results in that only 13 packets can be received. VM
kernel can quickly handle those packets and go to sleep (HLT).

As QEMU has no option to set the desc entries of a vq, so here,
we disable VIRTIO_NET_F_GUEST_TSO4 and VIRTIO_NET_F_GUEST_TSO6
with VIRTIO_NET_F_HOST_TSO4 and VIRTIO_NET_F_HOST_TSO6 when we
disable tso of vhost example, to avoid VM kernel virtio driver
go into big_packets path.

Fixes: 859b480d5afd ("vhost: add guest offload setting")

Reported-by: Qian Xu <qian.q.xu@intel.com>
Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>
---
 examples/vhost/main.c | 2 ++
 1 file changed, 2 insertions(+)
  

Comments

Yuanhan Liu July 20, 2016, 1:44 a.m. UTC | #1
On Tue, Jul 19, 2016 at 01:53:11PM +0000, Jianfeng Tan wrote:
> We find significant perfermance drop introduced by below commit,
> when vhost example is started with --mergeable 0 and inside vm,
> kernel virtio-net driver is used to do ip based forwarding.
> 
> The root cause is that below commit adds support for
> VIRTIO_NET_F_GUEST_TSO4 and VIRTIO_NET_F_GUEST_TSO6, and when
> mergeable is disabled, it triggers big_packets path of virtio-net
> driver. In this path, virtio driver uses 19 desc with 18 4K-sized
> pages to receive each packet, so that it can receive a big packet
> with size of 64K. But QEMU only creates 256 desc entries for each
> vq, which results in that only 13 packets can be received. VM
> kernel can quickly handle those packets and go to sleep (HLT).
> 
> As QEMU has no option to set the desc entries of a vq, so here,
> we disable VIRTIO_NET_F_GUEST_TSO4 and VIRTIO_NET_F_GUEST_TSO6
> with VIRTIO_NET_F_HOST_TSO4 and VIRTIO_NET_F_HOST_TSO6 when we
> disable tso of vhost example, to avoid VM kernel virtio driver
> go into big_packets path.
> 
> Fixes: 859b480d5afd ("vhost: add guest offload setting")
> 
> Reported-by: Qian Xu <qian.q.xu@intel.com>
> Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>

We could apply this patch, but I don't think it actually fix anything:

- it doesn't fix other vhost applications, say OVS, which is for sure
  way more widly used than vhost-example.

- it doesn't even fix it when tso is enabled and mergeable-rx is disabled
  with this vhost-example.

Thanks for the good root-cause, btw!

	--yliu
  
Jianfeng Tan July 20, 2016, 2:44 a.m. UTC | #2
Hi Yuanhan,

On 7/20/2016 9:44 AM, Yuanhan Liu wrote:
> On Tue, Jul 19, 2016 at 01:53:11PM +0000, Jianfeng Tan wrote:
>> We find significant perfermance drop introduced by below commit,
>> when vhost example is started with --mergeable 0 and inside vm,
>> kernel virtio-net driver is used to do ip based forwarding.
>>
>> The root cause is that below commit adds support for
>> VIRTIO_NET_F_GUEST_TSO4 and VIRTIO_NET_F_GUEST_TSO6, and when
>> mergeable is disabled, it triggers big_packets path of virtio-net
>> driver. In this path, virtio driver uses 19 desc with 18 4K-sized
>> pages to receive each packet, so that it can receive a big packet
>> with size of 64K. But QEMU only creates 256 desc entries for each
>> vq, which results in that only 13 packets can be received. VM
>> kernel can quickly handle those packets and go to sleep (HLT).
>>
>> As QEMU has no option to set the desc entries of a vq, so here,
>> we disable VIRTIO_NET_F_GUEST_TSO4 and VIRTIO_NET_F_GUEST_TSO6
>> with VIRTIO_NET_F_HOST_TSO4 and VIRTIO_NET_F_HOST_TSO6 when we
>> disable tso of vhost example, to avoid VM kernel virtio driver
>> go into big_packets path.
>>
>> Fixes: 859b480d5afd ("vhost: add guest offload setting")
>>
>> Reported-by: Qian Xu <qian.q.xu@intel.com>
>> Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>
> We could apply this patch, but I don't think it actually fix anything:
>
> - it doesn't fix other vhost applications, say OVS, which is for sure
>    way more widly used than vhost-example.

If I remember it correctly, OVS will enable mergeable.

>
> - it doesn't even fix it when tso is enabled and mergeable-rx is disabled
>    with this vhost-example.

But we'd better avoid users go into such doubt that performance drops 
because of that commit under the case tso=off,mergeable=off, right?

Thanks,
Jianfeng

>
> Thanks for the good root-cause, btw!
>
> 	--yliu
  
Xu, Qian Q July 20, 2016, 3:16 a.m. UTC | #3
My comments below. 

-----Original Message-----
From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Tan, Jianfeng
Sent: Wednesday, July 20, 2016 10:44 AM
To: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Cc: dev@dpdk.org; Wang, Zhihong <zhihong.wang@intel.com>
Subject: Re: [dpdk-dev] [PATCH] examples/vhost: fix perf regression

Hi Yuanhan,

On 7/20/2016 9:44 AM, Yuanhan Liu wrote:
> On Tue, Jul 19, 2016 at 01:53:11PM +0000, Jianfeng Tan wrote:
>> We find significant perfermance drop introduced by below commit, when 
>> vhost example is started with --mergeable 0 and inside vm, kernel 
>> virtio-net driver is used to do ip based forwarding.
>>
>> The root cause is that below commit adds support for
>> VIRTIO_NET_F_GUEST_TSO4 and VIRTIO_NET_F_GUEST_TSO6, and when 
>> mergeable is disabled, it triggers big_packets path of virtio-net 
>> driver. In this path, virtio driver uses 19 desc with 18 4K-sized 
>> pages to receive each packet, so that it can receive a big packet 
>> with size of 64K. But QEMU only creates 256 desc entries for each vq, 
>> which results in that only 13 packets can be received. VM kernel can 
>> quickly handle those packets and go to sleep (HLT).
>>
>> As QEMU has no option to set the desc entries of a vq, so here, we 
>> disable VIRTIO_NET_F_GUEST_TSO4 and VIRTIO_NET_F_GUEST_TSO6 with 
>> VIRTIO_NET_F_HOST_TSO4 and VIRTIO_NET_F_HOST_TSO6 when we disable tso 
>> of vhost example, to avoid VM kernel virtio driver go into 
>> big_packets path.
>>
>> Fixes: 859b480d5afd ("vhost: add guest offload setting")
>>
>> Reported-by: Qian Xu <qian.q.xu@intel.com>
>> Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>
> We could apply this patch, but I don't think it actually fix anything:
>
> - it doesn't fix other vhost applications, say OVS, which is for sure
>    way more widly used than vhost-example.

If I remember it correctly, OVS will enable mergeable.

>
> - it doesn't even fix it when tso is enabled and mergeable-rx is disabled
>    with this vhost-example.

But we'd better avoid users go into such doubt that performance drops because of that commit under the case tso=off,mergeable=off, right?

Normally, when people enable TSO, they should turn on mergeable, if they don't turn on mergeable, then please don't expect high performance, 
so this is not a problem. They may get low performance due to the improper settings. 

As to a complete fix for the issue, we may need go back to the TSO feature design for vhost, currently, the feature negotiation code is in the application, 
but it's better to be considered in the vhost/virtio library so that application doesn't need to check/set the feature. But now it's too late for the complete fix, 
so the workaround is ok for this release from my view. 

Thanks,
Jianfeng

>
> Thanks for the good root-cause, btw!
>
> 	--yliu
  
Yuanhan Liu July 20, 2016, 4 a.m. UTC | #4
On Wed, Jul 20, 2016 at 10:44:13AM +0800, Tan, Jianfeng wrote:
> Hi Yuanhan,
> 
> On 7/20/2016 9:44 AM, Yuanhan Liu wrote:
> >On Tue, Jul 19, 2016 at 01:53:11PM +0000, Jianfeng Tan wrote:
> >>We find significant perfermance drop introduced by below commit,
> >>when vhost example is started with --mergeable 0 and inside vm,
> >>kernel virtio-net driver is used to do ip based forwarding.
> >>
> >>The root cause is that below commit adds support for
> >>VIRTIO_NET_F_GUEST_TSO4 and VIRTIO_NET_F_GUEST_TSO6, and when
> >>mergeable is disabled, it triggers big_packets path of virtio-net
> >>driver. In this path, virtio driver uses 19 desc with 18 4K-sized
> >>pages to receive each packet, so that it can receive a big packet
> >>with size of 64K. But QEMU only creates 256 desc entries for each
> >>vq, which results in that only 13 packets can be received. VM
> >>kernel can quickly handle those packets and go to sleep (HLT).
> >>
> >>As QEMU has no option to set the desc entries of a vq, so here,
> >>we disable VIRTIO_NET_F_GUEST_TSO4 and VIRTIO_NET_F_GUEST_TSO6
> >>with VIRTIO_NET_F_HOST_TSO4 and VIRTIO_NET_F_HOST_TSO6 when we
> >>disable tso of vhost example, to avoid VM kernel virtio driver
> >>go into big_packets path.
> >>
> >>Fixes: 859b480d5afd ("vhost: add guest offload setting")
> >>
> >>Reported-by: Qian Xu <qian.q.xu@intel.com>
> >>Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>
> >We could apply this patch, but I don't think it actually fix anything:
> >
> >- it doesn't fix other vhost applications, say OVS, which is for sure
> >   way more widly used than vhost-example.
> 
> If I remember it correctly, OVS will enable mergeable.

Yes, and actually, vhost-example also should have enabled it by default.
Meanwhile, all features could be enabled/disabled by user.

> >
> >- it doesn't even fix it when tso is enabled and mergeable-rx is disabled
> >   with this vhost-example.
> 
> But we'd better avoid users go into such doubt that performance drops
> because of that commit under the case tso=off,mergeable=off, right?

I doubt people would actually use vhost-example (besides developer like
us), meaning they can NOT see the benifit from this patch; it also means
that user __does__ go into doubt that performance drops for the case
tso=off,mergeable=off.

Actually, it looks wrong to me to fiddle with those flags in the vhost-example.
If you want to disable tso, you should go disable it on the qemu side,
with something like:

    csum=off,gso=off,guest_tso4=off,guest_tso6=off,...

	--yliu
  
Yuanhan Liu July 20, 2016, 4:38 a.m. UTC | #5
On Tue, Jul 19, 2016 at 01:53:11PM +0000, Jianfeng Tan wrote:
> We find significant perfermance drop introduced by below commit,
> when vhost example is started with --mergeable 0 and inside vm,
> kernel virtio-net driver is used to do ip based forwarding.
> 
> The root cause is that below commit adds support for
> VIRTIO_NET_F_GUEST_TSO4 and VIRTIO_NET_F_GUEST_TSO6, and when
> mergeable is disabled, it triggers big_packets path of virtio-net
> driver. In this path, virtio driver uses 19 desc with 18 4K-sized
> pages to receive each packet, so that it can receive a big packet
> with size of 64K. But QEMU only creates 256 desc entries for each
> vq, which results in that only 13 packets can be received. VM
> kernel can quickly handle those packets and go to sleep (HLT).
> 
> As QEMU has no option to set the desc entries of a vq, so here,
> we disable VIRTIO_NET_F_GUEST_TSO4 and VIRTIO_NET_F_GUEST_TSO6
> with VIRTIO_NET_F_HOST_TSO4 and VIRTIO_NET_F_HOST_TSO6 when we
> disable tso of vhost example, to avoid VM kernel virtio driver
> go into big_packets path.
> 
> Fixes: 859b480d5afd ("vhost: add guest offload setting")

And here you are patching vhost example to try to fix an "issue"
in vhost lib, this is __logically__ wrong.

	--yliu
> 
> Reported-by: Qian Xu <qian.q.xu@intel.com>
> Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>
> ---
>  examples/vhost/main.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/examples/vhost/main.c b/examples/vhost/main.c
> index 3b98f42..92a9823 100644
> --- a/examples/vhost/main.c
> +++ b/examples/vhost/main.c
> @@ -327,6 +327,8 @@ port_init(uint8_t port)
>  	if (enable_tso == 0) {
>  		rte_vhost_feature_disable(1ULL << VIRTIO_NET_F_HOST_TSO4);
>  		rte_vhost_feature_disable(1ULL << VIRTIO_NET_F_HOST_TSO6);
> +		rte_vhost_feature_disable(1ULL << VIRTIO_NET_F_GUEST_TSO4);
> +		rte_vhost_feature_disable(1ULL << VIRTIO_NET_F_GUEST_TSO6);
>  	}
>  
>  	rx_rings = (uint16_t)dev_info.max_rx_queues;
> -- 
> 2.7.4
  
Jianfeng Tan July 20, 2016, 5:50 a.m. UTC | #6
On 7/20/2016 12:38 PM, Yuanhan Liu wrote:
> On Tue, Jul 19, 2016 at 01:53:11PM +0000, Jianfeng Tan wrote:
>> We find significant perfermance drop introduced by below commit,
>> when vhost example is started with --mergeable 0 and inside vm,
>> kernel virtio-net driver is used to do ip based forwarding.
>>
>> The root cause is that below commit adds support for
>> VIRTIO_NET_F_GUEST_TSO4 and VIRTIO_NET_F_GUEST_TSO6, and when
>> mergeable is disabled, it triggers big_packets path of virtio-net
>> driver. In this path, virtio driver uses 19 desc with 18 4K-sized
>> pages to receive each packet, so that it can receive a big packet
>> with size of 64K. But QEMU only creates 256 desc entries for each
>> vq, which results in that only 13 packets can be received. VM
>> kernel can quickly handle those packets and go to sleep (HLT).
>>
>> As QEMU has no option to set the desc entries of a vq, so here,
>> we disable VIRTIO_NET_F_GUEST_TSO4 and VIRTIO_NET_F_GUEST_TSO6
>> with VIRTIO_NET_F_HOST_TSO4 and VIRTIO_NET_F_HOST_TSO6 when we
>> disable tso of vhost example, to avoid VM kernel virtio driver
>> go into big_packets path.
>>
>> Fixes: 859b480d5afd ("vhost: add guest offload setting")
> And here you are patching vhost example to try to fix an "issue"
> in vhost lib, this is __logically__ wrong.
>
> 	--yliu

This is not an issue from vhost lib's perspective, vhost lib should 
provide all features it supports by default. Applications can 
enable/disable features according to their own requirements. And the 
vhost example after this commit just triggers a slow path of virtio 
driver. So this fix just makes sure vhost example does not go into the 
slow path by default.

By the way, if a fix patch should only involve those commits it will change?

Thanks,
Jianfeng
  
Yuanhan Liu July 20, 2016, 6:13 a.m. UTC | #7
On Wed, Jul 20, 2016 at 01:50:34PM +0800, Tan, Jianfeng wrote:
> 
> 
> On 7/20/2016 12:38 PM, Yuanhan Liu wrote:
> >On Tue, Jul 19, 2016 at 01:53:11PM +0000, Jianfeng Tan wrote:
> >>We find significant perfermance drop introduced by below commit,
> >>when vhost example is started with --mergeable 0 and inside vm,
> >>kernel virtio-net driver is used to do ip based forwarding.
> >>
> >>The root cause is that below commit adds support for
> >>VIRTIO_NET_F_GUEST_TSO4 and VIRTIO_NET_F_GUEST_TSO6, and when
> >>mergeable is disabled, it triggers big_packets path of virtio-net
> >>driver. In this path, virtio driver uses 19 desc with 18 4K-sized
> >>pages to receive each packet, so that it can receive a big packet
> >>with size of 64K. But QEMU only creates 256 desc entries for each
> >>vq, which results in that only 13 packets can be received. VM
> >>kernel can quickly handle those packets and go to sleep (HLT).
> >>
> >>As QEMU has no option to set the desc entries of a vq, so here,
> >>we disable VIRTIO_NET_F_GUEST_TSO4 and VIRTIO_NET_F_GUEST_TSO6
> >>with VIRTIO_NET_F_HOST_TSO4 and VIRTIO_NET_F_HOST_TSO6 when we
> >>disable tso of vhost example, to avoid VM kernel virtio driver
> >>go into big_packets path.
> >>
> >>Fixes: 859b480d5afd ("vhost: add guest offload setting")
> >And here you are patching vhost example to try to fix an "issue"
> >in vhost lib, this is __logically__ wrong.
> >
> >	--yliu
> 
> This is not an issue from vhost lib's perspective, vhost lib should provide
> all features it supports by default.

Bingo.., that's why "Fixes: 859b480d5afd ... " is wrong to me.
  
> Applications can enable/disable
> features according to their own requirements.

Yes, application can, but application normally doesn't do that. And
as stated in my early reply, the qemu is the place you should go for
all those options enabling/disabling, but not vhost (not vhost-example).

I think it's sometimes more handy if we can do that by introducing
some vhost-example options, and I guess that's why those options are
given.

In another word, there is nothing wrong about the commit 859b480d5afd,
if you want to "fix" anything here, following commit is something
we need fix:

    Fixes: 9fd72e3cbd29 ("examples/vhost: add virtio offload")

Because that commit just partially disables some TSO related features,
letting the virtio net driver goes to the slow path.

> And the vhost example after
> this commit just triggers a slow path of virtio driver. So this fix just
> makes sure vhost example does not go into the slow path by default.

I have made a statement in the first time, that I am not object to
have this patch at all.

Meanwhile, the right "fix" is you need disable all TSO related features
from QEMU, in such way, we should see no such issue from all vhost
application, but not only this one, the one we used mostly internally.

As you can see, it's more about the usage.

> By the way, if a fix patch should only involve those commits it will change?

IMO, logically, yes.

	--yliu
  
Jianfeng Tan July 20, 2016, 6:30 a.m. UTC | #8
On 7/20/2016 2:13 PM, Yuanhan Liu wrote:
> On Wed, Jul 20, 2016 at 01:50:34PM +0800, Tan, Jianfeng wrote:
>>
>> On 7/20/2016 12:38 PM, Yuanhan Liu wrote:
>>> On Tue, Jul 19, 2016 at 01:53:11PM +0000, Jianfeng Tan wrote:
>>>> We find significant perfermance drop introduced by below commit,
>>>> when vhost example is started with --mergeable 0 and inside vm,
>>>> kernel virtio-net driver is used to do ip based forwarding.
>>>>
>>>> The root cause is that below commit adds support for
>>>> VIRTIO_NET_F_GUEST_TSO4 and VIRTIO_NET_F_GUEST_TSO6, and when
>>>> mergeable is disabled, it triggers big_packets path of virtio-net
>>>> driver. In this path, virtio driver uses 19 desc with 18 4K-sized
>>>> pages to receive each packet, so that it can receive a big packet
>>>> with size of 64K. But QEMU only creates 256 desc entries for each
>>>> vq, which results in that only 13 packets can be received. VM
>>>> kernel can quickly handle those packets and go to sleep (HLT).
>>>>
>>>> As QEMU has no option to set the desc entries of a vq, so here,
>>>> we disable VIRTIO_NET_F_GUEST_TSO4 and VIRTIO_NET_F_GUEST_TSO6
>>>> with VIRTIO_NET_F_HOST_TSO4 and VIRTIO_NET_F_HOST_TSO6 when we
>>>> disable tso of vhost example, to avoid VM kernel virtio driver
>>>> go into big_packets path.
>>>>
>>>> Fixes: 859b480d5afd ("vhost: add guest offload setting")
>>> And here you are patching vhost example to try to fix an "issue"
>>> in vhost lib, this is __logically__ wrong.
>>>
>>> 	--yliu
>> This is not an issue from vhost lib's perspective, vhost lib should provide
>> all features it supports by default.
> Bingo.., that's why "Fixes: 859b480d5afd ... " is wrong to me.
>    
>> Applications can enable/disable
>> features according to their own requirements.
> Yes, application can, but application normally doesn't do that. And
> as stated in my early reply, the qemu is the place you should go for
> all those options enabling/disabling, but not vhost (not vhost-example).
>
> I think it's sometimes more handy if we can do that by introducing
> some vhost-example options, and I guess that's why those options are
> given.
>
> In another word, there is nothing wrong about the commit 859b480d5afd,
> if you want to "fix" anything here, following commit is something
> we need fix:
>
>      Fixes: 9fd72e3cbd29 ("examples/vhost: add virtio offload")
>
> Because that commit just partially disables some TSO related features,
> letting the virtio net driver goes to the slow path.

Great, I see. And thanks for detailed clarification. I'll send v2.

>
>> And the vhost example after
>> this commit just triggers a slow path of virtio driver. So this fix just
>> makes sure vhost example does not go into the slow path by default.
> I have made a statement in the first time, that I am not object to
> have this patch at all.
>
> Meanwhile, the right "fix" is you need disable all TSO related features
> from QEMU, in such way, we should see no such issue from all vhost
> application, but not only this one, the one we used mostly internally.
>
> As you can see, it's more about the usage.

Yes, I agree this is the BKM we should adopt and recommend users to use.

Thanks,
Jianfeng

>
>> By the way, if a fix patch should only involve those commits it will change?
> IMO, logically, yes.
>
> 	--yliu
  

Patch

diff --git a/examples/vhost/main.c b/examples/vhost/main.c
index 3b98f42..92a9823 100644
--- a/examples/vhost/main.c
+++ b/examples/vhost/main.c
@@ -327,6 +327,8 @@  port_init(uint8_t port)
 	if (enable_tso == 0) {
 		rte_vhost_feature_disable(1ULL << VIRTIO_NET_F_HOST_TSO4);
 		rte_vhost_feature_disable(1ULL << VIRTIO_NET_F_HOST_TSO6);
+		rte_vhost_feature_disable(1ULL << VIRTIO_NET_F_GUEST_TSO4);
+		rte_vhost_feature_disable(1ULL << VIRTIO_NET_F_GUEST_TSO6);
 	}
 
 	rx_rings = (uint16_t)dev_info.max_rx_queues;