[dpdk-dev,v3] examples/vhost: fix perf regression

Message ID 1469061765-50814-1-git-send-email-jianfeng.tan@intel.com (mailing list archive)
State Accepted, archived
Delegated to: Yuanhan Liu
Headers

Commit Message

Jianfeng Tan July 21, 2016, 12:42 a.m. UTC
  We find significant perfermance drop introduced by below commit,
when vhost example is started with --mergeable 0 and inside vm,
kernel virtio-net driver is used to do ip based forwarding.

The commit, 859b480d5afd ("vhost: add guest offload setting"), adds
support for VIRTIO_NET_F_GUEST_TSO4 and VIRTIO_NET_F_GUEST_TSO6,
in vhost lib. But inside vhost example, the way to disable tso only
excludes the direction from virtio to vhost, but not the opposite
direction. When mergeable is disabled, it triggers big_packets path
of virtio-net driver to prepare to receive possible big packets with
size of 64K. Because mergeable is off, for each entry of avail ring,
virtio driver uses 19 desc chained together, with one desc pointing
to header, other 18 desc pointing to 4K-sized pages. But QEMU only
creates 256 desc entries for each vq, which results in that only 13
packets can be received. VM kernel can quickly handle those packets
and go to sleep (HLT).

As QEMU has no option to set the desc entries of a vq, so here,
we disable VIRTIO_NET_F_GUEST_TSO4 and VIRTIO_NET_F_GUEST_TSO6
with VIRTIO_NET_F_HOST_TSO4 and VIRTIO_NET_F_HOST_TSO6 when we
disable tso of vhost example, to avoid VM kernel virtio driver
go into big_packets path.

Fixes: 9fd72e3cbd29 ("examples/vhost: add virtio offload")

Reported-by: Qian Xu <qian.q.xu@intel.com>
Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>
---
v3: reword commit log.
v2: change the Fixes line to point to proper commit to fix.
 examples/vhost/main.c | 2 ++
 1 file changed, 2 insertions(+)
  

Comments

Yuanhan Liu July 21, 2016, 1:34 a.m. UTC | #1
On Thu, Jul 21, 2016 at 12:42:45AM +0000, Jianfeng Tan wrote:
> We find significant perfermance drop introduced by below commit,
> when vhost example is started with --mergeable 0 and inside vm,
> kernel virtio-net driver is used to do ip based forwarding.
> 
> The commit, 859b480d5afd ("vhost: add guest offload setting"), adds
> support for VIRTIO_NET_F_GUEST_TSO4 and VIRTIO_NET_F_GUEST_TSO6,
> in vhost lib. But inside vhost example, the way to disable tso only
> excludes the direction from virtio to vhost, but not the opposite
> direction. When mergeable is disabled, it triggers big_packets path
> of virtio-net driver to prepare to receive possible big packets with
> size of 64K. Because mergeable is off, for each entry of avail ring,
> virtio driver uses 19 desc chained together, with one desc pointing
> to header, other 18 desc pointing to 4K-sized pages. But QEMU only
> creates 256 desc entries for each vq, which results in that only 13
> packets can be received. VM kernel can quickly handle those packets
> and go to sleep (HLT).
> 
> As QEMU has no option to set the desc entries of a vq, so here,
> we disable VIRTIO_NET_F_GUEST_TSO4 and VIRTIO_NET_F_GUEST_TSO6
> with VIRTIO_NET_F_HOST_TSO4 and VIRTIO_NET_F_HOST_TSO6 when we
> disable tso of vhost example, to avoid VM kernel virtio driver
> go into big_packets path.
> 
> Fixes: 9fd72e3cbd29 ("examples/vhost: add virtio offload")
> 
> Reported-by: Qian Xu <qian.q.xu@intel.com>
> Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>
> ---
> v3: reword commit log.

Yes, much better. One minor nit: you forgot to carry the Tested-by from
Qian.

Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>

Thanks.

	--yliu
  
Xu, Qian Q July 21, 2016, 1:38 a.m. UTC | #2
Add the tested-by:)

Tested-by: Qian Xu <qian.q.xu@intel.com>

- Test Commit: 608487f3fc96704271c624d0f3fe9d7fb2187aea
- OS/Kernel: Fedora 21/4.1.13
- GCC: gcc (GCC) 4.9.2 20141101 (Red Hat 4.9.2-1)
- CPU: Intel(R) Xeon(R) CPU E5-2695 v4 @ 2.10
- NIC: Intel(R) Ethernet Controller X710 for 10GbE SFP+
- Total 2 cases, 2 passed, 0 failed. 

Test Case1: Virtio-net IPV4 fwd performance with mergable=off

Summary: 
Launch the vhost-switch sample, and launch VM with 2 virtio-net devices, let 2 virtio-net run IPV4 fwd, send traffic to the NIC port and let the traffic go through 2 virtio-net devices. Check the performance.

Details: 
1. Bind one port to igb_uio. 
2. Run vhost switch sample with mergeable=0, disable mergeable. 
taskset -c 18-19 ./examples/vhost/build/vhost-switch -c 0xc0000 -n 4 --huge-dir /mnt/huge --socket-mem 1024,1024 -- -p 0x1 --mergeable 0 --vm2vm 0 3. Launch VM: 
taskset -c 22-23 \
/root/qemu-versions/qemu-2.6.0/x86_64-softmmu/qemu-system-x86_64 -name vm1 \ -cpu host -enable-kvm -m 2048 -object memory-backend-file,id=mem,size=2048M,mem-path=/mnt/huge,share=on -numa node,memdev=mem -mem-prealloc \ -smp cores=4 -drive file=/home/img/vm1.img  \ -chardev socket,id=char0,path=./vhost-net \ -netdev type=vhost-user,id=mynet1,chardev=char0,vhostforce \ -device virtio-net-pci,mac=52:54:00:00:00:01,netdev=mynet1,mrg_rxbuf=on \ -chardev socket,id=char1,path=./vhost-net \ -netdev type=vhost-user,id=mynet2,chardev=char1,vhostforce \ -device virtio-net-pci,mac=52:54:00:00:00:02,netdev=mynet2,mrg_rxbuf=on \ -netdev tap,id=ipvm1,ifname=tap3,script=/etc/qemu-ifup -device rtl8139,netdev=ipvm1,id=net0,mac=00:00:00:00:10:01 \ -vnc :3 -daemonize 4. Set IPV4 fwd rules in VM: 
virtio1=$1
virtio2=$2
systemctl stop firewalld.service
systemctl disable firewalld.service
systemctl stop ip6tables.service
systemctl disable ip6tables.service
systemctl stop iptables.service
systemctl disable iptables.service
systemctl stop NetworkManager.service
echo 1 >/proc/sys/net/ipv4/ip_forward
ip addr add 192.168.1.2/24 dev $virtio1
ip neigh add 192.168.1.1 lladdr 00:00:10:00:24:00 dev $virtio1 ip link set dev $virtio1 up

ip addr add 192.168.2.2/24 dev $virtio2
ip neigh add 192.168.2.1 lladdr 00:00:10:00:24:01 dev $virtio2 ip link set dev $virtio2 up

5. Send traffic to NIC and see the performance back from virtio2. The performance is back with the patch. 

Test Case2: Virtio-net IPV4 fwd performance with mergable=on Similar steps, just one feature set is different, set mergable=1 in the vhost-switch sample, then the performance is good as before.

-----Original Message-----
From: Yuanhan Liu [mailto:yuanhan.liu@linux.intel.com] 
Sent: Thursday, July 21, 2016 9:34 AM
To: Tan, Jianfeng <jianfeng.tan@intel.com>
Cc: dev@dpdk.org; Wang, Zhihong <zhihong.wang@intel.com>; Xu, Qian Q <qian.q.xu@intel.com>
Subject: Re: [PATCH v3] examples/vhost: fix perf regression

On Thu, Jul 21, 2016 at 12:42:45AM +0000, Jianfeng Tan wrote:
> We find significant perfermance drop introduced by below commit, when 
> vhost example is started with --mergeable 0 and inside vm, kernel 
> virtio-net driver is used to do ip based forwarding.
> 
> The commit, 859b480d5afd ("vhost: add guest offload setting"), adds 
> support for VIRTIO_NET_F_GUEST_TSO4 and VIRTIO_NET_F_GUEST_TSO6, in 
> vhost lib. But inside vhost example, the way to disable tso only 
> excludes the direction from virtio to vhost, but not the opposite 
> direction. When mergeable is disabled, it triggers big_packets path of 
> virtio-net driver to prepare to receive possible big packets with size 
> of 64K. Because mergeable is off, for each entry of avail ring, virtio 
> driver uses 19 desc chained together, with one desc pointing to 
> header, other 18 desc pointing to 4K-sized pages. But QEMU only 
> creates 256 desc entries for each vq, which results in that only 13 
> packets can be received. VM kernel can quickly handle those packets 
> and go to sleep (HLT).
> 
> As QEMU has no option to set the desc entries of a vq, so here, we 
> disable VIRTIO_NET_F_GUEST_TSO4 and VIRTIO_NET_F_GUEST_TSO6 with 
> VIRTIO_NET_F_HOST_TSO4 and VIRTIO_NET_F_HOST_TSO6 when we disable tso 
> of vhost example, to avoid VM kernel virtio driver go into big_packets 
> path.
> 
> Fixes: 9fd72e3cbd29 ("examples/vhost: add virtio offload")
> 
> Reported-by: Qian Xu <qian.q.xu@intel.com>
> Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>
> ---
> v3: reword commit log.

Yes, much better. One minor nit: you forgot to carry the Tested-by from Qian.

Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>

Thanks.

	--yliu
  
Thomas Monjalon July 22, 2016, 9:59 a.m. UTC | #3
2016-07-21 09:34, Yuanhan Liu:
> On Thu, Jul 21, 2016 at 12:42:45AM +0000, Jianfeng Tan wrote:
> > We find significant perfermance drop introduced by below commit,
> > when vhost example is started with --mergeable 0 and inside vm,
> > kernel virtio-net driver is used to do ip based forwarding.
> > 
> > The commit, 859b480d5afd ("vhost: add guest offload setting"), adds
> > support for VIRTIO_NET_F_GUEST_TSO4 and VIRTIO_NET_F_GUEST_TSO6,
> > in vhost lib. But inside vhost example, the way to disable tso only
> > excludes the direction from virtio to vhost, but not the opposite
> > direction. When mergeable is disabled, it triggers big_packets path
> > of virtio-net driver to prepare to receive possible big packets with
> > size of 64K. Because mergeable is off, for each entry of avail ring,
> > virtio driver uses 19 desc chained together, with one desc pointing
> > to header, other 18 desc pointing to 4K-sized pages. But QEMU only
> > creates 256 desc entries for each vq, which results in that only 13
> > packets can be received. VM kernel can quickly handle those packets
> > and go to sleep (HLT).
> > 
> > As QEMU has no option to set the desc entries of a vq, so here,
> > we disable VIRTIO_NET_F_GUEST_TSO4 and VIRTIO_NET_F_GUEST_TSO6
> > with VIRTIO_NET_F_HOST_TSO4 and VIRTIO_NET_F_HOST_TSO6 when we
> > disable tso of vhost example, to avoid VM kernel virtio driver
> > go into big_packets path.
> > 
> > Fixes: 9fd72e3cbd29 ("examples/vhost: add virtio offload")
> > 
> > Reported-by: Qian Xu <qian.q.xu@intel.com>
> > Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>
> > ---
> > v3: reword commit log.
> 
> Yes, much better. One minor nit: you forgot to carry the Tested-by from
> Qian.
> 
> Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>

Applied, thanks
  

Patch

diff --git a/examples/vhost/main.c b/examples/vhost/main.c
index 3b98f42..92a9823 100644
--- a/examples/vhost/main.c
+++ b/examples/vhost/main.c
@@ -327,6 +327,8 @@  port_init(uint8_t port)
 	if (enable_tso == 0) {
 		rte_vhost_feature_disable(1ULL << VIRTIO_NET_F_HOST_TSO4);
 		rte_vhost_feature_disable(1ULL << VIRTIO_NET_F_HOST_TSO6);
+		rte_vhost_feature_disable(1ULL << VIRTIO_NET_F_GUEST_TSO4);
+		rte_vhost_feature_disable(1ULL << VIRTIO_NET_F_GUEST_TSO6);
 	}
 
 	rx_rings = (uint16_t)dev_info.max_rx_queues;