Message ID | 1463748604-27251-1-git-send-email-i.maximets@samsung.com (mailing list archive) |
---|---|
State | Rejected, archived |
Delegated to: | Yuanhan Liu |
Headers |
Return-Path: <dev-bounces@dpdk.org> X-Original-To: patchwork@dpdk.org Delivered-To: patchwork@dpdk.org Received: from [92.243.14.124] (localhost [IPv6:::1]) by dpdk.org (Postfix) with ESMTP id 383B62142; Fri, 20 May 2016 14:50:17 +0200 (CEST) Received: from mailout4.w1.samsung.com (mailout4.w1.samsung.com [210.118.77.14]) by dpdk.org (Postfix) with ESMTP id 0E70DADB9 for <dev@dpdk.org>; Fri, 20 May 2016 14:50:16 +0200 (CEST) Received: from eucpsbgm2.samsung.com (unknown [203.254.199.245]) by mailout4.w1.samsung.com (Oracle Communications Messaging Server 7.0.5.31.0 64bit (built May 5 2014)) with ESMTP id <0O7H0098E7NQOP10@mailout4.w1.samsung.com> for dev@dpdk.org; Fri, 20 May 2016 13:50:14 +0100 (BST) X-AuditID: cbfec7f5-f792a6d000001302-9d-573f08058185 Received: from eusync4.samsung.com ( [203.254.199.214]) by eucpsbgm2.samsung.com (EUCPMTA) with SMTP id 16.59.04866.5080F375; Fri, 20 May 2016 13:50:13 +0100 (BST) Received: from imaximets.rnd.samsung.ru ([106.109.129.180]) by eusync4.samsung.com (Oracle Communications Messaging Server 7.0.5.31.0 64bit (built May 5 2014)) with ESMTPA id <0O7H00H757NKRR90@eusync4.samsung.com>; Fri, 20 May 2016 13:50:13 +0100 (BST) From: Ilya Maximets <i.maximets@samsung.com> To: dev@dpdk.org, Huawei Xie <huawei.xie@intel.com>, Yuanhan Liu <yuanhan.liu@linux.intel.com> Cc: Dyasly Sergey <s.dyasly@samsung.com>, Heetae Ahn <heetae82.ahn@samsung.com>, Jianfeng Tan <jianfeng.tan@intel.com>, Ilya Maximets <i.maximets@samsung.com> Date: Fri, 20 May 2016 15:50:04 +0300 Message-id: <1463748604-27251-1-git-send-email-i.maximets@samsung.com> X-Mailer: git-send-email 2.5.0 X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFvrCJMWRmVeSWpSXmKPExsVy+t/xa7qsHPbhBjPvGFm8+7SdyWLa59vs Fu0zzzJZXGn/yW7RPfsLm8Xk2VIW1ydcYHVg9/i1YCmrx+I9L5k85p0M9OjbsooxgCWKyyYl NSezLLVI3y6BK+P27Q6Wgm+qFf/2sjcwzpXrYuTkkBAwkbi05i4zhC0mceHeerYuRi4OIYGl jBKb1sxnhXBamSSmnZvIDlLFJqAjcWr1EUYQW0QgQeLI/t9gRcwCixgl9m/bBFYkLGAvcfbU JpYuRg4OFgFVic558iBhXgE3iffPjrKBhCUE5CQWXEifwMi9gJFhFaNoamlyQXFSeq6RXnFi bnFpXrpecn7uJkZIWHzdwbj0mNUhRgEORiUe3gw7u3Ah1sSy4srcQ4wSHMxKIrxRbPbhQrwp iZVVqUX58UWlOanFhxilOViUxHln7nofIiSQnliSmp2aWpBaBJNl4uCUamAstfnK9fHGosfi 76p7k09qxevE8TU6blhVGjFpYczKlkf3/M/6y5fzHzpY57zPpFjo74qa9rbK4w4OZ72ver6q Fl/132xdp8z8OqXni13jspvPJTG+SGn7OPPaDs1/VUvjytYFFva+stLtES0pNVC+2fUt/+uH Wbp9M4K9/TqFow24vzws3KjEUpyRaKjFXFScCABCQ6ACBwIAAA== Subject: [dpdk-dev] [PATCH] vhost: fix segfault on bad descriptor address. X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK <dev.dpdk.org> List-Unsubscribe: <http://dpdk.org/ml/options/dev>, <mailto:dev-request@dpdk.org?subject=unsubscribe> List-Archive: <http://dpdk.org/ml/archives/dev/> List-Post: <mailto:dev@dpdk.org> List-Help: <mailto:dev-request@dpdk.org?subject=help> List-Subscribe: <http://dpdk.org/ml/listinfo/dev>, <mailto:dev-request@dpdk.org?subject=subscribe> Errors-To: dev-bounces@dpdk.org Sender: "dev" <dev-bounces@dpdk.org> |
Commit Message
Ilya Maximets
May 20, 2016, 12:50 p.m. UTC
In current implementation guest application can reinitialize vrings
by executing start after stop. In the same time host application
can still poll virtqueue while device stopped in guest and it will
crash with segmentation fault while vring reinitialization because
of dereferencing of bad descriptor addresses.
OVS crash for example:
<------------------------------------------------------------------------>
[test-pmd inside guest VM]
testpmd> port stop all
Stopping ports...
Checking link statuses...
Port 0 Link Up - speed 10000 Mbps - full-duplex
Done
testpmd> port config all rxq 2
testpmd> port config all txq 2
testpmd> port start all
Configuring Port 0 (socket 0)
Port 0: 52:54:00:CB:44:C8
Checking link statuses...
Port 0 Link Up - speed 10000 Mbps - full-duplex
Done
[OVS on host]
Program received signal SIGSEGV, Segmentation fault.
rte_memcpy (n=2056, src=0xc, dst=0x7ff4d5247000) at rte_memcpy.h
(gdb) bt
#0 rte_memcpy (n=2056, src=0xc, dst=0x7ff4d5247000)
#1 copy_desc_to_mbuf
#2 rte_vhost_dequeue_burst
#3 netdev_dpdk_vhost_rxq_recv
...
(gdb) bt full
#0 rte_memcpy
...
#1 copy_desc_to_mbuf
desc_addr = 0
mbuf_offset = 0
desc_offset = 12
...
<------------------------------------------------------------------------>
Fix that by checking addresses of descriptors before using them.
Note: For mergeable buffers this patch checks only guest's address for
zero, but in non-meargeable case host's address checked. This is done
because checking of host's address in mergeable case requires additional
refactoring to keep virtqueue in consistent state in case of error.
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
---
Actually, current virtio implementation looks broken for me. Because
'virtio_dev_start' breaks virtqueue while it still available from the vhost
side.
There was 2 patches about this behaviour:
1. a85786dc816f ("virtio: fix states handling during initialization")
2. 9a0615af7746 ("virtio: fix restart")
The second patch fixes somehow issue intoduced in the first patch, but actually
also breaks vhost in the way described above.
It's not pretty clear for me what to do in current situation with virtio,
because it will be broken for guest application even if vhost will not crash.
May be it'll be better to forbid stopping of virtio device and force user to
exit and start again (may be implemented in hidden from user way)?
This patch adds additional sane checks, so it should be applied anyway, IMHO.
lib/librte_vhost/vhost_rxtx.c | 17 ++++++++++++-----
1 file changed, 12 insertions(+), 5 deletions(-)
Comments
On Fri, May 20, 2016 at 03:50:04PM +0300, Ilya Maximets wrote: > In current implementation guest application can reinitialize vrings > by executing start after stop. In the same time host application > can still poll virtqueue while device stopped in guest and it will > crash with segmentation fault while vring reinitialization because > of dereferencing of bad descriptor addresses. > > OVS crash for example: > <------------------------------------------------------------------------> > [test-pmd inside guest VM] > > testpmd> port stop all > Stopping ports... > Checking link statuses... > Port 0 Link Up - speed 10000 Mbps - full-duplex > Done > testpmd> port config all rxq 2 > testpmd> port config all txq 2 > testpmd> port start all > Configuring Port 0 (socket 0) > Port 0: 52:54:00:CB:44:C8 > Checking link statuses... > Port 0 Link Up - speed 10000 Mbps - full-duplex > Done I actually didn't manage to reproduce it on my side, with the vhost-example instead of OVS though. Is that all the commands to reproduce it, and run them just after start test-pmd? > [OVS on host] > Program received signal SIGSEGV, Segmentation fault. > rte_memcpy (n=2056, src=0xc, dst=0x7ff4d5247000) at rte_memcpy.h > > (gdb) bt > #0 rte_memcpy (n=2056, src=0xc, dst=0x7ff4d5247000) > #1 copy_desc_to_mbuf > #2 rte_vhost_dequeue_burst > #3 netdev_dpdk_vhost_rxq_recv > ... > > (gdb) bt full > #0 rte_memcpy > ... > #1 copy_desc_to_mbuf > desc_addr = 0 > mbuf_offset = 0 > desc_offset = 12 > ... > <------------------------------------------------------------------------> > > Fix that by checking addresses of descriptors before using them. > > Note: For mergeable buffers this patch checks only guest's address for > zero, but in non-meargeable case host's address checked. This is done > because checking of host's address in mergeable case requires additional > refactoring to keep virtqueue in consistent state in case of error. > > Signed-off-by: Ilya Maximets <i.maximets@samsung.com> > --- > > Actually, current virtio implementation looks broken for me. Because > 'virtio_dev_start' breaks virtqueue while it still available from the vhost > side. > > There was 2 patches about this behaviour: > > 1. a85786dc816f ("virtio: fix states handling during initialization") > 2. 9a0615af7746 ("virtio: fix restart") > > The second patch fixes somehow issue intoduced in the first patch, but actually > also breaks vhost in the way described above. > It's not pretty clear for me what to do in current situation with virtio, > because it will be broken for guest application even if vhost will not crash. > > May be it'll be better to forbid stopping of virtio device and force user to > exit and start again (may be implemented in hidden from user way)? > > This patch adds additional sane checks, so it should be applied anyway, IMHO. Agreed. --yliu
On 23.05.2016 13:57, Yuanhan Liu wrote: > On Fri, May 20, 2016 at 03:50:04PM +0300, Ilya Maximets wrote: >> In current implementation guest application can reinitialize vrings >> by executing start after stop. In the same time host application >> can still poll virtqueue while device stopped in guest and it will >> crash with segmentation fault while vring reinitialization because >> of dereferencing of bad descriptor addresses. >> >> OVS crash for example: >> <------------------------------------------------------------------------> >> [test-pmd inside guest VM] >> >> testpmd> port stop all >> Stopping ports... >> Checking link statuses... >> Port 0 Link Up - speed 10000 Mbps - full-duplex >> Done >> testpmd> port config all rxq 2 >> testpmd> port config all txq 2 >> testpmd> port start all >> Configuring Port 0 (socket 0) >> Port 0: 52:54:00:CB:44:C8 >> Checking link statuses... >> Port 0 Link Up - speed 10000 Mbps - full-duplex >> Done > > I actually didn't manage to reproduce it on my side, with the > vhost-example instead of OVS though. Is that all the commands > to reproduce it, and run them just after start test-pmd? Actually, I think, packet flow should be enabled while performing above actions and some traffic already should be sent through port to change last used idx on vhost side. Something like: start ..wait a while.. see that packets are flowing. stop port stop port config port config port start > >> [OVS on host] >> Program received signal SIGSEGV, Segmentation fault. >> rte_memcpy (n=2056, src=0xc, dst=0x7ff4d5247000) at rte_memcpy.h >> >> (gdb) bt >> #0 rte_memcpy (n=2056, src=0xc, dst=0x7ff4d5247000) >> #1 copy_desc_to_mbuf >> #2 rte_vhost_dequeue_burst >> #3 netdev_dpdk_vhost_rxq_recv >> ... >> >> (gdb) bt full >> #0 rte_memcpy >> ... >> #1 copy_desc_to_mbuf >> desc_addr = 0 >> mbuf_offset = 0 >> desc_offset = 12 >> ... >> <------------------------------------------------------------------------> >> >> Fix that by checking addresses of descriptors before using them. >> >> Note: For mergeable buffers this patch checks only guest's address for >> zero, but in non-meargeable case host's address checked. This is done >> because checking of host's address in mergeable case requires additional >> refactoring to keep virtqueue in consistent state in case of error. >> >> Signed-off-by: Ilya Maximets <i.maximets@samsung.com> >> --- >> >> Actually, current virtio implementation looks broken for me. Because >> 'virtio_dev_start' breaks virtqueue while it still available from the vhost >> side. >> >> There was 2 patches about this behaviour: >> >> 1. a85786dc816f ("virtio: fix states handling during initialization") >> 2. 9a0615af7746 ("virtio: fix restart") >> >> The second patch fixes somehow issue intoduced in the first patch, but actually >> also breaks vhost in the way described above. >> It's not pretty clear for me what to do in current situation with virtio, >> because it will be broken for guest application even if vhost will not crash. >> >> May be it'll be better to forbid stopping of virtio device and force user to >> exit and start again (may be implemented in hidden from user way)? >> >> This patch adds additional sane checks, so it should be applied anyway, IMHO. > > Agreed. > > --yliu > >
Ping. Best regards, Ilya Maximets. On 23.05.2016 14:04, Ilya Maximets wrote: > On 23.05.2016 13:57, Yuanhan Liu wrote: >> On Fri, May 20, 2016 at 03:50:04PM +0300, Ilya Maximets wrote: >>> In current implementation guest application can reinitialize vrings >>> by executing start after stop. In the same time host application >>> can still poll virtqueue while device stopped in guest and it will >>> crash with segmentation fault while vring reinitialization because >>> of dereferencing of bad descriptor addresses. >>> >>> OVS crash for example: >>> <------------------------------------------------------------------------> >>> [test-pmd inside guest VM] >>> >>> testpmd> port stop all >>> Stopping ports... >>> Checking link statuses... >>> Port 0 Link Up - speed 10000 Mbps - full-duplex >>> Done >>> testpmd> port config all rxq 2 >>> testpmd> port config all txq 2 >>> testpmd> port start all >>> Configuring Port 0 (socket 0) >>> Port 0: 52:54:00:CB:44:C8 >>> Checking link statuses... >>> Port 0 Link Up - speed 10000 Mbps - full-duplex >>> Done >> >> I actually didn't manage to reproduce it on my side, with the >> vhost-example instead of OVS though. Is that all the commands >> to reproduce it, and run them just after start test-pmd? > > Actually, I think, packet flow should be enabled while performing > above actions and some traffic already should be sent through port > to change last used idx on vhost side. > > Something like: > start > ..wait a while.. see that packets are flowing. > stop > port stop > port config > port config > port start >> >>> [OVS on host] >>> Program received signal SIGSEGV, Segmentation fault. >>> rte_memcpy (n=2056, src=0xc, dst=0x7ff4d5247000) at rte_memcpy.h >>> >>> (gdb) bt >>> #0 rte_memcpy (n=2056, src=0xc, dst=0x7ff4d5247000) >>> #1 copy_desc_to_mbuf >>> #2 rte_vhost_dequeue_burst >>> #3 netdev_dpdk_vhost_rxq_recv >>> ... >>> >>> (gdb) bt full >>> #0 rte_memcpy >>> ... >>> #1 copy_desc_to_mbuf >>> desc_addr = 0 >>> mbuf_offset = 0 >>> desc_offset = 12 >>> ... >>> <------------------------------------------------------------------------> >>> >>> Fix that by checking addresses of descriptors before using them. >>> >>> Note: For mergeable buffers this patch checks only guest's address for >>> zero, but in non-meargeable case host's address checked. This is done >>> because checking of host's address in mergeable case requires additional >>> refactoring to keep virtqueue in consistent state in case of error. >>> >>> Signed-off-by: Ilya Maximets <i.maximets@samsung.com> >>> --- >>> >>> Actually, current virtio implementation looks broken for me. Because >>> 'virtio_dev_start' breaks virtqueue while it still available from the vhost >>> side. >>> >>> There was 2 patches about this behaviour: >>> >>> 1. a85786dc816f ("virtio: fix states handling during initialization") >>> 2. 9a0615af7746 ("virtio: fix restart") >>> >>> The second patch fixes somehow issue intoduced in the first patch, but actually >>> also breaks vhost in the way described above. >>> It's not pretty clear for me what to do in current situation with virtio, >>> because it will be broken for guest application even if vhost will not crash. >>> >>> May be it'll be better to forbid stopping of virtio device and force user to >>> exit and start again (may be implemented in hidden from user way)? >>> >>> This patch adds additional sane checks, so it should be applied anyway, IMHO. >> >> Agreed. >> >> --yliu >> >>
Hi, > -----Original Message----- > From: Ilya Maximets [mailto:i.maximets@samsung.com] > Sent: Friday, May 20, 2016 8:50 PM > To: dev@dpdk.org; Xie, Huawei; Yuanhan Liu > Cc: Dyasly Sergey; Heetae Ahn; Tan, Jianfeng; Ilya Maximets > Subject: [PATCH] vhost: fix segfault on bad descriptor address. > > In current implementation guest application can reinitialize vrings > by executing start after stop. In the same time host application > can still poll virtqueue while device stopped in guest and it will > crash with segmentation fault while vring reinitialization because > of dereferencing of bad descriptor addresses. > > OVS crash for example: > <------------------------------------------------------------------------> > [test-pmd inside guest VM] > > testpmd> port stop all > Stopping ports... > Checking link statuses... > Port 0 Link Up - speed 10000 Mbps - full-duplex > Done > testpmd> port config all rxq 2 > testpmd> port config all txq 2 > testpmd> port start all > Configuring Port 0 (socket 0) > Port 0: 52:54:00:CB:44:C8 > Checking link statuses... > Port 0 Link Up - speed 10000 Mbps - full-duplex > Done > > [OVS on host] > Program received signal SIGSEGV, Segmentation fault. > rte_memcpy (n=2056, src=0xc, dst=0x7ff4d5247000) at > rte_memcpy.h > > (gdb) bt > #0 rte_memcpy (n=2056, src=0xc, dst=0x7ff4d5247000) > #1 copy_desc_to_mbuf > #2 rte_vhost_dequeue_burst > #3 netdev_dpdk_vhost_rxq_recv > ... > > (gdb) bt full > #0 rte_memcpy > ... > #1 copy_desc_to_mbuf > desc_addr = 0 > mbuf_offset = 0 > desc_offset = 12 > ... > <------------------------------------------------------------------------> > > Fix that by checking addresses of descriptors before using them. > > Note: For mergeable buffers this patch checks only guest's address for > zero, but in non-meargeable case host's address checked. This is done > because checking of host's address in mergeable case requires additional > refactoring to keep virtqueue in consistent state in case of error. I agree with you that it should be fixed because malicious guest could launch DOS attack on vswitch with the current implementation. But I don't understand why you do not fix the mergable case in copy_mbuf_to_desc_mergable() on where gpa_to_vva() happens? And the change in fill_vec_buf(), checking !vq->desc[idx].addr, make any sense? Thanks, Jianfeng
On 30.05.2016 15:00, Tan, Jianfeng wrote: > Hi, > >> -----Original Message----- >> From: Ilya Maximets [mailto:i.maximets@samsung.com] >> Sent: Friday, May 20, 2016 8:50 PM >> To: dev@dpdk.org; Xie, Huawei; Yuanhan Liu >> Cc: Dyasly Sergey; Heetae Ahn; Tan, Jianfeng; Ilya Maximets >> Subject: [PATCH] vhost: fix segfault on bad descriptor address. >> >> In current implementation guest application can reinitialize vrings >> by executing start after stop. In the same time host application >> can still poll virtqueue while device stopped in guest and it will >> crash with segmentation fault while vring reinitialization because >> of dereferencing of bad descriptor addresses. >> >> OVS crash for example: >> <------------------------------------------------------------------------> >> [test-pmd inside guest VM] >> >> testpmd> port stop all >> Stopping ports... >> Checking link statuses... >> Port 0 Link Up - speed 10000 Mbps - full-duplex >> Done >> testpmd> port config all rxq 2 >> testpmd> port config all txq 2 >> testpmd> port start all >> Configuring Port 0 (socket 0) >> Port 0: 52:54:00:CB:44:C8 >> Checking link statuses... >> Port 0 Link Up - speed 10000 Mbps - full-duplex >> Done >> >> [OVS on host] >> Program received signal SIGSEGV, Segmentation fault. >> rte_memcpy (n=2056, src=0xc, dst=0x7ff4d5247000) at >> rte_memcpy.h >> >> (gdb) bt >> #0 rte_memcpy (n=2056, src=0xc, dst=0x7ff4d5247000) >> #1 copy_desc_to_mbuf >> #2 rte_vhost_dequeue_burst >> #3 netdev_dpdk_vhost_rxq_recv >> ... >> >> (gdb) bt full >> #0 rte_memcpy >> ... >> #1 copy_desc_to_mbuf >> desc_addr = 0 >> mbuf_offset = 0 >> desc_offset = 12 >> ... >> <------------------------------------------------------------------------> >> >> Fix that by checking addresses of descriptors before using them. >> >> Note: For mergeable buffers this patch checks only guest's address for >> zero, but in non-meargeable case host's address checked. This is done >> because checking of host's address in mergeable case requires additional >> refactoring to keep virtqueue in consistent state in case of error. > > > I agree with you that it should be fixed because malicious guest could launch > DOS attack on vswitch with the current implementation. > > But I don't understand why you do not fix the mergable case in > copy_mbuf_to_desc_mergable() on where gpa_to_vva() happens? And the change in > fill_vec_buf(), checking !vq->desc[idx].addr, make any sense? > > Thanks, > Jianfeng Hi. As I said inside commit-message, checking of host's address in mergeable case requires additional refactoring to keep virtqueue in consistent state. There are few issues with checking inside copy_mbuf_to_desc_mergable() : 1. Ring elements already reserved and we must fill them with some sane data before going out of virtio_dev_merge_rx(). 2. copy_mbuf_to_desc_mergable() can't return an error in current implementation (additional checking needed), otherwise used->idx will be decremented (I think, it's bad). Checking !vq->desc[idx].addr inside fill_vec_buf() make sense in case of virtio reinitialization, because guest's address will be zero (case described in commit-message). Checking of guest's address will not help in case of bad and not NULL address, but useful in this common case. Also, we can't catch bad address what we able to map, so, malicious guest could break vhost anyway. I agree, that checking of host's address is better, but this may be done later together with resolving above issues. Best regards, Ilya Maximets.
Hi Ilya, Generically speaking, this patch looks good to me. But I guess still need more time to check this issue later; I still failed to reproduce it on my side after all. So, please allow a late merge. Thanks. --yliu On Mon, May 30, 2016 at 02:05:07PM +0300, Ilya Maximets wrote: > Ping. > > Best regards, Ilya Maximets. > > On 23.05.2016 14:04, Ilya Maximets wrote: > > On 23.05.2016 13:57, Yuanhan Liu wrote: > >> On Fri, May 20, 2016 at 03:50:04PM +0300, Ilya Maximets wrote: > >>> In current implementation guest application can reinitialize vrings > >>> by executing start after stop. In the same time host application > >>> can still poll virtqueue while device stopped in guest and it will > >>> crash with segmentation fault while vring reinitialization because > >>> of dereferencing of bad descriptor addresses. > >>> > >>> OVS crash for example: > >>> <------------------------------------------------------------------------> > >>> [test-pmd inside guest VM] > >>> > >>> testpmd> port stop all > >>> Stopping ports... > >>> Checking link statuses... > >>> Port 0 Link Up - speed 10000 Mbps - full-duplex > >>> Done > >>> testpmd> port config all rxq 2 > >>> testpmd> port config all txq 2 > >>> testpmd> port start all > >>> Configuring Port 0 (socket 0) > >>> Port 0: 52:54:00:CB:44:C8 > >>> Checking link statuses... > >>> Port 0 Link Up - speed 10000 Mbps - full-duplex > >>> Done > >> > >> I actually didn't manage to reproduce it on my side, with the > >> vhost-example instead of OVS though. Is that all the commands > >> to reproduce it, and run them just after start test-pmd? > > > > Actually, I think, packet flow should be enabled while performing > > above actions and some traffic already should be sent through port > > to change last used idx on vhost side. > > > > Something like: > > start > > ..wait a while.. see that packets are flowing. > > stop > > port stop > > port config > > port config > > port start > >> > >>> [OVS on host] > >>> Program received signal SIGSEGV, Segmentation fault. > >>> rte_memcpy (n=2056, src=0xc, dst=0x7ff4d5247000) at rte_memcpy.h > >>> > >>> (gdb) bt > >>> #0 rte_memcpy (n=2056, src=0xc, dst=0x7ff4d5247000) > >>> #1 copy_desc_to_mbuf > >>> #2 rte_vhost_dequeue_burst > >>> #3 netdev_dpdk_vhost_rxq_recv > >>> ... > >>> > >>> (gdb) bt full > >>> #0 rte_memcpy > >>> ... > >>> #1 copy_desc_to_mbuf > >>> desc_addr = 0 > >>> mbuf_offset = 0 > >>> desc_offset = 12 > >>> ... > >>> <------------------------------------------------------------------------> > >>> > >>> Fix that by checking addresses of descriptors before using them. > >>> > >>> Note: For mergeable buffers this patch checks only guest's address for > >>> zero, but in non-meargeable case host's address checked. This is done > >>> because checking of host's address in mergeable case requires additional > >>> refactoring to keep virtqueue in consistent state in case of error. > >>> > >>> Signed-off-by: Ilya Maximets <i.maximets@samsung.com> > >>> --- > >>> > >>> Actually, current virtio implementation looks broken for me. Because > >>> 'virtio_dev_start' breaks virtqueue while it still available from the vhost > >>> side. > >>> > >>> There was 2 patches about this behaviour: > >>> > >>> 1. a85786dc816f ("virtio: fix states handling during initialization") > >>> 2. 9a0615af7746 ("virtio: fix restart") > >>> > >>> The second patch fixes somehow issue intoduced in the first patch, but actually > >>> also breaks vhost in the way described above. > >>> It's not pretty clear for me what to do in current situation with virtio, > >>> because it will be broken for guest application even if vhost will not crash. > >>> > >>> May be it'll be better to forbid stopping of virtio device and force user to > >>> exit and start again (may be implemented in hidden from user way)? > >>> > >>> This patch adds additional sane checks, so it should be applied anyway, IMHO. > >> > >> Agreed. > >> > >> --yliu > >> > >>
Hi, On 5/30/2016 8:24 PM, Ilya Maximets wrote: > On 30.05.2016 15:00, Tan, Jianfeng wrote: >> Hi, >> >>> -----Original Message----- >>> From: Ilya Maximets [mailto:i.maximets@samsung.com] >>> Sent: Friday, May 20, 2016 8:50 PM >>> To: dev@dpdk.org; Xie, Huawei; Yuanhan Liu >>> Cc: Dyasly Sergey; Heetae Ahn; Tan, Jianfeng; Ilya Maximets >>> Subject: [PATCH] vhost: fix segfault on bad descriptor address. >>> >>> In current implementation guest application can reinitialize vrings >>> by executing start after stop. In the same time host application >>> can still poll virtqueue while device stopped in guest and it will >>> crash with segmentation fault while vring reinitialization because >>> of dereferencing of bad descriptor addresses. >>> >>> OVS crash for example: >>> <------------------------------------------------------------------------> >>> [test-pmd inside guest VM] >>> >>> testpmd> port stop all >>> Stopping ports... >>> Checking link statuses... >>> Port 0 Link Up - speed 10000 Mbps - full-duplex >>> Done >>> testpmd> port config all rxq 2 >>> testpmd> port config all txq 2 >>> testpmd> port start all >>> Configuring Port 0 (socket 0) >>> Port 0: 52:54:00:CB:44:C8 >>> Checking link statuses... >>> Port 0 Link Up - speed 10000 Mbps - full-duplex >>> Done >>> >>> [OVS on host] >>> Program received signal SIGSEGV, Segmentation fault. >>> rte_memcpy (n=2056, src=0xc, dst=0x7ff4d5247000) at >>> rte_memcpy.h >>> >>> (gdb) bt >>> #0 rte_memcpy (n=2056, src=0xc, dst=0x7ff4d5247000) >>> #1 copy_desc_to_mbuf >>> #2 rte_vhost_dequeue_burst >>> #3 netdev_dpdk_vhost_rxq_recv >>> ... >>> >>> (gdb) bt full >>> #0 rte_memcpy >>> ... >>> #1 copy_desc_to_mbuf >>> desc_addr = 0 >>> mbuf_offset = 0 >>> desc_offset = 12 >>> ... >>> <------------------------------------------------------------------------> >>> >>> Fix that by checking addresses of descriptors before using them. >>> >>> Note: For mergeable buffers this patch checks only guest's address for >>> zero, but in non-meargeable case host's address checked. This is done >>> because checking of host's address in mergeable case requires additional >>> refactoring to keep virtqueue in consistent state in case of error. >> >> I agree with you that it should be fixed because malicious guest could launch >> DOS attack on vswitch with the current implementation. >> >> But I don't understand why you do not fix the mergable case in >> copy_mbuf_to_desc_mergable() on where gpa_to_vva() happens? And the change in >> fill_vec_buf(), checking !vq->desc[idx].addr, make any sense? >> >> Thanks, >> Jianfeng > Hi. > As I said inside commit-message, checking of host's address in mergeable case > requires additional refactoring to keep virtqueue in consistent state. > > There are few issues with checking inside copy_mbuf_to_desc_mergable() : > > 1. Ring elements already reserved and we must fill them with some > sane data before going out of virtio_dev_merge_rx(). > > 2. copy_mbuf_to_desc_mergable() can't return an error in current > implementation (additional checking needed), otherwise used->idx > will be decremented (I think, it's bad). Yes, currently there is no way to return these invalid desc back to virtio because there's no invalid flag in virtio_net_hdr to indicate this desc contains no pkt. I see kernel just skips those descriptors with bad addr. I think it may rely on reset of the virtio device to improve such situation. Another thing is that, your patch only checks the desc->addr, but we should check desc->addr + desc->len too, right? Thanks, Jianfeng > > > Checking !vq->desc[idx].addr inside fill_vec_buf() make sense in case of virtio > reinitialization, because guest's address will be zero (case described in > commit-message). Checking of guest's address will not help in case of bad and > not NULL address, but useful in this common case. > Also, we can't catch bad address what we able to map, so, malicious guest could > break vhost anyway. > > I agree, that checking of host's address is better, but this may be done later > together with resolving above issues. > > Best regards, Ilya Maximets.
On 31.05.2016 09:53, Tan, Jianfeng wrote: > Hi, > > > On 5/30/2016 8:24 PM, Ilya Maximets wrote: >> On 30.05.2016 15:00, Tan, Jianfeng wrote: >>> Hi, >>> >>>> -----Original Message----- >>>> From: Ilya Maximets [mailto:i.maximets@samsung.com] >>>> Sent: Friday, May 20, 2016 8:50 PM >>>> To: dev@dpdk.org; Xie, Huawei; Yuanhan Liu >>>> Cc: Dyasly Sergey; Heetae Ahn; Tan, Jianfeng; Ilya Maximets >>>> Subject: [PATCH] vhost: fix segfault on bad descriptor address. >>>> >>>> In current implementation guest application can reinitialize vrings >>>> by executing start after stop. In the same time host application >>>> can still poll virtqueue while device stopped in guest and it will >>>> crash with segmentation fault while vring reinitialization because >>>> of dereferencing of bad descriptor addresses. >>>> >>>> OVS crash for example: >>>> <------------------------------------------------------------------------> >>>> [test-pmd inside guest VM] >>>> >>>> testpmd> port stop all >>>> Stopping ports... >>>> Checking link statuses... >>>> Port 0 Link Up - speed 10000 Mbps - full-duplex >>>> Done >>>> testpmd> port config all rxq 2 >>>> testpmd> port config all txq 2 >>>> testpmd> port start all >>>> Configuring Port 0 (socket 0) >>>> Port 0: 52:54:00:CB:44:C8 >>>> Checking link statuses... >>>> Port 0 Link Up - speed 10000 Mbps - full-duplex >>>> Done >>>> >>>> [OVS on host] >>>> Program received signal SIGSEGV, Segmentation fault. >>>> rte_memcpy (n=2056, src=0xc, dst=0x7ff4d5247000) at >>>> rte_memcpy.h >>>> >>>> (gdb) bt >>>> #0 rte_memcpy (n=2056, src=0xc, dst=0x7ff4d5247000) >>>> #1 copy_desc_to_mbuf >>>> #2 rte_vhost_dequeue_burst >>>> #3 netdev_dpdk_vhost_rxq_recv >>>> ... >>>> >>>> (gdb) bt full >>>> #0 rte_memcpy >>>> ... >>>> #1 copy_desc_to_mbuf >>>> desc_addr = 0 >>>> mbuf_offset = 0 >>>> desc_offset = 12 >>>> ... >>>> <------------------------------------------------------------------------> >>>> >>>> Fix that by checking addresses of descriptors before using them. >>>> >>>> Note: For mergeable buffers this patch checks only guest's address for >>>> zero, but in non-meargeable case host's address checked. This is done >>>> because checking of host's address in mergeable case requires additional >>>> refactoring to keep virtqueue in consistent state in case of error. >>> >>> I agree with you that it should be fixed because malicious guest could launch >>> DOS attack on vswitch with the current implementation. >>> >>> But I don't understand why you do not fix the mergable case in >>> copy_mbuf_to_desc_mergable() on where gpa_to_vva() happens? And the change in >>> fill_vec_buf(), checking !vq->desc[idx].addr, make any sense? >>> >>> Thanks, >>> Jianfeng >> Hi. >> As I said inside commit-message, checking of host's address in mergeable case >> requires additional refactoring to keep virtqueue in consistent state. >> >> There are few issues with checking inside copy_mbuf_to_desc_mergable() : >> >> 1. Ring elements already reserved and we must fill them with some >> sane data before going out of virtio_dev_merge_rx(). >> >> 2. copy_mbuf_to_desc_mergable() can't return an error in current >> implementation (additional checking needed), otherwise used->idx >> will be decremented (I think, it's bad). > > Yes, currently there is no way to return these invalid desc back to virtio because there's no invalid flag in virtio_net_hdr to indicate this desc contains no pkt. I see kernel just skips those descriptors with bad addr. I think it may rely on reset of the virtio device to improve such situation. > > Another thing is that, your patch only checks the desc->addr, but we should check desc->addr + desc->len too, right? To do it fast we need to check whole range inside gpa_to_vva(), but even more refactoring is required for that. Also, this can be a different patch because checking of addr + len not required to fix original issue with virtio reconfiguration. > > Thanks, > Jianfeng > >> >> >> Checking !vq->desc[idx].addr inside fill_vec_buf() make sense in case of virtio >> reinitialization, because guest's address will be zero (case described in >> commit-message). Checking of guest's address will not help in case of bad and >> not NULL address, but useful in this common case. >> Also, we can't catch bad address what we able to map, so, malicious guest could >> break vhost anyway. >> >> I agree, that checking of host's address is better, but this may be done later >> together with resolving above issues. >> >> Best regards, Ilya Maximets.
OK. On 30.05.2016 17:25, Yuanhan Liu wrote: > Hi Ilya, > > Generically speaking, this patch looks good to me. But I guess still > need more time to check this issue later; I still failed to reproduce > it on my side after all. So, please allow a late merge. > > Thanks. > > --yliu > > On Mon, May 30, 2016 at 02:05:07PM +0300, Ilya Maximets wrote: >> Ping. >> >> Best regards, Ilya Maximets. >> >> On 23.05.2016 14:04, Ilya Maximets wrote: >>> On 23.05.2016 13:57, Yuanhan Liu wrote: >>>> On Fri, May 20, 2016 at 03:50:04PM +0300, Ilya Maximets wrote: >>>>> In current implementation guest application can reinitialize vrings >>>>> by executing start after stop. In the same time host application >>>>> can still poll virtqueue while device stopped in guest and it will >>>>> crash with segmentation fault while vring reinitialization because >>>>> of dereferencing of bad descriptor addresses. >>>>> >>>>> OVS crash for example: >>>>> <------------------------------------------------------------------------> >>>>> [test-pmd inside guest VM] >>>>> >>>>> testpmd> port stop all >>>>> Stopping ports... >>>>> Checking link statuses... >>>>> Port 0 Link Up - speed 10000 Mbps - full-duplex >>>>> Done >>>>> testpmd> port config all rxq 2 >>>>> testpmd> port config all txq 2 >>>>> testpmd> port start all >>>>> Configuring Port 0 (socket 0) >>>>> Port 0: 52:54:00:CB:44:C8 >>>>> Checking link statuses... >>>>> Port 0 Link Up - speed 10000 Mbps - full-duplex >>>>> Done >>>> >>>> I actually didn't manage to reproduce it on my side, with the >>>> vhost-example instead of OVS though. Is that all the commands >>>> to reproduce it, and run them just after start test-pmd? >>> >>> Actually, I think, packet flow should be enabled while performing >>> above actions and some traffic already should be sent through port >>> to change last used idx on vhost side. >>> >>> Something like: >>> start >>> ..wait a while.. see that packets are flowing. >>> stop >>> port stop >>> port config >>> port config >>> port start >>>> >>>>> [OVS on host] >>>>> Program received signal SIGSEGV, Segmentation fault. >>>>> rte_memcpy (n=2056, src=0xc, dst=0x7ff4d5247000) at rte_memcpy.h >>>>> >>>>> (gdb) bt >>>>> #0 rte_memcpy (n=2056, src=0xc, dst=0x7ff4d5247000) >>>>> #1 copy_desc_to_mbuf >>>>> #2 rte_vhost_dequeue_burst >>>>> #3 netdev_dpdk_vhost_rxq_recv >>>>> ... >>>>> >>>>> (gdb) bt full >>>>> #0 rte_memcpy >>>>> ... >>>>> #1 copy_desc_to_mbuf >>>>> desc_addr = 0 >>>>> mbuf_offset = 0 >>>>> desc_offset = 12 >>>>> ... >>>>> <------------------------------------------------------------------------> >>>>> >>>>> Fix that by checking addresses of descriptors before using them. >>>>> >>>>> Note: For mergeable buffers this patch checks only guest's address for >>>>> zero, but in non-meargeable case host's address checked. This is done >>>>> because checking of host's address in mergeable case requires additional >>>>> refactoring to keep virtqueue in consistent state in case of error. >>>>> >>>>> Signed-off-by: Ilya Maximets <i.maximets@samsung.com> >>>>> --- >>>>> >>>>> Actually, current virtio implementation looks broken for me. Because >>>>> 'virtio_dev_start' breaks virtqueue while it still available from the vhost >>>>> side. >>>>> >>>>> There was 2 patches about this behaviour: >>>>> >>>>> 1. a85786dc816f ("virtio: fix states handling during initialization") >>>>> 2. 9a0615af7746 ("virtio: fix restart") >>>>> >>>>> The second patch fixes somehow issue intoduced in the first patch, but actually >>>>> also breaks vhost in the way described above. >>>>> It's not pretty clear for me what to do in current situation with virtio, >>>>> because it will be broken for guest application even if vhost will not crash. >>>>> >>>>> May be it'll be better to forbid stopping of virtio device and force user to >>>>> exit and start again (may be implemented in hidden from user way)? >>>>> >>>>> This patch adds additional sane checks, so it should be applied anyway, IMHO. >>>> >>>> Agreed. >>>> >>>> --yliu >>>> >>>> > >
On Fri, May 20, 2016 at 5:50 AM, Ilya Maximets <i.maximets@samsung.com> wrote: > In current implementation guest application can reinitialize vrings > by executing start after stop. In the same time host application > can still poll virtqueue while device stopped in guest and it will > crash with segmentation fault while vring reinitialization because > of dereferencing of bad descriptor addresses. > I see a performance regression with this patch at large packet sizes (> 768 bytes). rte_vhost_enqueue_burst is consuming 10% more cycles. Strangely, there's actually a ~1% performance improvement at small packet sizes. The regression happens with GCC 4.8.4 and 5.3.0, but not 6.1.1. AFAICT this is just the compiler generating bad code. One difference is that it's storing the offset on the stack instead of in a register. A workaround is to move the !desc_addr check outside the unlikely macros. --- a/lib/librte_vhost/vhost_rxtx.c > +++ b/lib/librte_vhost/vhost_rxtx.c > @@ -147,10 +147,10 @@ copy_mbuf_to_desc(struct virtio_net *dev, struct > vhost_virtqueue *vq, > struct virtio_net_hdr_mrg_rxbuf virtio_hdr = {{0, 0, 0, 0, 0, 0}, > 0}; > > desc = &vq->desc[desc_idx]; > - if (unlikely(desc->len < vq->vhost_hlen)) > + desc_addr = gpa_to_vva(dev, desc->addr); > + if (unlikely(desc->len < vq->vhost_hlen || !desc_addr)) > Workaround: change to "if (unlikely(desc->len < vq->vhost_hlen) || !desc_addr)". return -1; > - desc_addr = gpa_to_vva(dev, desc->addr); > rte_prefetch0((void *)(uintptr_t)desc_addr); > > virtio_enqueue_offload(m, &virtio_hdr.hdr); > @@ -184,6 +184,9 @@ copy_mbuf_to_desc(struct virtio_net *dev, struct > vhost_virtqueue *vq, > > desc = &vq->desc[desc->next]; > desc_addr = gpa_to_vva(dev, desc->addr); > + if (unlikely(!desc_addr)) > Workaround: change to "if (!desc_addr)". > + return -1; > + > desc_offset = 0; > desc_avail = desc->len; > } >
Hi, Rich. Thank you for testing and analysing. On 01.06.2016 01:06, Rich Lane wrote: > On Fri, May 20, 2016 at 5:50 AM, Ilya Maximets <i.maximets@samsung.com <mailto:i.maximets@samsung.com>> wrote: > > In current implementation guest application can reinitialize vrings > by executing start after stop. In the same time host application > can still poll virtqueue while device stopped in guest and it will > crash with segmentation fault while vring reinitialization because > of dereferencing of bad descriptor addresses. > > > I see a performance regression with this patch at large packet sizes (> 768 bytes). rte_vhost_enqueue_burst is consuming 10% more cycles. Strangely, there's actually a ~1% performance improvement at small packet sizes. > > The regression happens with GCC 4.8.4 and 5.3.0, but not 6.1.1. > > AFAICT this is just the compiler generating bad code. One difference is that it's storing the offset on the stack instead of in a register. A workaround is to move the !desc_addr check outside the unlikely macros. > > --- a/lib/librte_vhost/vhost_rxtx.c > +++ b/lib/librte_vhost/vhost_rxtx.c > @@ -147,10 +147,10 @@ copy_mbuf_to_desc(struct virtio_net *dev, struct vhost_virtqueue *vq, > struct virtio_net_hdr_mrg_rxbuf virtio_hdr = {{0, 0, 0, 0, 0, 0}, 0}; > > desc = &vq->desc[desc_idx]; > - if (unlikely(desc->len < vq->vhost_hlen)) > + desc_addr = gpa_to_vva(dev, desc->addr); > + if (unlikely(desc->len < vq->vhost_hlen || !desc_addr)) > > > Workaround: change to "if (unlikely(desc->len < vq->vhost_hlen) || !desc_addr)". > > return -1; > > > - desc_addr = gpa_to_vva(dev, desc->addr); > rte_prefetch0((void *)(uintptr_t)desc_addr); > > virtio_enqueue_offload(m, &virtio_hdr.hdr); > @@ -184,6 +184,9 @@ copy_mbuf_to_desc(struct virtio_net *dev, struct vhost_virtqueue *vq, > > desc = &vq->desc[desc->next]; > desc_addr = gpa_to_vva(dev, desc->addr); > + if (unlikely(!desc_addr)) > > > Workaround: change to "if (!desc_addr)". > > > + return -1; > + > desc_offset = 0; > desc_avail = desc->len; > } > What about other places? Is there same issues or it's only inside copy_mbuf_to_desc() ? Best regards, Ilya Maximets.
On Thu, Jun 2, 2016 at 3:46 AM, Ilya Maximets <i.maximets@samsung.com> wrote: > Hi, Rich. > Thank you for testing and analysing. > > On 01.06.2016 01:06, Rich Lane wrote: > > On Fri, May 20, 2016 at 5:50 AM, Ilya Maximets <i.maximets@samsung.com > <mailto:i.maximets@samsung.com>> wrote: > > > > In current implementation guest application can reinitialize vrings > > by executing start after stop. In the same time host application > > can still poll virtqueue while device stopped in guest and it will > > crash with segmentation fault while vring reinitialization because > > of dereferencing of bad descriptor addresses. > > > > > > I see a performance regression with this patch at large packet sizes (> > 768 bytes). rte_vhost_enqueue_burst is consuming 10% more cycles. > Strangely, there's actually a ~1% performance improvement at small packet > sizes. > > > > The regression happens with GCC 4.8.4 and 5.3.0, but not 6.1.1. > > > > AFAICT this is just the compiler generating bad code. One difference is > that it's storing the offset on the stack instead of in a register. A > workaround is to move the !desc_addr check outside the unlikely macros. > > > > --- a/lib/librte_vhost/vhost_rxtx.c > > +++ b/lib/librte_vhost/vhost_rxtx.c > > @@ -147,10 +147,10 @@ copy_mbuf_to_desc(struct virtio_net *dev, > struct vhost_virtqueue *vq, > > struct virtio_net_hdr_mrg_rxbuf virtio_hdr = {{0, 0, 0, 0, > 0, 0}, 0}; > > > > desc = &vq->desc[desc_idx]; > > - if (unlikely(desc->len < vq->vhost_hlen)) > > + desc_addr = gpa_to_vva(dev, desc->addr); > > + if (unlikely(desc->len < vq->vhost_hlen || !desc_addr)) > > > > > > Workaround: change to "if (unlikely(desc->len < vq->vhost_hlen) || > !desc_addr)". > > > > return -1; > > > > > > - desc_addr = gpa_to_vva(dev, desc->addr); > > rte_prefetch0((void *)(uintptr_t)desc_addr); > > > > virtio_enqueue_offload(m, &virtio_hdr.hdr); > > @@ -184,6 +184,9 @@ copy_mbuf_to_desc(struct virtio_net *dev, struct > vhost_virtqueue *vq, > > > > desc = &vq->desc[desc->next]; > > desc_addr = gpa_to_vva(dev, desc->addr); > > + if (unlikely(!desc_addr)) > > > > > > Workaround: change to "if (!desc_addr)". > > > > > > + return -1; > > + > > desc_offset = 0; > > desc_avail = desc->len; > > } > > > > What about other places? Is there same issues or it's only inside > copy_mbuf_to_desc() ? > Only copy_mbuf_to_desc has the issue.
On 02.06.2016 19:22, Rich Lane wrote: > On Thu, Jun 2, 2016 at 3:46 AM, Ilya Maximets <i.maximets@samsung.com <mailto:i.maximets@samsung.com>> wrote: > > Hi, Rich. > Thank you for testing and analysing. > > On 01.06.2016 01:06, Rich Lane wrote: > > On Fri, May 20, 2016 at 5:50 AM, Ilya Maximets <i.maximets@samsung.com <mailto:i.maximets@samsung.com> <mailto:i.maximets@samsung.com <mailto:i.maximets@samsung.com>>> wrote: > > > > In current implementation guest application can reinitialize vrings > > by executing start after stop. In the same time host application > > can still poll virtqueue while device stopped in guest and it will > > crash with segmentation fault while vring reinitialization because > > of dereferencing of bad descriptor addresses. > > > > > > I see a performance regression with this patch at large packet sizes (> 768 bytes). rte_vhost_enqueue_burst is consuming 10% more cycles. Strangely, there's actually a ~1% performance improvement at small packet sizes. > > > > The regression happens with GCC 4.8.4 and 5.3.0, but not 6.1.1. > > > > AFAICT this is just the compiler generating bad code. One difference is that it's storing the offset on the stack instead of in a register. A workaround is to move the !desc_addr check outside the unlikely macros. > > > > --- a/lib/librte_vhost/vhost_rxtx.c > > +++ b/lib/librte_vhost/vhost_rxtx.c > > @@ -147,10 +147,10 @@ copy_mbuf_to_desc(struct virtio_net *dev, struct vhost_virtqueue *vq, > > struct virtio_net_hdr_mrg_rxbuf virtio_hdr = {{0, 0, 0, 0, 0, 0}, 0}; > > > > desc = &vq->desc[desc_idx]; > > - if (unlikely(desc->len < vq->vhost_hlen)) > > + desc_addr = gpa_to_vva(dev, desc->addr); > > + if (unlikely(desc->len < vq->vhost_hlen || !desc_addr)) > > > > > > Workaround: change to "if (unlikely(desc->len < vq->vhost_hlen) || !desc_addr)". > > > > return -1; > > > > > > - desc_addr = gpa_to_vva(dev, desc->addr); > > rte_prefetch0((void *)(uintptr_t)desc_addr); > > > > virtio_enqueue_offload(m, &virtio_hdr.hdr); > > @@ -184,6 +184,9 @@ copy_mbuf_to_desc(struct virtio_net *dev, struct vhost_virtqueue *vq, > > > > desc = &vq->desc[desc->next]; > > desc_addr = gpa_to_vva(dev, desc->addr); > > + if (unlikely(!desc_addr)) > > > > > > Workaround: change to "if (!desc_addr)". > > > > > > + return -1; > > + > > desc_offset = 0; > > desc_avail = desc->len; > > } > > > > What about other places? Is there same issues or it's only inside copy_mbuf_to_desc() ? > > > Only copy_mbuf_to_desc has the issue. Ok. Actually, I can't reproduce this performance issue using gcc 4.8.5 from RHEL 7.2. I'm not sure if I should post v2 with above fixes. May be them could be applied while pushing patch to repository? Best regards, Ilya Maximets.
Hi, Sorry for the long delay. On Fri, May 20, 2016 at 03:50:04PM +0300, Ilya Maximets wrote: > In current implementation guest application can reinitialize vrings > by executing start after stop. In the same time host application > can still poll virtqueue while device stopped in guest and it will > crash with segmentation fault while vring reinitialization because > of dereferencing of bad descriptor addresses. Yes, you are right that vring will be reinitialized after restart. But even though, I don't see the reason it will cause a vhost crash, since the reinitialization will reset all the vring memeory by 0: memset(vq->vq_ring_virt_mem, 0, vq->vq_ring_size); That means those bad descriptors will be skipped, safely, at vhost side by: if (unlikely(desc->len < dev->vhost_hlen)) return -1; > > OVS crash for example: > <------------------------------------------------------------------------> > [test-pmd inside guest VM] > > testpmd> port stop all > Stopping ports... > Checking link statuses... > Port 0 Link Up - speed 10000 Mbps - full-duplex > Done > testpmd> port config all rxq 2 > testpmd> port config all txq 2 > testpmd> port start all > Configuring Port 0 (socket 0) > Port 0: 52:54:00:CB:44:C8 > Checking link statuses... > Port 0 Link Up - speed 10000 Mbps - full-duplex > Done > > [OVS on host] > Program received signal SIGSEGV, Segmentation fault. > rte_memcpy (n=2056, src=0xc, dst=0x7ff4d5247000) at rte_memcpy.h Interesting, so it bypasses the above check since desc->len is non-zero while desc->addr is zero. The size (2056) also looks weird. Do you mind to check this issue a bit deeper, say why desc->addr is zero, however, desc->len is not? > (gdb) bt > #0 rte_memcpy (n=2056, src=0xc, dst=0x7ff4d5247000) > #1 copy_desc_to_mbuf > #2 rte_vhost_dequeue_burst > #3 netdev_dpdk_vhost_rxq_recv > ... > > (gdb) bt full > #0 rte_memcpy > ... > #1 copy_desc_to_mbuf > desc_addr = 0 > mbuf_offset = 0 > desc_offset = 12 > ... > <------------------------------------------------------------------------> > > Fix that by checking addresses of descriptors before using them. > > Note: For mergeable buffers this patch checks only guest's address for > zero, but in non-meargeable case host's address checked. This is done > because checking of host's address in mergeable case requires additional > refactoring to keep virtqueue in consistent state in case of error. > > Signed-off-by: Ilya Maximets <i.maximets@samsung.com> > --- > > Actually, current virtio implementation looks broken for me. Because > 'virtio_dev_start' breaks virtqueue while it still available from the vhost > side. Yes, this sounds buggy. Maybe we could not reset the avail idx, in such case vhost dequeue/enqueue will just return as there are no more packets to dequeue and no more space to enqueue, respectively? --yliu
On 01.07.2016 10:35, Yuanhan Liu wrote: > Hi, > > Sorry for the long delay. > > On Fri, May 20, 2016 at 03:50:04PM +0300, Ilya Maximets wrote: >> In current implementation guest application can reinitialize vrings >> by executing start after stop. In the same time host application >> can still poll virtqueue while device stopped in guest and it will >> crash with segmentation fault while vring reinitialization because >> of dereferencing of bad descriptor addresses. > > Yes, you are right that vring will be reinitialized after restart. > But even though, I don't see the reason it will cause a vhost crash, > since the reinitialization will reset all the vring memeory by 0: > > memset(vq->vq_ring_virt_mem, 0, vq->vq_ring_size); > > That means those bad descriptors will be skipped, safely, at vhost > side by: > > if (unlikely(desc->len < dev->vhost_hlen)) > return -1; > >> >> OVS crash for example: >> <------------------------------------------------------------------------> >> [test-pmd inside guest VM] >> >> testpmd> port stop all >> Stopping ports... >> Checking link statuses... >> Port 0 Link Up - speed 10000 Mbps - full-duplex >> Done >> testpmd> port config all rxq 2 >> testpmd> port config all txq 2 >> testpmd> port start all >> Configuring Port 0 (socket 0) >> Port 0: 52:54:00:CB:44:C8 >> Checking link statuses... >> Port 0 Link Up - speed 10000 Mbps - full-duplex >> Done >> >> [OVS on host] >> Program received signal SIGSEGV, Segmentation fault. >> rte_memcpy (n=2056, src=0xc, dst=0x7ff4d5247000) at rte_memcpy.h > > Interesting, so it bypasses the above check since desc->len is non-zero > while desc->addr is zero. The size (2056) also looks weird. > > Do you mind to check this issue a bit deeper, say why desc->addr is > zero, however, desc->len is not? OK. I checked this few more times. Actually, I see, that desc->addr is not zero. All desc memory looks like some rubbish: <------------------------------------------------------------------------------> (gdb) #3 copy_desc_to_mbuf (mbuf_pool=0x7fe9da9f4480, desc_idx=65363, m=0x7fe9db269400, vq=0x7fe9fff7bac0, dev=0x7fe9fff7cbc0) desc = 0x2aabc00ff530 desc_addr = 0 mbuf_offset = 0 prev = 0x7fe9db269400 nr_desc = 1 desc_offset = 12 cur = 0x7fe9db269400 hdr = 0x0 desc_avail = 1012591375 mbuf_avail = 1526 cpy_len = 1526 (gdb) p *desc $2 = {addr = 8507655620301055744, len = 1012591387, flags = 3845, next = 48516} <------------------------------------------------------------------------------> And 'desc_addr' equals zero because 'gpa_to_vva' just can't map this huge address to host's. Scenario was the same. SIGSEGV received right after 'port start all'. Another thought: Actually, there is a race window between 'memset' in guest and reading of 'desc->len' and 'desc->addr' on host. So, it's possible to read non zero 'len' and zero 'addr' right after that. But you're right, this case should be very rare. > >> (gdb) bt >> #0 rte_memcpy (n=2056, src=0xc, dst=0x7ff4d5247000) >> #1 copy_desc_to_mbuf >> #2 rte_vhost_dequeue_burst >> #3 netdev_dpdk_vhost_rxq_recv >> ... >> >> (gdb) bt full >> #0 rte_memcpy >> ... >> #1 copy_desc_to_mbuf >> desc_addr = 0 >> mbuf_offset = 0 >> desc_offset = 12 >> ... >> <------------------------------------------------------------------------> >> >> Fix that by checking addresses of descriptors before using them. >> >> Note: For mergeable buffers this patch checks only guest's address for >> zero, but in non-meargeable case host's address checked. This is done >> because checking of host's address in mergeable case requires additional >> refactoring to keep virtqueue in consistent state in case of error. >> >> Signed-off-by: Ilya Maximets <i.maximets@samsung.com> >> --- >> >> Actually, current virtio implementation looks broken for me. Because >> 'virtio_dev_start' breaks virtqueue while it still available from the vhost >> side. > > Yes, this sounds buggy. Maybe we could not reset the avail idx, in such > case vhost dequeue/enqueue will just return as there are no more packets > to dequeue and no more space to enqueue, respectively? Maybe this will be a good fix for virtio because vhost will not try to receive from wrong descriptors. But this will not help if vhost already tries to receive something in time of guest's reconfiguration. Best regards, Ilya Maximets.
On Wed, Jul 06, 2016 at 02:19:12PM +0300, Ilya Maximets wrote: > On 01.07.2016 10:35, Yuanhan Liu wrote: > > Hi, > > > > Sorry for the long delay. > > > > On Fri, May 20, 2016 at 03:50:04PM +0300, Ilya Maximets wrote: > >> In current implementation guest application can reinitialize vrings > >> by executing start after stop. In the same time host application > >> can still poll virtqueue while device stopped in guest and it will > >> crash with segmentation fault while vring reinitialization because > >> of dereferencing of bad descriptor addresses. > > > > Yes, you are right that vring will be reinitialized after restart. > > But even though, I don't see the reason it will cause a vhost crash, > > since the reinitialization will reset all the vring memeory by 0: > > > > memset(vq->vq_ring_virt_mem, 0, vq->vq_ring_size); > > > > That means those bad descriptors will be skipped, safely, at vhost > > side by: > > > > if (unlikely(desc->len < dev->vhost_hlen)) > > return -1; > > > >> > >> OVS crash for example: > >> <------------------------------------------------------------------------> > >> [test-pmd inside guest VM] > >> > >> testpmd> port stop all > >> Stopping ports... > >> Checking link statuses... > >> Port 0 Link Up - speed 10000 Mbps - full-duplex > >> Done > >> testpmd> port config all rxq 2 > >> testpmd> port config all txq 2 > >> testpmd> port start all > >> Configuring Port 0 (socket 0) > >> Port 0: 52:54:00:CB:44:C8 > >> Checking link statuses... > >> Port 0 Link Up - speed 10000 Mbps - full-duplex > >> Done > >> > >> [OVS on host] > >> Program received signal SIGSEGV, Segmentation fault. > >> rte_memcpy (n=2056, src=0xc, dst=0x7ff4d5247000) at rte_memcpy.h > > > > Interesting, so it bypasses the above check since desc->len is non-zero > > while desc->addr is zero. The size (2056) also looks weird. > > > > Do you mind to check this issue a bit deeper, say why desc->addr is > > zero, however, desc->len is not? > > OK. I checked this few more times. Thanks! > Actually, I see, that desc->addr is > not zero. All desc memory looks like some rubbish: > > <------------------------------------------------------------------------------> > (gdb) > #3 copy_desc_to_mbuf (mbuf_pool=0x7fe9da9f4480, desc_idx=65363, > m=0x7fe9db269400, vq=0x7fe9fff7bac0, dev=0x7fe9fff7cbc0) > desc = 0x2aabc00ff530 > desc_addr = 0 > mbuf_offset = 0 > prev = 0x7fe9db269400 > nr_desc = 1 > desc_offset = 12 > cur = 0x7fe9db269400 > hdr = 0x0 > desc_avail = 1012591375 > mbuf_avail = 1526 > cpy_len = 1526 > > (gdb) p *desc > $2 = {addr = 8507655620301055744, len = 1012591387, flags = 3845, next = 48516} > > <------------------------------------------------------------------------------> > > And 'desc_addr' equals zero because 'gpa_to_vva' just can't map this huge > address to host's. > > Scenario was the same. SIGSEGV received right after 'port start all'. > > Another thought: > > Actually, there is a race window between 'memset' in guest and reading > of 'desc->len' and 'desc->addr' on host. So, it's possible to read non > zero 'len' and zero 'addr' right after that. That's also what I was thinking, that it should the only reason caused such issue. > But you're right, this case should be very rare. Yes, it should be very rare. What troubles me is that seems you can reproduce this issue very easily, that I doubt it's caused by this rare race. The reason could be something else? --yliu
On 06.07.2016 15:24, Yuanhan Liu wrote: > On Wed, Jul 06, 2016 at 02:19:12PM +0300, Ilya Maximets wrote: >> On 01.07.2016 10:35, Yuanhan Liu wrote: >>> Hi, >>> >>> Sorry for the long delay. >>> >>> On Fri, May 20, 2016 at 03:50:04PM +0300, Ilya Maximets wrote: >>>> In current implementation guest application can reinitialize vrings >>>> by executing start after stop. In the same time host application >>>> can still poll virtqueue while device stopped in guest and it will >>>> crash with segmentation fault while vring reinitialization because >>>> of dereferencing of bad descriptor addresses. >>> >>> Yes, you are right that vring will be reinitialized after restart. >>> But even though, I don't see the reason it will cause a vhost crash, >>> since the reinitialization will reset all the vring memeory by 0: >>> >>> memset(vq->vq_ring_virt_mem, 0, vq->vq_ring_size); >>> >>> That means those bad descriptors will be skipped, safely, at vhost >>> side by: >>> >>> if (unlikely(desc->len < dev->vhost_hlen)) >>> return -1; >>> >>>> >>>> OVS crash for example: >>>> <------------------------------------------------------------------------> >>>> [test-pmd inside guest VM] >>>> >>>> testpmd> port stop all >>>> Stopping ports... >>>> Checking link statuses... >>>> Port 0 Link Up - speed 10000 Mbps - full-duplex >>>> Done >>>> testpmd> port config all rxq 2 >>>> testpmd> port config all txq 2 >>>> testpmd> port start all >>>> Configuring Port 0 (socket 0) >>>> Port 0: 52:54:00:CB:44:C8 >>>> Checking link statuses... >>>> Port 0 Link Up - speed 10000 Mbps - full-duplex >>>> Done >>>> >>>> [OVS on host] >>>> Program received signal SIGSEGV, Segmentation fault. >>>> rte_memcpy (n=2056, src=0xc, dst=0x7ff4d5247000) at rte_memcpy.h >>> >>> Interesting, so it bypasses the above check since desc->len is non-zero >>> while desc->addr is zero. The size (2056) also looks weird. >>> >>> Do you mind to check this issue a bit deeper, say why desc->addr is >>> zero, however, desc->len is not? >> >> OK. I checked this few more times. > > Thanks! > >> Actually, I see, that desc->addr is >> not zero. All desc memory looks like some rubbish: >> >> <------------------------------------------------------------------------------> >> (gdb) >> #3 copy_desc_to_mbuf (mbuf_pool=0x7fe9da9f4480, desc_idx=65363, >> m=0x7fe9db269400, vq=0x7fe9fff7bac0, dev=0x7fe9fff7cbc0) >> desc = 0x2aabc00ff530 >> desc_addr = 0 >> mbuf_offset = 0 >> prev = 0x7fe9db269400 >> nr_desc = 1 >> desc_offset = 12 >> cur = 0x7fe9db269400 >> hdr = 0x0 >> desc_avail = 1012591375 >> mbuf_avail = 1526 >> cpy_len = 1526 >> >> (gdb) p *desc >> $2 = {addr = 8507655620301055744, len = 1012591387, flags = 3845, next = 48516} >> >> <------------------------------------------------------------------------------> >> >> And 'desc_addr' equals zero because 'gpa_to_vva' just can't map this huge >> address to host's. >> >> Scenario was the same. SIGSEGV received right after 'port start all'. >> >> Another thought: >> >> Actually, there is a race window between 'memset' in guest and reading >> of 'desc->len' and 'desc->addr' on host. So, it's possible to read non >> zero 'len' and zero 'addr' right after that. > > That's also what I was thinking, that it should the only reason caused > such issue. > >> But you're right, this case should be very rare. > > Yes, it should be very rare. What troubles me is that seems you can > reproduce this issue very easily, that I doubt it's caused by this > rare race. The reason could be something else? I don't know what exactly happens, but it constantly happens on 'port start all' command execution. Descriptors becomes broken for some time and after that time descriptors becomes normal. I think so, because with my patch applied network is working. It means, that broken descriptors appears for some little time while virtio restarting with new configuration. Another point is that crash constantly happens on queue_id=3 (second RX queue) in my scenario. It is newly allocated virtqueue while reconfiguration from rxq=1 to rxq=2. Obviously, virtio needs to be fixed, but we need to check address anyway on vhost side, because we don't know what happens in guest in common case. Best regards, Ilya Maximets.
On Fri, Jul 08, 2016 at 02:48:56PM +0300, Ilya Maximets wrote: > > Another point is that crash constantly happens on queue_id=3 (second RX queue) in > my scenario. It is newly allocated virtqueue while reconfiguration from rxq=1 to > rxq=2. That's a valuable message: what's your DPDK HEAD commit while triggering this issue? --yliu > > Obviously, virtio needs to be fixed, but we need to check address anyway on > vhost side, because we don't know what happens in guest in common case. > > Best regards, Ilya Maximets.
On Sun, Jul 10, 2016 at 09:17:31PM +0800, Yuanhan Liu wrote: > On Fri, Jul 08, 2016 at 02:48:56PM +0300, Ilya Maximets wrote: > > > > Another point is that crash constantly happens on queue_id=3 (second RX queue) in > > my scenario. It is newly allocated virtqueue while reconfiguration from rxq=1 to > > rxq=2. > > That's a valuable message: what's your DPDK HEAD commit while triggering > this issue? I guess I have understood what goes wrong in you case. I would guess that your vhost has 2 queues (here I mean queue-pairs, including one Tx and Rx queue; below usage is the same) configured, so does to your QEMU. However, you just enabled 1 queue while starting testpmd inside the guest, and you want to enable 2 queues by running following testpmd commands: stop port stop all port config all rxq 2 port config all txq 2 port start all Badly, that won't work for current virtio PMD implementation, and what's worse, it triggers a vhost crash, the one you saw. Here is how it comes. Since you just enabled 1 queue while starting testpmd, it will setup 1 queue only, meaning only one queue's **valid** information will be sent to vhost. You might see SET_VRING_ADDR (and related vhost messages) for the other queue as well, but they are just the dummy messages: they don't include any valid/real information about the 2nd queue: the driver don't setup it after all. So far, so good. It became broken when you run above commands. Those commands do setup for the 2nd queue, however, they failed to trigger the QEMU virtio device to start the vhost-user negotiation, meaning no SET_VRING_ADDR will be sent for the 2nd queue, leaving vhost untold and not updated. What's worse, above commands trigger the QEMU to send SET_VRING_ENABLE messages, to enable all the vrings. And since the vrings for the 2nd queue are not properly configured, the crash happens. So maybe we should do virtio reset on port start? --yliu
On 11.07.2016 11:38, Yuanhan Liu wrote: > On Sun, Jul 10, 2016 at 09:17:31PM +0800, Yuanhan Liu wrote: >> On Fri, Jul 08, 2016 at 02:48:56PM +0300, Ilya Maximets wrote: >>> >>> Another point is that crash constantly happens on queue_id=3 (second RX queue) in >>> my scenario. It is newly allocated virtqueue while reconfiguration from rxq=1 to >>> rxq=2. >> >> That's a valuable message: what's your DPDK HEAD commit while triggering >> this issue? fbfd99551ca3 ("mbuf: add raw allocation function") > > I guess I have understood what goes wrong in you case. > > I would guess that your vhost has 2 queues (here I mean queue-pairs, > including one Tx and Rx queue; below usage is the same) configured, > so does to your QEMU. However, you just enabled 1 queue while starting > testpmd inside the guest, and you want to enable 2 queues by running > following testpmd commands: > > stop > port stop all > port config all rxq 2 > port config all txq 2 > port start all > > Badly, that won't work for current virtio PMD implementation, and what's > worse, it triggers a vhost crash, the one you saw. > > Here is how it comes. Since you just enabled 1 queue while starting > testpmd, it will setup 1 queue only, meaning only one queue's **valid** > information will be sent to vhost. You might see SET_VRING_ADDR > (and related vhost messages) for the other queue as well, but they > are just the dummy messages: they don't include any valid/real > information about the 2nd queue: the driver don't setup it after all. > > So far, so good. It became broken when you run above commands. Those > commands do setup for the 2nd queue, however, they failed to trigger > the QEMU virtio device to start the vhost-user negotiation, meaning > no SET_VRING_ADDR will be sent for the 2nd queue, leaving vhost > untold and not updated. > > What's worse, above commands trigger the QEMU to send SET_VRING_ENABLE > messages, to enable all the vrings. And since the vrings for the 2nd > queue are not properly configured, the crash happens. Hmm, why 2nd queue works properly with my fix to vhost in this case? > So maybe we should do virtio reset on port start? I guess it was removed by this patch: a85786dc816f ("virtio: fix states handling during initialization").
On Mon, Jul 11, 2016 at 12:50:24PM +0300, Ilya Maximets wrote: > On 11.07.2016 11:38, Yuanhan Liu wrote: > > On Sun, Jul 10, 2016 at 09:17:31PM +0800, Yuanhan Liu wrote: > >> On Fri, Jul 08, 2016 at 02:48:56PM +0300, Ilya Maximets wrote: > >>> > >>> Another point is that crash constantly happens on queue_id=3 (second RX queue) in > >>> my scenario. It is newly allocated virtqueue while reconfiguration from rxq=1 to > >>> rxq=2. > >> > >> That's a valuable message: what's your DPDK HEAD commit while triggering > >> this issue? > > fbfd99551ca3 ("mbuf: add raw allocation function") > > > > > I guess I have understood what goes wrong in you case. > > > > I would guess that your vhost has 2 queues (here I mean queue-pairs, > > including one Tx and Rx queue; below usage is the same) configured, > > so does to your QEMU. However, you just enabled 1 queue while starting > > testpmd inside the guest, and you want to enable 2 queues by running > > following testpmd commands: > > > > stop > > port stop all > > port config all rxq 2 > > port config all txq 2 > > port start all > > > > Badly, that won't work for current virtio PMD implementation, and what's > > worse, it triggers a vhost crash, the one you saw. > > > > Here is how it comes. Since you just enabled 1 queue while starting > > testpmd, it will setup 1 queue only, meaning only one queue's **valid** > > information will be sent to vhost. You might see SET_VRING_ADDR > > (and related vhost messages) for the other queue as well, but they > > are just the dummy messages: they don't include any valid/real > > information about the 2nd queue: the driver don't setup it after all. > > > > So far, so good. It became broken when you run above commands. Those > > commands do setup for the 2nd queue, however, they failed to trigger > > the QEMU virtio device to start the vhost-user negotiation, meaning > > no SET_VRING_ADDR will be sent for the 2nd queue, leaving vhost > > untold and not updated. > > > > What's worse, above commands trigger the QEMU to send SET_VRING_ENABLE > > messages, to enable all the vrings. And since the vrings for the 2nd > > queue are not properly configured, the crash happens. > > Hmm, why 2nd queue works properly with my fix to vhost in this case? Hmm, really? You are sure that data flows in your 2nd queue after those commands? From what I know is that your patch just avoid a crash, but does not fix it. > > So maybe we should do virtio reset on port start? > > I guess it was removed by this patch: > a85786dc816f ("virtio: fix states handling during initialization"). Seems yes. --yliu
On 11.07.2016 14:05, Yuanhan Liu wrote: > On Mon, Jul 11, 2016 at 12:50:24PM +0300, Ilya Maximets wrote: >> On 11.07.2016 11:38, Yuanhan Liu wrote: >>> On Sun, Jul 10, 2016 at 09:17:31PM +0800, Yuanhan Liu wrote: >>>> On Fri, Jul 08, 2016 at 02:48:56PM +0300, Ilya Maximets wrote: >>>>> >>>>> Another point is that crash constantly happens on queue_id=3 (second RX queue) in >>>>> my scenario. It is newly allocated virtqueue while reconfiguration from rxq=1 to >>>>> rxq=2. >>>> >>>> That's a valuable message: what's your DPDK HEAD commit while triggering >>>> this issue? >> >> fbfd99551ca3 ("mbuf: add raw allocation function") >> >>> >>> I guess I have understood what goes wrong in you case. >>> >>> I would guess that your vhost has 2 queues (here I mean queue-pairs, >>> including one Tx and Rx queue; below usage is the same) configured, >>> so does to your QEMU. However, you just enabled 1 queue while starting >>> testpmd inside the guest, and you want to enable 2 queues by running >>> following testpmd commands: >>> >>> stop >>> port stop all >>> port config all rxq 2 >>> port config all txq 2 >>> port start all >>> >>> Badly, that won't work for current virtio PMD implementation, and what's >>> worse, it triggers a vhost crash, the one you saw. >>> >>> Here is how it comes. Since you just enabled 1 queue while starting >>> testpmd, it will setup 1 queue only, meaning only one queue's **valid** >>> information will be sent to vhost. You might see SET_VRING_ADDR >>> (and related vhost messages) for the other queue as well, but they >>> are just the dummy messages: they don't include any valid/real >>> information about the 2nd queue: the driver don't setup it after all. >>> >>> So far, so good. It became broken when you run above commands. Those >>> commands do setup for the 2nd queue, however, they failed to trigger >>> the QEMU virtio device to start the vhost-user negotiation, meaning >>> no SET_VRING_ADDR will be sent for the 2nd queue, leaving vhost >>> untold and not updated. >>> >>> What's worse, above commands trigger the QEMU to send SET_VRING_ENABLE >>> messages, to enable all the vrings. And since the vrings for the 2nd >>> queue are not properly configured, the crash happens. >> >> Hmm, why 2nd queue works properly with my fix to vhost in this case? > > Hmm, really? You are sure that data flows in your 2nd queue after those > commands? From what I know is that your patch just avoid a crash, but > does not fix it. Oh, sorry. Yes, it doesn't work. With my patch applied I have a QEMU hang. >>> So maybe we should do virtio reset on port start? >> >> I guess it was removed by this patch: >> a85786dc816f ("virtio: fix states handling during initialization"). > > Seems yes. > > --yliu > >
On Mon, Jul 11, 2016 at 02:47:56PM +0300, Ilya Maximets wrote: > On 11.07.2016 14:05, Yuanhan Liu wrote: > > On Mon, Jul 11, 2016 at 12:50:24PM +0300, Ilya Maximets wrote: > >> On 11.07.2016 11:38, Yuanhan Liu wrote: > >>> On Sun, Jul 10, 2016 at 09:17:31PM +0800, Yuanhan Liu wrote: > >>>> On Fri, Jul 08, 2016 at 02:48:56PM +0300, Ilya Maximets wrote: > >>>>> > >>>>> Another point is that crash constantly happens on queue_id=3 (second RX queue) in > >>>>> my scenario. It is newly allocated virtqueue while reconfiguration from rxq=1 to > >>>>> rxq=2. > >>>> > >>>> That's a valuable message: what's your DPDK HEAD commit while triggering > >>>> this issue? > >> > >> fbfd99551ca3 ("mbuf: add raw allocation function") > >> > >>> > >>> I guess I have understood what goes wrong in you case. > >>> > >>> I would guess that your vhost has 2 queues (here I mean queue-pairs, > >>> including one Tx and Rx queue; below usage is the same) configured, > >>> so does to your QEMU. However, you just enabled 1 queue while starting > >>> testpmd inside the guest, and you want to enable 2 queues by running > >>> following testpmd commands: > >>> > >>> stop > >>> port stop all > >>> port config all rxq 2 > >>> port config all txq 2 > >>> port start all > >>> > >>> Badly, that won't work for current virtio PMD implementation, and what's > >>> worse, it triggers a vhost crash, the one you saw. > >>> > >>> Here is how it comes. Since you just enabled 1 queue while starting > >>> testpmd, it will setup 1 queue only, meaning only one queue's **valid** > >>> information will be sent to vhost. You might see SET_VRING_ADDR > >>> (and related vhost messages) for the other queue as well, but they > >>> are just the dummy messages: they don't include any valid/real > >>> information about the 2nd queue: the driver don't setup it after all. > >>> > >>> So far, so good. It became broken when you run above commands. Those > >>> commands do setup for the 2nd queue, however, they failed to trigger > >>> the QEMU virtio device to start the vhost-user negotiation, meaning > >>> no SET_VRING_ADDR will be sent for the 2nd queue, leaving vhost > >>> untold and not updated. > >>> > >>> What's worse, above commands trigger the QEMU to send SET_VRING_ENABLE > >>> messages, to enable all the vrings. And since the vrings for the 2nd > >>> queue are not properly configured, the crash happens. > >> > >> Hmm, why 2nd queue works properly with my fix to vhost in this case? > > > > Hmm, really? You are sure that data flows in your 2nd queue after those > > commands? From what I know is that your patch just avoid a crash, but > > does not fix it. > > Oh, sorry. Yes, it doesn't work. With my patch applied I have a QEMU hang. The crash actually could be avoided by commit 0823c1cb0a73 ("vhost: workaround stale vring base"), accidentally. That's why I asked you above the HEAD commit you were using. > >>> So maybe we should do virtio reset on port start? > >> > >> I guess it was removed by this patch: > >> a85786dc816f ("virtio: fix states handling during initialization"). > > > > Seems yes. Actually, we should not do that: do reset on port start. The right fix should be allocating MAX queues virtio device supports (2 here). This would allow us changing the queue number dynamically. But this doesn't sound a simple fix; it involves many code changes, due to it was not designed this way before. Therefore, we will not fix it in this release, due to it's too late. Let's fix it in the next release instead. For the crash issue, it will not happen with the latest HEAD. Though it's accident fix, I think we are fine here. --yliu
On 12.07.2016 05:43, Yuanhan Liu wrote: > On Mon, Jul 11, 2016 at 02:47:56PM +0300, Ilya Maximets wrote: >> On 11.07.2016 14:05, Yuanhan Liu wrote: >>> On Mon, Jul 11, 2016 at 12:50:24PM +0300, Ilya Maximets wrote: >>>> On 11.07.2016 11:38, Yuanhan Liu wrote: >>>>> On Sun, Jul 10, 2016 at 09:17:31PM +0800, Yuanhan Liu wrote: >>>>>> On Fri, Jul 08, 2016 at 02:48:56PM +0300, Ilya Maximets wrote: >>>>>>> >>>>>>> Another point is that crash constantly happens on queue_id=3 (second RX queue) in >>>>>>> my scenario. It is newly allocated virtqueue while reconfiguration from rxq=1 to >>>>>>> rxq=2. >>>>>> >>>>>> That's a valuable message: what's your DPDK HEAD commit while triggering >>>>>> this issue? >>>> >>>> fbfd99551ca3 ("mbuf: add raw allocation function") >>>> >>>>> >>>>> I guess I have understood what goes wrong in you case. >>>>> >>>>> I would guess that your vhost has 2 queues (here I mean queue-pairs, >>>>> including one Tx and Rx queue; below usage is the same) configured, >>>>> so does to your QEMU. However, you just enabled 1 queue while starting >>>>> testpmd inside the guest, and you want to enable 2 queues by running >>>>> following testpmd commands: >>>>> >>>>> stop >>>>> port stop all >>>>> port config all rxq 2 >>>>> port config all txq 2 >>>>> port start all >>>>> >>>>> Badly, that won't work for current virtio PMD implementation, and what's >>>>> worse, it triggers a vhost crash, the one you saw. >>>>> >>>>> Here is how it comes. Since you just enabled 1 queue while starting >>>>> testpmd, it will setup 1 queue only, meaning only one queue's **valid** >>>>> information will be sent to vhost. You might see SET_VRING_ADDR >>>>> (and related vhost messages) for the other queue as well, but they >>>>> are just the dummy messages: they don't include any valid/real >>>>> information about the 2nd queue: the driver don't setup it after all. >>>>> >>>>> So far, so good. It became broken when you run above commands. Those >>>>> commands do setup for the 2nd queue, however, they failed to trigger >>>>> the QEMU virtio device to start the vhost-user negotiation, meaning >>>>> no SET_VRING_ADDR will be sent for the 2nd queue, leaving vhost >>>>> untold and not updated. >>>>> >>>>> What's worse, above commands trigger the QEMU to send SET_VRING_ENABLE >>>>> messages, to enable all the vrings. And since the vrings for the 2nd >>>>> queue are not properly configured, the crash happens. >>>> >>>> Hmm, why 2nd queue works properly with my fix to vhost in this case? >>> >>> Hmm, really? You are sure that data flows in your 2nd queue after those >>> commands? From what I know is that your patch just avoid a crash, but >>> does not fix it. >> >> Oh, sorry. Yes, it doesn't work. With my patch applied I have a QEMU hang. > > The crash actually could be avoided by commit 0823c1cb0a73 ("vhost: > workaround stale vring base"), accidentally. That's why I asked you > above the HEAD commit you were using. Thanks for pointing this. I'll check it. >>>>> So maybe we should do virtio reset on port start? >>>> >>>> I guess it was removed by this patch: >>>> a85786dc816f ("virtio: fix states handling during initialization"). >>> >>> Seems yes. > > Actually, we should not do that: do reset on port start. The right fix > should be allocating MAX queues virtio device supports (2 here). This > would allow us changing the queue number dynamically. Yes, I agree that this is the right way to fix this issue. > But this doesn't sound a simple fix; it involves many code changes, due > to it was not designed this way before. Therefore, we will not fix it > in this release, due to it's too late. Let's fix it in the next release > instead. For the crash issue, it will not happen with the latest HEAD. > Though it's accident fix, I think we are fine here. > > --yliu > >
On 12.07.2016 08:53, Ilya Maximets wrote: > On 12.07.2016 05:43, Yuanhan Liu wrote: >> On Mon, Jul 11, 2016 at 02:47:56PM +0300, Ilya Maximets wrote: >>> On 11.07.2016 14:05, Yuanhan Liu wrote: >>>> On Mon, Jul 11, 2016 at 12:50:24PM +0300, Ilya Maximets wrote: >>>>> On 11.07.2016 11:38, Yuanhan Liu wrote: >>>>>> On Sun, Jul 10, 2016 at 09:17:31PM +0800, Yuanhan Liu wrote: >>>>>>> On Fri, Jul 08, 2016 at 02:48:56PM +0300, Ilya Maximets wrote: >>>>>>>> >>>>>>>> Another point is that crash constantly happens on queue_id=3 (second RX queue) in >>>>>>>> my scenario. It is newly allocated virtqueue while reconfiguration from rxq=1 to >>>>>>>> rxq=2. >>>>>>> >>>>>>> That's a valuable message: what's your DPDK HEAD commit while triggering >>>>>>> this issue? >>>>> >>>>> fbfd99551ca3 ("mbuf: add raw allocation function") >>>>> >>>>>> >>>>>> I guess I have understood what goes wrong in you case. >>>>>> >>>>>> I would guess that your vhost has 2 queues (here I mean queue-pairs, >>>>>> including one Tx and Rx queue; below usage is the same) configured, >>>>>> so does to your QEMU. However, you just enabled 1 queue while starting >>>>>> testpmd inside the guest, and you want to enable 2 queues by running >>>>>> following testpmd commands: >>>>>> >>>>>> stop >>>>>> port stop all >>>>>> port config all rxq 2 >>>>>> port config all txq 2 >>>>>> port start all >>>>>> >>>>>> Badly, that won't work for current virtio PMD implementation, and what's >>>>>> worse, it triggers a vhost crash, the one you saw. >>>>>> >>>>>> Here is how it comes. Since you just enabled 1 queue while starting >>>>>> testpmd, it will setup 1 queue only, meaning only one queue's **valid** >>>>>> information will be sent to vhost. You might see SET_VRING_ADDR >>>>>> (and related vhost messages) for the other queue as well, but they >>>>>> are just the dummy messages: they don't include any valid/real >>>>>> information about the 2nd queue: the driver don't setup it after all. >>>>>> >>>>>> So far, so good. It became broken when you run above commands. Those >>>>>> commands do setup for the 2nd queue, however, they failed to trigger >>>>>> the QEMU virtio device to start the vhost-user negotiation, meaning >>>>>> no SET_VRING_ADDR will be sent for the 2nd queue, leaving vhost >>>>>> untold and not updated. >>>>>> >>>>>> What's worse, above commands trigger the QEMU to send SET_VRING_ENABLE >>>>>> messages, to enable all the vrings. And since the vrings for the 2nd >>>>>> queue are not properly configured, the crash happens. >>>>> >>>>> Hmm, why 2nd queue works properly with my fix to vhost in this case? >>>> >>>> Hmm, really? You are sure that data flows in your 2nd queue after those >>>> commands? From what I know is that your patch just avoid a crash, but >>>> does not fix it. >>> >>> Oh, sorry. Yes, it doesn't work. With my patch applied I have a QEMU hang. >> >> The crash actually could be avoided by commit 0823c1cb0a73 ("vhost: >> workaround stale vring base"), accidentally. That's why I asked you >> above the HEAD commit you were using. > > Thanks for pointing this. I'll check it. I've checked my DPDK HEAD with above commit backported. Yes, it helps to avoid vhost crash in my scenario. As expected, after reconfiguration new virtqueue doesn't work, QEMU hangs sometimes. >>>>>> So maybe we should do virtio reset on port start? >>>>> >>>>> I guess it was removed by this patch: >>>>> a85786dc816f ("virtio: fix states handling during initialization"). >>>> >>>> Seems yes. >> >> Actually, we should not do that: do reset on port start. The right fix >> should be allocating MAX queues virtio device supports (2 here). This >> would allow us changing the queue number dynamically. > > Yes, I agree that this is the right way to fix this issue. > >> But this doesn't sound a simple fix; it involves many code changes, due >> to it was not designed this way before. Therefore, we will not fix it >> in this release, due to it's too late. Let's fix it in the next release >> instead. For the crash issue, it will not happen with the latest HEAD. >> Though it's accident fix, I think we are fine here. This scenario fixed somehow, I agree. But this patch still needed to protect vhost from untrusted VM, from malicious or buggy virtio application. Maybe we could change the commit-message and resend this patch as a security enhancement? What do you think?
On Wed, Jul 13, 2016 at 10:34:07AM +0300, Ilya Maximets wrote: > On 12.07.2016 08:53, Ilya Maximets wrote: > > On 12.07.2016 05:43, Yuanhan Liu wrote: > >> On Mon, Jul 11, 2016 at 02:47:56PM +0300, Ilya Maximets wrote: > >>> On 11.07.2016 14:05, Yuanhan Liu wrote: > >>>> On Mon, Jul 11, 2016 at 12:50:24PM +0300, Ilya Maximets wrote: > >>>>> On 11.07.2016 11:38, Yuanhan Liu wrote: > >>>>>> On Sun, Jul 10, 2016 at 09:17:31PM +0800, Yuanhan Liu wrote: > >>>>>>> On Fri, Jul 08, 2016 at 02:48:56PM +0300, Ilya Maximets wrote: > >>>>>>>> > >>>>>>>> Another point is that crash constantly happens on queue_id=3 (second RX queue) in > >>>>>>>> my scenario. It is newly allocated virtqueue while reconfiguration from rxq=1 to > >>>>>>>> rxq=2. > >>>>>>> > >>>>>>> That's a valuable message: what's your DPDK HEAD commit while triggering > >>>>>>> this issue? > >>>>> > >>>>> fbfd99551ca3 ("mbuf: add raw allocation function") > >>>>> > >>>>>> > >>>>>> I guess I have understood what goes wrong in you case. > >>>>>> > >>>>>> I would guess that your vhost has 2 queues (here I mean queue-pairs, > >>>>>> including one Tx and Rx queue; below usage is the same) configured, > >>>>>> so does to your QEMU. However, you just enabled 1 queue while starting > >>>>>> testpmd inside the guest, and you want to enable 2 queues by running > >>>>>> following testpmd commands: > >>>>>> > >>>>>> stop > >>>>>> port stop all > >>>>>> port config all rxq 2 > >>>>>> port config all txq 2 > >>>>>> port start all > >>>>>> > >>>>>> Badly, that won't work for current virtio PMD implementation, and what's > >>>>>> worse, it triggers a vhost crash, the one you saw. > >>>>>> > >>>>>> Here is how it comes. Since you just enabled 1 queue while starting > >>>>>> testpmd, it will setup 1 queue only, meaning only one queue's **valid** > >>>>>> information will be sent to vhost. You might see SET_VRING_ADDR > >>>>>> (and related vhost messages) for the other queue as well, but they > >>>>>> are just the dummy messages: they don't include any valid/real > >>>>>> information about the 2nd queue: the driver don't setup it after all. > >>>>>> > >>>>>> So far, so good. It became broken when you run above commands. Those > >>>>>> commands do setup for the 2nd queue, however, they failed to trigger > >>>>>> the QEMU virtio device to start the vhost-user negotiation, meaning > >>>>>> no SET_VRING_ADDR will be sent for the 2nd queue, leaving vhost > >>>>>> untold and not updated. > >>>>>> > >>>>>> What's worse, above commands trigger the QEMU to send SET_VRING_ENABLE > >>>>>> messages, to enable all the vrings. And since the vrings for the 2nd > >>>>>> queue are not properly configured, the crash happens. > >>>>> > >>>>> Hmm, why 2nd queue works properly with my fix to vhost in this case? > >>>> > >>>> Hmm, really? You are sure that data flows in your 2nd queue after those > >>>> commands? From what I know is that your patch just avoid a crash, but > >>>> does not fix it. > >>> > >>> Oh, sorry. Yes, it doesn't work. With my patch applied I have a QEMU hang. > >> > >> The crash actually could be avoided by commit 0823c1cb0a73 ("vhost: > >> workaround stale vring base"), accidentally. That's why I asked you > >> above the HEAD commit you were using. > > > > Thanks for pointing this. I'll check it. > > I've checked my DPDK HEAD with above commit backported. Yes, it helps to > avoid vhost crash in my scenario. As expected, after reconfiguration new > virtqueue doesn't work, QEMU hangs sometimes. > >>>>>> So maybe we should do virtio reset on port start? > >>>>> > >>>>> I guess it was removed by this patch: > >>>>> a85786dc816f ("virtio: fix states handling during initialization"). > >>>> > >>>> Seems yes. > >> > >> Actually, we should not do that: do reset on port start. The right fix > >> should be allocating MAX queues virtio device supports (2 here). This > >> would allow us changing the queue number dynamically. > > > > Yes, I agree that this is the right way to fix this issue. > > > >> But this doesn't sound a simple fix; it involves many code changes, due > >> to it was not designed this way before. Therefore, we will not fix it > >> in this release, due to it's too late. Let's fix it in the next release > >> instead. For the crash issue, it will not happen with the latest HEAD. > >> Though it's accident fix, I think we are fine here. > > This scenario fixed somehow, I agree. But this patch still needed to protect > vhost from untrusted VM, from malicious or buggy virtio application. > Maybe we could change the commit-message and resend this patch as a > security enhancement? What do you think? Indeed, but I'm a bit concerned about the performance regression found by Rich, yet I am not quite sure why it happens, though Rich claimed that it seems to be a problem related to compiler. --yliu
On Wednesday, July 13, 2016, Yuanhan Liu <yuanhan.liu@linux.intel.com> wrote: > On Wed, Jul 13, 2016 at 10:34:07AM +0300, Ilya Maximets wrote: > > This scenario fixed somehow, I agree. But this patch still needed to > protect > > vhost from untrusted VM, from malicious or buggy virtio application. > > Maybe we could change the commit-message and resend this patch as a > > security enhancement? What do you think? > > Indeed, but I'm a bit concerned about the performance regression found > by Rich, yet I am not quite sure why it happens, though Rich claimed > that it seems to be a problem related to compiler. The workaround I suggested solves the performance regression. But even if it hadn't, this is a security fix that should be merged regardless of the performance impact.
On Wed, Jul 13, 2016 at 08:54:08AM -0700, Rich Lane wrote: > On Wednesday, July 13, 2016, Yuanhan Liu <yuanhan.liu@linux.intel.com> wrote: > > On Wed, Jul 13, 2016 at 10:34:07AM +0300, Ilya Maximets wrote: > > This scenario fixed somehow, I agree. But this patch still needed to > protect > > vhost from untrusted VM, from malicious or buggy virtio application. > > Maybe we could change the commit-message and resend this patch as a > > security enhancement? What do you think? > > Indeed, but I'm a bit concerned about the performance regression found > by Rich, yet I am not quite sure why it happens, though Rich claimed > that it seems to be a problem related to compiler. > > > The workaround I suggested solves the performance regression. But even if it > hadn't, this is a security fix that should be merged regardless of the > performance impact. Good point. Ilya, would you reword the commit log and resend based on latest code? --yliu
On 14.07.2016 04:42, Yuanhan Liu wrote: > On Wed, Jul 13, 2016 at 08:54:08AM -0700, Rich Lane wrote: >> On Wednesday, July 13, 2016, Yuanhan Liu <yuanhan.liu@linux.intel.com> wrote: >> >> On Wed, Jul 13, 2016 at 10:34:07AM +0300, Ilya Maximets wrote: >> > This scenario fixed somehow, I agree. But this patch still needed to >> protect >> > vhost from untrusted VM, from malicious or buggy virtio application. >> > Maybe we could change the commit-message and resend this patch as a >> > security enhancement? What do you think? >> >> Indeed, but I'm a bit concerned about the performance regression found >> by Rich, yet I am not quite sure why it happens, though Rich claimed >> that it seems to be a problem related to compiler. >> >> >> The workaround I suggested solves the performance regression. But even if it >> hadn't, this is a security fix that should be merged regardless of the >> performance impact. > > Good point. Ilya, would you reword the commit log and resend based on > latest code? OK.
Version 3: * Patch splitted in two. * Applied workaround from Rich Lane and added comment about performance issue with some compilers and 'unlikely' macro. Version 2: * Rebased on top of current master. * host's address now checked in meargeable case, because needed refactoring already done. * Commit-message changed because old issue with virtio reload accidentially fixed by commit 0823c1cb0a73. Ilya Maximets (2): vhost: fix using of bad return value on mergeable enqueue vhost: do sanity check for ring descriptor address lib/librte_vhost/vhost_rxtx.c | 33 ++++++++++++++++++++++++++------- 1 file changed, 26 insertions(+), 7 deletions(-)
On Fri, Jul 15, 2016 at 02:15:03PM +0300, Ilya Maximets wrote: > Version 3: > * Patch splitted in two. > * Applied workaround from Rich Lane and added comment about > performance issue with some compilers and 'unlikely' macro. Thanks a lot for the patches. Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com> --yliu > > Version 2: > * Rebased on top of current master. > * host's address now checked in meargeable case, > because needed refactoring already done. > * Commit-message changed because old issue with > virtio reload accidentially fixed by commit > 0823c1cb0a73. > > Ilya Maximets (2): > vhost: fix using of bad return value on mergeable enqueue > vhost: do sanity check for ring descriptor address > > lib/librte_vhost/vhost_rxtx.c | 33 ++++++++++++++++++++++++++------- > 1 file changed, 26 insertions(+), 7 deletions(-) > > -- > 2.7.4
2016-07-15 20:14, Yuanhan Liu: > On Fri, Jul 15, 2016 at 02:15:03PM +0300, Ilya Maximets wrote: > > Version 3: > > * Patch splitted in two. > > * Applied workaround from Rich Lane and added comment about > > performance issue with some compilers and 'unlikely' macro. > > Thanks a lot for the patches. > > Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com> Applied, thanks
diff --git a/lib/librte_vhost/vhost_rxtx.c b/lib/librte_vhost/vhost_rxtx.c index 750821a..9d05739 100644 --- a/lib/librte_vhost/vhost_rxtx.c +++ b/lib/librte_vhost/vhost_rxtx.c @@ -147,10 +147,10 @@ copy_mbuf_to_desc(struct virtio_net *dev, struct vhost_virtqueue *vq, struct virtio_net_hdr_mrg_rxbuf virtio_hdr = {{0, 0, 0, 0, 0, 0}, 0}; desc = &vq->desc[desc_idx]; - if (unlikely(desc->len < vq->vhost_hlen)) + desc_addr = gpa_to_vva(dev, desc->addr); + if (unlikely(desc->len < vq->vhost_hlen || !desc_addr)) return -1; - desc_addr = gpa_to_vva(dev, desc->addr); rte_prefetch0((void *)(uintptr_t)desc_addr); virtio_enqueue_offload(m, &virtio_hdr.hdr); @@ -184,6 +184,9 @@ copy_mbuf_to_desc(struct virtio_net *dev, struct vhost_virtqueue *vq, desc = &vq->desc[desc->next]; desc_addr = gpa_to_vva(dev, desc->addr); + if (unlikely(!desc_addr)) + return -1; + desc_offset = 0; desc_avail = desc->len; } @@ -344,7 +347,8 @@ fill_vec_buf(struct vhost_virtqueue *vq, uint32_t avail_idx, uint32_t len = *allocated; while (1) { - if (unlikely(vec_id >= BUF_VECTOR_MAX || idx >= vq->size)) + if (unlikely(vec_id >= BUF_VECTOR_MAX || idx >= vq->size + || !vq->desc[idx].addr)) return -1; len += vq->desc[idx].len; @@ -747,10 +751,10 @@ copy_desc_to_mbuf(struct virtio_net *dev, struct vhost_virtqueue *vq, uint32_t nr_desc = 1; desc = &vq->desc[desc_idx]; - if (unlikely(desc->len < vq->vhost_hlen)) + desc_addr = gpa_to_vva(dev, desc->addr); + if (unlikely(desc->len < vq->vhost_hlen || !desc_addr)) return -1; - desc_addr = gpa_to_vva(dev, desc->addr); rte_prefetch0((void *)(uintptr_t)desc_addr); /* Retrieve virtio net header */ @@ -769,6 +773,9 @@ copy_desc_to_mbuf(struct virtio_net *dev, struct vhost_virtqueue *vq, desc = &vq->desc[desc->next]; desc_addr = gpa_to_vva(dev, desc->addr); + if (unlikely(!desc_addr)) + return -1; + rte_prefetch0((void *)(uintptr_t)desc_addr); desc_offset = 0;