diff mbox

[dpdk-dev] vhost: fix segfault on bad descriptor address.

Message ID	1463748604-27251-1-git-send-email-i.maximets@samsung.com (mailing list archive)
State	Rejected, archived
Delegated to:	Yuanhan Liu
Headers	From: Ilya Maximets <i.maximets@samsung.com> To: dev@dpdk.org, Huawei Xie <huawei.xie@intel.com>, Yuanhan Liu <yuanhan.liu@linux.intel.com> Cc: Dyasly Sergey <s.dyasly@samsung.com>, Heetae Ahn <heetae82.ahn@samsung.com>, Jianfeng Tan <jianfeng.tan@intel.com>, Ilya Maximets <i.maximets@samsung.com> Date: Fri, 20 May 2016 15:50:04 +0300 Message-id: <1463748604-27251-1-git-send-email-i.maximets@samsung.com> Subject: [dpdk-dev] [PATCH] vhost: fix segfault on bad descriptor address. Precedence: list Errors-To: dev-bounces@dpdk.org Sender: "dev" <dev-bounces@dpdk.org>

Commit Message

Ilya Maximets May 20, 2016, 12:50 p.m. UTC

  In current implementation guest application can reinitialize vrings
by executing start after stop. In the same time host application
can still poll virtqueue while device stopped in guest and it will
crash with segmentation fault while vring reinitialization because
of dereferencing of bad descriptor addresses.

OVS crash for example:
<------------------------------------------------------------------------>
[test-pmd inside guest VM]

	testpmd> port stop all
	    Stopping ports...
	    Checking link statuses...
	    Port 0 Link Up - speed 10000 Mbps - full-duplex
	    Done
	testpmd> port config all rxq 2
	testpmd> port config all txq 2
	testpmd> port start all
	    Configuring Port 0 (socket 0)
	    Port 0: 52:54:00:CB:44:C8
	    Checking link statuses...
	    Port 0 Link Up - speed 10000 Mbps - full-duplex
	    Done

[OVS on host]
	Program received signal SIGSEGV, Segmentation fault.
	rte_memcpy (n=2056, src=0xc, dst=0x7ff4d5247000) at rte_memcpy.h

	(gdb) bt
	    #0  rte_memcpy (n=2056, src=0xc, dst=0x7ff4d5247000)
	    #1  copy_desc_to_mbuf
	    #2  rte_vhost_dequeue_burst
	    #3  netdev_dpdk_vhost_rxq_recv
	    ...

	(gdb) bt full
	    #0  rte_memcpy
	        ...
	    #1  copy_desc_to_mbuf
	        desc_addr = 0
	        mbuf_offset = 0
	        desc_offset = 12
	        ...
<------------------------------------------------------------------------>

Fix that by checking addresses of descriptors before using them.

Note: For mergeable buffers this patch checks only guest's address for
zero, but in non-meargeable case host's address checked. This is done
because checking of host's address in mergeable case requires additional
refactoring to keep virtqueue in consistent state in case of error.

Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
---

Actually, current virtio implementation looks broken for me. Because
'virtio_dev_start' breaks virtqueue while it still available from the vhost
side.

There was 2 patches about this behaviour:

	1. a85786dc816f ("virtio: fix states handling during initialization")
	2. 9a0615af7746 ("virtio: fix restart")

The second patch fixes somehow issue intoduced in the first patch, but actually
also breaks vhost in the way described above.
It's not pretty clear for me what to do in current situation with virtio,
because it will be broken for guest application even if vhost will not crash.

May be it'll be better to forbid stopping of virtio device and force user to
exit and start again (may be implemented in hidden from user way)?

This patch adds additional sane checks, so it should be applied anyway, IMHO.

 lib/librte_vhost/vhost_rxtx.c | 17 ++++++++++++-----
 1 file changed, 12 insertions(+), 5 deletions(-)

Comments

Yuanhan Liu May 23, 2016, 10:57 a.m. UTC | #1

On Fri, May 20, 2016 at 03:50:04PM +0300, Ilya Maximets wrote:
> In current implementation guest application can reinitialize vrings
> by executing start after stop. In the same time host application
> can still poll virtqueue while device stopped in guest and it will
> crash with segmentation fault while vring reinitialization because
> of dereferencing of bad descriptor addresses.
> 
> OVS crash for example:
> <------------------------------------------------------------------------>
> [test-pmd inside guest VM]
> 
> 	testpmd> port stop all
> 	    Stopping ports...
> 	    Checking link statuses...
> 	    Port 0 Link Up - speed 10000 Mbps - full-duplex
> 	    Done
> 	testpmd> port config all rxq 2
> 	testpmd> port config all txq 2
> 	testpmd> port start all
> 	    Configuring Port 0 (socket 0)
> 	    Port 0: 52:54:00:CB:44:C8
> 	    Checking link statuses...
> 	    Port 0 Link Up - speed 10000 Mbps - full-duplex
> 	    Done

I actually didn't manage to reproduce it on my side, with the
vhost-example instead of OVS though. Is that all the commands
to reproduce it, and run them just after start test-pmd?

> [OVS on host]
> 	Program received signal SIGSEGV, Segmentation fault.
> 	rte_memcpy (n=2056, src=0xc, dst=0x7ff4d5247000) at rte_memcpy.h
> 
> 	(gdb) bt
> 	    #0  rte_memcpy (n=2056, src=0xc, dst=0x7ff4d5247000)
> 	    #1  copy_desc_to_mbuf
> 	    #2  rte_vhost_dequeue_burst
> 	    #3  netdev_dpdk_vhost_rxq_recv
> 	    ...
> 
> 	(gdb) bt full
> 	    #0  rte_memcpy
> 	        ...
> 	    #1  copy_desc_to_mbuf
> 	        desc_addr = 0
> 	        mbuf_offset = 0
> 	        desc_offset = 12
> 	        ...
> <------------------------------------------------------------------------>
> 
> Fix that by checking addresses of descriptors before using them.
> 
> Note: For mergeable buffers this patch checks only guest's address for
> zero, but in non-meargeable case host's address checked. This is done
> because checking of host's address in mergeable case requires additional
> refactoring to keep virtqueue in consistent state in case of error.
> 
> Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
> ---
> 
> Actually, current virtio implementation looks broken for me. Because
> 'virtio_dev_start' breaks virtqueue while it still available from the vhost
> side.
> 
> There was 2 patches about this behaviour:
> 
> 	1. a85786dc816f ("virtio: fix states handling during initialization")
> 	2. 9a0615af7746 ("virtio: fix restart")
> 
> The second patch fixes somehow issue intoduced in the first patch, but actually
> also breaks vhost in the way described above.
> It's not pretty clear for me what to do in current situation with virtio,
> because it will be broken for guest application even if vhost will not crash.
> 
> May be it'll be better to forbid stopping of virtio device and force user to
> exit and start again (may be implemented in hidden from user way)?
> 
> This patch adds additional sane checks, so it should be applied anyway, IMHO.

Agreed.

	--yliu

Ilya Maximets May 23, 2016, 11:04 a.m. UTC | #2

On 23.05.2016 13:57, Yuanhan Liu wrote:
> On Fri, May 20, 2016 at 03:50:04PM +0300, Ilya Maximets wrote:
>> In current implementation guest application can reinitialize vrings
>> by executing start after stop. In the same time host application
>> can still poll virtqueue while device stopped in guest and it will
>> crash with segmentation fault while vring reinitialization because
>> of dereferencing of bad descriptor addresses.
>>
>> OVS crash for example:
>> <------------------------------------------------------------------------>
>> [test-pmd inside guest VM]
>>
>> 	testpmd> port stop all
>> 	    Stopping ports...
>> 	    Checking link statuses...
>> 	    Port 0 Link Up - speed 10000 Mbps - full-duplex
>> 	    Done
>> 	testpmd> port config all rxq 2
>> 	testpmd> port config all txq 2
>> 	testpmd> port start all
>> 	    Configuring Port 0 (socket 0)
>> 	    Port 0: 52:54:00:CB:44:C8
>> 	    Checking link statuses...
>> 	    Port 0 Link Up - speed 10000 Mbps - full-duplex
>> 	    Done
> 
> I actually didn't manage to reproduce it on my side, with the
> vhost-example instead of OVS though. Is that all the commands
> to reproduce it, and run them just after start test-pmd?

Actually, I think, packet flow should be enabled while performing
above actions and some traffic already should be sent through port
to change last used idx on vhost side.

Something like:
	start
	..wait a while.. see that packets are flowing.
	stop
	port stop
	port config
	port config
	port start
> 
>> [OVS on host]
>> 	Program received signal SIGSEGV, Segmentation fault.
>> 	rte_memcpy (n=2056, src=0xc, dst=0x7ff4d5247000) at rte_memcpy.h
>>
>> 	(gdb) bt
>> 	    #0  rte_memcpy (n=2056, src=0xc, dst=0x7ff4d5247000)
>> 	    #1  copy_desc_to_mbuf
>> 	    #2  rte_vhost_dequeue_burst
>> 	    #3  netdev_dpdk_vhost_rxq_recv
>> 	    ...
>>
>> 	(gdb) bt full
>> 	    #0  rte_memcpy
>> 	        ...
>> 	    #1  copy_desc_to_mbuf
>> 	        desc_addr = 0
>> 	        mbuf_offset = 0
>> 	        desc_offset = 12
>> 	        ...
>> <------------------------------------------------------------------------>
>>
>> Fix that by checking addresses of descriptors before using them.
>>
>> Note: For mergeable buffers this patch checks only guest's address for
>> zero, but in non-meargeable case host's address checked. This is done
>> because checking of host's address in mergeable case requires additional
>> refactoring to keep virtqueue in consistent state in case of error.
>>
>> Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
>> ---
>>
>> Actually, current virtio implementation looks broken for me. Because
>> 'virtio_dev_start' breaks virtqueue while it still available from the vhost
>> side.
>>
>> There was 2 patches about this behaviour:
>>
>> 	1. a85786dc816f ("virtio: fix states handling during initialization")
>> 	2. 9a0615af7746 ("virtio: fix restart")
>>
>> The second patch fixes somehow issue intoduced in the first patch, but actually
>> also breaks vhost in the way described above.
>> It's not pretty clear for me what to do in current situation with virtio,
>> because it will be broken for guest application even if vhost will not crash.
>>
>> May be it'll be better to forbid stopping of virtio device and force user to
>> exit and start again (may be implemented in hidden from user way)?
>>
>> This patch adds additional sane checks, so it should be applied anyway, IMHO.
> 
> Agreed.
> 
> 	--yliu
> 
>

Ilya Maximets May 30, 2016, 11:05 a.m. UTC | #3

Ping.

Best regards, Ilya Maximets.

On 23.05.2016 14:04, Ilya Maximets wrote:
> On 23.05.2016 13:57, Yuanhan Liu wrote:
>> On Fri, May 20, 2016 at 03:50:04PM +0300, Ilya Maximets wrote:
>>> In current implementation guest application can reinitialize vrings
>>> by executing start after stop. In the same time host application
>>> can still poll virtqueue while device stopped in guest and it will
>>> crash with segmentation fault while vring reinitialization because
>>> of dereferencing of bad descriptor addresses.
>>>
>>> OVS crash for example:
>>> <------------------------------------------------------------------------>
>>> [test-pmd inside guest VM]
>>>
>>> 	testpmd> port stop all
>>> 	    Stopping ports...
>>> 	    Checking link statuses...
>>> 	    Port 0 Link Up - speed 10000 Mbps - full-duplex
>>> 	    Done
>>> 	testpmd> port config all rxq 2
>>> 	testpmd> port config all txq 2
>>> 	testpmd> port start all
>>> 	    Configuring Port 0 (socket 0)
>>> 	    Port 0: 52:54:00:CB:44:C8
>>> 	    Checking link statuses...
>>> 	    Port 0 Link Up - speed 10000 Mbps - full-duplex
>>> 	    Done
>>
>> I actually didn't manage to reproduce it on my side, with the
>> vhost-example instead of OVS though. Is that all the commands
>> to reproduce it, and run them just after start test-pmd?
> 
> Actually, I think, packet flow should be enabled while performing
> above actions and some traffic already should be sent through port
> to change last used idx on vhost side.
> 
> Something like:
> 	start
> 	..wait a while.. see that packets are flowing.
> 	stop
> 	port stop
> 	port config
> 	port config
> 	port start
>>
>>> [OVS on host]
>>> 	Program received signal SIGSEGV, Segmentation fault.
>>> 	rte_memcpy (n=2056, src=0xc, dst=0x7ff4d5247000) at rte_memcpy.h
>>>
>>> 	(gdb) bt
>>> 	    #0  rte_memcpy (n=2056, src=0xc, dst=0x7ff4d5247000)
>>> 	    #1  copy_desc_to_mbuf
>>> 	    #2  rte_vhost_dequeue_burst
>>> 	    #3  netdev_dpdk_vhost_rxq_recv
>>> 	    ...
>>>
>>> 	(gdb) bt full
>>> 	    #0  rte_memcpy
>>> 	        ...
>>> 	    #1  copy_desc_to_mbuf
>>> 	        desc_addr = 0
>>> 	        mbuf_offset = 0
>>> 	        desc_offset = 12
>>> 	        ...
>>> <------------------------------------------------------------------------>
>>>
>>> Fix that by checking addresses of descriptors before using them.
>>>
>>> Note: For mergeable buffers this patch checks only guest's address for
>>> zero, but in non-meargeable case host's address checked. This is done
>>> because checking of host's address in mergeable case requires additional
>>> refactoring to keep virtqueue in consistent state in case of error.
>>>
>>> Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
>>> ---
>>>
>>> Actually, current virtio implementation looks broken for me. Because
>>> 'virtio_dev_start' breaks virtqueue while it still available from the vhost
>>> side.
>>>
>>> There was 2 patches about this behaviour:
>>>
>>> 	1. a85786dc816f ("virtio: fix states handling during initialization")
>>> 	2. 9a0615af7746 ("virtio: fix restart")
>>>
>>> The second patch fixes somehow issue intoduced in the first patch, but actually
>>> also breaks vhost in the way described above.
>>> It's not pretty clear for me what to do in current situation with virtio,
>>> because it will be broken for guest application even if vhost will not crash.
>>>
>>> May be it'll be better to forbid stopping of virtio device and force user to
>>> exit and start again (may be implemented in hidden from user way)?
>>>
>>> This patch adds additional sane checks, so it should be applied anyway, IMHO.
>>
>> Agreed.
>>
>> 	--yliu
>>
>>

Jianfeng Tan May 30, 2016, noon UTC | #4

Hi,

> -----Original Message-----
> From: Ilya Maximets [mailto:i.maximets@samsung.com]
> Sent: Friday, May 20, 2016 8:50 PM
> To: dev@dpdk.org; Xie, Huawei; Yuanhan Liu
> Cc: Dyasly Sergey; Heetae Ahn; Tan, Jianfeng; Ilya Maximets
> Subject: [PATCH] vhost: fix segfault on bad descriptor address.
> 
> In current implementation guest application can reinitialize vrings
> by executing start after stop. In the same time host application
> can still poll virtqueue while device stopped in guest and it will
> crash with segmentation fault while vring reinitialization because
> of dereferencing of bad descriptor addresses.
> 
> OVS crash for example:
> <------------------------------------------------------------------------>
> [test-pmd inside guest VM]
> 
> 	testpmd> port stop all
> 	    Stopping ports...
> 	    Checking link statuses...
> 	    Port 0 Link Up - speed 10000 Mbps - full-duplex
> 	    Done
> 	testpmd> port config all rxq 2
> 	testpmd> port config all txq 2
> 	testpmd> port start all
> 	    Configuring Port 0 (socket 0)
> 	    Port 0: 52:54:00:CB:44:C8
> 	    Checking link statuses...
> 	    Port 0 Link Up - speed 10000 Mbps - full-duplex
> 	    Done
> 
> [OVS on host]
> 	Program received signal SIGSEGV, Segmentation fault.
> 	rte_memcpy (n=2056, src=0xc, dst=0x7ff4d5247000) at
> rte_memcpy.h
> 
> 	(gdb) bt
> 	    #0  rte_memcpy (n=2056, src=0xc, dst=0x7ff4d5247000)
> 	    #1  copy_desc_to_mbuf
> 	    #2  rte_vhost_dequeue_burst
> 	    #3  netdev_dpdk_vhost_rxq_recv
> 	    ...
> 
> 	(gdb) bt full
> 	    #0  rte_memcpy
> 	        ...
> 	    #1  copy_desc_to_mbuf
> 	        desc_addr = 0
> 	        mbuf_offset = 0
> 	        desc_offset = 12
> 	        ...
> <------------------------------------------------------------------------>
> 
> Fix that by checking addresses of descriptors before using them.
> 
> Note: For mergeable buffers this patch checks only guest's address for
> zero, but in non-meargeable case host's address checked. This is done
> because checking of host's address in mergeable case requires additional
> refactoring to keep virtqueue in consistent state in case of error.


I agree with you that it should be fixed because malicious guest could launch DOS attack on vswitch with the current implementation.

But I don't understand why you do not fix the mergable case in copy_mbuf_to_desc_mergable() on where gpa_to_vva() happens? And the change in fill_vec_buf(), checking !vq->desc[idx].addr, make any sense?

Thanks,
Jianfeng

Ilya Maximets May 30, 2016, 12:24 p.m. UTC | #5

On 30.05.2016 15:00, Tan, Jianfeng wrote:
> Hi,
> 
>> -----Original Message-----
>> From: Ilya Maximets [mailto:i.maximets@samsung.com]
>> Sent: Friday, May 20, 2016 8:50 PM
>> To: dev@dpdk.org; Xie, Huawei; Yuanhan Liu
>> Cc: Dyasly Sergey; Heetae Ahn; Tan, Jianfeng; Ilya Maximets
>> Subject: [PATCH] vhost: fix segfault on bad descriptor address.
>>
>> In current implementation guest application can reinitialize vrings
>> by executing start after stop. In the same time host application
>> can still poll virtqueue while device stopped in guest and it will
>> crash with segmentation fault while vring reinitialization because
>> of dereferencing of bad descriptor addresses.
>>
>> OVS crash for example:
>> <------------------------------------------------------------------------>
>> [test-pmd inside guest VM]
>>
>> 	testpmd> port stop all
>> 	    Stopping ports...
>> 	    Checking link statuses...
>> 	    Port 0 Link Up - speed 10000 Mbps - full-duplex
>> 	    Done
>> 	testpmd> port config all rxq 2
>> 	testpmd> port config all txq 2
>> 	testpmd> port start all
>> 	    Configuring Port 0 (socket 0)
>> 	    Port 0: 52:54:00:CB:44:C8
>> 	    Checking link statuses...
>> 	    Port 0 Link Up - speed 10000 Mbps - full-duplex
>> 	    Done
>>
>> [OVS on host]
>> 	Program received signal SIGSEGV, Segmentation fault.
>> 	rte_memcpy (n=2056, src=0xc, dst=0x7ff4d5247000) at
>> rte_memcpy.h
>>
>> 	(gdb) bt
>> 	    #0  rte_memcpy (n=2056, src=0xc, dst=0x7ff4d5247000)
>> 	    #1  copy_desc_to_mbuf
>> 	    #2  rte_vhost_dequeue_burst
>> 	    #3  netdev_dpdk_vhost_rxq_recv
>> 	    ...
>>
>> 	(gdb) bt full
>> 	    #0  rte_memcpy
>> 	        ...
>> 	    #1  copy_desc_to_mbuf
>> 	        desc_addr = 0
>> 	        mbuf_offset = 0
>> 	        desc_offset = 12
>> 	        ...
>> <------------------------------------------------------------------------>
>>
>> Fix that by checking addresses of descriptors before using them.
>>
>> Note: For mergeable buffers this patch checks only guest's address for
>> zero, but in non-meargeable case host's address checked. This is done
>> because checking of host's address in mergeable case requires additional
>> refactoring to keep virtqueue in consistent state in case of error.
> 
> 
> I agree with you that it should be fixed because malicious guest could launch
> DOS attack on vswitch with the current implementation.
> 
> But I don't understand why you do not fix the mergable case in
> copy_mbuf_to_desc_mergable() on where gpa_to_vva() happens? And the change in
> fill_vec_buf(), checking !vq->desc[idx].addr, make any sense?
> 
> Thanks,
> Jianfeng

Hi.
As I said inside commit-message, checking of host's address in mergeable case
requires additional refactoring to keep virtqueue in consistent state.

There are few issues with checking inside copy_mbuf_to_desc_mergable() :

	1. Ring elements already reserved and we must fill them with some
	   sane data before going out of virtio_dev_merge_rx().

	2. copy_mbuf_to_desc_mergable() can't return an error in current
	   implementation (additional checking needed), otherwise used->idx
	   will be decremented (I think, it's bad).


Checking !vq->desc[idx].addr inside fill_vec_buf() make sense in case of virtio
reinitialization, because guest's address will be zero (case described in
commit-message). Checking of guest's address will not help in case of bad and
not NULL address, but useful in this common case.
Also, we can't catch bad address what we able to map, so, malicious guest could
break vhost anyway.

I agree, that checking of host's address is better, but this may be done later
together with resolving above issues.

Best regards, Ilya Maximets.

Yuanhan Liu May 30, 2016, 2:25 p.m. UTC | #6

Hi Ilya,

Generically speaking, this patch looks good to me. But I guess still
need more time to check this issue later; I still failed to reproduce
it on my side after all. So, please allow a late merge.

Thanks.

	--yliu

On Mon, May 30, 2016 at 02:05:07PM +0300, Ilya Maximets wrote:
> Ping.
> 
> Best regards, Ilya Maximets.
> 
> On 23.05.2016 14:04, Ilya Maximets wrote:
> > On 23.05.2016 13:57, Yuanhan Liu wrote:
> >> On Fri, May 20, 2016 at 03:50:04PM +0300, Ilya Maximets wrote:
> >>> In current implementation guest application can reinitialize vrings
> >>> by executing start after stop. In the same time host application
> >>> can still poll virtqueue while device stopped in guest and it will
> >>> crash with segmentation fault while vring reinitialization because
> >>> of dereferencing of bad descriptor addresses.
> >>>
> >>> OVS crash for example:
> >>> <------------------------------------------------------------------------>
> >>> [test-pmd inside guest VM]
> >>>
> >>> 	testpmd> port stop all
> >>> 	    Stopping ports...
> >>> 	    Checking link statuses...
> >>> 	    Port 0 Link Up - speed 10000 Mbps - full-duplex
> >>> 	    Done
> >>> 	testpmd> port config all rxq 2
> >>> 	testpmd> port config all txq 2
> >>> 	testpmd> port start all
> >>> 	    Configuring Port 0 (socket 0)
> >>> 	    Port 0: 52:54:00:CB:44:C8
> >>> 	    Checking link statuses...
> >>> 	    Port 0 Link Up - speed 10000 Mbps - full-duplex
> >>> 	    Done
> >>
> >> I actually didn't manage to reproduce it on my side, with the
> >> vhost-example instead of OVS though. Is that all the commands
> >> to reproduce it, and run them just after start test-pmd?
> > 
> > Actually, I think, packet flow should be enabled while performing
> > above actions and some traffic already should be sent through port
> > to change last used idx on vhost side.
> > 
> > Something like:
> > 	start
> > 	..wait a while.. see that packets are flowing.
> > 	stop
> > 	port stop
> > 	port config
> > 	port config
> > 	port start
> >>
> >>> [OVS on host]
> >>> 	Program received signal SIGSEGV, Segmentation fault.
> >>> 	rte_memcpy (n=2056, src=0xc, dst=0x7ff4d5247000) at rte_memcpy.h
> >>>
> >>> 	(gdb) bt
> >>> 	    #0  rte_memcpy (n=2056, src=0xc, dst=0x7ff4d5247000)
> >>> 	    #1  copy_desc_to_mbuf
> >>> 	    #2  rte_vhost_dequeue_burst
> >>> 	    #3  netdev_dpdk_vhost_rxq_recv
> >>> 	    ...
> >>>
> >>> 	(gdb) bt full
> >>> 	    #0  rte_memcpy
> >>> 	        ...
> >>> 	    #1  copy_desc_to_mbuf
> >>> 	        desc_addr = 0
> >>> 	        mbuf_offset = 0
> >>> 	        desc_offset = 12
> >>> 	        ...
> >>> <------------------------------------------------------------------------>
> >>>
> >>> Fix that by checking addresses of descriptors before using them.
> >>>
> >>> Note: For mergeable buffers this patch checks only guest's address for
> >>> zero, but in non-meargeable case host's address checked. This is done
> >>> because checking of host's address in mergeable case requires additional
> >>> refactoring to keep virtqueue in consistent state in case of error.
> >>>
> >>> Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
> >>> ---
> >>>
> >>> Actually, current virtio implementation looks broken for me. Because
> >>> 'virtio_dev_start' breaks virtqueue while it still available from the vhost
> >>> side.
> >>>
> >>> There was 2 patches about this behaviour:
> >>>
> >>> 	1. a85786dc816f ("virtio: fix states handling during initialization")
> >>> 	2. 9a0615af7746 ("virtio: fix restart")
> >>>
> >>> The second patch fixes somehow issue intoduced in the first patch, but actually
> >>> also breaks vhost in the way described above.
> >>> It's not pretty clear for me what to do in current situation with virtio,
> >>> because it will be broken for guest application even if vhost will not crash.
> >>>
> >>> May be it'll be better to forbid stopping of virtio device and force user to
> >>> exit and start again (may be implemented in hidden from user way)?
> >>>
> >>> This patch adds additional sane checks, so it should be applied anyway, IMHO.
> >>
> >> Agreed.
> >>
> >> 	--yliu
> >>
> >>

Jianfeng Tan May 31, 2016, 6:53 a.m. UTC | #7

Hi,


On 5/30/2016 8:24 PM, Ilya Maximets wrote:
> On 30.05.2016 15:00, Tan, Jianfeng wrote:
>> Hi,
>>
>>> -----Original Message-----
>>> From: Ilya Maximets [mailto:i.maximets@samsung.com]
>>> Sent: Friday, May 20, 2016 8:50 PM
>>> To: dev@dpdk.org; Xie, Huawei; Yuanhan Liu
>>> Cc: Dyasly Sergey; Heetae Ahn; Tan, Jianfeng; Ilya Maximets
>>> Subject: [PATCH] vhost: fix segfault on bad descriptor address.
>>>
>>> In current implementation guest application can reinitialize vrings
>>> by executing start after stop. In the same time host application
>>> can still poll virtqueue while device stopped in guest and it will
>>> crash with segmentation fault while vring reinitialization because
>>> of dereferencing of bad descriptor addresses.
>>>
>>> OVS crash for example:
>>> <------------------------------------------------------------------------>
>>> [test-pmd inside guest VM]
>>>
>>> 	testpmd> port stop all
>>> 	    Stopping ports...
>>> 	    Checking link statuses...
>>> 	    Port 0 Link Up - speed 10000 Mbps - full-duplex
>>> 	    Done
>>> 	testpmd> port config all rxq 2
>>> 	testpmd> port config all txq 2
>>> 	testpmd> port start all
>>> 	    Configuring Port 0 (socket 0)
>>> 	    Port 0: 52:54:00:CB:44:C8
>>> 	    Checking link statuses...
>>> 	    Port 0 Link Up - speed 10000 Mbps - full-duplex
>>> 	    Done
>>>
>>> [OVS on host]
>>> 	Program received signal SIGSEGV, Segmentation fault.
>>> 	rte_memcpy (n=2056, src=0xc, dst=0x7ff4d5247000) at
>>> rte_memcpy.h
>>>
>>> 	(gdb) bt
>>> 	    #0  rte_memcpy (n=2056, src=0xc, dst=0x7ff4d5247000)
>>> 	    #1  copy_desc_to_mbuf
>>> 	    #2  rte_vhost_dequeue_burst
>>> 	    #3  netdev_dpdk_vhost_rxq_recv
>>> 	    ...
>>>
>>> 	(gdb) bt full
>>> 	    #0  rte_memcpy
>>> 	        ...
>>> 	    #1  copy_desc_to_mbuf
>>> 	        desc_addr = 0
>>> 	        mbuf_offset = 0
>>> 	        desc_offset = 12
>>> 	        ...
>>> <------------------------------------------------------------------------>
>>>
>>> Fix that by checking addresses of descriptors before using them.
>>>
>>> Note: For mergeable buffers this patch checks only guest's address for
>>> zero, but in non-meargeable case host's address checked. This is done
>>> because checking of host's address in mergeable case requires additional
>>> refactoring to keep virtqueue in consistent state in case of error.
>>
>> I agree with you that it should be fixed because malicious guest could launch
>> DOS attack on vswitch with the current implementation.
>>
>> But I don't understand why you do not fix the mergable case in
>> copy_mbuf_to_desc_mergable() on where gpa_to_vva() happens? And the change in
>> fill_vec_buf(), checking !vq->desc[idx].addr, make any sense?
>>
>> Thanks,
>> Jianfeng
> Hi.
> As I said inside commit-message, checking of host's address in mergeable case
> requires additional refactoring to keep virtqueue in consistent state.
>
> There are few issues with checking inside copy_mbuf_to_desc_mergable() :
>
> 	1. Ring elements already reserved and we must fill them with some
> 	   sane data before going out of virtio_dev_merge_rx().
>
> 	2. copy_mbuf_to_desc_mergable() can't return an error in current
> 	   implementation (additional checking needed), otherwise used->idx
> 	   will be decremented (I think, it's bad).

Yes, currently there is no way to return these invalid desc back to 
virtio because there's no invalid flag in virtio_net_hdr to indicate 
this desc contains no pkt. I see kernel just skips those descriptors 
with bad addr. I think it may rely on reset of the virtio device to 
improve such situation.

Another thing is that, your patch only checks the desc->addr, but we 
should check desc->addr + desc->len too, right?

Thanks,
Jianfeng

>
>
> Checking !vq->desc[idx].addr inside fill_vec_buf() make sense in case of virtio
> reinitialization, because guest's address will be zero (case described in
> commit-message). Checking of guest's address will not help in case of bad and
> not NULL address, but useful in this common case.
> Also, we can't catch bad address what we able to map, so, malicious guest could
> break vhost anyway.
>
> I agree, that checking of host's address is better, but this may be done later
> together with resolving above issues.
>
> Best regards, Ilya Maximets.

Ilya Maximets May 31, 2016, 9:10 a.m. UTC | #8

On 31.05.2016 09:53, Tan, Jianfeng wrote:
> Hi,
> 
> 
> On 5/30/2016 8:24 PM, Ilya Maximets wrote:
>> On 30.05.2016 15:00, Tan, Jianfeng wrote:
>>> Hi,
>>>
>>>> -----Original Message-----
>>>> From: Ilya Maximets [mailto:i.maximets@samsung.com]
>>>> Sent: Friday, May 20, 2016 8:50 PM
>>>> To: dev@dpdk.org; Xie, Huawei; Yuanhan Liu
>>>> Cc: Dyasly Sergey; Heetae Ahn; Tan, Jianfeng; Ilya Maximets
>>>> Subject: [PATCH] vhost: fix segfault on bad descriptor address.
>>>>
>>>> In current implementation guest application can reinitialize vrings
>>>> by executing start after stop. In the same time host application
>>>> can still poll virtqueue while device stopped in guest and it will
>>>> crash with segmentation fault while vring reinitialization because
>>>> of dereferencing of bad descriptor addresses.
>>>>
>>>> OVS crash for example:
>>>> <------------------------------------------------------------------------>
>>>> [test-pmd inside guest VM]
>>>>
>>>>     testpmd> port stop all
>>>>         Stopping ports...
>>>>         Checking link statuses...
>>>>         Port 0 Link Up - speed 10000 Mbps - full-duplex
>>>>         Done
>>>>     testpmd> port config all rxq 2
>>>>     testpmd> port config all txq 2
>>>>     testpmd> port start all
>>>>         Configuring Port 0 (socket 0)
>>>>         Port 0: 52:54:00:CB:44:C8
>>>>         Checking link statuses...
>>>>         Port 0 Link Up - speed 10000 Mbps - full-duplex
>>>>         Done
>>>>
>>>> [OVS on host]
>>>>     Program received signal SIGSEGV, Segmentation fault.
>>>>     rte_memcpy (n=2056, src=0xc, dst=0x7ff4d5247000) at
>>>> rte_memcpy.h
>>>>
>>>>     (gdb) bt
>>>>         #0  rte_memcpy (n=2056, src=0xc, dst=0x7ff4d5247000)
>>>>         #1  copy_desc_to_mbuf
>>>>         #2  rte_vhost_dequeue_burst
>>>>         #3  netdev_dpdk_vhost_rxq_recv
>>>>         ...
>>>>
>>>>     (gdb) bt full
>>>>         #0  rte_memcpy
>>>>             ...
>>>>         #1  copy_desc_to_mbuf
>>>>             desc_addr = 0
>>>>             mbuf_offset = 0
>>>>             desc_offset = 12
>>>>             ...
>>>> <------------------------------------------------------------------------>
>>>>
>>>> Fix that by checking addresses of descriptors before using them.
>>>>
>>>> Note: For mergeable buffers this patch checks only guest's address for
>>>> zero, but in non-meargeable case host's address checked. This is done
>>>> because checking of host's address in mergeable case requires additional
>>>> refactoring to keep virtqueue in consistent state in case of error.
>>>
>>> I agree with you that it should be fixed because malicious guest could launch
>>> DOS attack on vswitch with the current implementation.
>>>
>>> But I don't understand why you do not fix the mergable case in
>>> copy_mbuf_to_desc_mergable() on where gpa_to_vva() happens? And the change in
>>> fill_vec_buf(), checking !vq->desc[idx].addr, make any sense?
>>>
>>> Thanks,
>>> Jianfeng
>> Hi.
>> As I said inside commit-message, checking of host's address in mergeable case
>> requires additional refactoring to keep virtqueue in consistent state.
>>
>> There are few issues with checking inside copy_mbuf_to_desc_mergable() :
>>
>>     1. Ring elements already reserved and we must fill them with some
>>        sane data before going out of virtio_dev_merge_rx().
>>
>>     2. copy_mbuf_to_desc_mergable() can't return an error in current
>>        implementation (additional checking needed), otherwise used->idx
>>        will be decremented (I think, it's bad).
> 
> Yes, currently there is no way to return these invalid desc back to virtio because there's no invalid flag in virtio_net_hdr to indicate this desc contains no pkt. I see kernel just skips those descriptors with bad addr. I think it may rely on reset of the virtio device to improve such situation.
> 
> Another thing is that, your patch only checks the desc->addr, but we should check desc->addr + desc->len too, right?

To do it fast we need to check whole range inside gpa_to_vva(), but even
more refactoring is required for that. Also, this can be a different
patch because checking of addr + len not required to fix original issue
with virtio reconfiguration.

> 
> Thanks,
> Jianfeng
> 
>>
>>
>> Checking !vq->desc[idx].addr inside fill_vec_buf() make sense in case of virtio
>> reinitialization, because guest's address will be zero (case described in
>> commit-message). Checking of guest's address will not help in case of bad and
>> not NULL address, but useful in this common case.
>> Also, we can't catch bad address what we able to map, so, malicious guest could
>> break vhost anyway.
>>
>> I agree, that checking of host's address is better, but this may be done later
>> together with resolving above issues.
>>
>> Best regards, Ilya Maximets.

Ilya Maximets May 31, 2016, 9:12 a.m. UTC | #9

OK.

On 30.05.2016 17:25, Yuanhan Liu wrote:
> Hi Ilya,
> 
> Generically speaking, this patch looks good to me. But I guess still
> need more time to check this issue later; I still failed to reproduce
> it on my side after all. So, please allow a late merge.
> 
> Thanks.
> 
> 	--yliu
> 
> On Mon, May 30, 2016 at 02:05:07PM +0300, Ilya Maximets wrote:
>> Ping.
>>
>> Best regards, Ilya Maximets.
>>
>> On 23.05.2016 14:04, Ilya Maximets wrote:
>>> On 23.05.2016 13:57, Yuanhan Liu wrote:
>>>> On Fri, May 20, 2016 at 03:50:04PM +0300, Ilya Maximets wrote:
>>>>> In current implementation guest application can reinitialize vrings
>>>>> by executing start after stop. In the same time host application
>>>>> can still poll virtqueue while device stopped in guest and it will
>>>>> crash with segmentation fault while vring reinitialization because
>>>>> of dereferencing of bad descriptor addresses.
>>>>>
>>>>> OVS crash for example:
>>>>> <------------------------------------------------------------------------>
>>>>> [test-pmd inside guest VM]
>>>>>
>>>>> 	testpmd> port stop all
>>>>> 	    Stopping ports...
>>>>> 	    Checking link statuses...
>>>>> 	    Port 0 Link Up - speed 10000 Mbps - full-duplex
>>>>> 	    Done
>>>>> 	testpmd> port config all rxq 2
>>>>> 	testpmd> port config all txq 2
>>>>> 	testpmd> port start all
>>>>> 	    Configuring Port 0 (socket 0)
>>>>> 	    Port 0: 52:54:00:CB:44:C8
>>>>> 	    Checking link statuses...
>>>>> 	    Port 0 Link Up - speed 10000 Mbps - full-duplex
>>>>> 	    Done
>>>>
>>>> I actually didn't manage to reproduce it on my side, with the
>>>> vhost-example instead of OVS though. Is that all the commands
>>>> to reproduce it, and run them just after start test-pmd?
>>>
>>> Actually, I think, packet flow should be enabled while performing
>>> above actions and some traffic already should be sent through port
>>> to change last used idx on vhost side.
>>>
>>> Something like:
>>> 	start
>>> 	..wait a while.. see that packets are flowing.
>>> 	stop
>>> 	port stop
>>> 	port config
>>> 	port config
>>> 	port start
>>>>
>>>>> [OVS on host]
>>>>> 	Program received signal SIGSEGV, Segmentation fault.
>>>>> 	rte_memcpy (n=2056, src=0xc, dst=0x7ff4d5247000) at rte_memcpy.h
>>>>>
>>>>> 	(gdb) bt
>>>>> 	    #0  rte_memcpy (n=2056, src=0xc, dst=0x7ff4d5247000)
>>>>> 	    #1  copy_desc_to_mbuf
>>>>> 	    #2  rte_vhost_dequeue_burst
>>>>> 	    #3  netdev_dpdk_vhost_rxq_recv
>>>>> 	    ...
>>>>>
>>>>> 	(gdb) bt full
>>>>> 	    #0  rte_memcpy
>>>>> 	        ...
>>>>> 	    #1  copy_desc_to_mbuf
>>>>> 	        desc_addr = 0
>>>>> 	        mbuf_offset = 0
>>>>> 	        desc_offset = 12
>>>>> 	        ...
>>>>> <------------------------------------------------------------------------>
>>>>>
>>>>> Fix that by checking addresses of descriptors before using them.
>>>>>
>>>>> Note: For mergeable buffers this patch checks only guest's address for
>>>>> zero, but in non-meargeable case host's address checked. This is done
>>>>> because checking of host's address in mergeable case requires additional
>>>>> refactoring to keep virtqueue in consistent state in case of error.
>>>>>
>>>>> Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
>>>>> ---
>>>>>
>>>>> Actually, current virtio implementation looks broken for me. Because
>>>>> 'virtio_dev_start' breaks virtqueue while it still available from the vhost
>>>>> side.
>>>>>
>>>>> There was 2 patches about this behaviour:
>>>>>
>>>>> 	1. a85786dc816f ("virtio: fix states handling during initialization")
>>>>> 	2. 9a0615af7746 ("virtio: fix restart")
>>>>>
>>>>> The second patch fixes somehow issue intoduced in the first patch, but actually
>>>>> also breaks vhost in the way described above.
>>>>> It's not pretty clear for me what to do in current situation with virtio,
>>>>> because it will be broken for guest application even if vhost will not crash.
>>>>>
>>>>> May be it'll be better to forbid stopping of virtio device and force user to
>>>>> exit and start again (may be implemented in hidden from user way)?
>>>>>
>>>>> This patch adds additional sane checks, so it should be applied anyway, IMHO.
>>>>
>>>> Agreed.
>>>>
>>>> 	--yliu
>>>>
>>>>
> 
>

Rich Lane May 31, 2016, 10:06 p.m. UTC | #10

On Fri, May 20, 2016 at 5:50 AM, Ilya Maximets <i.maximets@samsung.com>
wrote:

> In current implementation guest application can reinitialize vrings
> by executing start after stop. In the same time host application
> can still poll virtqueue while device stopped in guest and it will
> crash with segmentation fault while vring reinitialization because
> of dereferencing of bad descriptor addresses.
>

I see a performance regression with this patch at large packet sizes (> 768
bytes). rte_vhost_enqueue_burst is consuming 10% more cycles. Strangely,
there's actually a ~1% performance improvement at small packet sizes.

The regression happens with GCC 4.8.4 and 5.3.0, but not 6.1.1.

AFAICT this is just the compiler generating bad code. One difference is
that it's storing the offset on the stack instead of in a register. A
workaround is to move the !desc_addr check outside the unlikely macros.

--- a/lib/librte_vhost/vhost_rxtx.c
> +++ b/lib/librte_vhost/vhost_rxtx.c
> @@ -147,10 +147,10 @@ copy_mbuf_to_desc(struct virtio_net *dev, struct
> vhost_virtqueue *vq,
>         struct virtio_net_hdr_mrg_rxbuf virtio_hdr = {{0, 0, 0, 0, 0, 0},
> 0};
>
>         desc = &vq->desc[desc_idx];
> -       if (unlikely(desc->len < vq->vhost_hlen))
> +       desc_addr = gpa_to_vva(dev, desc->addr);
> +       if (unlikely(desc->len < vq->vhost_hlen || !desc_addr))
>

 Workaround: change to "if (unlikely(desc->len < vq->vhost_hlen) ||
!desc_addr)".

                return -1;

> -       desc_addr = gpa_to_vva(dev, desc->addr);
>         rte_prefetch0((void *)(uintptr_t)desc_addr);
>
>         virtio_enqueue_offload(m, &virtio_hdr.hdr);
> @@ -184,6 +184,9 @@ copy_mbuf_to_desc(struct virtio_net *dev, struct
> vhost_virtqueue *vq,
>
>                         desc = &vq->desc[desc->next];
>                         desc_addr   = gpa_to_vva(dev, desc->addr);
> +                       if (unlikely(!desc_addr))
>

Workaround: change to "if (!desc_addr)".

> +                               return -1;
> +
>                         desc_offset = 0;
>                         desc_avail  = desc->len;
>                 }
>

Ilya Maximets June 2, 2016, 10:46 a.m. UTC | #11

Hi, Rich.
Thank you for testing and analysing.

On 01.06.2016 01:06, Rich Lane wrote:
> On Fri, May 20, 2016 at 5:50 AM, Ilya Maximets <i.maximets@samsung.com <mailto:i.maximets@samsung.com>> wrote:
> 
>     In current implementation guest application can reinitialize vrings
>     by executing start after stop. In the same time host application
>     can still poll virtqueue while device stopped in guest and it will
>     crash with segmentation fault while vring reinitialization because
>     of dereferencing of bad descriptor addresses.
> 
> 
> I see a performance regression with this patch at large packet sizes (> 768 bytes). rte_vhost_enqueue_burst is consuming 10% more cycles. Strangely, there's actually a ~1% performance improvement at small packet sizes.
> 
> The regression happens with GCC 4.8.4 and 5.3.0, but not 6.1.1.
> 
> AFAICT this is just the compiler generating bad code. One difference is that it's storing the offset on the stack instead of in a register. A workaround is to move the !desc_addr check outside the unlikely macros.
> 
>     --- a/lib/librte_vhost/vhost_rxtx.c
>     +++ b/lib/librte_vhost/vhost_rxtx.c
>     @@ -147,10 +147,10 @@ copy_mbuf_to_desc(struct virtio_net *dev, struct vhost_virtqueue *vq,
>             struct virtio_net_hdr_mrg_rxbuf virtio_hdr = {{0, 0, 0, 0, 0, 0}, 0};
> 
>             desc = &vq->desc[desc_idx];
>     -       if (unlikely(desc->len < vq->vhost_hlen))
>     +       desc_addr = gpa_to_vva(dev, desc->addr);
>     +       if (unlikely(desc->len < vq->vhost_hlen || !desc_addr))
> 
> 
>  Workaround: change to "if (unlikely(desc->len < vq->vhost_hlen) || !desc_addr)".
> 
>                     return -1;
> 
> 
>     -       desc_addr = gpa_to_vva(dev, desc->addr);
>             rte_prefetch0((void *)(uintptr_t)desc_addr);
> 
>             virtio_enqueue_offload(m, &virtio_hdr.hdr);
>     @@ -184,6 +184,9 @@ copy_mbuf_to_desc(struct virtio_net *dev, struct vhost_virtqueue *vq,
> 
>                             desc = &vq->desc[desc->next];
>                             desc_addr   = gpa_to_vva(dev, desc->addr);
>     +                       if (unlikely(!desc_addr))
> 
> 
> Workaround: change to "if (!desc_addr)".
>  
> 
>     +                               return -1;
>     +
>                             desc_offset = 0;
>                             desc_avail  = desc->len;
>                     }
> 

What about other places? Is there same issues or it's only inside copy_mbuf_to_desc() ?

Best regards, Ilya Maximets.

Rich Lane June 2, 2016, 4:22 p.m. UTC | #12

On Thu, Jun 2, 2016 at 3:46 AM, Ilya Maximets <i.maximets@samsung.com>
wrote:

> Hi, Rich.
> Thank you for testing and analysing.
>
> On 01.06.2016 01:06, Rich Lane wrote:
> > On Fri, May 20, 2016 at 5:50 AM, Ilya Maximets <i.maximets@samsung.com
> <mailto:i.maximets@samsung.com>> wrote:
> >
> >     In current implementation guest application can reinitialize vrings
> >     by executing start after stop. In the same time host application
> >     can still poll virtqueue while device stopped in guest and it will
> >     crash with segmentation fault while vring reinitialization because
> >     of dereferencing of bad descriptor addresses.
> >
> >
> > I see a performance regression with this patch at large packet sizes (>
> 768 bytes). rte_vhost_enqueue_burst is consuming 10% more cycles.
> Strangely, there's actually a ~1% performance improvement at small packet
> sizes.
> >
> > The regression happens with GCC 4.8.4 and 5.3.0, but not 6.1.1.
> >
> > AFAICT this is just the compiler generating bad code. One difference is
> that it's storing the offset on the stack instead of in a register. A
> workaround is to move the !desc_addr check outside the unlikely macros.
> >
> >     --- a/lib/librte_vhost/vhost_rxtx.c
> >     +++ b/lib/librte_vhost/vhost_rxtx.c
> >     @@ -147,10 +147,10 @@ copy_mbuf_to_desc(struct virtio_net *dev,
> struct vhost_virtqueue *vq,
> >             struct virtio_net_hdr_mrg_rxbuf virtio_hdr = {{0, 0, 0, 0,
> 0, 0}, 0};
> >
> >             desc = &vq->desc[desc_idx];
> >     -       if (unlikely(desc->len < vq->vhost_hlen))
> >     +       desc_addr = gpa_to_vva(dev, desc->addr);
> >     +       if (unlikely(desc->len < vq->vhost_hlen || !desc_addr))
> >
> >
> >  Workaround: change to "if (unlikely(desc->len < vq->vhost_hlen) ||
> !desc_addr)".
> >
> >                     return -1;
> >
> >
> >     -       desc_addr = gpa_to_vva(dev, desc->addr);
> >             rte_prefetch0((void *)(uintptr_t)desc_addr);
> >
> >             virtio_enqueue_offload(m, &virtio_hdr.hdr);
> >     @@ -184,6 +184,9 @@ copy_mbuf_to_desc(struct virtio_net *dev, struct
> vhost_virtqueue *vq,
> >
> >                             desc = &vq->desc[desc->next];
> >                             desc_addr   = gpa_to_vva(dev, desc->addr);
> >     +                       if (unlikely(!desc_addr))
> >
> >
> > Workaround: change to "if (!desc_addr)".
> >
> >
> >     +                               return -1;
> >     +
> >                             desc_offset = 0;
> >                             desc_avail  = desc->len;
> >                     }
> >
>
> What about other places? Is there same issues or it's only inside
> copy_mbuf_to_desc() ?
>

Only copy_mbuf_to_desc has the issue.

Ilya Maximets June 3, 2016, 6:01 a.m. UTC | #13

On 02.06.2016 19:22, Rich Lane wrote:
> On Thu, Jun 2, 2016 at 3:46 AM, Ilya Maximets <i.maximets@samsung.com <mailto:i.maximets@samsung.com>> wrote:
> 
>     Hi, Rich.
>     Thank you for testing and analysing.
> 
>     On 01.06.2016 01:06, Rich Lane wrote:
>     > On Fri, May 20, 2016 at 5:50 AM, Ilya Maximets <i.maximets@samsung.com <mailto:i.maximets@samsung.com> <mailto:i.maximets@samsung.com <mailto:i.maximets@samsung.com>>> wrote:
>     >
>     >     In current implementation guest application can reinitialize vrings
>     >     by executing start after stop. In the same time host application
>     >     can still poll virtqueue while device stopped in guest and it will
>     >     crash with segmentation fault while vring reinitialization because
>     >     of dereferencing of bad descriptor addresses.
>     >
>     >
>     > I see a performance regression with this patch at large packet sizes (> 768 bytes). rte_vhost_enqueue_burst is consuming 10% more cycles. Strangely, there's actually a ~1% performance improvement at small packet sizes.
>     >
>     > The regression happens with GCC 4.8.4 and 5.3.0, but not 6.1.1.
>     >
>     > AFAICT this is just the compiler generating bad code. One difference is that it's storing the offset on the stack instead of in a register. A workaround is to move the !desc_addr check outside the unlikely macros.
>     >
>     >     --- a/lib/librte_vhost/vhost_rxtx.c
>     >     +++ b/lib/librte_vhost/vhost_rxtx.c
>     >     @@ -147,10 +147,10 @@ copy_mbuf_to_desc(struct virtio_net *dev, struct vhost_virtqueue *vq,
>     >             struct virtio_net_hdr_mrg_rxbuf virtio_hdr = {{0, 0, 0, 0, 0, 0}, 0};
>     >
>     >             desc = &vq->desc[desc_idx];
>     >     -       if (unlikely(desc->len < vq->vhost_hlen))
>     >     +       desc_addr = gpa_to_vva(dev, desc->addr);
>     >     +       if (unlikely(desc->len < vq->vhost_hlen || !desc_addr))
>     >
>     >
>     >  Workaround: change to "if (unlikely(desc->len < vq->vhost_hlen) || !desc_addr)".
>     >
>     >                     return -1;
>     >
>     >
>     >     -       desc_addr = gpa_to_vva(dev, desc->addr);
>     >             rte_prefetch0((void *)(uintptr_t)desc_addr);
>     >
>     >             virtio_enqueue_offload(m, &virtio_hdr.hdr);
>     >     @@ -184,6 +184,9 @@ copy_mbuf_to_desc(struct virtio_net *dev, struct vhost_virtqueue *vq,
>     >
>     >                             desc = &vq->desc[desc->next];
>     >                             desc_addr   = gpa_to_vva(dev, desc->addr);
>     >     +                       if (unlikely(!desc_addr))
>     >
>     >
>     > Workaround: change to "if (!desc_addr)".
>     >
>     >
>     >     +                               return -1;
>     >     +
>     >                             desc_offset = 0;
>     >                             desc_avail  = desc->len;
>     >                     }
>     >
> 
>     What about other places? Is there same issues or it's only inside copy_mbuf_to_desc() ?
> 
> 
> Only copy_mbuf_to_desc has the issue.

Ok.
Actually, I can't reproduce this performance issue using gcc 4.8.5 from RHEL 7.2.
I'm not sure if I should post v2 with above fixes. May be them could be applied
while pushing patch to repository? 

Best regards, Ilya Maximets.

Yuanhan Liu July 1, 2016, 7:35 a.m. UTC | #14

Hi,

Sorry for the long delay.

On Fri, May 20, 2016 at 03:50:04PM +0300, Ilya Maximets wrote:
> In current implementation guest application can reinitialize vrings
> by executing start after stop. In the same time host application
> can still poll virtqueue while device stopped in guest and it will
> crash with segmentation fault while vring reinitialization because
> of dereferencing of bad descriptor addresses.

Yes, you are right that vring will be reinitialized after restart.
But even though, I don't see the reason it will cause a vhost crash,
since the reinitialization will reset all the vring memeory by 0:

    memset(vq->vq_ring_virt_mem, 0, vq->vq_ring_size);

That means those bad descriptors will be skipped, safely, at vhost
side by:

	if (unlikely(desc->len < dev->vhost_hlen))
		return -1;

> 
> OVS crash for example:
> <------------------------------------------------------------------------>
> [test-pmd inside guest VM]
> 
> 	testpmd> port stop all
> 	    Stopping ports...
> 	    Checking link statuses...
> 	    Port 0 Link Up - speed 10000 Mbps - full-duplex
> 	    Done
> 	testpmd> port config all rxq 2
> 	testpmd> port config all txq 2
> 	testpmd> port start all
> 	    Configuring Port 0 (socket 0)
> 	    Port 0: 52:54:00:CB:44:C8
> 	    Checking link statuses...
> 	    Port 0 Link Up - speed 10000 Mbps - full-duplex
> 	    Done
> 
> [OVS on host]
> 	Program received signal SIGSEGV, Segmentation fault.
> 	rte_memcpy (n=2056, src=0xc, dst=0x7ff4d5247000) at rte_memcpy.h

Interesting, so it bypasses the above check since desc->len is non-zero
while desc->addr is zero. The size (2056) also looks weird.

Do you mind to check this issue a bit deeper, say why desc->addr is
zero, however, desc->len is not?

> 	(gdb) bt
> 	    #0  rte_memcpy (n=2056, src=0xc, dst=0x7ff4d5247000)
> 	    #1  copy_desc_to_mbuf
> 	    #2  rte_vhost_dequeue_burst
> 	    #3  netdev_dpdk_vhost_rxq_recv
> 	    ...
> 
> 	(gdb) bt full
> 	    #0  rte_memcpy
> 	        ...
> 	    #1  copy_desc_to_mbuf
> 	        desc_addr = 0
> 	        mbuf_offset = 0
> 	        desc_offset = 12
> 	        ...
> <------------------------------------------------------------------------>
> 
> Fix that by checking addresses of descriptors before using them.
> 
> Note: For mergeable buffers this patch checks only guest's address for
> zero, but in non-meargeable case host's address checked. This is done
> because checking of host's address in mergeable case requires additional
> refactoring to keep virtqueue in consistent state in case of error.
> 
> Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
> ---
> 
> Actually, current virtio implementation looks broken for me. Because
> 'virtio_dev_start' breaks virtqueue while it still available from the vhost
> side.

Yes, this sounds buggy. Maybe we could not reset the avail idx, in such
case vhost dequeue/enqueue will just return as there are no more packets
to dequeue and no more space to enqueue, respectively?

	--yliu

Ilya Maximets July 6, 2016, 11:19 a.m. UTC | #15

On 01.07.2016 10:35, Yuanhan Liu wrote:
> Hi,
> 
> Sorry for the long delay.
> 
> On Fri, May 20, 2016 at 03:50:04PM +0300, Ilya Maximets wrote:
>> In current implementation guest application can reinitialize vrings
>> by executing start after stop. In the same time host application
>> can still poll virtqueue while device stopped in guest and it will
>> crash with segmentation fault while vring reinitialization because
>> of dereferencing of bad descriptor addresses.
> 
> Yes, you are right that vring will be reinitialized after restart.
> But even though, I don't see the reason it will cause a vhost crash,
> since the reinitialization will reset all the vring memeory by 0:
> 
>     memset(vq->vq_ring_virt_mem, 0, vq->vq_ring_size);
> 
> That means those bad descriptors will be skipped, safely, at vhost
> side by:
> 
> 	if (unlikely(desc->len < dev->vhost_hlen))
> 		return -1;
> 
>>
>> OVS crash for example:
>> <------------------------------------------------------------------------>
>> [test-pmd inside guest VM]
>>
>> 	testpmd> port stop all
>> 	    Stopping ports...
>> 	    Checking link statuses...
>> 	    Port 0 Link Up - speed 10000 Mbps - full-duplex
>> 	    Done
>> 	testpmd> port config all rxq 2
>> 	testpmd> port config all txq 2
>> 	testpmd> port start all
>> 	    Configuring Port 0 (socket 0)
>> 	    Port 0: 52:54:00:CB:44:C8
>> 	    Checking link statuses...
>> 	    Port 0 Link Up - speed 10000 Mbps - full-duplex
>> 	    Done
>>
>> [OVS on host]
>> 	Program received signal SIGSEGV, Segmentation fault.
>> 	rte_memcpy (n=2056, src=0xc, dst=0x7ff4d5247000) at rte_memcpy.h
> 
> Interesting, so it bypasses the above check since desc->len is non-zero
> while desc->addr is zero. The size (2056) also looks weird.
> 
> Do you mind to check this issue a bit deeper, say why desc->addr is
> zero, however, desc->len is not?

OK. I checked this few more times. Actually, I see, that desc->addr is
not zero. All desc memory looks like some rubbish:

<------------------------------------------------------------------------------>
(gdb)
#3 copy_desc_to_mbuf (mbuf_pool=0x7fe9da9f4480, desc_idx=65363,
                      m=0x7fe9db269400, vq=0x7fe9fff7bac0, dev=0x7fe9fff7cbc0)
        desc = 0x2aabc00ff530
        desc_addr = 0
        mbuf_offset = 0
        prev = 0x7fe9db269400
        nr_desc = 1
        desc_offset = 12
        cur = 0x7fe9db269400
        hdr = 0x0
        desc_avail = 1012591375
        mbuf_avail = 1526
        cpy_len = 1526

(gdb) p *desc
$2 = {addr = 8507655620301055744, len = 1012591387, flags = 3845, next = 48516}

<------------------------------------------------------------------------------>

And 'desc_addr' equals zero because 'gpa_to_vva' just can't map this huge
address to host's.

Scenario was the same. SIGSEGV received right after 'port start all'.

Another thought:

	Actually, there is a race window between 'memset' in guest and reading
	of 'desc->len' and 'desc->addr' on host. So, it's possible to read non
	zero 'len' and zero 'addr' right after that. But you're right, this
	case should be very rare.

> 
>> 	(gdb) bt
>> 	    #0  rte_memcpy (n=2056, src=0xc, dst=0x7ff4d5247000)
>> 	    #1  copy_desc_to_mbuf
>> 	    #2  rte_vhost_dequeue_burst
>> 	    #3  netdev_dpdk_vhost_rxq_recv
>> 	    ...
>>
>> 	(gdb) bt full
>> 	    #0  rte_memcpy
>> 	        ...
>> 	    #1  copy_desc_to_mbuf
>> 	        desc_addr = 0
>> 	        mbuf_offset = 0
>> 	        desc_offset = 12
>> 	        ...
>> <------------------------------------------------------------------------>
>>
>> Fix that by checking addresses of descriptors before using them.
>>
>> Note: For mergeable buffers this patch checks only guest's address for
>> zero, but in non-meargeable case host's address checked. This is done
>> because checking of host's address in mergeable case requires additional
>> refactoring to keep virtqueue in consistent state in case of error.
>>
>> Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
>> ---
>>
>> Actually, current virtio implementation looks broken for me. Because
>> 'virtio_dev_start' breaks virtqueue while it still available from the vhost
>> side.
> 
> Yes, this sounds buggy. Maybe we could not reset the avail idx, in such
> case vhost dequeue/enqueue will just return as there are no more packets
> to dequeue and no more space to enqueue, respectively?

Maybe this will be a good fix for virtio because vhost will not try to receive
from wrong descriptors. But this will not help if vhost already tries to receive
something in time of guest's reconfiguration.

Best regards, Ilya Maximets.

Yuanhan Liu July 6, 2016, 12:24 p.m. UTC | #16

On Wed, Jul 06, 2016 at 02:19:12PM +0300, Ilya Maximets wrote:
> On 01.07.2016 10:35, Yuanhan Liu wrote:
> > Hi,
> > 
> > Sorry for the long delay.
> > 
> > On Fri, May 20, 2016 at 03:50:04PM +0300, Ilya Maximets wrote:
> >> In current implementation guest application can reinitialize vrings
> >> by executing start after stop. In the same time host application
> >> can still poll virtqueue while device stopped in guest and it will
> >> crash with segmentation fault while vring reinitialization because
> >> of dereferencing of bad descriptor addresses.
> > 
> > Yes, you are right that vring will be reinitialized after restart.
> > But even though, I don't see the reason it will cause a vhost crash,
> > since the reinitialization will reset all the vring memeory by 0:
> > 
> >     memset(vq->vq_ring_virt_mem, 0, vq->vq_ring_size);
> > 
> > That means those bad descriptors will be skipped, safely, at vhost
> > side by:
> > 
> > 	if (unlikely(desc->len < dev->vhost_hlen))
> > 		return -1;
> > 
> >>
> >> OVS crash for example:
> >> <------------------------------------------------------------------------>
> >> [test-pmd inside guest VM]
> >>
> >> 	testpmd> port stop all
> >> 	    Stopping ports...
> >> 	    Checking link statuses...
> >> 	    Port 0 Link Up - speed 10000 Mbps - full-duplex
> >> 	    Done
> >> 	testpmd> port config all rxq 2
> >> 	testpmd> port config all txq 2
> >> 	testpmd> port start all
> >> 	    Configuring Port 0 (socket 0)
> >> 	    Port 0: 52:54:00:CB:44:C8
> >> 	    Checking link statuses...
> >> 	    Port 0 Link Up - speed 10000 Mbps - full-duplex
> >> 	    Done
> >>
> >> [OVS on host]
> >> 	Program received signal SIGSEGV, Segmentation fault.
> >> 	rte_memcpy (n=2056, src=0xc, dst=0x7ff4d5247000) at rte_memcpy.h
> > 
> > Interesting, so it bypasses the above check since desc->len is non-zero
> > while desc->addr is zero. The size (2056) also looks weird.
> > 
> > Do you mind to check this issue a bit deeper, say why desc->addr is
> > zero, however, desc->len is not?
> 
> OK. I checked this few more times.

Thanks!

> Actually, I see, that desc->addr is
> not zero. All desc memory looks like some rubbish:
> 
> <------------------------------------------------------------------------------>
> (gdb)
> #3 copy_desc_to_mbuf (mbuf_pool=0x7fe9da9f4480, desc_idx=65363,
>                       m=0x7fe9db269400, vq=0x7fe9fff7bac0, dev=0x7fe9fff7cbc0)
>         desc = 0x2aabc00ff530
>         desc_addr = 0
>         mbuf_offset = 0
>         prev = 0x7fe9db269400
>         nr_desc = 1
>         desc_offset = 12
>         cur = 0x7fe9db269400
>         hdr = 0x0
>         desc_avail = 1012591375
>         mbuf_avail = 1526
>         cpy_len = 1526
> 
> (gdb) p *desc
> $2 = {addr = 8507655620301055744, len = 1012591387, flags = 3845, next = 48516}
> 
> <------------------------------------------------------------------------------>
> 
> And 'desc_addr' equals zero because 'gpa_to_vva' just can't map this huge
> address to host's.
> 
> Scenario was the same. SIGSEGV received right after 'port start all'.
> 
> Another thought:
> 
> 	Actually, there is a race window between 'memset' in guest and reading
> 	of 'desc->len' and 'desc->addr' on host. So, it's possible to read non
> 	zero 'len' and zero 'addr' right after that.

That's also what I was thinking, that it should the only reason caused
such issue.

> But you're right, this case should be very rare.

Yes, it should be very rare. What troubles me is that seems you can
reproduce this issue very easily, that I doubt it's caused by this
rare race. The reason could be something else?

	--yliu

Ilya Maximets July 8, 2016, 11:48 a.m. UTC | #17

On 06.07.2016 15:24, Yuanhan Liu wrote:
> On Wed, Jul 06, 2016 at 02:19:12PM +0300, Ilya Maximets wrote:
>> On 01.07.2016 10:35, Yuanhan Liu wrote:
>>> Hi,
>>>
>>> Sorry for the long delay.
>>>
>>> On Fri, May 20, 2016 at 03:50:04PM +0300, Ilya Maximets wrote:
>>>> In current implementation guest application can reinitialize vrings
>>>> by executing start after stop. In the same time host application
>>>> can still poll virtqueue while device stopped in guest and it will
>>>> crash with segmentation fault while vring reinitialization because
>>>> of dereferencing of bad descriptor addresses.
>>>
>>> Yes, you are right that vring will be reinitialized after restart.
>>> But even though, I don't see the reason it will cause a vhost crash,
>>> since the reinitialization will reset all the vring memeory by 0:
>>>
>>>     memset(vq->vq_ring_virt_mem, 0, vq->vq_ring_size);
>>>
>>> That means those bad descriptors will be skipped, safely, at vhost
>>> side by:
>>>
>>> 	if (unlikely(desc->len < dev->vhost_hlen))
>>> 		return -1;
>>>
>>>>
>>>> OVS crash for example:
>>>> <------------------------------------------------------------------------>
>>>> [test-pmd inside guest VM]
>>>>
>>>> 	testpmd> port stop all
>>>> 	    Stopping ports...
>>>> 	    Checking link statuses...
>>>> 	    Port 0 Link Up - speed 10000 Mbps - full-duplex
>>>> 	    Done
>>>> 	testpmd> port config all rxq 2
>>>> 	testpmd> port config all txq 2
>>>> 	testpmd> port start all
>>>> 	    Configuring Port 0 (socket 0)
>>>> 	    Port 0: 52:54:00:CB:44:C8
>>>> 	    Checking link statuses...
>>>> 	    Port 0 Link Up - speed 10000 Mbps - full-duplex
>>>> 	    Done
>>>>
>>>> [OVS on host]
>>>> 	Program received signal SIGSEGV, Segmentation fault.
>>>> 	rte_memcpy (n=2056, src=0xc, dst=0x7ff4d5247000) at rte_memcpy.h
>>>
>>> Interesting, so it bypasses the above check since desc->len is non-zero
>>> while desc->addr is zero. The size (2056) also looks weird.
>>>
>>> Do you mind to check this issue a bit deeper, say why desc->addr is
>>> zero, however, desc->len is not?
>>
>> OK. I checked this few more times.
> 
> Thanks!
> 
>> Actually, I see, that desc->addr is
>> not zero. All desc memory looks like some rubbish:
>>
>> <------------------------------------------------------------------------------>
>> (gdb)
>> #3 copy_desc_to_mbuf (mbuf_pool=0x7fe9da9f4480, desc_idx=65363,
>>                       m=0x7fe9db269400, vq=0x7fe9fff7bac0, dev=0x7fe9fff7cbc0)
>>         desc = 0x2aabc00ff530
>>         desc_addr = 0
>>         mbuf_offset = 0
>>         prev = 0x7fe9db269400
>>         nr_desc = 1
>>         desc_offset = 12
>>         cur = 0x7fe9db269400
>>         hdr = 0x0
>>         desc_avail = 1012591375
>>         mbuf_avail = 1526
>>         cpy_len = 1526
>>
>> (gdb) p *desc
>> $2 = {addr = 8507655620301055744, len = 1012591387, flags = 3845, next = 48516}
>>
>> <------------------------------------------------------------------------------>
>>
>> And 'desc_addr' equals zero because 'gpa_to_vva' just can't map this huge
>> address to host's.
>>
>> Scenario was the same. SIGSEGV received right after 'port start all'.
>>
>> Another thought:
>>
>> 	Actually, there is a race window between 'memset' in guest and reading
>> 	of 'desc->len' and 'desc->addr' on host. So, it's possible to read non
>> 	zero 'len' and zero 'addr' right after that.
> 
> That's also what I was thinking, that it should the only reason caused
> such issue.
> 
>> But you're right, this case should be very rare.
> 
> Yes, it should be very rare. What troubles me is that seems you can
> reproduce this issue very easily, that I doubt it's caused by this
> rare race. The reason could be something else?

I don't know what exactly happens, but it constantly happens on 'port start all'
command execution. Descriptors becomes broken for some time and after that time
descriptors becomes normal. I think so, because with my patch applied network
is working. It means, that broken descriptors appears for some little time
while virtio restarting with new configuration.

Another point is that crash constantly happens on queue_id=3 (second RX queue) in
my scenario. It is newly allocated virtqueue while reconfiguration from rxq=1 to
rxq=2.

Obviously, virtio needs to be fixed, but we need to check address anyway on
vhost side, because we don't know what happens in guest in common case.

Best regards, Ilya Maximets.

Yuanhan Liu July 10, 2016, 1:17 p.m. UTC | #18

On Fri, Jul 08, 2016 at 02:48:56PM +0300, Ilya Maximets wrote:
> 
> Another point is that crash constantly happens on queue_id=3 (second RX queue) in
> my scenario. It is newly allocated virtqueue while reconfiguration from rxq=1 to
> rxq=2.

That's a valuable message: what's your DPDK HEAD commit while triggering
this issue?

	--yliu
> 
> Obviously, virtio needs to be fixed, but we need to check address anyway on
> vhost side, because we don't know what happens in guest in common case.
> 
> Best regards, Ilya Maximets.

Yuanhan Liu July 11, 2016, 8:38 a.m. UTC | #19

On Sun, Jul 10, 2016 at 09:17:31PM +0800, Yuanhan Liu wrote:
> On Fri, Jul 08, 2016 at 02:48:56PM +0300, Ilya Maximets wrote:
> > 
> > Another point is that crash constantly happens on queue_id=3 (second RX queue) in
> > my scenario. It is newly allocated virtqueue while reconfiguration from rxq=1 to
> > rxq=2.
> 
> That's a valuable message: what's your DPDK HEAD commit while triggering
> this issue?

I guess I have understood what goes wrong in you case.

I would guess that your vhost has 2 queues (here I mean queue-pairs,
including one Tx and Rx queue; below usage is the same) configured,
so does to your QEMU. However, you just enabled 1 queue while starting
testpmd inside the guest, and you want to enable 2 queues by running
following testpmd commands:

    stop
    port stop all
    port config all rxq 2
    port config all txq 2
    port start all

Badly, that won't work for current virtio PMD implementation, and what's
worse, it triggers a vhost crash, the one you saw.

Here is how it comes. Since you just enabled 1 queue while starting
testpmd, it will setup 1 queue only, meaning only one queue's **valid**
information will be sent to vhost. You might see SET_VRING_ADDR
(and related vhost messages) for the other queue as well, but they
are just the dummy messages: they don't include any valid/real
information about the 2nd queue: the driver don't setup it after all.

So far, so good. It became broken when you run above commands. Those
commands do setup for the 2nd queue, however, they failed to trigger
the QEMU virtio device to start the vhost-user negotiation, meaning
no SET_VRING_ADDR will be sent for the 2nd queue, leaving vhost
untold and not updated.

What's worse, above commands trigger the QEMU to send SET_VRING_ENABLE
messages, to enable all the vrings. And since the vrings for the 2nd
queue are not properly configured, the crash happens.

So maybe we should do virtio reset on port start?

	--yliu

Ilya Maximets July 11, 2016, 9:50 a.m. UTC | #20

On 11.07.2016 11:38, Yuanhan Liu wrote:
> On Sun, Jul 10, 2016 at 09:17:31PM +0800, Yuanhan Liu wrote:
>> On Fri, Jul 08, 2016 at 02:48:56PM +0300, Ilya Maximets wrote:
>>>
>>> Another point is that crash constantly happens on queue_id=3 (second RX queue) in
>>> my scenario. It is newly allocated virtqueue while reconfiguration from rxq=1 to
>>> rxq=2.
>>
>> That's a valuable message: what's your DPDK HEAD commit while triggering
>> this issue?

fbfd99551ca3 ("mbuf: add raw allocation function")

> 
> I guess I have understood what goes wrong in you case.
> 
> I would guess that your vhost has 2 queues (here I mean queue-pairs,
> including one Tx and Rx queue; below usage is the same) configured,
> so does to your QEMU. However, you just enabled 1 queue while starting
> testpmd inside the guest, and you want to enable 2 queues by running
> following testpmd commands:
> 
>     stop
>     port stop all
>     port config all rxq 2
>     port config all txq 2
>     port start all
> 
> Badly, that won't work for current virtio PMD implementation, and what's
> worse, it triggers a vhost crash, the one you saw.
> 
> Here is how it comes. Since you just enabled 1 queue while starting
> testpmd, it will setup 1 queue only, meaning only one queue's **valid**
> information will be sent to vhost. You might see SET_VRING_ADDR
> (and related vhost messages) for the other queue as well, but they
> are just the dummy messages: they don't include any valid/real
> information about the 2nd queue: the driver don't setup it after all.
> 
> So far, so good. It became broken when you run above commands. Those
> commands do setup for the 2nd queue, however, they failed to trigger
> the QEMU virtio device to start the vhost-user negotiation, meaning
> no SET_VRING_ADDR will be sent for the 2nd queue, leaving vhost
> untold and not updated.
> 
> What's worse, above commands trigger the QEMU to send SET_VRING_ENABLE
> messages, to enable all the vrings. And since the vrings for the 2nd
> queue are not properly configured, the crash happens.

Hmm, why 2nd queue works properly with my fix to vhost in this case?
 
> So maybe we should do virtio reset on port start?

I guess it was removed by this patch:
a85786dc816f ("virtio: fix states handling during initialization").

Yuanhan Liu July 11, 2016, 11:05 a.m. UTC | #21

On Mon, Jul 11, 2016 at 12:50:24PM +0300, Ilya Maximets wrote:
> On 11.07.2016 11:38, Yuanhan Liu wrote:
> > On Sun, Jul 10, 2016 at 09:17:31PM +0800, Yuanhan Liu wrote:
> >> On Fri, Jul 08, 2016 at 02:48:56PM +0300, Ilya Maximets wrote:
> >>>
> >>> Another point is that crash constantly happens on queue_id=3 (second RX queue) in
> >>> my scenario. It is newly allocated virtqueue while reconfiguration from rxq=1 to
> >>> rxq=2.
> >>
> >> That's a valuable message: what's your DPDK HEAD commit while triggering
> >> this issue?
> 
> fbfd99551ca3 ("mbuf: add raw allocation function")
> 
> > 
> > I guess I have understood what goes wrong in you case.
> > 
> > I would guess that your vhost has 2 queues (here I mean queue-pairs,
> > including one Tx and Rx queue; below usage is the same) configured,
> > so does to your QEMU. However, you just enabled 1 queue while starting
> > testpmd inside the guest, and you want to enable 2 queues by running
> > following testpmd commands:
> > 
> >     stop
> >     port stop all
> >     port config all rxq 2
> >     port config all txq 2
> >     port start all
> > 
> > Badly, that won't work for current virtio PMD implementation, and what's
> > worse, it triggers a vhost crash, the one you saw.
> > 
> > Here is how it comes. Since you just enabled 1 queue while starting
> > testpmd, it will setup 1 queue only, meaning only one queue's **valid**
> > information will be sent to vhost. You might see SET_VRING_ADDR
> > (and related vhost messages) for the other queue as well, but they
> > are just the dummy messages: they don't include any valid/real
> > information about the 2nd queue: the driver don't setup it after all.
> > 
> > So far, so good. It became broken when you run above commands. Those
> > commands do setup for the 2nd queue, however, they failed to trigger
> > the QEMU virtio device to start the vhost-user negotiation, meaning
> > no SET_VRING_ADDR will be sent for the 2nd queue, leaving vhost
> > untold and not updated.
> > 
> > What's worse, above commands trigger the QEMU to send SET_VRING_ENABLE
> > messages, to enable all the vrings. And since the vrings for the 2nd
> > queue are not properly configured, the crash happens.
> 
> Hmm, why 2nd queue works properly with my fix to vhost in this case?

Hmm, really? You are sure that data flows in your 2nd queue after those
commands? From what I know is that your patch just avoid a crash, but
does not fix it.

> > So maybe we should do virtio reset on port start?
> 
> I guess it was removed by this patch:
> a85786dc816f ("virtio: fix states handling during initialization").

Seems yes.

	--yliu

Ilya Maximets July 11, 2016, 11:47 a.m. UTC | #22

On 11.07.2016 14:05, Yuanhan Liu wrote:
> On Mon, Jul 11, 2016 at 12:50:24PM +0300, Ilya Maximets wrote:
>> On 11.07.2016 11:38, Yuanhan Liu wrote:
>>> On Sun, Jul 10, 2016 at 09:17:31PM +0800, Yuanhan Liu wrote:
>>>> On Fri, Jul 08, 2016 at 02:48:56PM +0300, Ilya Maximets wrote:
>>>>>
>>>>> Another point is that crash constantly happens on queue_id=3 (second RX queue) in
>>>>> my scenario. It is newly allocated virtqueue while reconfiguration from rxq=1 to
>>>>> rxq=2.
>>>>
>>>> That's a valuable message: what's your DPDK HEAD commit while triggering
>>>> this issue?
>>
>> fbfd99551ca3 ("mbuf: add raw allocation function")
>>
>>>
>>> I guess I have understood what goes wrong in you case.
>>>
>>> I would guess that your vhost has 2 queues (here I mean queue-pairs,
>>> including one Tx and Rx queue; below usage is the same) configured,
>>> so does to your QEMU. However, you just enabled 1 queue while starting
>>> testpmd inside the guest, and you want to enable 2 queues by running
>>> following testpmd commands:
>>>
>>>     stop
>>>     port stop all
>>>     port config all rxq 2
>>>     port config all txq 2
>>>     port start all
>>>
>>> Badly, that won't work for current virtio PMD implementation, and what's
>>> worse, it triggers a vhost crash, the one you saw.
>>>
>>> Here is how it comes. Since you just enabled 1 queue while starting
>>> testpmd, it will setup 1 queue only, meaning only one queue's **valid**
>>> information will be sent to vhost. You might see SET_VRING_ADDR
>>> (and related vhost messages) for the other queue as well, but they
>>> are just the dummy messages: they don't include any valid/real
>>> information about the 2nd queue: the driver don't setup it after all.
>>>
>>> So far, so good. It became broken when you run above commands. Those
>>> commands do setup for the 2nd queue, however, they failed to trigger
>>> the QEMU virtio device to start the vhost-user negotiation, meaning
>>> no SET_VRING_ADDR will be sent for the 2nd queue, leaving vhost
>>> untold and not updated.
>>>
>>> What's worse, above commands trigger the QEMU to send SET_VRING_ENABLE
>>> messages, to enable all the vrings. And since the vrings for the 2nd
>>> queue are not properly configured, the crash happens.
>>
>> Hmm, why 2nd queue works properly with my fix to vhost in this case?
> 
> Hmm, really? You are sure that data flows in your 2nd queue after those
> commands? From what I know is that your patch just avoid a crash, but
> does not fix it.

Oh, sorry. Yes, it doesn't work. With my patch applied I have a QEMU hang.

>>> So maybe we should do virtio reset on port start?
>>
>> I guess it was removed by this patch:
>> a85786dc816f ("virtio: fix states handling during initialization").
> 
> Seems yes.
> 
> 	--yliu
> 
>

Yuanhan Liu July 12, 2016, 2:43 a.m. UTC | #23

On Mon, Jul 11, 2016 at 02:47:56PM +0300, Ilya Maximets wrote:
> On 11.07.2016 14:05, Yuanhan Liu wrote:
> > On Mon, Jul 11, 2016 at 12:50:24PM +0300, Ilya Maximets wrote:
> >> On 11.07.2016 11:38, Yuanhan Liu wrote:
> >>> On Sun, Jul 10, 2016 at 09:17:31PM +0800, Yuanhan Liu wrote:
> >>>> On Fri, Jul 08, 2016 at 02:48:56PM +0300, Ilya Maximets wrote:
> >>>>>
> >>>>> Another point is that crash constantly happens on queue_id=3 (second RX queue) in
> >>>>> my scenario. It is newly allocated virtqueue while reconfiguration from rxq=1 to
> >>>>> rxq=2.
> >>>>
> >>>> That's a valuable message: what's your DPDK HEAD commit while triggering
> >>>> this issue?
> >>
> >> fbfd99551ca3 ("mbuf: add raw allocation function")
> >>
> >>>
> >>> I guess I have understood what goes wrong in you case.
> >>>
> >>> I would guess that your vhost has 2 queues (here I mean queue-pairs,
> >>> including one Tx and Rx queue; below usage is the same) configured,
> >>> so does to your QEMU. However, you just enabled 1 queue while starting
> >>> testpmd inside the guest, and you want to enable 2 queues by running
> >>> following testpmd commands:
> >>>
> >>>     stop
> >>>     port stop all
> >>>     port config all rxq 2
> >>>     port config all txq 2
> >>>     port start all
> >>>
> >>> Badly, that won't work for current virtio PMD implementation, and what's
> >>> worse, it triggers a vhost crash, the one you saw.
> >>>
> >>> Here is how it comes. Since you just enabled 1 queue while starting
> >>> testpmd, it will setup 1 queue only, meaning only one queue's **valid**
> >>> information will be sent to vhost. You might see SET_VRING_ADDR
> >>> (and related vhost messages) for the other queue as well, but they
> >>> are just the dummy messages: they don't include any valid/real
> >>> information about the 2nd queue: the driver don't setup it after all.
> >>>
> >>> So far, so good. It became broken when you run above commands. Those
> >>> commands do setup for the 2nd queue, however, they failed to trigger
> >>> the QEMU virtio device to start the vhost-user negotiation, meaning
> >>> no SET_VRING_ADDR will be sent for the 2nd queue, leaving vhost
> >>> untold and not updated.
> >>>
> >>> What's worse, above commands trigger the QEMU to send SET_VRING_ENABLE
> >>> messages, to enable all the vrings. And since the vrings for the 2nd
> >>> queue are not properly configured, the crash happens.
> >>
> >> Hmm, why 2nd queue works properly with my fix to vhost in this case?
> > 
> > Hmm, really? You are sure that data flows in your 2nd queue after those
> > commands? From what I know is that your patch just avoid a crash, but
> > does not fix it.
> 
> Oh, sorry. Yes, it doesn't work. With my patch applied I have a QEMU hang.

The crash actually could be avoided by commit 0823c1cb0a73 ("vhost:
workaround stale vring base"), accidentally. That's why I asked you
above the HEAD commit you were using.

> >>> So maybe we should do virtio reset on port start?
> >>
> >> I guess it was removed by this patch:
> >> a85786dc816f ("virtio: fix states handling during initialization").
> > 
> > Seems yes.

Actually, we should not do that: do reset on port start. The right fix
should be allocating MAX queues virtio device supports (2 here). This
would allow us changing the queue number dynamically.

But this doesn't sound a simple fix; it involves many code changes, due
to it was not designed this way before. Therefore, we will not fix it
in this release, due to it's too late. Let's fix it in the next release
instead. For the crash issue, it will not happen with the latest HEAD.
Though it's accident fix, I think we are fine here.

	--yliu

Ilya Maximets July 12, 2016, 5:53 a.m. UTC | #24

On 12.07.2016 05:43, Yuanhan Liu wrote:
> On Mon, Jul 11, 2016 at 02:47:56PM +0300, Ilya Maximets wrote:
>> On 11.07.2016 14:05, Yuanhan Liu wrote:
>>> On Mon, Jul 11, 2016 at 12:50:24PM +0300, Ilya Maximets wrote:
>>>> On 11.07.2016 11:38, Yuanhan Liu wrote:
>>>>> On Sun, Jul 10, 2016 at 09:17:31PM +0800, Yuanhan Liu wrote:
>>>>>> On Fri, Jul 08, 2016 at 02:48:56PM +0300, Ilya Maximets wrote:
>>>>>>>
>>>>>>> Another point is that crash constantly happens on queue_id=3 (second RX queue) in
>>>>>>> my scenario. It is newly allocated virtqueue while reconfiguration from rxq=1 to
>>>>>>> rxq=2.
>>>>>>
>>>>>> That's a valuable message: what's your DPDK HEAD commit while triggering
>>>>>> this issue?
>>>>
>>>> fbfd99551ca3 ("mbuf: add raw allocation function")
>>>>
>>>>>
>>>>> I guess I have understood what goes wrong in you case.
>>>>>
>>>>> I would guess that your vhost has 2 queues (here I mean queue-pairs,
>>>>> including one Tx and Rx queue; below usage is the same) configured,
>>>>> so does to your QEMU. However, you just enabled 1 queue while starting
>>>>> testpmd inside the guest, and you want to enable 2 queues by running
>>>>> following testpmd commands:
>>>>>
>>>>>     stop
>>>>>     port stop all
>>>>>     port config all rxq 2
>>>>>     port config all txq 2
>>>>>     port start all
>>>>>
>>>>> Badly, that won't work for current virtio PMD implementation, and what's
>>>>> worse, it triggers a vhost crash, the one you saw.
>>>>>
>>>>> Here is how it comes. Since you just enabled 1 queue while starting
>>>>> testpmd, it will setup 1 queue only, meaning only one queue's **valid**
>>>>> information will be sent to vhost. You might see SET_VRING_ADDR
>>>>> (and related vhost messages) for the other queue as well, but they
>>>>> are just the dummy messages: they don't include any valid/real
>>>>> information about the 2nd queue: the driver don't setup it after all.
>>>>>
>>>>> So far, so good. It became broken when you run above commands. Those
>>>>> commands do setup for the 2nd queue, however, they failed to trigger
>>>>> the QEMU virtio device to start the vhost-user negotiation, meaning
>>>>> no SET_VRING_ADDR will be sent for the 2nd queue, leaving vhost
>>>>> untold and not updated.
>>>>>
>>>>> What's worse, above commands trigger the QEMU to send SET_VRING_ENABLE
>>>>> messages, to enable all the vrings. And since the vrings for the 2nd
>>>>> queue are not properly configured, the crash happens.
>>>>
>>>> Hmm, why 2nd queue works properly with my fix to vhost in this case?
>>>
>>> Hmm, really? You are sure that data flows in your 2nd queue after those
>>> commands? From what I know is that your patch just avoid a crash, but
>>> does not fix it.
>>
>> Oh, sorry. Yes, it doesn't work. With my patch applied I have a QEMU hang.
> 
> The crash actually could be avoided by commit 0823c1cb0a73 ("vhost:
> workaround stale vring base"), accidentally. That's why I asked you
> above the HEAD commit you were using.

Thanks for pointing this. I'll check it.
 
>>>>> So maybe we should do virtio reset on port start?
>>>>
>>>> I guess it was removed by this patch:
>>>> a85786dc816f ("virtio: fix states handling during initialization").
>>>
>>> Seems yes.
> 
> Actually, we should not do that: do reset on port start. The right fix
> should be allocating MAX queues virtio device supports (2 here). This
> would allow us changing the queue number dynamically.

Yes, I agree that this is the right way to fix this issue.
 
> But this doesn't sound a simple fix; it involves many code changes, due
> to it was not designed this way before. Therefore, we will not fix it
> in this release, due to it's too late. Let's fix it in the next release
> instead. For the crash issue, it will not happen with the latest HEAD.
> Though it's accident fix, I think we are fine here.
> 
> 	--yliu
> 
>

Ilya Maximets July 13, 2016, 7:34 a.m. UTC | #25

On 12.07.2016 08:53, Ilya Maximets wrote:
> On 12.07.2016 05:43, Yuanhan Liu wrote:
>> On Mon, Jul 11, 2016 at 02:47:56PM +0300, Ilya Maximets wrote:
>>> On 11.07.2016 14:05, Yuanhan Liu wrote:
>>>> On Mon, Jul 11, 2016 at 12:50:24PM +0300, Ilya Maximets wrote:
>>>>> On 11.07.2016 11:38, Yuanhan Liu wrote:
>>>>>> On Sun, Jul 10, 2016 at 09:17:31PM +0800, Yuanhan Liu wrote:
>>>>>>> On Fri, Jul 08, 2016 at 02:48:56PM +0300, Ilya Maximets wrote:
>>>>>>>>
>>>>>>>> Another point is that crash constantly happens on queue_id=3 (second RX queue) in
>>>>>>>> my scenario. It is newly allocated virtqueue while reconfiguration from rxq=1 to
>>>>>>>> rxq=2.
>>>>>>>
>>>>>>> That's a valuable message: what's your DPDK HEAD commit while triggering
>>>>>>> this issue?
>>>>>
>>>>> fbfd99551ca3 ("mbuf: add raw allocation function")
>>>>>
>>>>>>
>>>>>> I guess I have understood what goes wrong in you case.
>>>>>>
>>>>>> I would guess that your vhost has 2 queues (here I mean queue-pairs,
>>>>>> including one Tx and Rx queue; below usage is the same) configured,
>>>>>> so does to your QEMU. However, you just enabled 1 queue while starting
>>>>>> testpmd inside the guest, and you want to enable 2 queues by running
>>>>>> following testpmd commands:
>>>>>>
>>>>>>     stop
>>>>>>     port stop all
>>>>>>     port config all rxq 2
>>>>>>     port config all txq 2
>>>>>>     port start all
>>>>>>
>>>>>> Badly, that won't work for current virtio PMD implementation, and what's
>>>>>> worse, it triggers a vhost crash, the one you saw.
>>>>>>
>>>>>> Here is how it comes. Since you just enabled 1 queue while starting
>>>>>> testpmd, it will setup 1 queue only, meaning only one queue's **valid**
>>>>>> information will be sent to vhost. You might see SET_VRING_ADDR
>>>>>> (and related vhost messages) for the other queue as well, but they
>>>>>> are just the dummy messages: they don't include any valid/real
>>>>>> information about the 2nd queue: the driver don't setup it after all.
>>>>>>
>>>>>> So far, so good. It became broken when you run above commands. Those
>>>>>> commands do setup for the 2nd queue, however, they failed to trigger
>>>>>> the QEMU virtio device to start the vhost-user negotiation, meaning
>>>>>> no SET_VRING_ADDR will be sent for the 2nd queue, leaving vhost
>>>>>> untold and not updated.
>>>>>>
>>>>>> What's worse, above commands trigger the QEMU to send SET_VRING_ENABLE
>>>>>> messages, to enable all the vrings. And since the vrings for the 2nd
>>>>>> queue are not properly configured, the crash happens.
>>>>>
>>>>> Hmm, why 2nd queue works properly with my fix to vhost in this case?
>>>>
>>>> Hmm, really? You are sure that data flows in your 2nd queue after those
>>>> commands? From what I know is that your patch just avoid a crash, but
>>>> does not fix it.
>>>
>>> Oh, sorry. Yes, it doesn't work. With my patch applied I have a QEMU hang.
>>
>> The crash actually could be avoided by commit 0823c1cb0a73 ("vhost:
>> workaround stale vring base"), accidentally. That's why I asked you
>> above the HEAD commit you were using.
> 
> Thanks for pointing this. I'll check it.

I've checked my DPDK HEAD with above commit backported. Yes, it helps to
avoid vhost crash in my scenario. As expected, after reconfiguration new
virtqueue doesn't work, QEMU hangs sometimes.

>>>>>> So maybe we should do virtio reset on port start?
>>>>>
>>>>> I guess it was removed by this patch:
>>>>> a85786dc816f ("virtio: fix states handling during initialization").
>>>>
>>>> Seems yes.
>>
>> Actually, we should not do that: do reset on port start. The right fix
>> should be allocating MAX queues virtio device supports (2 here). This
>> would allow us changing the queue number dynamically.
> 
> Yes, I agree that this is the right way to fix this issue.
>  
>> But this doesn't sound a simple fix; it involves many code changes, due
>> to it was not designed this way before. Therefore, we will not fix it
>> in this release, due to it's too late. Let's fix it in the next release
>> instead. For the crash issue, it will not happen with the latest HEAD.
>> Though it's accident fix, I think we are fine here.

This scenario fixed somehow, I agree. But this patch still needed to protect
vhost from untrusted VM, from malicious or buggy virtio application.
Maybe we could change the commit-message and resend this patch as a
security enhancement? What do you think?

Yuanhan Liu July 13, 2016, 8:47 a.m. UTC | #26

On Wed, Jul 13, 2016 at 10:34:07AM +0300, Ilya Maximets wrote:
> On 12.07.2016 08:53, Ilya Maximets wrote:
> > On 12.07.2016 05:43, Yuanhan Liu wrote:
> >> On Mon, Jul 11, 2016 at 02:47:56PM +0300, Ilya Maximets wrote:
> >>> On 11.07.2016 14:05, Yuanhan Liu wrote:
> >>>> On Mon, Jul 11, 2016 at 12:50:24PM +0300, Ilya Maximets wrote:
> >>>>> On 11.07.2016 11:38, Yuanhan Liu wrote:
> >>>>>> On Sun, Jul 10, 2016 at 09:17:31PM +0800, Yuanhan Liu wrote:
> >>>>>>> On Fri, Jul 08, 2016 at 02:48:56PM +0300, Ilya Maximets wrote:
> >>>>>>>>
> >>>>>>>> Another point is that crash constantly happens on queue_id=3 (second RX queue) in
> >>>>>>>> my scenario. It is newly allocated virtqueue while reconfiguration from rxq=1 to
> >>>>>>>> rxq=2.
> >>>>>>>
> >>>>>>> That's a valuable message: what's your DPDK HEAD commit while triggering
> >>>>>>> this issue?
> >>>>>
> >>>>> fbfd99551ca3 ("mbuf: add raw allocation function")
> >>>>>
> >>>>>>
> >>>>>> I guess I have understood what goes wrong in you case.
> >>>>>>
> >>>>>> I would guess that your vhost has 2 queues (here I mean queue-pairs,
> >>>>>> including one Tx and Rx queue; below usage is the same) configured,
> >>>>>> so does to your QEMU. However, you just enabled 1 queue while starting
> >>>>>> testpmd inside the guest, and you want to enable 2 queues by running
> >>>>>> following testpmd commands:
> >>>>>>
> >>>>>>     stop
> >>>>>>     port stop all
> >>>>>>     port config all rxq 2
> >>>>>>     port config all txq 2
> >>>>>>     port start all
> >>>>>>
> >>>>>> Badly, that won't work for current virtio PMD implementation, and what's
> >>>>>> worse, it triggers a vhost crash, the one you saw.
> >>>>>>
> >>>>>> Here is how it comes. Since you just enabled 1 queue while starting
> >>>>>> testpmd, it will setup 1 queue only, meaning only one queue's **valid**
> >>>>>> information will be sent to vhost. You might see SET_VRING_ADDR
> >>>>>> (and related vhost messages) for the other queue as well, but they
> >>>>>> are just the dummy messages: they don't include any valid/real
> >>>>>> information about the 2nd queue: the driver don't setup it after all.
> >>>>>>
> >>>>>> So far, so good. It became broken when you run above commands. Those
> >>>>>> commands do setup for the 2nd queue, however, they failed to trigger
> >>>>>> the QEMU virtio device to start the vhost-user negotiation, meaning
> >>>>>> no SET_VRING_ADDR will be sent for the 2nd queue, leaving vhost
> >>>>>> untold and not updated.
> >>>>>>
> >>>>>> What's worse, above commands trigger the QEMU to send SET_VRING_ENABLE
> >>>>>> messages, to enable all the vrings. And since the vrings for the 2nd
> >>>>>> queue are not properly configured, the crash happens.
> >>>>>
> >>>>> Hmm, why 2nd queue works properly with my fix to vhost in this case?
> >>>>
> >>>> Hmm, really? You are sure that data flows in your 2nd queue after those
> >>>> commands? From what I know is that your patch just avoid a crash, but
> >>>> does not fix it.
> >>>
> >>> Oh, sorry. Yes, it doesn't work. With my patch applied I have a QEMU hang.
> >>
> >> The crash actually could be avoided by commit 0823c1cb0a73 ("vhost:
> >> workaround stale vring base"), accidentally. That's why I asked you
> >> above the HEAD commit you were using.
> > 
> > Thanks for pointing this. I'll check it.
> 
> I've checked my DPDK HEAD with above commit backported. Yes, it helps to
> avoid vhost crash in my scenario. As expected, after reconfiguration new
> virtqueue doesn't work, QEMU hangs sometimes.
> >>>>>> So maybe we should do virtio reset on port start?
> >>>>>
> >>>>> I guess it was removed by this patch:
> >>>>> a85786dc816f ("virtio: fix states handling during initialization").
> >>>>
> >>>> Seems yes.
> >>
> >> Actually, we should not do that: do reset on port start. The right fix
> >> should be allocating MAX queues virtio device supports (2 here). This
> >> would allow us changing the queue number dynamically.
> > 
> > Yes, I agree that this is the right way to fix this issue.
> >  
> >> But this doesn't sound a simple fix; it involves many code changes, due
> >> to it was not designed this way before. Therefore, we will not fix it
> >> in this release, due to it's too late. Let's fix it in the next release
> >> instead. For the crash issue, it will not happen with the latest HEAD.
> >> Though it's accident fix, I think we are fine here.
> 
> This scenario fixed somehow, I agree. But this patch still needed to protect
> vhost from untrusted VM, from malicious or buggy virtio application.
> Maybe we could change the commit-message and resend this patch as a
> security enhancement? What do you think?

Indeed, but I'm a bit concerned about the performance regression found
by Rich, yet I am not quite sure why it happens, though Rich claimed
that it seems to be a problem related to compiler.

	--yliu

Rich Lane July 13, 2016, 3:54 p.m. UTC | #27

On Wednesday, July 13, 2016, Yuanhan Liu <yuanhan.liu@linux.intel.com>
wrote:

> On Wed, Jul 13, 2016 at 10:34:07AM +0300, Ilya Maximets wrote:
> > This scenario fixed somehow, I agree. But this patch still needed to
> protect
> > vhost from untrusted VM, from malicious or buggy virtio application.
> > Maybe we could change the commit-message and resend this patch as a
> > security enhancement? What do you think?
>
> Indeed, but I'm a bit concerned about the performance regression found
> by Rich, yet I am not quite sure why it happens, though Rich claimed
> that it seems to be a problem related to compiler.

The workaround I suggested solves the performance regression. But even if
it hadn't, this is a security fix that should be merged regardless of the
performance impact.

Yuanhan Liu July 14, 2016, 1:42 a.m. UTC | #28

On Wed, Jul 13, 2016 at 08:54:08AM -0700, Rich Lane wrote:
> On Wednesday, July 13, 2016, Yuanhan Liu <yuanhan.liu@linux.intel.com> wrote:
> 
>     On Wed, Jul 13, 2016 at 10:34:07AM +0300, Ilya Maximets wrote:
>     > This scenario fixed somehow, I agree. But this patch still needed to
>     protect
>     > vhost from untrusted VM, from malicious or buggy virtio application.
>     > Maybe we could change the commit-message and resend this patch as a
>     > security enhancement? What do you think?
> 
>     Indeed, but I'm a bit concerned about the performance regression found
>     by Rich, yet I am not quite sure why it happens, though Rich claimed
>     that it seems to be a problem related to compiler.
> 
> 
> The workaround I suggested solves the performance regression. But even if it
> hadn't, this is a security fix that should be merged regardless of the
> performance impact.

Good point. Ilya, would you reword the commit log and resend based on
latest code?

	--yliu

Ilya Maximets July 14, 2016, 4:38 a.m. UTC | #29

On 14.07.2016 04:42, Yuanhan Liu wrote:
> On Wed, Jul 13, 2016 at 08:54:08AM -0700, Rich Lane wrote:
>> On Wednesday, July 13, 2016, Yuanhan Liu <yuanhan.liu@linux.intel.com> wrote:
>>
>>     On Wed, Jul 13, 2016 at 10:34:07AM +0300, Ilya Maximets wrote:
>>     > This scenario fixed somehow, I agree. But this patch still needed to
>>     protect
>>     > vhost from untrusted VM, from malicious or buggy virtio application.
>>     > Maybe we could change the commit-message and resend this patch as a
>>     > security enhancement? What do you think?
>>
>>     Indeed, but I'm a bit concerned about the performance regression found
>>     by Rich, yet I am not quite sure why it happens, though Rich claimed
>>     that it seems to be a problem related to compiler.
>>
>>
>> The workaround I suggested solves the performance regression. But even if it
>> hadn't, this is a security fix that should be merged regardless of the
>> performance impact.
> 
> Good point. Ilya, would you reword the commit log and resend based on
> latest code?

OK.

Ilya Maximets July 15, 2016, 11:15 a.m. UTC | #30

Version 3:
	* Patch splitted in two.
	* Applied workaround from Rich Lane and added comment about
	  performance issue with some compilers and 'unlikely' macro.

Version 2:
	* Rebased on top of current master.
	* host's address now checked in meargeable case,
	  because needed refactoring already done.
	* Commit-message changed because old issue with
	  virtio reload accidentially fixed by commit
	  0823c1cb0a73.

Ilya Maximets (2):
  vhost: fix using of bad return value on mergeable enqueue
  vhost: do sanity check for ring descriptor address

 lib/librte_vhost/vhost_rxtx.c | 33 ++++++++++++++++++++++++++-------
 1 file changed, 26 insertions(+), 7 deletions(-)

Yuanhan Liu July 15, 2016, 12:14 p.m. UTC | #31

On Fri, Jul 15, 2016 at 02:15:03PM +0300, Ilya Maximets wrote:
> Version 3:
> 	* Patch splitted in two.
> 	* Applied workaround from Rich Lane and added comment about
> 	  performance issue with some compilers and 'unlikely' macro.

Thanks a lot for the patches.

Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>

	--yliu
> 
> Version 2:
> 	* Rebased on top of current master.
> 	* host's address now checked in meargeable case,
> 	  because needed refactoring already done.
> 	* Commit-message changed because old issue with
> 	  virtio reload accidentially fixed by commit
> 	  0823c1cb0a73.
> 
> Ilya Maximets (2):
>   vhost: fix using of bad return value on mergeable enqueue
>   vhost: do sanity check for ring descriptor address
> 
>  lib/librte_vhost/vhost_rxtx.c | 33 ++++++++++++++++++++++++++-------
>  1 file changed, 26 insertions(+), 7 deletions(-)
> 
> -- 
> 2.7.4

Thomas Monjalon July 15, 2016, 7:37 p.m. UTC | #32

2016-07-15 20:14, Yuanhan Liu:
> On Fri, Jul 15, 2016 at 02:15:03PM +0300, Ilya Maximets wrote:
> > Version 3:
> > 	* Patch splitted in two.
> > 	* Applied workaround from Rich Lane and added comment about
> > 	  performance issue with some compilers and 'unlikely' macro.
> 
> Thanks a lot for the patches.
> 
> Acked-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>

Applied, thanks

diff mbox

Patch

diff --git a/lib/librte_vhost/vhost_rxtx.c b/lib/librte_vhost/vhost_rxtx.c
index 750821a..9d05739 100644
--- a/lib/librte_vhost/vhost_rxtx.c
+++ b/lib/librte_vhost/vhost_rxtx.c
@@ -147,10 +147,10 @@  copy_mbuf_to_desc(struct virtio_net *dev, struct vhost_virtqueue *vq,
 	struct virtio_net_hdr_mrg_rxbuf virtio_hdr = {{0, 0, 0, 0, 0, 0}, 0};
 
 	desc = &vq->desc[desc_idx];
-	if (unlikely(desc->len < vq->vhost_hlen))
+	desc_addr = gpa_to_vva(dev, desc->addr);
+	if (unlikely(desc->len < vq->vhost_hlen || !desc_addr))
 		return -1;
 
-	desc_addr = gpa_to_vva(dev, desc->addr);
 	rte_prefetch0((void *)(uintptr_t)desc_addr);
 
 	virtio_enqueue_offload(m, &virtio_hdr.hdr);
@@ -184,6 +184,9 @@  copy_mbuf_to_desc(struct virtio_net *dev, struct vhost_virtqueue *vq,
 
 			desc = &vq->desc[desc->next];
 			desc_addr   = gpa_to_vva(dev, desc->addr);
+			if (unlikely(!desc_addr))
+				return -1;
+
 			desc_offset = 0;
 			desc_avail  = desc->len;
 		}
@@ -344,7 +347,8 @@  fill_vec_buf(struct vhost_virtqueue *vq, uint32_t avail_idx,
 	uint32_t len    = *allocated;
 
 	while (1) {
-		if (unlikely(vec_id >= BUF_VECTOR_MAX || idx >= vq->size))
+		if (unlikely(vec_id >= BUF_VECTOR_MAX || idx >= vq->size
+			|| !vq->desc[idx].addr))
 			return -1;
 
 		len += vq->desc[idx].len;
@@ -747,10 +751,10 @@  copy_desc_to_mbuf(struct virtio_net *dev, struct vhost_virtqueue *vq,
 	uint32_t nr_desc = 1;
 
 	desc = &vq->desc[desc_idx];
-	if (unlikely(desc->len < vq->vhost_hlen))
+	desc_addr = gpa_to_vva(dev, desc->addr);
+	if (unlikely(desc->len < vq->vhost_hlen || !desc_addr))
 		return -1;
 
-	desc_addr = gpa_to_vva(dev, desc->addr);
 	rte_prefetch0((void *)(uintptr_t)desc_addr);
 
 	/* Retrieve virtio net header */
@@ -769,6 +773,9 @@  copy_desc_to_mbuf(struct virtio_net *dev, struct vhost_virtqueue *vq,
 			desc = &vq->desc[desc->next];
 
 			desc_addr = gpa_to_vva(dev, desc->addr);
+			if (unlikely(!desc_addr))
+				return -1;
+
 			rte_prefetch0((void *)(uintptr_t)desc_addr);
 
 			desc_offset = 0;