[dpdk-dev,v8,3/8] vhost: vring queue setup for multiple queue support

Message ID 20151026054215.GY3115@yliu-dev.sh.intel.com (mailing list archive)
State Accepted, archived
Headers

Commit Message

Yuanhan Liu Oct. 26, 2015, 5:42 a.m. UTC
  On Mon, Oct 26, 2015 at 02:24:08PM +0900, Tetsuya Mukawa wrote:
> On 2015/10/22 21:35, Yuanhan Liu wrote:
...
> > @@ -292,13 +300,13 @@ user_get_vring_base(struct vhost_device_ctx ctx,
> >  	 * sent and only sent in vhost_vring_stop.
> >  	 * TODO: cleanup the vring, it isn't usable since here.
> >  	 */
> > -	if ((dev->virtqueue[VIRTIO_RXQ]->kickfd) >= 0) {
> > -		close(dev->virtqueue[VIRTIO_RXQ]->kickfd);
> > -		dev->virtqueue[VIRTIO_RXQ]->kickfd = -1;
> > +	if ((dev->virtqueue[state->index]->kickfd + VIRTIO_RXQ) >= 0) {
> > +		close(dev->virtqueue[state->index + VIRTIO_RXQ]->kickfd);
> > +		dev->virtqueue[state->index + VIRTIO_RXQ]->kickfd = -1;
> >  	}
> 
> Hi Yuanhan,
> 
> Please let me make sure whether below is correct.
>     if ((dev->virtqueue[state->index]->kickfd + VIRTIO_RXQ) >= 0) {
> 
> > -	if ((dev->virtqueue[VIRTIO_TXQ]->kickfd) >= 0) {
> > -		close(dev->virtqueue[VIRTIO_TXQ]->kickfd);
> > -		dev->virtqueue[VIRTIO_TXQ]->kickfd = -1;
> > +	if ((dev->virtqueue[state->index]->kickfd + VIRTIO_TXQ) >= 0) {
> > +		close(dev->virtqueue[state->index + VIRTIO_TXQ]->kickfd);
> > +		dev->virtqueue[state->index + VIRTIO_TXQ]->kickfd = -1;
> 
> Also, same question here.

Oops, silly typos... Thanks for catching it out!

Here is an update patch (Thomas, please let me know if you prefer me
to send the whole patchset for you to apply):

-- >8 --
From 2b7d8155b6c9f37bffcbb220e87f7634f329acee Mon Sep 17 00:00:00 2001
From: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Date: Fri, 18 Sep 2015 16:01:10 +0800
Subject: [PATCH] vhost: vring queue setup for multiple queue support

All queue pairs, including the default (the first) queue pair,
are allocated dynamically, when a vring_call message is received
first time for a specific queue pair.

This is a refactor work for enabling vhost-user multiple queue;
it should not break anything as it does no functional changes:
we don't support mq set, so there is only one mq at max.

This patch is based on Changchun's patch.

Signed-off-by: Ouyang Changchun <changchun.ouyang@intel.com>
Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Acked-by: Flavio Leitner <fbl@sysclose.org>

---

v9: - fix silly error "dev->virtqueue[state->index]->kickfd + VIRTIO_RXQ"

v8: - move virtuque field to the end of `virtio_net' struct.

    - Add a FIXME at set_vring_call() for doing vring queue pair
      allocation.
---
 lib/librte_vhost/rte_virtio_net.h             |   3 +-
 lib/librte_vhost/vhost_user/virtio-net-user.c |  46 ++++----
 lib/librte_vhost/virtio-net.c                 | 156 ++++++++++++++++----------
 3 files changed, 123 insertions(+), 82 deletions(-)
  

Comments

Tetsuya Mukawa Oct. 27, 2015, 6:20 a.m. UTC | #1
On 2015/10/26 14:42, Yuanhan Liu wrote:
> On Mon, Oct 26, 2015 at 02:24:08PM +0900, Tetsuya Mukawa wrote:
>> On 2015/10/22 21:35, Yuanhan Liu wrote:
> ...
>>> @@ -292,13 +300,13 @@ user_get_vring_base(struct vhost_device_ctx ctx,
>>>  	 * sent and only sent in vhost_vring_stop.
>>>  	 * TODO: cleanup the vring, it isn't usable since here.
>>>  	 */
>>> -	if ((dev->virtqueue[VIRTIO_RXQ]->kickfd) >= 0) {
>>> -		close(dev->virtqueue[VIRTIO_RXQ]->kickfd);
>>> -		dev->virtqueue[VIRTIO_RXQ]->kickfd = -1;
>>> +	if ((dev->virtqueue[state->index]->kickfd + VIRTIO_RXQ) >= 0) {
>>> +		close(dev->virtqueue[state->index + VIRTIO_RXQ]->kickfd);
>>> +		dev->virtqueue[state->index + VIRTIO_RXQ]->kickfd = -1;
>>>  	}
>> Hi Yuanhan,
>>
>> Please let me make sure whether below is correct.
>>     if ((dev->virtqueue[state->index]->kickfd + VIRTIO_RXQ) >= 0) {
>>
>>> -	if ((dev->virtqueue[VIRTIO_TXQ]->kickfd) >= 0) {
>>> -		close(dev->virtqueue[VIRTIO_TXQ]->kickfd);
>>> -		dev->virtqueue[VIRTIO_TXQ]->kickfd = -1;
>>> +	if ((dev->virtqueue[state->index]->kickfd + VIRTIO_TXQ) >= 0) {
>>> +		close(dev->virtqueue[state->index + VIRTIO_TXQ]->kickfd);
>>> +		dev->virtqueue[state->index + VIRTIO_TXQ]->kickfd = -1;
>> Also, same question here.
> Oops, silly typos... Thanks for catching it out!
>
> Here is an update patch (Thomas, please let me know if you prefer me
> to send the whole patchset for you to apply):

Hi Yuanhan,

I may miss one more issue here.
Could you please see below patch I've submitted today?
(I may find a similar issue, so I've fixed it also in below patch.)
 
- http://dpdk.org/dev/patchwork/patch/8038/
 
Thanks,
Tetsuya
  
Michael S. Tsirkin Oct. 27, 2015, 9:17 a.m. UTC | #2
On Tue, Oct 27, 2015 at 03:20:40PM +0900, Tetsuya Mukawa wrote:
> On 2015/10/26 14:42, Yuanhan Liu wrote:
> > On Mon, Oct 26, 2015 at 02:24:08PM +0900, Tetsuya Mukawa wrote:
> >> On 2015/10/22 21:35, Yuanhan Liu wrote:
> > ...
> >>> @@ -292,13 +300,13 @@ user_get_vring_base(struct vhost_device_ctx ctx,
> >>>  	 * sent and only sent in vhost_vring_stop.
> >>>  	 * TODO: cleanup the vring, it isn't usable since here.
> >>>  	 */
> >>> -	if ((dev->virtqueue[VIRTIO_RXQ]->kickfd) >= 0) {
> >>> -		close(dev->virtqueue[VIRTIO_RXQ]->kickfd);
> >>> -		dev->virtqueue[VIRTIO_RXQ]->kickfd = -1;
> >>> +	if ((dev->virtqueue[state->index]->kickfd + VIRTIO_RXQ) >= 0) {
> >>> +		close(dev->virtqueue[state->index + VIRTIO_RXQ]->kickfd);
> >>> +		dev->virtqueue[state->index + VIRTIO_RXQ]->kickfd = -1;
> >>>  	}
> >> Hi Yuanhan,
> >>
> >> Please let me make sure whether below is correct.
> >>     if ((dev->virtqueue[state->index]->kickfd + VIRTIO_RXQ) >= 0) {
> >>
> >>> -	if ((dev->virtqueue[VIRTIO_TXQ]->kickfd) >= 0) {
> >>> -		close(dev->virtqueue[VIRTIO_TXQ]->kickfd);
> >>> -		dev->virtqueue[VIRTIO_TXQ]->kickfd = -1;
> >>> +	if ((dev->virtqueue[state->index]->kickfd + VIRTIO_TXQ) >= 0) {
> >>> +		close(dev->virtqueue[state->index + VIRTIO_TXQ]->kickfd);
> >>> +		dev->virtqueue[state->index + VIRTIO_TXQ]->kickfd = -1;
> >> Also, same question here.
> > Oops, silly typos... Thanks for catching it out!
> >
> > Here is an update patch (Thomas, please let me know if you prefer me
> > to send the whole patchset for you to apply):
> 
> Hi Yuanhan,
> 
> I may miss one more issue here.
> Could you please see below patch I've submitted today?
> (I may find a similar issue, so I've fixed it also in below patch.)
>  
> - http://dpdk.org/dev/patchwork/patch/8038/
>  
> Thanks,
> Tetsuya

Looking at that, at least when MQ is enabled, please don't key
stopping queues off GET_VRING_BASE.

There are ENABLE/DISABLE messages for that.

Generally guys, don't take whatever QEMU happens to do for
granted! Look at the protocol spec under doc/specs directory,
if you are making more assumptions you must document them!
  
Yuanhan Liu Oct. 27, 2015, 9:30 a.m. UTC | #3
On Tue, Oct 27, 2015 at 11:17:16AM +0200, Michael S. Tsirkin wrote:
> On Tue, Oct 27, 2015 at 03:20:40PM +0900, Tetsuya Mukawa wrote:
> > On 2015/10/26 14:42, Yuanhan Liu wrote:
> > > On Mon, Oct 26, 2015 at 02:24:08PM +0900, Tetsuya Mukawa wrote:
> > >> On 2015/10/22 21:35, Yuanhan Liu wrote:
> > > ...
> > >>> @@ -292,13 +300,13 @@ user_get_vring_base(struct vhost_device_ctx ctx,
> > >>>  	 * sent and only sent in vhost_vring_stop.
> > >>>  	 * TODO: cleanup the vring, it isn't usable since here.
> > >>>  	 */
> > >>> -	if ((dev->virtqueue[VIRTIO_RXQ]->kickfd) >= 0) {
> > >>> -		close(dev->virtqueue[VIRTIO_RXQ]->kickfd);
> > >>> -		dev->virtqueue[VIRTIO_RXQ]->kickfd = -1;
> > >>> +	if ((dev->virtqueue[state->index]->kickfd + VIRTIO_RXQ) >= 0) {
> > >>> +		close(dev->virtqueue[state->index + VIRTIO_RXQ]->kickfd);
> > >>> +		dev->virtqueue[state->index + VIRTIO_RXQ]->kickfd = -1;
> > >>>  	}
> > >> Hi Yuanhan,
> > >>
> > >> Please let me make sure whether below is correct.
> > >>     if ((dev->virtqueue[state->index]->kickfd + VIRTIO_RXQ) >= 0) {
> > >>
> > >>> -	if ((dev->virtqueue[VIRTIO_TXQ]->kickfd) >= 0) {
> > >>> -		close(dev->virtqueue[VIRTIO_TXQ]->kickfd);
> > >>> -		dev->virtqueue[VIRTIO_TXQ]->kickfd = -1;
> > >>> +	if ((dev->virtqueue[state->index]->kickfd + VIRTIO_TXQ) >= 0) {
> > >>> +		close(dev->virtqueue[state->index + VIRTIO_TXQ]->kickfd);
> > >>> +		dev->virtqueue[state->index + VIRTIO_TXQ]->kickfd = -1;
> > >> Also, same question here.
> > > Oops, silly typos... Thanks for catching it out!
> > >
> > > Here is an update patch (Thomas, please let me know if you prefer me
> > > to send the whole patchset for you to apply):
> > 
> > Hi Yuanhan,
> > 
> > I may miss one more issue here.
> > Could you please see below patch I've submitted today?
> > (I may find a similar issue, so I've fixed it also in below patch.)
> >  
> > - http://dpdk.org/dev/patchwork/patch/8038/
> >  
> > Thanks,
> > Tetsuya
> 
> Looking at that, at least when MQ is enabled, please don't key
> stopping queues off GET_VRING_BASE.

Yes, that's only a workaround. I guess it has been there for quite a
while, maybe at the time qemu doesn't send RESET_OWNER message.

> There are ENABLE/DISABLE messages for that.

That's something new, though I have plan to use them instead, we still
need to make sure our code work with old qemu, without ENABLE/DISABLE
messages.

And I will think more while enabling live migration: I should have
more time to address issues like this at that time.

> Generally guys, don't take whatever QEMU happens to do for
> granted! Look at the protocol spec under doc/specs directory,
> if you are making more assumptions you must document them!

Indeed. And we will try to address them bit by bit in future.

	--yliu
  
Michael S. Tsirkin Oct. 27, 2015, 9:42 a.m. UTC | #4
On Tue, Oct 27, 2015 at 05:30:41PM +0800, Yuanhan Liu wrote:
> On Tue, Oct 27, 2015 at 11:17:16AM +0200, Michael S. Tsirkin wrote:
> > On Tue, Oct 27, 2015 at 03:20:40PM +0900, Tetsuya Mukawa wrote:
> > > On 2015/10/26 14:42, Yuanhan Liu wrote:
> > > > On Mon, Oct 26, 2015 at 02:24:08PM +0900, Tetsuya Mukawa wrote:
> > > >> On 2015/10/22 21:35, Yuanhan Liu wrote:
> > > > ...
> > > >>> @@ -292,13 +300,13 @@ user_get_vring_base(struct vhost_device_ctx ctx,
> > > >>>  	 * sent and only sent in vhost_vring_stop.
> > > >>>  	 * TODO: cleanup the vring, it isn't usable since here.
> > > >>>  	 */
> > > >>> -	if ((dev->virtqueue[VIRTIO_RXQ]->kickfd) >= 0) {
> > > >>> -		close(dev->virtqueue[VIRTIO_RXQ]->kickfd);
> > > >>> -		dev->virtqueue[VIRTIO_RXQ]->kickfd = -1;
> > > >>> +	if ((dev->virtqueue[state->index]->kickfd + VIRTIO_RXQ) >= 0) {
> > > >>> +		close(dev->virtqueue[state->index + VIRTIO_RXQ]->kickfd);
> > > >>> +		dev->virtqueue[state->index + VIRTIO_RXQ]->kickfd = -1;
> > > >>>  	}
> > > >> Hi Yuanhan,
> > > >>
> > > >> Please let me make sure whether below is correct.
> > > >>     if ((dev->virtqueue[state->index]->kickfd + VIRTIO_RXQ) >= 0) {
> > > >>
> > > >>> -	if ((dev->virtqueue[VIRTIO_TXQ]->kickfd) >= 0) {
> > > >>> -		close(dev->virtqueue[VIRTIO_TXQ]->kickfd);
> > > >>> -		dev->virtqueue[VIRTIO_TXQ]->kickfd = -1;
> > > >>> +	if ((dev->virtqueue[state->index]->kickfd + VIRTIO_TXQ) >= 0) {
> > > >>> +		close(dev->virtqueue[state->index + VIRTIO_TXQ]->kickfd);
> > > >>> +		dev->virtqueue[state->index + VIRTIO_TXQ]->kickfd = -1;
> > > >> Also, same question here.
> > > > Oops, silly typos... Thanks for catching it out!
> > > >
> > > > Here is an update patch (Thomas, please let me know if you prefer me
> > > > to send the whole patchset for you to apply):
> > > 
> > > Hi Yuanhan,
> > > 
> > > I may miss one more issue here.
> > > Could you please see below patch I've submitted today?
> > > (I may find a similar issue, so I've fixed it also in below patch.)
> > >  
> > > - http://dpdk.org/dev/patchwork/patch/8038/
> > >  
> > > Thanks,
> > > Tetsuya
> > 
> > Looking at that, at least when MQ is enabled, please don't key
> > stopping queues off GET_VRING_BASE.
> 
> Yes, that's only a workaround. I guess it has been there for quite a
> while, maybe at the time qemu doesn't send RESET_OWNER message.

RESET_OWNER was a bad idea since it basically closes
everything.

> > There are ENABLE/DISABLE messages for that.
> 
> That's something new,

That's part of multiqueue support. If you ignore them,
nothing works properly.

> though I have plan to use them instead, we still
> need to make sure our code work with old qemu, without ENABLE/DISABLE
> messages.

OK but don't rely on this for new code.

> And I will think more while enabling live migration: I should have
> more time to address issues like this at that time.
> 
> > Generally guys, don't take whatever QEMU happens to do for
> > granted! Look at the protocol spec under doc/specs directory,
> > if you are making more assumptions you must document them!
> 
> Indeed. And we will try to address them bit by bit in future.
> 
> 	--yliu

But don't pile up these workarounds meanwhile.  I'm very worried.  The
way you are carrying on, each new QEMU is likely to break your
assumptions.
  
Thomas Monjalon Oct. 27, 2015, 9:51 a.m. UTC | #5
2015-10-27 11:42, Michael S. Tsirkin:
> On Tue, Oct 27, 2015 at 05:30:41PM +0800, Yuanhan Liu wrote:
> > On Tue, Oct 27, 2015 at 11:17:16AM +0200, Michael S. Tsirkin wrote:
> > > Looking at that, at least when MQ is enabled, please don't key
> > > stopping queues off GET_VRING_BASE.
> > 
> > Yes, that's only a workaround. I guess it has been there for quite a
> > while, maybe at the time qemu doesn't send RESET_OWNER message.
> 
> RESET_OWNER was a bad idea since it basically closes
> everything.
> 
> > > There are ENABLE/DISABLE messages for that.
> > 
> > That's something new,
> 
> That's part of multiqueue support. If you ignore them,
> nothing works properly.
> 
> > though I have plan to use them instead, we still
> > need to make sure our code work with old qemu, without ENABLE/DISABLE
> > messages.
> 
> OK but don't rely on this for new code.
> 
> > And I will think more while enabling live migration: I should have
> > more time to address issues like this at that time.
> > 
> > > Generally guys, don't take whatever QEMU happens to do for
> > > granted! Look at the protocol spec under doc/specs directory,
> > > if you are making more assumptions you must document them!
> > 
> > Indeed. And we will try to address them bit by bit in future.
> > 
> > 	--yliu
> 
> But don't pile up these workarounds meanwhile.  I'm very worried.  The
> way you are carrying on, each new QEMU is likely to break your
> assumptions.

I think it may be saner to increase the minimum QEMU version supported in
each DPDK release, dropping old stuff progressively.
Michael, you are welcome to suggest how to move precisely.
Thanks
  
Yuanhan Liu Oct. 27, 2015, 9:53 a.m. UTC | #6
On Tue, Oct 27, 2015 at 11:42:24AM +0200, Michael S. Tsirkin wrote:
...
> > > Looking at that, at least when MQ is enabled, please don't key
> > > stopping queues off GET_VRING_BASE.
> > 
> > Yes, that's only a workaround. I guess it has been there for quite a
> > while, maybe at the time qemu doesn't send RESET_OWNER message.
> 
> RESET_OWNER was a bad idea since it basically closes
> everything.
> 
> > > There are ENABLE/DISABLE messages for that.
> > 
> > That's something new,
> 
> That's part of multiqueue support. If you ignore them,
> nothing works properly.

I will handle them shortly. (well, it may still need weeks :(

> > though I have plan to use them instead, we still
> > need to make sure our code work with old qemu, without ENABLE/DISABLE
> > messages.
> 
> OK but don't rely on this for new code.

Yes.

> 
> > And I will think more while enabling live migration: I should have
> > more time to address issues like this at that time.
> > 
> > > Generally guys, don't take whatever QEMU happens to do for
> > > granted! Look at the protocol spec under doc/specs directory,
> > > if you are making more assumptions you must document them!
> > 
> > Indeed. And we will try to address them bit by bit in future.
> > 
> > 	--yliu
> 
> But don't pile up these workarounds meanwhile.  I'm very worried.  The
> way you are carrying on, each new QEMU is likely to break your
> assumptions.

Good point. I'll have more discussion with Huawei, to see if we can
fix them sooner.

	--yliu
  
Michael S. Tsirkin Oct. 27, 2015, 9:55 a.m. UTC | #7
On Tue, Oct 27, 2015 at 10:51:14AM +0100, Thomas Monjalon wrote:
> 2015-10-27 11:42, Michael S. Tsirkin:
> > On Tue, Oct 27, 2015 at 05:30:41PM +0800, Yuanhan Liu wrote:
> > > On Tue, Oct 27, 2015 at 11:17:16AM +0200, Michael S. Tsirkin wrote:
> > > > Looking at that, at least when MQ is enabled, please don't key
> > > > stopping queues off GET_VRING_BASE.
> > > 
> > > Yes, that's only a workaround. I guess it has been there for quite a
> > > while, maybe at the time qemu doesn't send RESET_OWNER message.
> > 
> > RESET_OWNER was a bad idea since it basically closes
> > everything.
> > 
> > > > There are ENABLE/DISABLE messages for that.
> > > 
> > > That's something new,
> > 
> > That's part of multiqueue support. If you ignore them,
> > nothing works properly.
> > 
> > > though I have plan to use them instead, we still
> > > need to make sure our code work with old qemu, without ENABLE/DISABLE
> > > messages.
> > 
> > OK but don't rely on this for new code.
> > 
> > > And I will think more while enabling live migration: I should have
> > > more time to address issues like this at that time.
> > > 
> > > > Generally guys, don't take whatever QEMU happens to do for
> > > > granted! Look at the protocol spec under doc/specs directory,
> > > > if you are making more assumptions you must document them!
> > > 
> > > Indeed. And we will try to address them bit by bit in future.
> > > 
> > > 	--yliu
> > 
> > But don't pile up these workarounds meanwhile.  I'm very worried.  The
> > way you are carrying on, each new QEMU is likely to break your
> > assumptions.
> 
> I think it may be saner to increase the minimum QEMU version supported in
> each DPDK release, dropping old stuff progressively.
> Michael, you are welcome to suggest how to move precisely.
> Thanks

This doesn't work for downstreams which need to backport fixes and
features.

Just go by the spec, and if you find issues, fix them at the
source instead of working around them - the code is open.

For new features, we have protocol feature bits.
  
Huawei Xie Oct. 27, 2015, 10:41 a.m. UTC | #8
On 10/27/2015 5:56 PM, Michael S. Tsirkin wrote:
> On Tue, Oct 27, 2015 at 10:51:14AM +0100, Thomas Monjalon wrote:
>> 2015-10-27 11:42, Michael S. Tsirkin:
>>> On Tue, Oct 27, 2015 at 05:30:41PM +0800, Yuanhan Liu wrote:
>>>> On Tue, Oct 27, 2015 at 11:17:16AM +0200, Michael S. Tsirkin wrote:
>>>>> Looking at that, at least when MQ is enabled, please don't key
>>>>> stopping queues off GET_VRING_BASE.
>>>> Yes, that's only a workaround. I guess it has been there for quite a
>>>> while, maybe at the time qemu doesn't send RESET_OWNER message.
>>> RESET_OWNER was a bad idea since it basically closes
>>> everything.
>>>
>>>>> There are ENABLE/DISABLE messages for that.
>>>> That's something new,
>>> That's part of multiqueue support. If you ignore them,
>>> nothing works properly.
>>>
>>>> though I have plan to use them instead, we still
>>>> need to make sure our code work with old qemu, without ENABLE/DISABLE
>>>> messages.
>>> OK but don't rely on this for new code.
>>>
>>>> And I will think more while enabling live migration: I should have
>>>> more time to address issues like this at that time.
>>>>
>>>>> Generally guys, don't take whatever QEMU happens to do for
>>>>> granted! Look at the protocol spec under doc/specs directory,
>>>>> if you are making more assumptions you must document them!
>>>> Indeed. And we will try to address them bit by bit in future.
>>>>
>>>> 	--yliu
>>> But don't pile up these workarounds meanwhile.  I'm very worried.  The
>>> way you are carrying on, each new QEMU is likely to break your
>>> assumptions.
>> I think it may be saner to increase the minimum QEMU version supported in
>> each DPDK release, dropping old stuff progressively.
>> Michael, you are welcome to suggest how to move precisely.
>> Thanks
> This doesn't work for downstreams which need to backport fixes and
> features.
>
> Just go by the spec, and if you find issues, fix them at the
> source instead of working around them - the code is open.
>
> For new features, we have protocol feature bits.
To me, one requirement is we need clear message(or spec) to know when
virtio device (or better queue granularity) could be processed or should
be stopped from processing. We need have a clear state machine in mind.
For another requirement, we hope QEMU could send vhost an ID to let
vhost-user have the ability to identify the connection. Let us discuss
this in other thread.
  

Patch

diff --git a/lib/librte_vhost/rte_virtio_net.h b/lib/librte_vhost/rte_virtio_net.h
index e3a21e5..9a32a95 100644
--- a/lib/librte_vhost/rte_virtio_net.h
+++ b/lib/librte_vhost/rte_virtio_net.h
@@ -96,7 +96,6 @@  struct vhost_virtqueue {
  * Device structure contains all configuration information relating to the device.
  */
 struct virtio_net {
-	struct vhost_virtqueue	*virtqueue[VIRTIO_QNUM];	/**< Contains all virtqueue information. */
 	struct virtio_memory	*mem;		/**< QEMU memory and memory region information. */
 	uint64_t		features;	/**< Negotiated feature set. */
 	uint64_t		protocol_features;	/**< Negotiated protocol feature set. */
@@ -104,7 +103,9 @@  struct virtio_net {
 	uint32_t		flags;		/**< Device flags. Only used to check if device is running on data core. */
 #define IF_NAME_SZ (PATH_MAX > IFNAMSIZ ? PATH_MAX : IFNAMSIZ)
 	char			ifname[IF_NAME_SZ];	/**< Name of the tap device or socket path. */
+	uint32_t		virt_qp_nb;	/**< number of queue pair we have allocated */
 	void			*priv;		/**< private context */
+	struct vhost_virtqueue	*virtqueue[VIRTIO_NET_CTRL_MQ_VQ_PAIRS_MAX];	/**< Contains all virtqueue information. */
 } __rte_cache_aligned;
 
 /**
diff --git a/lib/librte_vhost/vhost_user/virtio-net-user.c b/lib/librte_vhost/vhost_user/virtio-net-user.c
index 6da729d..7fc3805 100644
--- a/lib/librte_vhost/vhost_user/virtio-net-user.c
+++ b/lib/librte_vhost/vhost_user/virtio-net-user.c
@@ -206,25 +206,33 @@  err_mmap:
 }
 
 static int
+vq_is_ready(struct vhost_virtqueue *vq)
+{
+	return vq && vq->desc   &&
+	       vq->kickfd != -1 &&
+	       vq->callfd != -1;
+}
+
+static int
 virtio_is_ready(struct virtio_net *dev)
 {
 	struct vhost_virtqueue *rvq, *tvq;
+	uint32_t i;
 
-	/* mq support in future.*/
-	rvq = dev->virtqueue[VIRTIO_RXQ];
-	tvq = dev->virtqueue[VIRTIO_TXQ];
-	if (rvq && tvq && rvq->desc && tvq->desc &&
-		(rvq->kickfd != -1) &&
-		(rvq->callfd != -1) &&
-		(tvq->kickfd != -1) &&
-		(tvq->callfd != -1)) {
-		RTE_LOG(INFO, VHOST_CONFIG,
-			"virtio is now ready for processing.\n");
-		return 1;
+	for (i = 0; i < dev->virt_qp_nb; i++) {
+		rvq = dev->virtqueue[i * VIRTIO_QNUM + VIRTIO_RXQ];
+		tvq = dev->virtqueue[i * VIRTIO_QNUM + VIRTIO_TXQ];
+
+		if (!vq_is_ready(rvq) || !vq_is_ready(tvq)) {
+			RTE_LOG(INFO, VHOST_CONFIG,
+				"virtio is not ready for processing.\n");
+			return 0;
+		}
 	}
+
 	RTE_LOG(INFO, VHOST_CONFIG,
-		"virtio isn't ready for processing.\n");
-	return 0;
+		"virtio is now ready for processing.\n");
+	return 1;
 }
 
 void
@@ -292,13 +300,13 @@  user_get_vring_base(struct vhost_device_ctx ctx,
 	 * sent and only sent in vhost_vring_stop.
 	 * TODO: cleanup the vring, it isn't usable since here.
 	 */
-	if ((dev->virtqueue[VIRTIO_RXQ]->kickfd) >= 0) {
-		close(dev->virtqueue[VIRTIO_RXQ]->kickfd);
-		dev->virtqueue[VIRTIO_RXQ]->kickfd = -1;
+	if (dev->virtqueue[state->index + VIRTIO_RXQ]->kickfd >= 0) {
+		close(dev->virtqueue[state->index + VIRTIO_RXQ]->kickfd);
+		dev->virtqueue[state->index + VIRTIO_RXQ]->kickfd = -1;
 	}
-	if ((dev->virtqueue[VIRTIO_TXQ]->kickfd) >= 0) {
-		close(dev->virtqueue[VIRTIO_TXQ]->kickfd);
-		dev->virtqueue[VIRTIO_TXQ]->kickfd = -1;
+	if (dev->virtqueue[state->index + VIRTIO_TXQ]->kickfd >= 0) {
+		close(dev->virtqueue[state->index + VIRTIO_TXQ]->kickfd);
+		dev->virtqueue[state->index + VIRTIO_TXQ]->kickfd = -1;
 	}
 
 	return 0;
diff --git a/lib/librte_vhost/virtio-net.c b/lib/librte_vhost/virtio-net.c
index 830f22a..772f835 100644
--- a/lib/librte_vhost/virtio-net.c
+++ b/lib/librte_vhost/virtio-net.c
@@ -36,6 +36,7 @@ 
 #include <stddef.h>
 #include <stdint.h>
 #include <stdlib.h>
+#include <assert.h>
 #include <sys/mman.h>
 #include <unistd.h>
 #ifdef RTE_LIBRTE_VHOST_NUMA
@@ -178,6 +179,15 @@  add_config_ll_entry(struct virtio_net_config_ll *new_ll_dev)
 
 }
 
+static void
+cleanup_vq(struct vhost_virtqueue *vq)
+{
+	if (vq->callfd >= 0)
+		close(vq->callfd);
+	if (vq->kickfd >= 0)
+		close(vq->kickfd);
+}
+
 /*
  * Unmap any memory, close any file descriptors and
  * free any memory owned by a device.
@@ -185,6 +195,8 @@  add_config_ll_entry(struct virtio_net_config_ll *new_ll_dev)
 static void
 cleanup_device(struct virtio_net *dev)
 {
+	uint32_t i;
+
 	/* Unmap QEMU memory file if mapped. */
 	if (dev->mem) {
 		munmap((void *)(uintptr_t)dev->mem->mapped_address,
@@ -192,15 +204,10 @@  cleanup_device(struct virtio_net *dev)
 		free(dev->mem);
 	}
 
-	/* Close any event notifiers opened by device. */
-	if (dev->virtqueue[VIRTIO_RXQ]->callfd >= 0)
-		close(dev->virtqueue[VIRTIO_RXQ]->callfd);
-	if (dev->virtqueue[VIRTIO_RXQ]->kickfd >= 0)
-		close(dev->virtqueue[VIRTIO_RXQ]->kickfd);
-	if (dev->virtqueue[VIRTIO_TXQ]->callfd >= 0)
-		close(dev->virtqueue[VIRTIO_TXQ]->callfd);
-	if (dev->virtqueue[VIRTIO_TXQ]->kickfd >= 0)
-		close(dev->virtqueue[VIRTIO_TXQ]->kickfd);
+	for (i = 0; i < dev->virt_qp_nb; i++) {
+		cleanup_vq(dev->virtqueue[i * VIRTIO_QNUM + VIRTIO_RXQ]);
+		cleanup_vq(dev->virtqueue[i * VIRTIO_QNUM + VIRTIO_TXQ]);
+	}
 }
 
 /*
@@ -209,9 +216,11 @@  cleanup_device(struct virtio_net *dev)
 static void
 free_device(struct virtio_net_config_ll *ll_dev)
 {
-	/* Free any malloc'd memory */
-	rte_free(ll_dev->dev.virtqueue[VIRTIO_RXQ]);
-	rte_free(ll_dev->dev.virtqueue[VIRTIO_TXQ]);
+	uint32_t i;
+
+	for (i = 0; i < ll_dev->dev.virt_qp_nb; i++)
+		rte_free(ll_dev->dev.virtqueue[i * VIRTIO_QNUM]);
+
 	rte_free(ll_dev);
 }
 
@@ -244,34 +253,68 @@  rm_config_ll_entry(struct virtio_net_config_ll *ll_dev,
 	}
 }
 
+static void
+init_vring_queue(struct vhost_virtqueue *vq)
+{
+	memset(vq, 0, sizeof(struct vhost_virtqueue));
+
+	vq->kickfd = -1;
+	vq->callfd = -1;
+
+	/* Backends are set to -1 indicating an inactive device. */
+	vq->backend = -1;
+}
+
+static void
+init_vring_queue_pair(struct virtio_net *dev, uint32_t qp_idx)
+{
+	init_vring_queue(dev->virtqueue[qp_idx * VIRTIO_QNUM + VIRTIO_RXQ]);
+	init_vring_queue(dev->virtqueue[qp_idx * VIRTIO_QNUM + VIRTIO_TXQ]);
+}
+
+static int
+alloc_vring_queue_pair(struct virtio_net *dev, uint32_t qp_idx)
+{
+	struct vhost_virtqueue *virtqueue = NULL;
+	uint32_t virt_rx_q_idx = qp_idx * VIRTIO_QNUM + VIRTIO_RXQ;
+	uint32_t virt_tx_q_idx = qp_idx * VIRTIO_QNUM + VIRTIO_TXQ;
+
+	virtqueue = rte_malloc(NULL,
+			       sizeof(struct vhost_virtqueue) * VIRTIO_QNUM, 0);
+	if (virtqueue == NULL) {
+		RTE_LOG(ERR, VHOST_CONFIG,
+			"Failed to allocate memory for virt qp:%d.\n", qp_idx);
+		return -1;
+	}
+
+	dev->virtqueue[virt_rx_q_idx] = virtqueue;
+	dev->virtqueue[virt_tx_q_idx] = virtqueue + VIRTIO_TXQ;
+
+	init_vring_queue_pair(dev, qp_idx);
+
+	dev->virt_qp_nb += 1;
+
+	return 0;
+}
+
 /*
  *  Initialise all variables in device structure.
  */
 static void
 init_device(struct virtio_net *dev)
 {
-	uint64_t vq_offset;
+	int vq_offset;
+	uint32_t i;
 
 	/*
 	 * Virtqueues have already been malloced so
 	 * we don't want to set them to NULL.
 	 */
-	vq_offset = offsetof(struct virtio_net, mem);
-
-	/* Set everything to 0. */
-	memset((void *)(uintptr_t)((uint64_t)(uintptr_t)dev + vq_offset), 0,
-		(sizeof(struct virtio_net) - (size_t)vq_offset));
-	memset(dev->virtqueue[VIRTIO_RXQ], 0, sizeof(struct vhost_virtqueue));
-	memset(dev->virtqueue[VIRTIO_TXQ], 0, sizeof(struct vhost_virtqueue));
-
-	dev->virtqueue[VIRTIO_RXQ]->kickfd = -1;
-	dev->virtqueue[VIRTIO_RXQ]->callfd = -1;
-	dev->virtqueue[VIRTIO_TXQ]->kickfd = -1;
-	dev->virtqueue[VIRTIO_TXQ]->callfd = -1;
+	vq_offset = offsetof(struct virtio_net, virtqueue);
+	memset(dev, 0, vq_offset);
 
-	/* Backends are set to -1 indicating an inactive device. */
-	dev->virtqueue[VIRTIO_RXQ]->backend = VIRTIO_DEV_STOPPED;
-	dev->virtqueue[VIRTIO_TXQ]->backend = VIRTIO_DEV_STOPPED;
+	for (i = 0; i < dev->virt_qp_nb; i++)
+		init_vring_queue_pair(dev, i);
 }
 
 /*
@@ -283,7 +326,6 @@  static int
 new_device(struct vhost_device_ctx ctx)
 {
 	struct virtio_net_config_ll *new_ll_dev;
-	struct vhost_virtqueue *virtqueue_rx, *virtqueue_tx;
 
 	/* Setup device and virtqueues. */
 	new_ll_dev = rte_malloc(NULL, sizeof(struct virtio_net_config_ll), 0);
@@ -294,28 +336,6 @@  new_device(struct vhost_device_ctx ctx)
 		return -1;
 	}
 
-	virtqueue_rx = rte_malloc(NULL, sizeof(struct vhost_virtqueue), 0);
-	if (virtqueue_rx == NULL) {
-		rte_free(new_ll_dev);
-		RTE_LOG(ERR, VHOST_CONFIG,
-			"(%"PRIu64") Failed to allocate memory for rxq.\n",
-			ctx.fh);
-		return -1;
-	}
-
-	virtqueue_tx = rte_malloc(NULL, sizeof(struct vhost_virtqueue), 0);
-	if (virtqueue_tx == NULL) {
-		rte_free(virtqueue_rx);
-		rte_free(new_ll_dev);
-		RTE_LOG(ERR, VHOST_CONFIG,
-			"(%"PRIu64") Failed to allocate memory for txq.\n",
-			ctx.fh);
-		return -1;
-	}
-
-	new_ll_dev->dev.virtqueue[VIRTIO_RXQ] = virtqueue_rx;
-	new_ll_dev->dev.virtqueue[VIRTIO_TXQ] = virtqueue_tx;
-
 	/* Initialise device and virtqueues. */
 	init_device(&new_ll_dev->dev);
 
@@ -441,6 +461,8 @@  static int
 set_features(struct vhost_device_ctx ctx, uint64_t *pu)
 {
 	struct virtio_net *dev;
+	uint16_t vhost_hlen;
+	uint16_t i;
 
 	dev = get_device(ctx);
 	if (dev == NULL)
@@ -448,27 +470,26 @@  set_features(struct vhost_device_ctx ctx, uint64_t *pu)
 	if (*pu & ~VHOST_FEATURES)
 		return -1;
 
-	/* Store the negotiated feature list for the device. */
 	dev->features = *pu;
-
-	/* Set the vhost_hlen depending on if VIRTIO_NET_F_MRG_RXBUF is set. */
 	if (dev->features & (1 << VIRTIO_NET_F_MRG_RXBUF)) {
 		LOG_DEBUG(VHOST_CONFIG,
 			"(%"PRIu64") Mergeable RX buffers enabled\n",
 			dev->device_fh);
-		dev->virtqueue[VIRTIO_RXQ]->vhost_hlen =
-			sizeof(struct virtio_net_hdr_mrg_rxbuf);
-		dev->virtqueue[VIRTIO_TXQ]->vhost_hlen =
-			sizeof(struct virtio_net_hdr_mrg_rxbuf);
+		vhost_hlen = sizeof(struct virtio_net_hdr_mrg_rxbuf);
 	} else {
 		LOG_DEBUG(VHOST_CONFIG,
 			"(%"PRIu64") Mergeable RX buffers disabled\n",
 			dev->device_fh);
-		dev->virtqueue[VIRTIO_RXQ]->vhost_hlen =
-			sizeof(struct virtio_net_hdr);
-		dev->virtqueue[VIRTIO_TXQ]->vhost_hlen =
-			sizeof(struct virtio_net_hdr);
+		vhost_hlen = sizeof(struct virtio_net_hdr);
+	}
+
+	for (i = 0; i < dev->virt_qp_nb; i++) {
+		uint16_t base_idx = i * VIRTIO_QNUM;
+
+		dev->virtqueue[base_idx + VIRTIO_RXQ]->vhost_hlen = vhost_hlen;
+		dev->virtqueue[base_idx + VIRTIO_TXQ]->vhost_hlen = vhost_hlen;
 	}
+
 	return 0;
 }
 
@@ -684,13 +705,24 @@  set_vring_call(struct vhost_device_ctx ctx, struct vhost_vring_file *file)
 {
 	struct virtio_net *dev;
 	struct vhost_virtqueue *vq;
+	uint32_t cur_qp_idx = file->index / VIRTIO_QNUM;
 
 	dev = get_device(ctx);
 	if (dev == NULL)
 		return -1;
 
+	/*
+	 * FIXME: VHOST_SET_VRING_CALL is the first per-vring message
+	 * we get, so we do vring queue pair allocation here.
+	 */
+	if (cur_qp_idx + 1 > dev->virt_qp_nb) {
+		if (alloc_vring_queue_pair(dev, cur_qp_idx) < 0)
+			return -1;
+	}
+
 	/* file->index refers to the queue index. The txq is 1, rxq is 0. */
 	vq = dev->virtqueue[file->index];
+	assert(vq != NULL);
 
 	if (vq->callfd >= 0)
 		close(vq->callfd);