Message ID | 82834CFE-767C-41B0-9327-E64B8210E076@cisco.com (mailing list archive) |
---|---|
State | Superseded, archived |
Delegated to: | Yuanhan Liu |
Headers |
Return-Path: <dev-bounces@dpdk.org> X-Original-To: patchwork@dpdk.org Delivered-To: patchwork@dpdk.org Received: from [92.243.14.124] (localhost [IPv6:::1]) by dpdk.org (Postfix) with ESMTP id A32FA11DE; Tue, 8 Nov 2016 10:31:21 +0100 (CET) Received: from alln-iport-1.cisco.com (alln-iport-1.cisco.com [173.37.142.88]) by dpdk.org (Postfix) with ESMTP id 69FF1F72 for <dev@dpdk.org>; Tue, 8 Nov 2016 10:31:18 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=cisco.com; i=@cisco.com; l=1272; q=dns/txt; s=iport; t=1478597478; x=1479807078; h=from:to:subject:date:message-id:content-id: content-transfer-encoding:mime-version; bh=hFd+FH3iv3jMTL1bUWVq3+gaTy5UkvH8BtYlIHcsKnU=; b=EudGIEfW79MZyVeEkHQLMOe1HdkB1MZukg/nuMsSPK5iqqEn1cRiKsQK InYFYgNk/bQb9br5uUEwg98hEjIXgsaoozBiQmGtBuODRrq/tYpyOhYJk lBQGAInm4aUQBieM439qvI3EH3S2tKB72e01D+QkjImySRFMAG9TAExzl 0=; X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: =?us-ascii?q?A0D8AQATmiFY/5xdJa1dHAEBBAEBCgEBg?= =?us-ascii?q?y8BAQEBAR+BVweNMqlGgg+CCIg4PxQBAgEBAQEBAQFiHQuEaCcTUQE+QicEE4h?= =?us-ascii?q?coiSRaFKLSAEBAQEBBQEBAQEBAQEBH4Y+gX0IhmeDYYIvBZouAZBGkBICSZBpA?= =?us-ascii?q?R43eoMxHIFdcoR+gTCBDAEBAQ?= X-IronPort-AV: E=Sophos;i="5.31,609,1473120000"; d="scan'208";a="345827729" Received: from rcdn-core-5.cisco.com ([173.37.93.156]) by alln-iport-1.cisco.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 08 Nov 2016 09:31:17 +0000 Received: from XCH-RCD-016.cisco.com (xch-rcd-016.cisco.com [173.37.102.26]) by rcdn-core-5.cisco.com (8.14.5/8.14.5) with ESMTP id uA89VHH8021520 (version=TLSv1/SSLv3 cipher=AES256-SHA bits=256 verify=FAIL) for <dev@dpdk.org>; Tue, 8 Nov 2016 09:31:17 GMT Received: from xch-aln-017.cisco.com (173.36.7.27) by XCH-RCD-016.cisco.com (173.37.102.26) with Microsoft SMTP Server (TLS) id 15.0.1210.3; Tue, 8 Nov 2016 03:31:16 -0600 Received: from xch-aln-017.cisco.com ([173.36.7.27]) by XCH-ALN-017.cisco.com ([173.36.7.27]) with mapi id 15.00.1210.000; Tue, 8 Nov 2016 03:31:16 -0600 From: "Pierre Pfister (ppfister)" <ppfister@cisco.com> To: "dev@dpdk.org" <dev@dpdk.org> Thread-Topic: [PATCH] virtio: tx with can_push when VERSION_1 is set Thread-Index: AQHSOaLaRLG7+j7W/EOP60/1fgXcRg== Date: Tue, 8 Nov 2016 09:31:16 +0000 Message-ID: <82834CFE-767C-41B0-9327-E64B8210E076@cisco.com> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-ms-exchange-messagesentrepresentingtype: 1 x-ms-exchange-transport-fromentityheader: Hosted x-originating-ip: [10.228.42.38] Content-Type: text/plain; charset="us-ascii" Content-ID: <6DAB4FBD96A9EA48A9ED5AC13FBA9D5A@emea.cisco.com> Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Subject: [dpdk-dev] [PATCH] virtio: tx with can_push when VERSION_1 is set X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK <dev.dpdk.org> List-Unsubscribe: <http://dpdk.org/ml/options/dev>, <mailto:dev-request@dpdk.org?subject=unsubscribe> List-Archive: <http://dpdk.org/ml/archives/dev/> List-Post: <mailto:dev@dpdk.org> List-Help: <mailto:dev-request@dpdk.org?subject=help> List-Subscribe: <http://dpdk.org/ml/listinfo/dev>, <mailto:dev-request@dpdk.org?subject=subscribe> Errors-To: dev-bounces@dpdk.org Sender: "dev" <dev-bounces@dpdk.org> |
Commit Message
Pierre Pfister
Nov. 8, 2016, 9:31 a.m. UTC
Current virtio driver advertises VERSION_1 support,
but does not handle device's VERSION_1 support when
sending packets (it looks for ANY_LAYOUT feature,
which is absent).
This patch enables 'can_push' in tx path when VERSION_1
is advertised by the device.
This significantly improves small packets forwarding rate
towards devices advertising VERSION_1 feature.
Signed-off-by: Pierre Pfister <ppfister@cisco.com>
---
drivers/net/virtio/virtio_rxtx.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
--
2.7.4 (Apple Git-66)
Comments
Hi Pierre, On 11/08/2016 10:31 AM, Pierre Pfister (ppfister) wrote: > Current virtio driver advertises VERSION_1 support, > but does not handle device's VERSION_1 support when > sending packets (it looks for ANY_LAYOUT feature, > which is absent). > > This patch enables 'can_push' in tx path when VERSION_1 > is advertised by the device. > > This significantly improves small packets forwarding rate > towards devices advertising VERSION_1 feature. I think it depends whether offloading is enabled or not. If no offloading enabled, I measured significant drop. Indeed, when no offloading is enabled, the Tx path in Virtio does not access the virtio header before your patch, as the header is memset to zero at device init time. With your patch, it gets memset to zero at every transmit in the hot path. With offloading enabled, it does makes sense though, as the header will be accessed. This patch is for v17.02 anyway, and we may provide a way to enable and disable features at Virtio PMD init time by this release. Thanks, Maxime > > Signed-off-by: Pierre Pfister <ppfister@cisco.com> > --- > drivers/net/virtio/virtio_rxtx.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/drivers/net/virtio/virtio_rxtx.c b/drivers/net/virtio/virtio_rxtx.c > index 724517e..2fe0338 100644 > --- a/drivers/net/virtio/virtio_rxtx.c > +++ b/drivers/net/virtio/virtio_rxtx.c > @@ -925,7 +925,8 @@ virtio_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts) > } > > /* optimize ring usage */ > - if (vtpci_with_feature(hw, VIRTIO_F_ANY_LAYOUT) && > + if ((vtpci_with_feature(hw, VIRTIO_F_ANY_LAYOUT) || > + vtpci_with_feature(hw, VIRTIO_F_VERSION_1)) && > rte_mbuf_refcnt_read(txm) == 1 && > RTE_MBUF_DIRECT(txm) && > txm->nb_segs == 1 && > -- > 2.7.4 (Apple Git-66) >
Hello Maxime, Sorry for the late reply. > Le 8 nov. 2016 à 10:44, Maxime Coquelin <maxime.coquelin@redhat.com> a écrit : > > Hi Pierre, > > On 11/08/2016 10:31 AM, Pierre Pfister (ppfister) wrote: >> Current virtio driver advertises VERSION_1 support, >> but does not handle device's VERSION_1 support when >> sending packets (it looks for ANY_LAYOUT feature, >> which is absent). >> >> This patch enables 'can_push' in tx path when VERSION_1 >> is advertised by the device. >> >> This significantly improves small packets forwarding rate >> towards devices advertising VERSION_1 feature. > I think it depends whether offloading is enabled or not. > If no offloading enabled, I measured significant drop. > Indeed, when no offloading is enabled, the Tx path in Virtio > does not access the virtio header before your patch, as the header is memset to zero at device init time. > With your patch, it gets memset to zero at every transmit in the hot > path. Right. On the virtio side that is true, but on the device side, we have to access the header anyway. And accessing two descriptors (with the address resolution and memory fetch which comes with it) is a costy operation compared to a single one. In the case indirect descriptors are used, this is 1 desc access instead or 3. And in the case chained descriptors are used, this doubles the number of packets that you can put in your queue. Those are the results in my PHY -> VM (testpmd) -> PHY setup Traffic is flowing bidirectionally. Numbers are for lossless-rates. When chained buffers are used for dpdk's TX: 2x2.13Mpps When indirect descriptors are used for dpdk's TX: 2x2.38Mpps When shallow buffers are used for dpdk's TX (with this patch): 2x2.42Mpps I must also note that qemu 2.5 does not seem to deal with VERSION_1 and ANY_LAYOUT correctly. The patch I am proposing here works for qemu 2.7, but with qemu 2.5, testpmd still behaves as if ANY_LAYOUT (or VERSION_1) was not available. This is not catastrophic. But just note that you will not see performance in some cases with qemu 2.5. Cheers - Pierre > > With offloading enabled, it does makes sense though, as the header will > be accessed. > > This patch is for v17.02 anyway, and we may provide a way to enable and > disable features at Virtio PMD init time by this release. > > Thanks, > Maxime > >> >> Signed-off-by: Pierre Pfister <ppfister@cisco.com> >> --- >> drivers/net/virtio/virtio_rxtx.c | 3 ++- >> 1 file changed, 2 insertions(+), 1 deletion(-) >> >> diff --git a/drivers/net/virtio/virtio_rxtx.c b/drivers/net/virtio/virtio_rxtx.c >> index 724517e..2fe0338 100644 >> --- a/drivers/net/virtio/virtio_rxtx.c >> +++ b/drivers/net/virtio/virtio_rxtx.c >> @@ -925,7 +925,8 @@ virtio_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts) >> } >> >> /* optimize ring usage */ >> - if (vtpci_with_feature(hw, VIRTIO_F_ANY_LAYOUT) && >> + if ((vtpci_with_feature(hw, VIRTIO_F_ANY_LAYOUT) || >> + vtpci_with_feature(hw, VIRTIO_F_VERSION_1)) && >> rte_mbuf_refcnt_read(txm) == 1 && >> RTE_MBUF_DIRECT(txm) && >> txm->nb_segs == 1 && >> -- >> 2.7.4 (Apple Git-66) >>
Hi Pierre, On 11/09/2016 01:42 PM, Pierre Pfister (ppfister) wrote: > Hello Maxime, > > Sorry for the late reply. > > >> Le 8 nov. 2016 à 10:44, Maxime Coquelin <maxime.coquelin@redhat.com> a écrit : >> >> Hi Pierre, >> >> On 11/08/2016 10:31 AM, Pierre Pfister (ppfister) wrote: >>> Current virtio driver advertises VERSION_1 support, >>> but does not handle device's VERSION_1 support when >>> sending packets (it looks for ANY_LAYOUT feature, >>> which is absent). >>> >>> This patch enables 'can_push' in tx path when VERSION_1 >>> is advertised by the device. >>> >>> This significantly improves small packets forwarding rate >>> towards devices advertising VERSION_1 feature. >> I think it depends whether offloading is enabled or not. >> If no offloading enabled, I measured significant drop. >> Indeed, when no offloading is enabled, the Tx path in Virtio >> does not access the virtio header before your patch, as the header is memset to zero at device init time. >> With your patch, it gets memset to zero at every transmit in the hot >> path. > > Right. On the virtio side that is true, but on the device side, we have to access the header anyway. No more now, if no offload features have been negotiated. I have done a patch that landed in v16.11 not to parse header in this case. That said, we still have to access its descriptor. > And accessing two descriptors (with the address resolution and memory fetch which comes with it) > is a costy operation compared to a single one. > In the case indirect descriptors are used, this is 1 desc access instead or 3. > And in the case chained descriptors are used, this doubles the number of packets that you can put in your queue. > > Those are the results in my PHY -> VM (testpmd) -> PHY setup > Traffic is flowing bidirectionally. Numbers are for lossless-rates. > > When chained buffers are used for dpdk's TX: 2x2.13Mpps > When indirect descriptors are used for dpdk's TX: 2x2.38Mpps > When shallow buffers are used for dpdk's TX (with this patch): 2x2.42Mpps When I tried it, I also did PVP 0% benchmark, and I got opposite results. Chained and indirect cases were significantly better. My PVP setup was using a single NIC and single Virtio PMD, and NIC2VM forwarding was IO mode done with testpmd on host, and Rx->Tx forwarding was macswap mode on guest side. That said, if I'm the only one seeing a performance regression, maybe something is wrong with my setup. Yuanhan, did you made some benchmarks when you implemented your any_layout enabling series? > > I must also note that qemu 2.5 does not seem to deal with VERSION_1 and ANY_LAYOUT correctly. > The patch I am proposing here works for qemu 2.7, but with qemu 2.5, testpmd still behaves as if ANY_LAYOUT (or VERSION_1) was not available. This is not catastrophic. But just note that you will not see performance in some cases with qemu 2.5. Thanks for the info. Regards, Maxime
Hi Pierre, On 11/09/2016 01:42 PM, Pierre Pfister (ppfister) wrote: > Hello Maxime, > > Sorry for the late reply. > > >> Le 8 nov. 2016 à 10:44, Maxime Coquelin <maxime.coquelin@redhat.com> a écrit : >> >> Hi Pierre, >> >> On 11/08/2016 10:31 AM, Pierre Pfister (ppfister) wrote: >>> Current virtio driver advertises VERSION_1 support, >>> but does not handle device's VERSION_1 support when >>> sending packets (it looks for ANY_LAYOUT feature, >>> which is absent). >>> >>> This patch enables 'can_push' in tx path when VERSION_1 >>> is advertised by the device. >>> >>> This significantly improves small packets forwarding rate >>> towards devices advertising VERSION_1 feature. >> I think it depends whether offloading is enabled or not. >> If no offloading enabled, I measured significant drop. >> Indeed, when no offloading is enabled, the Tx path in Virtio >> does not access the virtio header before your patch, as the header is memset to zero at device init time. >> With your patch, it gets memset to zero at every transmit in the hot >> path. > > Right. On the virtio side that is true, but on the device side, we have to access the header anyway. No more now, if no offload features have been negotiated. I have done a patch that landed in v16.11 to skip header parsing in this case. That said, we still have to access its descriptor. > And accessing two descriptors (with the address resolution and memory fetch which comes with it) > is a costy operation compared to a single one. > In the case indirect descriptors are used, this is 1 desc access instead or 3. I agree this is far from being optimal. > And in the case chained descriptors are used, this doubles the number of packets that you can put in your queue. > > Those are the results in my PHY -> VM (testpmd) -> PHY setup > Traffic is flowing bidirectionally. Numbers are for lossless-rates. > > When chained buffers are used for dpdk's TX: 2x2.13Mpps > When indirect descriptors are used for dpdk's TX: 2x2.38Mpps > When shallow buffers are used for dpdk's TX (with this patch): 2x2.42Mpps When I tried it, I also did PVP 0% benchmark, and I got opposite results. Chained and indirect cases were significantly better. My PVP setup was using a single NIC and single Virtio PMD, and NIC2VM forwarding was IO mode done with testpmd on host, and Rx->Tx forwarding was macswap mode on guest side. I also saw some perf regression when running simple tespmd test on both ends. Yuanhan, did you run some benchmark with your series enabling ANY_LAYOUT? > > I must also note that qemu 2.5 does not seem to deal with VERSION_1 and ANY_LAYOUT correctly. > The patch I am proposing here works for qemu 2.7, but with qemu 2.5, testpmd still behaves as if ANY_LAYOUT (or VERSION_1) was not available. This is not catastrophic. But just note that you will not see performance in some cases with qemu 2.5. Thanks for the info. Regards, Maxime
Hello Maxime, > Le 9 nov. 2016 à 15:51, Maxime Coquelin <maxime.coquelin@redhat.com> a écrit : > > Hi Pierre, > > On 11/09/2016 01:42 PM, Pierre Pfister (ppfister) wrote: >> Hello Maxime, >> >> Sorry for the late reply. >> >> >>> Le 8 nov. 2016 à 10:44, Maxime Coquelin <maxime.coquelin@redhat.com> a écrit : >>> >>> Hi Pierre, >>> >>> On 11/08/2016 10:31 AM, Pierre Pfister (ppfister) wrote: >>>> Current virtio driver advertises VERSION_1 support, >>>> but does not handle device's VERSION_1 support when >>>> sending packets (it looks for ANY_LAYOUT feature, >>>> which is absent). >>>> >>>> This patch enables 'can_push' in tx path when VERSION_1 >>>> is advertised by the device. >>>> >>>> This significantly improves small packets forwarding rate >>>> towards devices advertising VERSION_1 feature. >>> I think it depends whether offloading is enabled or not. >>> If no offloading enabled, I measured significant drop. >>> Indeed, when no offloading is enabled, the Tx path in Virtio >>> does not access the virtio header before your patch, as the header is memset to zero at device init time. >>> With your patch, it gets memset to zero at every transmit in the hot >>> path. >> >> Right. On the virtio side that is true, but on the device side, we have to access the header anyway. > No more now, if no offload features have been negotiated. > I have done a patch that landed in v16.11 to skip header parsing in > this case. > That said, we still have to access its descriptor. > >> And accessing two descriptors (with the address resolution and memory fetch which comes with it) >> is a costy operation compared to a single one. >> In the case indirect descriptors are used, this is 1 desc access instead or 3. > I agree this is far from being optimal. > >> And in the case chained descriptors are used, this doubles the number of packets that you can put in your queue. >> >> Those are the results in my PHY -> VM (testpmd) -> PHY setup >> Traffic is flowing bidirectionally. Numbers are for lossless-rates. >> >> When chained buffers are used for dpdk's TX: 2x2.13Mpps >> When indirect descriptors are used for dpdk's TX: 2x2.38Mpps >> When shallow buffers are used for dpdk's TX (with this patch): 2x2.42Mpps > When I tried it, I also did PVP 0% benchmark, and I got opposite results. Chained and indirect cases were significantly better. > > My PVP setup was using a single NIC and single Virtio PMD, and NIC2VM > forwarding was IO mode done with testpmd on host, and Rx->Tx forwarding > was macswap mode on guest side. > > I also saw some perf regression when running simple tespmd test on both > ends. > > Yuanhan, did you run some benchmark with your series enabling > ANY_LAYOUT? It was enabled. But the specs specify that VERSION_1 includes ANY_LAYOUT. Therefor, Qemu removes ANY_LAYOUT when VERSION_1 is set. We can keep arguing about which is fastest. I guess we have different setups and different results, so we probably are deadlocked here. But in any case, the current code is inconsistent, as it uses single descriptor when ANY_LAYOUT is set, but not when VERSION_1 is set. I believe it makes sense to use single-descriptor any time it is possible, but you are free to think otherwise. Please make a call and make the code consistent (removes single-descriptors all together, or use them when VERSION_1 is set too). Otherwise it just creates yet-another testing headache. Thanks, - Pierre > >> >> I must also note that qemu 2.5 does not seem to deal with VERSION_1 and ANY_LAYOUT correctly. >> The patch I am proposing here works for qemu 2.7, but with qemu 2.5, testpmd still behaves as if ANY_LAYOUT (or VERSION_1) was not available. This is not catastrophic. But just note that you will not see performance in some cases with qemu 2.5. > > Thanks for the info. > > Regards, > Maxime
Hi Pierre, On 11/22/2016 10:54 AM, Pierre Pfister (ppfister) wrote: > Hello Maxime, > >> Le 9 nov. 2016 à 15:51, Maxime Coquelin <maxime.coquelin@redhat.com> a écrit : >> >> Hi Pierre, >> >> On 11/09/2016 01:42 PM, Pierre Pfister (ppfister) wrote: >>> Hello Maxime, >>> >>> Sorry for the late reply. >>> >>> >>>> Le 8 nov. 2016 à 10:44, Maxime Coquelin <maxime.coquelin@redhat.com> a écrit : >>>> >>>> Hi Pierre, >>>> >>>> On 11/08/2016 10:31 AM, Pierre Pfister (ppfister) wrote: >>>>> Current virtio driver advertises VERSION_1 support, >>>>> but does not handle device's VERSION_1 support when >>>>> sending packets (it looks for ANY_LAYOUT feature, >>>>> which is absent). >>>>> >>>>> This patch enables 'can_push' in tx path when VERSION_1 >>>>> is advertised by the device. >>>>> >>>>> This significantly improves small packets forwarding rate >>>>> towards devices advertising VERSION_1 feature. >>>> I think it depends whether offloading is enabled or not. >>>> If no offloading enabled, I measured significant drop. >>>> Indeed, when no offloading is enabled, the Tx path in Virtio >>>> does not access the virtio header before your patch, as the header is memset to zero at device init time. >>>> With your patch, it gets memset to zero at every transmit in the hot >>>> path. >>> >>> Right. On the virtio side that is true, but on the device side, we have to access the header anyway. >> No more now, if no offload features have been negotiated. >> I have done a patch that landed in v16.11 to skip header parsing in >> this case. >> That said, we still have to access its descriptor. >> >>> And accessing two descriptors (with the address resolution and memory fetch which comes with it) >>> is a costy operation compared to a single one. >>> In the case indirect descriptors are used, this is 1 desc access instead or 3. >> I agree this is far from being optimal. >> >>> And in the case chained descriptors are used, this doubles the number of packets that you can put in your queue. >>> >>> Those are the results in my PHY -> VM (testpmd) -> PHY setup >>> Traffic is flowing bidirectionally. Numbers are for lossless-rates. >>> >>> When chained buffers are used for dpdk's TX: 2x2.13Mpps >>> When indirect descriptors are used for dpdk's TX: 2x2.38Mpps >>> When shallow buffers are used for dpdk's TX (with this patch): 2x2.42Mpps >> When I tried it, I also did PVP 0% benchmark, and I got opposite results. Chained and indirect cases were significantly better. >> >> My PVP setup was using a single NIC and single Virtio PMD, and NIC2VM >> forwarding was IO mode done with testpmd on host, and Rx->Tx forwarding >> was macswap mode on guest side. >> >> I also saw some perf regression when running simple tespmd test on both >> ends. >> >> Yuanhan, did you run some benchmark with your series enabling >> ANY_LAYOUT? > > It was enabled. But the specs specify that VERSION_1 includes ANY_LAYOUT. > Therefor, Qemu removes ANY_LAYOUT when VERSION_1 is set. > > We can keep arguing about which is fastest. I guess we have different setups and different results, so we probably are deadlocked here. > But in any case, the current code is inconsistent, as it uses single descriptor when ANY_LAYOUT is set, but not when VERSION_1 is set. > > I believe it makes sense to use single-descriptor any time it is possible, but you are free to think otherwise. > Please make a call and make the code consistent (removes single-descriptors all together, or use them when VERSION_1 is set too). Otherwise it just creates yet-another testing headache. I also think it makes sense to have a single descriptor, but I had to highlight that I noticed a significant performance degradation when using single descriptor on my setup. I'm fine we take your patch in virtio-next, so that more testing is conducted. Thanks, Maxime > > Thanks, > > - Pierre > >> >>> >>> I must also note that qemu 2.5 does not seem to deal with VERSION_1 and ANY_LAYOUT correctly. >>> The patch I am proposing here works for qemu 2.7, but with qemu 2.5, testpmd still behaves as if ANY_LAYOUT (or VERSION_1) was not available. This is not catastrophic. But just note that you will not see performance in some cases with qemu 2.5. >> >> Thanks for the info. >> >> Regards, >> Maxime >
> Le 22 nov. 2016 à 14:17, Maxime Coquelin <maxime.coquelin@redhat.com> a écrit : > > Hi Pierre, > > On 11/22/2016 10:54 AM, Pierre Pfister (ppfister) wrote: >> Hello Maxime, >> >>> Le 9 nov. 2016 à 15:51, Maxime Coquelin <maxime.coquelin@redhat.com> a écrit : >>> >>> Hi Pierre, >>> >>> On 11/09/2016 01:42 PM, Pierre Pfister (ppfister) wrote: >>>> Hello Maxime, >>>> >>>> Sorry for the late reply. >>>> >>>> >>>>> Le 8 nov. 2016 à 10:44, Maxime Coquelin <maxime.coquelin@redhat.com> a écrit : >>>>> >>>>> Hi Pierre, >>>>> >>>>> On 11/08/2016 10:31 AM, Pierre Pfister (ppfister) wrote: >>>>>> Current virtio driver advertises VERSION_1 support, >>>>>> but does not handle device's VERSION_1 support when >>>>>> sending packets (it looks for ANY_LAYOUT feature, >>>>>> which is absent). >>>>>> >>>>>> This patch enables 'can_push' in tx path when VERSION_1 >>>>>> is advertised by the device. >>>>>> >>>>>> This significantly improves small packets forwarding rate >>>>>> towards devices advertising VERSION_1 feature. >>>>> I think it depends whether offloading is enabled or not. >>>>> If no offloading enabled, I measured significant drop. >>>>> Indeed, when no offloading is enabled, the Tx path in Virtio >>>>> does not access the virtio header before your patch, as the header is memset to zero at device init time. >>>>> With your patch, it gets memset to zero at every transmit in the hot >>>>> path. >>>> >>>> Right. On the virtio side that is true, but on the device side, we have to access the header anyway. >>> No more now, if no offload features have been negotiated. >>> I have done a patch that landed in v16.11 to skip header parsing in >>> this case. >>> That said, we still have to access its descriptor. >>> >>>> And accessing two descriptors (with the address resolution and memory fetch which comes with it) >>>> is a costy operation compared to a single one. >>>> In the case indirect descriptors are used, this is 1 desc access instead or 3. >>> I agree this is far from being optimal. >>> >>>> And in the case chained descriptors are used, this doubles the number of packets that you can put in your queue. >>>> >>>> Those are the results in my PHY -> VM (testpmd) -> PHY setup >>>> Traffic is flowing bidirectionally. Numbers are for lossless-rates. >>>> >>>> When chained buffers are used for dpdk's TX: 2x2.13Mpps >>>> When indirect descriptors are used for dpdk's TX: 2x2.38Mpps >>>> When shallow buffers are used for dpdk's TX (with this patch): 2x2.42Mpps >>> When I tried it, I also did PVP 0% benchmark, and I got opposite results. Chained and indirect cases were significantly better. >>> >>> My PVP setup was using a single NIC and single Virtio PMD, and NIC2VM >>> forwarding was IO mode done with testpmd on host, and Rx->Tx forwarding >>> was macswap mode on guest side. >>> >>> I also saw some perf regression when running simple tespmd test on both >>> ends. >>> >>> Yuanhan, did you run some benchmark with your series enabling >>> ANY_LAYOUT? >> >> It was enabled. But the specs specify that VERSION_1 includes ANY_LAYOUT. >> Therefor, Qemu removes ANY_LAYOUT when VERSION_1 is set. >> >> We can keep arguing about which is fastest. I guess we have different setups and different results, so we probably are deadlocked here. >> But in any case, the current code is inconsistent, as it uses single descriptor when ANY_LAYOUT is set, but not when VERSION_1 is set. >> >> I believe it makes sense to use single-descriptor any time it is possible, but you are free to think otherwise. >> Please make a call and make the code consistent (removes single-descriptors all together, or use them when VERSION_1 is set too). Otherwise it just creates yet-another testing headache. > I also think it makes sense to have a single descriptor, but I had to > highlight that I noticed a significant performance degradation when > using single descriptor on my setup. > > I'm fine we take your patch in virtio-next, so that more testing is conducted. > Thanks, I just realised there was an indentation error in the patch. Meaning that this patch didn't make it to 16.11 ... I will send a new version. - Pierre > Thanks, > Maxime > >> >> Thanks, >> >> - Pierre >> >>> >>>> >>>> I must also note that qemu 2.5 does not seem to deal with VERSION_1 and ANY_LAYOUT correctly. >>>> The patch I am proposing here works for qemu 2.7, but with qemu 2.5, testpmd still behaves as if ANY_LAYOUT (or VERSION_1) was not available. This is not catastrophic. But just note that you will not see performance in some cases with qemu 2.5. >>> >>> Thanks for the info. >>> >>> Regards, >>> Maxime >>
diff --git a/drivers/net/virtio/virtio_rxtx.c b/drivers/net/virtio/virtio_rxtx.c index 724517e..2fe0338 100644 --- a/drivers/net/virtio/virtio_rxtx.c +++ b/drivers/net/virtio/virtio_rxtx.c @@ -925,7 +925,8 @@ virtio_xmit_pkts(void *tx_queue, struct rte_mbuf **tx_pkts, uint16_t nb_pkts) } /* optimize ring usage */ - if (vtpci_with_feature(hw, VIRTIO_F_ANY_LAYOUT) && + if ((vtpci_with_feature(hw, VIRTIO_F_ANY_LAYOUT) || + vtpci_with_feature(hw, VIRTIO_F_VERSION_1)) && rte_mbuf_refcnt_read(txm) == 1 && RTE_MBUF_DIRECT(txm) && txm->nb_segs == 1 &&