[dpdk-dev,v2] ixgbe: speed up transmit

Message ID 1447431013-23245-1-git-send-email-stephen@networkplumber.org (mailing list archive)
State Accepted, archived
Delegated to: Bruce Richardson
Headers

Commit Message

Stephen Hemminger Nov. 13, 2015, 4:10 p.m. UTC
  From: Stephen Hemminger <shemming@brocade.com>

The freeing of mbuf's in ixgbe is one of the observable hot spots
under load. Optimize it by doing bulk free of mbufs using code similar
to i40e and fm10k.

Drop the no longer needed micro-optimization for the no refcount flag.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
v2 - rebase and use variable names consistent with i40e

 drivers/net/ixgbe/ixgbe_rxtx.c | 32 ++++++++++++++++++++------------
 drivers/net/ixgbe/ixgbe_rxtx.h |  2 +-
 2 files changed, 21 insertions(+), 13 deletions(-)
  

Comments

Stephen Hemminger Dec. 11, 2015, 4:48 p.m. UTC | #1
On Fri, 13 Nov 2015 08:10:13 -0800
Stephen Hemminger <stephen@networkplumber.org> wrote:

> From: Stephen Hemminger <shemming@brocade.com>
> 
> The freeing of mbuf's in ixgbe is one of the observable hot spots
> under load. Optimize it by doing bulk free of mbufs using code similar
> to i40e and fm10k.
> 
> Drop the no longer needed micro-optimization for the no refcount flag.
> 
> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>

How come this patch got no review or comments?
It gets a visible performance gain of up to 10% in some cases.

I understand maintainers are busy with internal work, but they need
to read mailing list as well.
  
Ananyev, Konstantin Dec. 11, 2015, 6:52 p.m. UTC | #2
> -----Original Message-----
> From: Stephen Hemminger [mailto:shemming@brocade.com]
> Sent: Friday, December 11, 2015 4:48 PM
> To: dev@dpdk.org; Zhang, Helin; Ananyev, Konstantin
> Subject: Re: [PATCH v2 ] ixgbe: speed up transmit
> 
> On Fri, 13 Nov 2015 08:10:13 -0800
> Stephen Hemminger <stephen@networkplumber.org> wrote:
> 
> > From: Stephen Hemminger <shemming@brocade.com>
> >
> > The freeing of mbuf's in ixgbe is one of the observable hot spots
> > under load. Optimize it by doing bulk free of mbufs using code similar
> > to i40e and fm10k.
> >
> > Drop the no longer needed micro-optimization for the no refcount flag.
> >
> > Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
> 
> How come this patch got no review or comments?
> It gets a visible performance gain of up to 10% in some cases.
> 
> I understand maintainers are busy with internal work, but they need
> to read mailing list as well.

Yep, I missed it somehow.
BTW, by some reason I couldn't find it patchwork....
Was it submitted too late for 2.2 timeframe?
Anyway, about the patch itself: it looks good to me and indeed it provides 10%+
performance improvement.
So here is my ACK for it.
Konstantin
  
Bruce Richardson Feb. 10, 2016, 3:19 p.m. UTC | #3
On Fri, Dec 11, 2015 at 06:52:36PM +0000, Ananyev, Konstantin wrote:
> > -----Original Message-----
> > From: Stephen Hemminger [mailto:shemming at brocade.com]
> > Sent: Friday, December 11, 2015 4:48 PM
> > To: dev at dpdk.org; Zhang, Helin; Ananyev, Konstantin
> > Subject: Re: [PATCH v2 ] ixgbe: speed up transmit
> > 
> > On Fri, 13 Nov 2015 08:10:13 -0800
> > Stephen Hemminger <stephen at networkplumber.org> wrote:
> > 
> > > From: Stephen Hemminger <shemming at brocade.com>
> > >
> > > The freeing of mbuf's in ixgbe is one of the observable hot spots
> > > under load. Optimize it by doing bulk free of mbufs using code similar
> > > to i40e and fm10k.
> > >
> > > Drop the no longer needed micro-optimization for the no refcount flag.
> > >
> > > Signed-off-by: Stephen Hemminger <stephen at networkplumber.org>
> > 
> > How come this patch got no review or comments?
> > It gets a visible performance gain of up to 10% in some cases.
> > 
> > I understand maintainers are busy with internal work, but they need
> > to read mailing list as well.
> 
> Yep, I missed it somehow.
> BTW, by some reason I couldn't find it patchwork....
> Was it submitted too late for 2.2 timeframe?
> Anyway, about the patch itself: it looks good to me and indeed it provides 10%+
> performance improvement.
> So here is my ACK for it.
> Konstantin

Applied to dpdk-next-net/rel_16_04, with edited title to clarify that it's the 
non-vectorized tx code that is being sped up, and not all ixgbe transmit paths.

Thanks,
/Bruce
  

Patch

diff --git a/drivers/net/ixgbe/ixgbe_rxtx.c b/drivers/net/ixgbe/ixgbe_rxtx.c
index 52a263c..0b087c3 100644
--- a/drivers/net/ixgbe/ixgbe_rxtx.c
+++ b/drivers/net/ixgbe/ixgbe_rxtx.c
@@ -126,7 +126,8 @@  ixgbe_tx_free_bufs(struct ixgbe_tx_queue *txq)
 {
 	struct ixgbe_tx_entry *txep;
 	uint32_t status;
-	int i;
+	int i, nb_free = 0;
+	struct rte_mbuf *m, *free[RTE_IXGBE_TX_MAX_FREE_BUF_SZ];
 
 	/* check DD bit on threshold descriptor */
 	status = txq->tx_ring[txq->tx_next_dd].wb.status;
@@ -139,20 +140,27 @@  ixgbe_tx_free_bufs(struct ixgbe_tx_queue *txq)
 	 */
 	txep = &(txq->sw_ring[txq->tx_next_dd - (txq->tx_rs_thresh - 1)]);
 
-	/* free buffers one at a time */
-	if ((txq->txq_flags & (uint32_t)ETH_TXQ_FLAGS_NOREFCOUNT) != 0) {
-		for (i = 0; i < txq->tx_rs_thresh; ++i, ++txep) {
-			txep->mbuf->next = NULL;
-			rte_mempool_put(txep->mbuf->pool, txep->mbuf);
-			txep->mbuf = NULL;
-		}
-	} else {
-		for (i = 0; i < txq->tx_rs_thresh; ++i, ++txep) {
-			rte_pktmbuf_free_seg(txep->mbuf);
-			txep->mbuf = NULL;
+	for (i = 0; i < txq->tx_rs_thresh; ++i, ++txep) {
+		/* free buffers one at a time */
+		m = __rte_pktmbuf_prefree_seg(txep->mbuf);
+		txep->mbuf = NULL;
+
+		if (unlikely(m == NULL))
+			continue;
+
+		if (nb_free >= RTE_IXGBE_TX_MAX_FREE_BUF_SZ ||
+		    (nb_free > 0 && m->pool != free[0]->pool)) {
+			rte_mempool_put_bulk(free[0]->pool,
+					     (void **)free, nb_free);
+			nb_free = 0;
 		}
+
+		free[nb_free++] = m;
 	}
 
+	if (nb_free > 0)
+		rte_mempool_put_bulk(free[0]->pool, (void **)free, nb_free);
+
 	/* buffers were freed, update counters */
 	txq->nb_tx_free = (uint16_t)(txq->nb_tx_free + txq->tx_rs_thresh);
 	txq->tx_next_dd = (uint16_t)(txq->tx_next_dd + txq->tx_rs_thresh);
diff --git a/drivers/net/ixgbe/ixgbe_rxtx.h b/drivers/net/ixgbe/ixgbe_rxtx.h
index 475a800..064cbda 100644
--- a/drivers/net/ixgbe/ixgbe_rxtx.h
+++ b/drivers/net/ixgbe/ixgbe_rxtx.h
@@ -58,6 +58,7 @@ 
 
 #define RTE_PMD_IXGBE_TX_MAX_BURST 32
 #define RTE_PMD_IXGBE_RX_MAX_BURST 32
+#define RTE_IXGBE_TX_MAX_FREE_BUF_SZ 64
 
 #define RTE_IXGBE_DESCS_PER_LOOP    4
 
@@ -70,7 +71,6 @@ 
 #ifdef RTE_IXGBE_INC_VECTOR
 #define RTE_IXGBE_RXQ_REARM_THRESH      32
 #define RTE_IXGBE_MAX_RX_BURST          RTE_IXGBE_RXQ_REARM_THRESH
-#define RTE_IXGBE_TX_MAX_FREE_BUF_SZ    64
 #endif
 
 #define RX_RING_SZ ((IXGBE_MAX_RING_DESC + RTE_IXGBE_DESCS_PER_LOOP - 1) * \