[dpdk-dev] mbuf: remove inconsistent assert statements

Message ID	1465374688-11729-1-git-send-email-adrien.mazarguil@6wind.com (mailing list archive)
State	Rejected, archived
Headers	From: Adrien Mazarguil <adrien.mazarguil@6wind.com> To: dev@dpdk.org Cc: Olivier Matz <olivier.matz@6wind.com> Date: Wed, 8 Jun 2016 10:31:28 +0200 Message-Id: <1465374688-11729-1-git-send-email-adrien.mazarguil@6wind.com> Subject: [dpdk-dev] [PATCH] mbuf: remove inconsistent assert statements Precedence: list Errors-To: dev-bounces@dpdk.org Sender: "dev" <dev-bounces@dpdk.org>

Message ID

1465374688-11729-1-git-send-email-adrien.mazarguil@6wind.com (mailing list archive)

State

Rejected, archived

Headers

From: Adrien Mazarguil <adrien.mazarguil@6wind.com>
To: dev@dpdk.org
Cc: Olivier Matz <olivier.matz@6wind.com>
Date: Wed,  8 Jun 2016 10:31:28 +0200
Message-Id: <1465374688-11729-1-git-send-email-adrien.mazarguil@6wind.com>
Subject: [dpdk-dev] [PATCH] mbuf: remove inconsistent assert statements
Precedence: list
Errors-To: dev-bounces@dpdk.org
Sender: "dev" <dev-bounces@dpdk.org>

Commit Message

Adrien Mazarguil June 8, 2016, 8:31 a.m. UTC

  An assertion failure occurs in __rte_mbuf_raw_free() (called by a few PMDs)
when compiling DPDK with CONFIG_RTE_LOG_LEVEL=RTE_LOG_DEBUG and starting
applications with a log level high enough to trigger it.

While rte_mbuf_raw_alloc() sets refcount to 1, __rte_mbuf_raw_free()
expects it to be 0. Considering users are not expected to reset the
reference count to satisfy assert() and that raw functions are designed on
purpose without safety belts, remove these checks.

Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
---
 lib/librte_mbuf/rte_mbuf.h | 2 --
 1 file changed, 2 deletions(-)

Comments

Ananyev, Konstantin June 8, 2016, 10:34 a.m. UTC | #1

Hi Adrien,

> 
> An assertion failure occurs in __rte_mbuf_raw_free() (called by a few PMDs)
> when compiling DPDK with CONFIG_RTE_LOG_LEVEL=RTE_LOG_DEBUG and starting
> applications with a log level high enough to trigger it.
> 
> While rte_mbuf_raw_alloc() sets refcount to 1, __rte_mbuf_raw_free()
> expects it to be 0. 
>Considering users are not expected to reset the
> reference count to satisfy assert() and that raw functions are designed on
> purpose without safety belts, remove these checks.

Yes, it refcnt supposed to be set to 0 by __rte_pktmbuf_prefree_seg().
Wright now, it is a user responsibility to make sure refcnt==0 before pushing
mbuf back to the pool.
Not sure why do you consider that wrong?
If the user calls __rte_mbuf_raw_free() manualy it is his responsibility to make
sure mbuf's refcn==0.
BTW, why are you doing it?
The comment clearly states that the function is for internal use:
/**
 * @internal Put mbuf back into its original mempool.
 * The use of that function is reserved for RTE internal needs.
 * Please use rte_pktmbuf_free().
 *
 * @param m
 *   The mbuf to be freed.
 */
static inline void __attribute__((always_inline))
__rte_mbuf_raw_free(struct rte_mbuf *m)

Konstantin

> 
> Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
> ---
>  lib/librte_mbuf/rte_mbuf.h | 2 --
>  1 file changed, 2 deletions(-)
> 
> diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
> index 11fa06d..7070bb8 100644
> --- a/lib/librte_mbuf/rte_mbuf.h
> +++ b/lib/librte_mbuf/rte_mbuf.h
> @@ -1108,7 +1108,6 @@ static inline struct rte_mbuf *rte_mbuf_raw_alloc(struct rte_mempool *mp)
>  	if (rte_mempool_get(mp, &mb) < 0)
>  		return NULL;
>  	m = (struct rte_mbuf *)mb;
> -	RTE_ASSERT(rte_mbuf_refcnt_read(m) == 0);
>  	rte_mbuf_refcnt_set(m, 1);
>  	__rte_mbuf_sanity_check(m, 0);
> 
> @@ -1133,7 +1132,6 @@ __rte_mbuf_raw_alloc(struct rte_mempool *mp)
>  static inline void __attribute__((always_inline))
>  __rte_mbuf_raw_free(struct rte_mbuf *m)
>  {
> -	RTE_ASSERT(rte_mbuf_refcnt_read(m) == 0);
>  	rte_mempool_put(m->pool, m);
>  }
> 
> --
> 2.1.4

Adrien Mazarguil June 8, 2016, 12:27 p.m. UTC | #2

Hi Konstantin,

On Wed, Jun 08, 2016 at 10:34:17AM +0000, Ananyev, Konstantin wrote:
> Hi Adrien,
> 
> > 
> > An assertion failure occurs in __rte_mbuf_raw_free() (called by a few PMDs)
> > when compiling DPDK with CONFIG_RTE_LOG_LEVEL=RTE_LOG_DEBUG and starting
> > applications with a log level high enough to trigger it.
> > 
> > While rte_mbuf_raw_alloc() sets refcount to 1, __rte_mbuf_raw_free()
> > expects it to be 0. 
> >Considering users are not expected to reset the
> > reference count to satisfy assert() and that raw functions are designed on
> > purpose without safety belts, remove these checks.
> 
> Yes, it refcnt supposed to be set to 0 by __rte_pktmbuf_prefree_seg().
> Wright now, it is a user responsibility to make sure refcnt==0 before pushing
> mbuf back to the pool.
> Not sure why do you consider that wrong?

I do not consider this wrong and I'm all for using assert() to catch
programming errors, however in this specific case, I think they are
inconsistent and misleading.

> If the user calls __rte_mbuf_raw_free() manualy it is his responsibility to make
> sure mbuf's refcn==0.

Sure, however what harm does it cause (besides assert() to fail), since the
allocation function sets refcount to 1?

Why having the allocation function set the refcount if we are sure it is
already 0 (assert() proves it). Removing rte_mbuf_refcnt_set(m, 1) can
surely improve performance.

> BTW, why are you doing it?
> The comment clearly states that the function is for internal use:
> /**
>  * @internal Put mbuf back into its original mempool.
>  * The use of that function is reserved for RTE internal needs.
>  * Please use rte_pktmbuf_free().
>  *
>  * @param m
>  *   The mbuf to be freed.
>  */
> static inline void __attribute__((always_inline))
> __rte_mbuf_raw_free(struct rte_mbuf *m)

Several PMDs are using it anyway (won't name names, but I know one of them
quite well). I chose to modify this code instead of its users for the
following reasons:

- Considering their names, these functions should be opposites and able to
  work together like malloc()/free().

- PMDs are relying on these functions for performance reasons, we can assume
  they took the extra care necessary to make sure it would work properly.

- Preventing it would make these PMDs slower and is not acceptable either.

What remains is the consistency issue, I think these statements were only
added to catch multiple frees, and those should be caught at a higher
level, where other consistency checks are also performed.

> > Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
> > ---
> >  lib/librte_mbuf/rte_mbuf.h | 2 --
> >  1 file changed, 2 deletions(-)
> > 
> > diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
> > index 11fa06d..7070bb8 100644
> > --- a/lib/librte_mbuf/rte_mbuf.h
> > +++ b/lib/librte_mbuf/rte_mbuf.h
> > @@ -1108,7 +1108,6 @@ static inline struct rte_mbuf *rte_mbuf_raw_alloc(struct rte_mempool *mp)
> >  	if (rte_mempool_get(mp, &mb) < 0)
> >  		return NULL;
> >  	m = (struct rte_mbuf *)mb;
> > -	RTE_ASSERT(rte_mbuf_refcnt_read(m) == 0);
> >  	rte_mbuf_refcnt_set(m, 1);
> >  	__rte_mbuf_sanity_check(m, 0);
> > 
> > @@ -1133,7 +1132,6 @@ __rte_mbuf_raw_alloc(struct rte_mempool *mp)
> >  static inline void __attribute__((always_inline))
> >  __rte_mbuf_raw_free(struct rte_mbuf *m)
> >  {
> > -	RTE_ASSERT(rte_mbuf_refcnt_read(m) == 0);
> >  	rte_mempool_put(m->pool, m);
> >  }
> > 
> > --
> > 2.1.4
>

Ananyev, Konstantin June 8, 2016, 1:09 p.m. UTC | #3

> 
> Hi Konstantin,
> 
> On Wed, Jun 08, 2016 at 10:34:17AM +0000, Ananyev, Konstantin wrote:
> > Hi Adrien,
> >
> > >
> > > An assertion failure occurs in __rte_mbuf_raw_free() (called by a few PMDs)
> > > when compiling DPDK with CONFIG_RTE_LOG_LEVEL=RTE_LOG_DEBUG and starting
> > > applications with a log level high enough to trigger it.
> > >
> > > While rte_mbuf_raw_alloc() sets refcount to 1, __rte_mbuf_raw_free()
> > > expects it to be 0.
> > >Considering users are not expected to reset the
> > > reference count to satisfy assert() and that raw functions are designed on
> > > purpose without safety belts, remove these checks.
> >
> > Yes, it refcnt supposed to be set to 0 by __rte_pktmbuf_prefree_seg().
> > Wright now, it is a user responsibility to make sure refcnt==0 before pushing
> > mbuf back to the pool.
> > Not sure why do you consider that wrong?
> 
> I do not consider this wrong and I'm all for using assert() to catch
> programming errors, however in this specific case, I think they are
> inconsistent and misleading.

Honestly, I don't understand why.
Right now the rule of thumb is - when mbuf is in the pool, it's refcnt should be equal zero.
Yes, as you pointed below - that rule probably can be changed to: 
when mbuf is in the pool, it's refcnt should equal one, and that would probably allow us
to speedup things a bit, but I suppose that's the matter of another aptch/discussion.

> 
> > If the user calls __rte_mbuf_raw_free() manualy it is his responsibility to make
> > sure mbuf's refcn==0.
> 
> Sure, however what harm does it cause (besides assert() to fail), since the
> allocation function sets refcount to 1?
> 
> Why having the allocation function set the refcount if we are sure it is
> already 0 (assert() proves it). Removing rte_mbuf_refcnt_set(m, 1) can
> surely improve performance.

That's' just an assert() enabled when MBUF_DEBUG  is on.
Its sole purpose is to help troubleshoot the bugs and help to catch situations
when someone silently updates mbufs supposed to be free.  

> 
> > BTW, why are you doing it?
> > The comment clearly states that the function is for internal use:
> > /**
> >  * @internal Put mbuf back into its original mempool.
> >  * The use of that function is reserved for RTE internal needs.
> >  * Please use rte_pktmbuf_free().
> >  *
> >  * @param m
> >  *   The mbuf to be freed.
> >  */
> > static inline void __attribute__((always_inline))
> > __rte_mbuf_raw_free(struct rte_mbuf *m)
> 
> Several PMDs are using it anyway (won't name names, but I know one of them
> quite well).

Then it probably is a bug in these PMDs that need to be fixed.


> I chose to modify this code instead of its users for the
> following reasons:
> 
> - Considering their names, these functions should be opposites and able to
>   work together like malloc()/free().

These are internal functions.
Comments in mbuf clearly state that library users shouldn't call them directly.
They are written to fit internal librte_mbuf needs, and no-one ever promised
malloc/free() compatibility here. 

> 
> - PMDs are relying on these functions for performance reasons, we can assume
>   they took the extra care necessary to make sure it would work properly.

That just doesn't seem correct to me.
The proper way to do free fo mbuf segment is:

static inline void __attribute__((always_inline))
rte_pktmbuf_free_seg(struct rte_mbuf *m)
{
        if (likely(NULL != (m = __rte_pktmbuf_prefree_seg(m)))) {
                m->next = NULL;
                __rte_mbuf_raw_free(m);
        }
}

If by some reason you choose not to use this function, then it is your
responsibility to perform similar actions on your own before pushing mbuf into the pool.
That's what some TX functions for some Intel NICs do to improve performance:
they call _prefree_seg() manually and try to put mbufs into the pool in groups.

> 
> - Preventing it would make these PMDs slower and is not acceptable either.

I can hardly imagine that __rte_pktmbuf_prefree_seg() impact would be that severe...
But ok, probably  you do have some very specific case, but then why you PMD just doesn't call:
rte_mempool_put(m->pool, m); 
directly?
Why instead you choose to change common functions and compromise
librte_mbuf debug ability?

> 
> What remains is the consistency issue, I think these statements were only
> added to catch multiple frees,

Yes these asserts() here to help catch bugs,
and I think it is a good thing to have them here. 

> and those should be caught at a higher
> level, where other consistency checks are also performed.

Like where?

Konstantin


> 
> > > Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
> > > ---
> > >  lib/librte_mbuf/rte_mbuf.h | 2 --
> > >  1 file changed, 2 deletions(-)
> > >
> > > diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
> > > index 11fa06d..7070bb8 100644
> > > --- a/lib/librte_mbuf/rte_mbuf.h
> > > +++ b/lib/librte_mbuf/rte_mbuf.h
> > > @@ -1108,7 +1108,6 @@ static inline struct rte_mbuf *rte_mbuf_raw_alloc(struct rte_mempool *mp)
> > >  	if (rte_mempool_get(mp, &mb) < 0)
> > >  		return NULL;
> > >  	m = (struct rte_mbuf *)mb;
> > > -	RTE_ASSERT(rte_mbuf_refcnt_read(m) == 0);
> > >  	rte_mbuf_refcnt_set(m, 1);
> > >  	__rte_mbuf_sanity_check(m, 0);
> > >
> > > @@ -1133,7 +1132,6 @@ __rte_mbuf_raw_alloc(struct rte_mempool *mp)
> > >  static inline void __attribute__((always_inline))
> > >  __rte_mbuf_raw_free(struct rte_mbuf *m)
> > >  {
> > > -	RTE_ASSERT(rte_mbuf_refcnt_read(m) == 0);
> > >  	rte_mempool_put(m->pool, m);
> > >  }
> > >
> > > --
> > > 2.1.4
> >
> 
> --
> Adrien Mazarguil
> 6WIND

Adrien Mazarguil June 8, 2016, 1:57 p.m. UTC | #4

On Wed, Jun 08, 2016 at 01:09:18PM +0000, Ananyev, Konstantin wrote:
> > 
> > Hi Konstantin,
> > 
> > On Wed, Jun 08, 2016 at 10:34:17AM +0000, Ananyev, Konstantin wrote:
> > > Hi Adrien,
> > >
> > > >
> > > > An assertion failure occurs in __rte_mbuf_raw_free() (called by a few PMDs)
> > > > when compiling DPDK with CONFIG_RTE_LOG_LEVEL=RTE_LOG_DEBUG and starting
> > > > applications with a log level high enough to trigger it.
> > > >
> > > > While rte_mbuf_raw_alloc() sets refcount to 1, __rte_mbuf_raw_free()
> > > > expects it to be 0.
> > > >Considering users are not expected to reset the
> > > > reference count to satisfy assert() and that raw functions are designed on
> > > > purpose without safety belts, remove these checks.
> > >
> > > Yes, it refcnt supposed to be set to 0 by __rte_pktmbuf_prefree_seg().
> > > Wright now, it is a user responsibility to make sure refcnt==0 before pushing
> > > mbuf back to the pool.
> > > Not sure why do you consider that wrong?
> > 
> > I do not consider this wrong and I'm all for using assert() to catch
> > programming errors, however in this specific case, I think they are
> > inconsistent and misleading.
> 
> Honestly, I don't understand why.
> Right now the rule of thumb is - when mbuf is in the pool, it's refcnt should be equal zero.
> Yes, as you pointed below - that rule probably can be changed to: 
> when mbuf is in the pool, it's refcnt should equal one, and that would probably allow us
> to speedup things a bit, but I suppose that's the matter of another aptch/discussion.

Agreed.

> > > If the user calls __rte_mbuf_raw_free() manualy it is his responsibility to make
> > > sure mbuf's refcn==0.
> > 
> > Sure, however what harm does it cause (besides assert() to fail), since the
> > allocation function sets refcount to 1?
> > 
> > Why having the allocation function set the refcount if we are sure it is
> > already 0 (assert() proves it). Removing rte_mbuf_refcnt_set(m, 1) can
> > surely improve performance.
> 
> That's' just an assert() enabled when MBUF_DEBUG  is on.
> Its sole purpose is to help troubleshoot the bugs and help to catch situations
> when someone silently updates mbufs supposed to be free.  

I perfectly understand and I cannot agree more with this explanation,
however the fact these functions are not symmetrical remains an issue that
needs to be addressed somehow in my opinion.

> > > BTW, why are you doing it?
> > > The comment clearly states that the function is for internal use:
> > > /**
> > >  * @internal Put mbuf back into its original mempool.
> > >  * The use of that function is reserved for RTE internal needs.
> > >  * Please use rte_pktmbuf_free().
> > >  *
> > >  * @param m
> > >  *   The mbuf to be freed.
> > >  */
> > > static inline void __attribute__((always_inline))
> > > __rte_mbuf_raw_free(struct rte_mbuf *m)
> > 
> > Several PMDs are using it anyway (won't name names, but I know one of them
> > quite well).
> 
> Then it probably is a bug in these PMDs that need to be fixed.
> 
> > I chose to modify this code instead of its users for the
> > following reasons:
> > 
> > - Considering their names, these functions should be opposites and able to
> >   work together like malloc()/free().
> 
> These are internal functions.
> Comments in mbuf clearly state that library users shouldn't call them directly.
> They are written to fit internal librte_mbuf needs, and no-one ever promised
> malloc/free() compatibility here. 

So it cannot be provided for the sake of not providing it or is there a good
reason?

What I meant is that since PMDs already made the mistake of using these
functions to benefit from the improved performance, DPDK being all about
performance and stuff, let them use it as intended. Perhaps we should drop
those "__" like for rte_mbuf_raw_alloc().

> > 
> > - PMDs are relying on these functions for performance reasons, we can assume
> >   they took the extra care necessary to make sure it would work properly.
> 
> That just doesn't seem correct to me.
> The proper way to do free fo mbuf segment is:
> 
> static inline void __attribute__((always_inline))
> rte_pktmbuf_free_seg(struct rte_mbuf *m)
> {
>         if (likely(NULL != (m = __rte_pktmbuf_prefree_seg(m)))) {
>                 m->next = NULL;
>                 __rte_mbuf_raw_free(m);
>         }
> }
> 
> If by some reason you choose not to use this function, then it is your
> responsibility to perform similar actions on your own before pushing mbuf into the pool.
> That's what some TX functions for some Intel NICs do to improve performance:
> they call _prefree_seg() manually and try to put mbufs into the pool in groups.

Not anymore it seems, but in the current code base both ena and mpipe PMDs
(soon mlx5 as well) seem to get this wrong.

> > - Preventing it would make these PMDs slower and is not acceptable either.
> 
> I can hardly imagine that __rte_pktmbuf_prefree_seg() impact would be that severe...
> But ok, probably  you do have some very specific case, but then why you PMD just doesn't call:
> rte_mempool_put(m->pool, m); 
> directly?

To survive the upcoming transition to mbufs backed by libc malloc() without
having to fix them? Joke aside, I guess the reason is to use functions with
"mbuf" in their names when dealing with mbufs.

> Why instead you choose to change common functions and compromise
> librte_mbuf debug ability?

No, I'm fine with keeping the debug ability, however I did not find a way to
both keep it and fix the consistency issue without breaking something
(performance or refcount assumptions I'm not familiar with elsewhere).

> > What remains is the consistency issue, I think these statements were only
> > added to catch multiple frees,
> 
> Yes these asserts() here to help catch bugs,
> and I think it is a good thing to have them here. 
> 
> > and those should be caught at a higher
> > level, where other consistency checks are also performed.
> 
> Like where?

Possibly rte_pktmbuf_free_seg().

> > > > Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
> > > > ---
> > > >  lib/librte_mbuf/rte_mbuf.h | 2 --
> > > >  1 file changed, 2 deletions(-)
> > > >
> > > > diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
> > > > index 11fa06d..7070bb8 100644
> > > > --- a/lib/librte_mbuf/rte_mbuf.h
> > > > +++ b/lib/librte_mbuf/rte_mbuf.h
> > > > @@ -1108,7 +1108,6 @@ static inline struct rte_mbuf *rte_mbuf_raw_alloc(struct rte_mempool *mp)
> > > >  	if (rte_mempool_get(mp, &mb) < 0)
> > > >  		return NULL;
> > > >  	m = (struct rte_mbuf *)mb;
> > > > -	RTE_ASSERT(rte_mbuf_refcnt_read(m) == 0);
> > > >  	rte_mbuf_refcnt_set(m, 1);
> > > >  	__rte_mbuf_sanity_check(m, 0);
> > > >
> > > > @@ -1133,7 +1132,6 @@ __rte_mbuf_raw_alloc(struct rte_mempool *mp)
> > > >  static inline void __attribute__((always_inline))
> > > >  __rte_mbuf_raw_free(struct rte_mbuf *m)
> > > >  {
> > > > -	RTE_ASSERT(rte_mbuf_refcnt_read(m) == 0);
> > > >  	rte_mempool_put(m->pool, m);
> > > >  }
> > > >
> > > > --
> > > > 2.1.4
> > >
> > 
> > --
> > Adrien Mazarguil
> > 6WIND

Olivier Matz June 8, 2016, 2:11 p.m. UTC | #5

Hi Adrien, Konstantin,

I'm jumping in this (interesting) discussion. I already talked a
bit with Adrien in point to point, and I think its patch is valid.
Please see some comments below.


On 06/08/2016 03:57 PM, Adrien Mazarguil wrote:
> On Wed, Jun 08, 2016 at 01:09:18PM +0000, Ananyev, Konstantin wrote:
>>>
>>> Hi Konstantin,
>>>
>>> On Wed, Jun 08, 2016 at 10:34:17AM +0000, Ananyev, Konstantin wrote:
>>>> Hi Adrien,
>>>>
>>>>>
>>>>> An assertion failure occurs in __rte_mbuf_raw_free() (called by a few PMDs)
>>>>> when compiling DPDK with CONFIG_RTE_LOG_LEVEL=RTE_LOG_DEBUG and starting
>>>>> applications with a log level high enough to trigger it.
>>>>>
>>>>> While rte_mbuf_raw_alloc() sets refcount to 1, __rte_mbuf_raw_free()
>>>>> expects it to be 0.
>>>>> Considering users are not expected to reset the
>>>>> reference count to satisfy assert() and that raw functions are designed on
>>>>> purpose without safety belts, remove these checks.
>>>>
>>>> Yes, it refcnt supposed to be set to 0 by __rte_pktmbuf_prefree_seg().
>>>> Wright now, it is a user responsibility to make sure refcnt==0 before pushing
>>>> mbuf back to the pool.
>>>> Not sure why do you consider that wrong?
>>>
>>> I do not consider this wrong and I'm all for using assert() to catch
>>> programming errors, however in this specific case, I think they are
>>> inconsistent and misleading.
>>
>> Honestly, I don't understand why.
>> Right now the rule of thumb is - when mbuf is in the pool, it's refcnt should be equal zero.

What is the purpose of this? Is there some code relying on this?

>> Yes, as you pointed below - that rule probably can be changed to: 
>> when mbuf is in the pool, it's refcnt should equal one, and that would probably allow us
>> to speedup things a bit, but I suppose that's the matter of another aptch/discussion.
> 
> Agreed.

I agree this is somehow another discussion, but let's dive into :)

But since [1], it is allowed to call rte_mbuf_raw_alloc() in PMDs (all
PMDs were calling an internal function before). We could argue that
rte_mbuf_raw_free() should also be made public for PMDs.

As you said below, no-one promised that the free() reverts the malloc(),
but given the function names, one can legitimately expect that the
following code is valid:

m = __rte_mbuf_raw_alloc();
/* do nothing here */
__rte_mbuf_raw_free(m);

If no code relies on having the refcnt set to 0 when a mbuf is in
the pool, I suggest to relax this constraint as Adrien proposed.

Then, my opinion is that the refcount should be set to 1 in
rte_pktmbuf_reset(). And rte_pktmbuf_free() should not be allowed on
an uninitialized mbuf (yes, this would require some changes in PMDs).
This would open the door for bulk allocation/free in the mbuf api.

This could be done in several steps:

1/ remove these assert(), add introduce a public rte_mbuf_raw_free()
2/ announce that rte_pktmbuf_free() won't work on uninitialized mbufs
in a future version, and ensure that all PMD are inline with this
requirement
3/ later, move refcount initialization in rte_pktmbuf_reset()

What do you think?

Another option is to remove the rte_mbuf_raw_alloc()/rte_mbuf_raw_free()
and use mempool calls. But having a mbuf wrapper does not seem a bad
thing to me.

By the way, __rte_pktmbuf_prefree_seg() is also an internal function.
I think we should try to update the mbuf API so that the PMDs do not
need to call these internal functions.


[1]
http://dpdk.org/browse/dpdk/commit/?id=fbfd99551ca370266f4bfff58ce441cf5cb1203a


Regards,
Olivier


> 
>>>> If the user calls __rte_mbuf_raw_free() manualy it is his responsibility to make
>>>> sure mbuf's refcn==0.
>>>
>>> Sure, however what harm does it cause (besides assert() to fail), since the
>>> allocation function sets refcount to 1?
>>>
>>> Why having the allocation function set the refcount if we are sure it is
>>> already 0 (assert() proves it). Removing rte_mbuf_refcnt_set(m, 1) can
>>> surely improve performance.
>>
>> That's' just an assert() enabled when MBUF_DEBUG  is on.
>> Its sole purpose is to help troubleshoot the bugs and help to catch situations
>> when someone silently updates mbufs supposed to be free.  
> 
> I perfectly understand and I cannot agree more with this explanation,
> however the fact these functions are not symmetrical remains an issue that
> needs to be addressed somehow in my opinion.
> 
>>>> BTW, why are you doing it?
>>>> The comment clearly states that the function is for internal use:
>>>> /**
>>>>  * @internal Put mbuf back into its original mempool.
>>>>  * The use of that function is reserved for RTE internal needs.
>>>>  * Please use rte_pktmbuf_free().
>>>>  *
>>>>  * @param m
>>>>  *   The mbuf to be freed.
>>>>  */
>>>> static inline void __attribute__((always_inline))
>>>> __rte_mbuf_raw_free(struct rte_mbuf *m)
>>>
>>> Several PMDs are using it anyway (won't name names, but I know one of them
>>> quite well).
>>
>> Then it probably is a bug in these PMDs that need to be fixed.
>>
>>> I chose to modify this code instead of its users for the
>>> following reasons:
>>>
>>> - Considering their names, these functions should be opposites and able to
>>>   work together like malloc()/free().
>>
>> These are internal functions.
>> Comments in mbuf clearly state that library users shouldn't call them directly.
>> They are written to fit internal librte_mbuf needs, and no-one ever promised
>> malloc/free() compatibility here. 
> 
> So it cannot be provided for the sake of not providing it or is there a good
> reason?
> 
> What I meant is that since PMDs already made the mistake of using these
> functions to benefit from the improved performance, DPDK being all about
> performance and stuff, let them use it as intended. Perhaps we should drop
> those "__" like for rte_mbuf_raw_alloc().
> 
>>>
>>> - PMDs are relying on these functions for performance reasons, we can assume
>>>   they took the extra care necessary to make sure it would work properly.
>>
>> That just doesn't seem correct to me.
>> The proper way to do free fo mbuf segment is:
>>
>> static inline void __attribute__((always_inline))
>> rte_pktmbuf_free_seg(struct rte_mbuf *m)
>> {
>>         if (likely(NULL != (m = __rte_pktmbuf_prefree_seg(m)))) {
>>                 m->next = NULL;
>>                 __rte_mbuf_raw_free(m);
>>         }
>> }
>>
>> If by some reason you choose not to use this function, then it is your
>> responsibility to perform similar actions on your own before pushing mbuf into the pool.
>> That's what some TX functions for some Intel NICs do to improve performance:
>> they call _prefree_seg() manually and try to put mbufs into the pool in groups.
> 
> Not anymore it seems, but in the current code base both ena and mpipe PMDs
> (soon mlx5 as well) seem to get this wrong.
> 
>>> - Preventing it would make these PMDs slower and is not acceptable either.
>>
>> I can hardly imagine that __rte_pktmbuf_prefree_seg() impact would be that severe...
>> But ok, probably  you do have some very specific case, but then why you PMD just doesn't call:
>> rte_mempool_put(m->pool, m); 
>> directly?
> 
> To survive the upcoming transition to mbufs backed by libc malloc() without
> having to fix them? Joke aside, I guess the reason is to use functions with
> "mbuf" in their names when dealing with mbufs.
> 
>> Why instead you choose to change common functions and compromise
>> librte_mbuf debug ability?
> 
> No, I'm fine with keeping the debug ability, however I did not find a way to
> both keep it and fix the consistency issue without breaking something
> (performance or refcount assumptions I'm not familiar with elsewhere).
> 
>>> What remains is the consistency issue, I think these statements were only
>>> added to catch multiple frees,
>>
>> Yes these asserts() here to help catch bugs,
>> and I think it is a good thing to have them here. 
>>
>>> and those should be caught at a higher
>>> level, where other consistency checks are also performed.
>>
>> Like where?
> 
> Possibly rte_pktmbuf_free_seg().
> 
>>>>> Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
>>>>> ---
>>>>>  lib/librte_mbuf/rte_mbuf.h | 2 --
>>>>>  1 file changed, 2 deletions(-)
>>>>>
>>>>> diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
>>>>> index 11fa06d..7070bb8 100644
>>>>> --- a/lib/librte_mbuf/rte_mbuf.h
>>>>> +++ b/lib/librte_mbuf/rte_mbuf.h
>>>>> @@ -1108,7 +1108,6 @@ static inline struct rte_mbuf *rte_mbuf_raw_alloc(struct rte_mempool *mp)
>>>>>  	if (rte_mempool_get(mp, &mb) < 0)
>>>>>  		return NULL;
>>>>>  	m = (struct rte_mbuf *)mb;
>>>>> -	RTE_ASSERT(rte_mbuf_refcnt_read(m) == 0);
>>>>>  	rte_mbuf_refcnt_set(m, 1);
>>>>>  	__rte_mbuf_sanity_check(m, 0);
>>>>>
>>>>> @@ -1133,7 +1132,6 @@ __rte_mbuf_raw_alloc(struct rte_mempool *mp)
>>>>>  static inline void __attribute__((always_inline))
>>>>>  __rte_mbuf_raw_free(struct rte_mbuf *m)
>>>>>  {
>>>>> -	RTE_ASSERT(rte_mbuf_refcnt_read(m) == 0);
>>>>>  	rte_mempool_put(m->pool, m);
>>>>>  }
>>>>>
>>>>> --
>>>>> 2.1.4
>>>>
>>>
>>> --
>>> Adrien Mazarguil
>>> 6WIND
>

Ananyev, Konstantin June 8, 2016, 4:07 p.m. UTC | #6

Hi Olivier,

> 
> Hi Adrien, Konstantin,
> 
> I'm jumping in this (interesting) discussion. I already talked a
> bit with Adrien in point to point, and I think its patch is valid.
> Please see some comments below.
> 
> 
> On 06/08/2016 03:57 PM, Adrien Mazarguil wrote:
> > On Wed, Jun 08, 2016 at 01:09:18PM +0000, Ananyev, Konstantin wrote:
> >>>
> >>> Hi Konstantin,
> >>>
> >>> On Wed, Jun 08, 2016 at 10:34:17AM +0000, Ananyev, Konstantin wrote:
> >>>> Hi Adrien,
> >>>>
> >>>>>
> >>>>> An assertion failure occurs in __rte_mbuf_raw_free() (called by a few PMDs)
> >>>>> when compiling DPDK with CONFIG_RTE_LOG_LEVEL=RTE_LOG_DEBUG and starting
> >>>>> applications with a log level high enough to trigger it.
> >>>>>
> >>>>> While rte_mbuf_raw_alloc() sets refcount to 1, __rte_mbuf_raw_free()
> >>>>> expects it to be 0.
> >>>>> Considering users are not expected to reset the
> >>>>> reference count to satisfy assert() and that raw functions are designed on
> >>>>> purpose without safety belts, remove these checks.
> >>>>
> >>>> Yes, it refcnt supposed to be set to 0 by __rte_pktmbuf_prefree_seg().
> >>>> Wright now, it is a user responsibility to make sure refcnt==0 before pushing
> >>>> mbuf back to the pool.
> >>>> Not sure why do you consider that wrong?
> >>>
> >>> I do not consider this wrong and I'm all for using assert() to catch
> >>> programming errors, however in this specific case, I think they are
> >>> inconsistent and misleading.
> >>
> >> Honestly, I don't understand why.
> >> Right now the rule of thumb is - when mbuf is in the pool, it's refcnt should be equal zero.
> 
> What is the purpose of this? Is there some code relying on this?

The whole current implementation of mbuf_free code path relies on that.
Straight here: 
if (likely(NULL != (m = __rte_pktmbuf_prefree_seg(m)))) {
                m->next = NULL;
                __rte_mbuf_raw_free(m);
        }

If we'll exclude indirect mbuf logic, all it does:
if (rte_mbuf_refcnt_update(m, -1) == 0) {
	m->next = NULL;
	__rte_mbuf_raw_free(m);
}

I.E.:
decrement mbuf->refcnt.
If new value of refcnt is zero, then put it back into the pool.

So having ASERT(mbuf->refcnt==0) inside
__rte_mbuf_raw_free()/__rte_mbuf_raw_alloc()
looks absolutely valid to me.
I *has* to be zero at that point with current implementation,
And if it is not then we probably have (or will have) a silent memory corruption.

> 
> >> Yes, as you pointed below - that rule probably can be changed to:
> >> when mbuf is in the pool, it's refcnt should equal one, and that would probably allow us
> >> to speedup things a bit, but I suppose that's the matter of another aptch/discussion.
> >
> > Agreed.
> 
> I agree this is somehow another discussion, but let's dive into :)

OK :)

> 
> But since [1], it is allowed to call rte_mbuf_raw_alloc() in PMDs (all
> PMDs were calling an internal function before).

Yes, raw_alloc is public, NP with that.

> We could argue that
> rte_mbuf_raw_free() should also be made public for PMDs.

We could, but right now it is not.
Again, as I said, user could use it on his own but he obviously has to
obey the rules and do manually what __rte_pktmbuf_prefree_seg() does.
At least:

rte_mbuf_refcnt_set(m, 0);
__rte_mbuf_raw_free(m);

> 
> As you said below, no-one promised that the free() reverts the malloc(),
> but given the function names, one can legitimately expect that the
> following code is valid:
> 
> m = __rte_mbuf_raw_alloc();
> /* do nothing here */
> __rte_mbuf_raw_free(m);

Surely people could (and would) expect various things...
But the reality right now is: __rte_mbuf_raw_free() is an internal
function and not counterpart of __rte_mbuf_raw_alloc().
If the people don't bother to read API docs or/and the actual code,
I can't see how we can help them :)

> 
> If no code relies on having the refcnt set to 0 when a mbuf is in
> the pool, I suggest to relax this constraint as Adrien proposed.

Why not just rename it to __rte_mbuf_raw_free_dont_use_after_raw_alloc()?
To avoid any further confusion :)
Seriously speaking I would prefer to leave it as it is.
If you feel we have to introduce a counterpart of  rte_mbuf_raw_alloc(),
we can make a new public one:

rte_mbuf_raw_free(stuct rte_mbuf *m)
{
      if (rte_mbuf_refcnt_update(m, -1) == 0)
                __rte_mbuf_raw_free(m);
}

> 
> Then, my opinion is that the refcount should be set to 1 in
> rte_pktmbuf_reset().

I don't think we need to update rte_pktmbuf_reset(),
it doesn't touch refcnt at all and probably better to keep it that way.
To achieve what you suggesting, we probably need to:
1) update  _rte_pktmbuf_prefree_seg and rte_pktmbuf_detach() to
set refcnt back to 1, i.e:

static inline struct rte_mbuf* __attribute__((always_inline))
__rte_pktmbuf_prefree_seg(struct rte_mbuf *m)
{
        __rte_mbuf_sanity_check(m, 0);

        if (likely(rte_mbuf_refcnt_update(m, -1) == 0)) {
                /* if this is an indirect mbuf, it is detached. */
                if (RTE_MBUF_INDIRECT(m))
                        rte_pktmbuf_detach(m);
+	rte_mbuf_refcnt_set(m, 1);
                return m;
        }
        return NULL;
}

2) either:
   a) update mbuf constructor function, so it sets refcnt=1.
        I suppose that is easy for rte_pktmbuf_init(), but it means that all custom
        constructors should do the same.
        Which means possible changes in existing user code and all ABI change related hassle.
  b) keep rte_mbuf_raw_alloc() setting mbuf->refcnt=1.
      But then I don't see how will get any performance gain here.  

So not sure is it really worth it.

> And rte_pktmbuf_free() should not be allowed on
> an uninitialized mbuf (yes, this would require some changes in PMDs).

Not sure I understand you here...
free() wouldn not be allowed on mbuf whose recnf==0?
But it is not allowed right now anyway.

> This would open the door for bulk allocation/free in the mbuf api.

Hmm and what stops us having one right now?
As I know we do have rte_pktmbuf_alloc_bulk(), I don't see
why we can't have rte_pktmbuf_free_bulk() right now.

> 
> This could be done in several steps:
> 
> 1/ remove these assert(), add introduce a public rte_mbuf_raw_free()
> 2/ announce that rte_pktmbuf_free() won't work on uninitialized mbufs
> in a future version, and ensure that all PMD are inline with this
> requirement
> 3/ later, move refcount initialization in rte_pktmbuf_reset()
> 
> What do you think?

I still think that assert() is on a right place :)
*If* we'll change mbuf free code in a way that mbufs inside the pool
should have refcnt==1, then I think we'll change it to:
RTE_ASSERT(rte_mbuf_refcnt_read(m) == 1); 
But as I stated above, that change might cause some backward compatibility hassle: 2.a)
Or might not give us any performance gain: 2.b).

> 
> Another option is to remove the rte_mbuf_raw_alloc()/rte_mbuf_raw_free()
> and use mempool calls.

Don't see why we have to remove them...
Basically we have a bug in PMD, but instead of fixing it,
you guys suggest to change mbuf code.
Sounds a bit strange to me.
Why not just make for these PMDs to call rte_mempool_put() directly, if you are sure it is safe here?

> But having a mbuf wrapper does not seem a bad
> thing to me.

We can add some extra wrapper then, something like:
#define __RTE_MBUF_PUT_FREED(m)	(rte_mempool_put((m)->pool, m))
?

Konstantin


> 
> By the way, __rte_pktmbuf_prefree_seg() is also an internal function.
> I think we should try to update the mbuf API so that the PMDs do not
> need to call these internal functions.
> 
> 
> [1]
> http://dpdk.org/browse/dpdk/commit/?id=fbfd99551ca370266f4bfff58ce441cf5cb1203a
> 
> 
> Regards,
> Olivier
> 
> 
> >
> >>>> If the user calls __rte_mbuf_raw_free() manualy it is his responsibility to make
> >>>> sure mbuf's refcn==0.
> >>>
> >>> Sure, however what harm does it cause (besides assert() to fail), since the
> >>> allocation function sets refcount to 1?
> >>>
> >>> Why having the allocation function set the refcount if we are sure it is
> >>> already 0 (assert() proves it). Removing rte_mbuf_refcnt_set(m, 1) can
> >>> surely improve performance.
> >>
> >> That's' just an assert() enabled when MBUF_DEBUG  is on.
> >> Its sole purpose is to help troubleshoot the bugs and help to catch situations
> >> when someone silently updates mbufs supposed to be free.
> >
> > I perfectly understand and I cannot agree more with this explanation,
> > however the fact these functions are not symmetrical remains an issue that
> > needs to be addressed somehow in my opinion.
> >
> >>>> BTW, why are you doing it?
> >>>> The comment clearly states that the function is for internal use:
> >>>> /**
> >>>>  * @internal Put mbuf back into its original mempool.
> >>>>  * The use of that function is reserved for RTE internal needs.
> >>>>  * Please use rte_pktmbuf_free().
> >>>>  *
> >>>>  * @param m
> >>>>  *   The mbuf to be freed.
> >>>>  */
> >>>> static inline void __attribute__((always_inline))
> >>>> __rte_mbuf_raw_free(struct rte_mbuf *m)
> >>>
> >>> Several PMDs are using it anyway (won't name names, but I know one of them
> >>> quite well).
> >>
> >> Then it probably is a bug in these PMDs that need to be fixed.
> >>
> >>> I chose to modify this code instead of its users for the
> >>> following reasons:
> >>>
> >>> - Considering their names, these functions should be opposites and able to
> >>>   work together like malloc()/free().
> >>
> >> These are internal functions.
> >> Comments in mbuf clearly state that library users shouldn't call them directly.
> >> They are written to fit internal librte_mbuf needs, and no-one ever promised
> >> malloc/free() compatibility here.
> >
> > So it cannot be provided for the sake of not providing it or is there a good
> > reason?
> >
> > What I meant is that since PMDs already made the mistake of using these
> > functions to benefit from the improved performance, DPDK being all about
> > performance and stuff, let them use it as intended. Perhaps we should drop
> > those "__" like for rte_mbuf_raw_alloc().
> >
> >>>
> >>> - PMDs are relying on these functions for performance reasons, we can assume
> >>>   they took the extra care necessary to make sure it would work properly.
> >>
> >> That just doesn't seem correct to me.
> >> The proper way to do free fo mbuf segment is:
> >>
> >> static inline void __attribute__((always_inline))
> >> rte_pktmbuf_free_seg(struct rte_mbuf *m)
> >> {
> >>         if (likely(NULL != (m = __rte_pktmbuf_prefree_seg(m)))) {
> >>                 m->next = NULL;
> >>                 __rte_mbuf_raw_free(m);
> >>         }
> >> }
> >>
> >> If by some reason you choose not to use this function, then it is your
> >> responsibility to perform similar actions on your own before pushing mbuf into the pool.
> >> That's what some TX functions for some Intel NICs do to improve performance:
> >> they call _prefree_seg() manually and try to put mbufs into the pool in groups.
> >
> > Not anymore it seems, but in the current code base both ena and mpipe PMDs
> > (soon mlx5 as well) seem to get this wrong.
> >
> >>> - Preventing it would make these PMDs slower and is not acceptable either.
> >>
> >> I can hardly imagine that __rte_pktmbuf_prefree_seg() impact would be that severe...
> >> But ok, probably  you do have some very specific case, but then why you PMD just doesn't call:
> >> rte_mempool_put(m->pool, m);
> >> directly?
> >
> > To survive the upcoming transition to mbufs backed by libc malloc() without
> > having to fix them? Joke aside, I guess the reason is to use functions with
> > "mbuf" in their names when dealing with mbufs.
> >
> >> Why instead you choose to change common functions and compromise
> >> librte_mbuf debug ability?
> >
> > No, I'm fine with keeping the debug ability, however I did not find a way to
> > both keep it and fix the consistency issue without breaking something
> > (performance or refcount assumptions I'm not familiar with elsewhere).
> >
> >>> What remains is the consistency issue, I think these statements were only
> >>> added to catch multiple frees,
> >>
> >> Yes these asserts() here to help catch bugs,
> >> and I think it is a good thing to have them here.
> >>
> >>> and those should be caught at a higher
> >>> level, where other consistency checks are also performed.
> >>
> >> Like where?
> >
> > Possibly rte_pktmbuf_free_seg().
> >
> >>>>> Signed-off-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
> >>>>> ---
> >>>>>  lib/librte_mbuf/rte_mbuf.h | 2 --
> >>>>>  1 file changed, 2 deletions(-)
> >>>>>
> >>>>> diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
> >>>>> index 11fa06d..7070bb8 100644
> >>>>> --- a/lib/librte_mbuf/rte_mbuf.h
> >>>>> +++ b/lib/librte_mbuf/rte_mbuf.h
> >>>>> @@ -1108,7 +1108,6 @@ static inline struct rte_mbuf *rte_mbuf_raw_alloc(struct rte_mempool *mp)
> >>>>>  	if (rte_mempool_get(mp, &mb) < 0)
> >>>>>  		return NULL;
> >>>>>  	m = (struct rte_mbuf *)mb;
> >>>>> -	RTE_ASSERT(rte_mbuf_refcnt_read(m) == 0);
> >>>>>  	rte_mbuf_refcnt_set(m, 1);
> >>>>>  	__rte_mbuf_sanity_check(m, 0);
> >>>>>
> >>>>> @@ -1133,7 +1132,6 @@ __rte_mbuf_raw_alloc(struct rte_mempool *mp)
> >>>>>  static inline void __attribute__((always_inline))
> >>>>>  __rte_mbuf_raw_free(struct rte_mbuf *m)
> >>>>>  {
> >>>>> -	RTE_ASSERT(rte_mbuf_refcnt_read(m) == 0);
> >>>>>  	rte_mempool_put(m->pool, m);
> >>>>>  }
> >>>>>
> >>>>> --
> >>>>> 2.1.4
> >>>>
> >>>
> >>> --
> >>> Adrien Mazarguil
> >>> 6WIND
> >

Olivier Matz June 9, 2016, 7:46 a.m. UTC | #7

Hi Konstantin,

>>>>>> Yes, it refcnt supposed to be set to 0 by __rte_pktmbuf_prefree_seg().
>>>>>> Wright now, it is a user responsibility to make sure refcnt==0 before pushing
>>>>>> mbuf back to the pool.
>>>>>> Not sure why do you consider that wrong?
>>>>>
>>>>> I do not consider this wrong and I'm all for using assert() to catch
>>>>> programming errors, however in this specific case, I think they are
>>>>> inconsistent and misleading.
>>>>
>>>> Honestly, I don't understand why.
>>>> Right now the rule of thumb is - when mbuf is in the pool, it's refcnt should be equal zero.
>>
>> What is the purpose of this? Is there some code relying on this?
> 
> The whole current implementation of mbuf_free code path relies on that.
> Straight here: 
> if (likely(NULL != (m = __rte_pktmbuf_prefree_seg(m)))) {
>                 m->next = NULL;
>                 __rte_mbuf_raw_free(m);
>         }
> 
> If we'll exclude indirect mbuf logic, all it does:
> if (rte_mbuf_refcnt_update(m, -1) == 0) {
> 	m->next = NULL;
> 	__rte_mbuf_raw_free(m);
> }
> 
> I.E.:
> decrement mbuf->refcnt.
> If new value of refcnt is zero, then put it back into the pool.
> 
> So having ASERT(mbuf->refcnt==0) inside
> __rte_mbuf_raw_free()/__rte_mbuf_raw_alloc()
> looks absolutely valid to me.
> I *has* to be zero at that point with current implementation,
> And if it is not then we probably have (or will have) a silent memory corruption.

This explains how the refcount is used, and why it is set
to zero before returning the mbuf to the pool with the mbuf
free functions.

It does not explain which code relies on the refcnt beeing 0
while the mbuf is in the pool.


>> But since [1], it is allowed to call rte_mbuf_raw_alloc() in PMDs (all
>> PMDs were calling an internal function before).
> 
> Yes, raw_alloc is public, NP with that.
> 
>> We could argue that
>> rte_mbuf_raw_free() should also be made public for PMDs.
> 
> We could, but right now it is not.
> Again, as I said, user could use it on his own but he obviously has to
> obey the rules and do manually what __rte_pktmbuf_prefree_seg() does.
> At least:
> 
> rte_mbuf_refcnt_set(m, 0);
> __rte_mbuf_raw_free(m);
> 
>>
>> As you said below, no-one promised that the free() reverts the malloc(),
>> but given the function names, one can legitimately expect that the
>> following code is valid:
>>
>> m = __rte_mbuf_raw_alloc();
>> /* do nothing here */
>> __rte_mbuf_raw_free(m);
> 
> Surely people could (and would) expect various things...
> But the reality right now is: __rte_mbuf_raw_free() is an internal
> function and not counterpart of __rte_mbuf_raw_alloc().
> If the people don't bother to read API docs or/and the actual code,
> I can't see how we can help them :)

Yes, of course, people should read the doc.
This does not prevent to have a nice API that behaves in a
natural way :)

By the way, the fact that today the mbuf refcnt should be 0 while
in a pool is not in the api doc, but in the code.

>> If no code relies on having the refcnt set to 0 when a mbuf is in
>> the pool, I suggest to relax this constraint as Adrien proposed.
> 
> Why not just rename it to __rte_mbuf_raw_free_dont_use_after_raw_alloc()?
> To avoid any further confusion :)
> Seriously speaking I would prefer to leave it as it is.
> If you feel we have to introduce a counterpart of  rte_mbuf_raw_alloc(),
> we can make a new public one:
> 
> rte_mbuf_raw_free(stuct rte_mbuf *m)
> {
>       if (rte_mbuf_refcnt_update(m, -1) == 0)
>                 __rte_mbuf_raw_free(m);
> }

This is an option, but I think it's not efficient to touch
the mbuf structure when allocating/freeing. See below.

>> Then, my opinion is that the refcount should be set to 1 in
>> rte_pktmbuf_reset().
> 
> I don't think we need to update rte_pktmbuf_reset(),
> it doesn't touch refcnt at all and probably better to keep it that way.

Why would it be better?
All mbuf struct initializations are done in that function, why would
it be different for the refcnt?

> To achieve what you suggesting, we probably need to:
> 1) update  _rte_pktmbuf_prefree_seg and rte_pktmbuf_detach() to
> set refcnt back to 1, i.e:
> 
> static inline struct rte_mbuf* __attribute__((always_inline))
> __rte_pktmbuf_prefree_seg(struct rte_mbuf *m)
> {
>         __rte_mbuf_sanity_check(m, 0);
> 
>         if (likely(rte_mbuf_refcnt_update(m, -1) == 0)) {
>                 /* if this is an indirect mbuf, it is detached. */
>                 if (RTE_MBUF_INDIRECT(m))
>                         rte_pktmbuf_detach(m);
> +	rte_mbuf_refcnt_set(m, 1);
>                 return m;
>         }
>         return NULL;
> }
> 
> 2) either:
>    a) update mbuf constructor function, so it sets refcnt=1.
>         I suppose that is easy for rte_pktmbuf_init(), but it means that all custom
>         constructors should do the same.
>         Which means possible changes in existing user code and all ABI change related hassle.
>   b) keep rte_mbuf_raw_alloc() setting mbuf->refcnt=1.
>       But then I don't see how will get any performance gain here.  
> 
> So not sure is it really worth it.
> 
>> And rte_pktmbuf_free() should not be allowed on
>> an uninitialized mbuf (yes, this would require some changes in PMDs).
> 
> Not sure I understand you here...
> free() wouldn not be allowed on mbuf whose recnf==0?
> But it is not allowed right now anyway.

Today:

  /* allowed */
  m = rte_pktmbuf_alloc();
  rte_pktmbuf_free(m);

  /* not allowed */
  m = rte_mbuf_raw_alloc();
  __rte_mbuf_raw_free(m);

  /* we should do instead (strange): */
  m = rte_mbuf_raw_alloc();
  rte_pktmbuf_free(m);

What I suggest to have:

  /* allowed, no change */
  m = rte_pktmbuf_alloc();
  rte_pktmbuf_free(m);

  /* allowed, these functions would be symetrical */
  m = rte_mbuf_raw_alloc();
  rte_mbuf_raw_free(m);

  /* not allowed, m->refcnt is uninitialized */
  m = rte_mbuf_raw_alloc();
  rte_pktmbuf_free(m);



> 
>> This would open the door for bulk allocation/free in the mbuf api.
> 
> Hmm and what stops us having one right now?
> As I know we do have rte_pktmbuf_alloc_bulk(), I don't see
> why we can't have rte_pktmbuf_free_bulk() right now.

I think, not touching the refcnt on raw allocation allows a PMD to
do the following:

  rte_mbuf_allow_raw_bulk(pool, &mbufs, n)
  for (i = 0; i < n; i++) {
    prefetch(m[i+1]);
    rte_pktmbuf_reset(); /* including refcnt = 1 */
    do_stuff(m);
  }
  /* if all mbufs are not used, free the remaining */
  rte_mbuf_free_raw_bulk(&mbufs[i], n-i);

In that case, we can prefetch mbufs before using them, and
we can free the unused mbufs without touching the structure.
It looks to be a good advantage.

Yes, we can do that with mempool functions. But I feel having a mbuf
API with type checking is better.

>> This could be done in several steps:
>>
>> 1/ remove these assert(), add introduce a public rte_mbuf_raw_free()
>> 2/ announce that rte_pktmbuf_free() won't work on uninitialized mbufs
>> in a future version, and ensure that all PMD are inline with this
>> requirement
>> 3/ later, move refcount initialization in rte_pktmbuf_reset()
>>
>> What do you think?
> 
> I still think that assert() is on a right place :)
> *If* we'll change mbuf free code in a way that mbufs inside the pool
> should have refcnt==1, then I think we'll change it to:
> RTE_ASSERT(rte_mbuf_refcnt_read(m) == 1); 
> But as I stated above, that change might cause some backward compatibility hassle: 2.a)
> Or might not give us any performance gain: 2.b).

I suggest instead to have no constraint on the value of the refcnt
in the pool.

>> Another option is to remove the rte_mbuf_raw_alloc()/rte_mbuf_raw_free()
>> and use mempool calls.
> 
> Don't see why we have to remove them...
> Basically we have a bug in PMD, but instead of fixing it,
> you guys suggest to change mbuf code.
> Sounds a bit strange to me.
> Why not just make for these PMDs to call rte_mempool_put() directly, if you are sure it is safe here?

Yes, there are some bugs in the PMDs regarding the refcnt value when
compiled with MBUF_DEBUG=y. By the way, looking at the i40e code, we
have:

  i40e_alloc_rx_queue_mbufs()
  {
    for (...) {
       struct rte_mbuf *mbuf = rte_mbuf_raw_alloc(rxq->mp);
       ...
       rte_mbuf_refcnt_set(mbuf, 1);  /* not needed */
     ...
  }

  i40e_tx_free_bufs()
  {
    ...
    if (txq->txq_flags & (uint32_t)ETH_TXQ_FLAGS_NOREFCOUNT) {
      for (...) {
         /* returning a mbuf with refcnt=1 */
         rte_mempool_put(txep->mbuf->pool, txep->mbuf);
   ...

The fact that we can find many bugs related to that in PMDs is a
sign that the API is not understandable enough.

The point is not to fix the bugs in PMDs. I think the point is
to enhance the mbuf API.

Hope I have convinced you ;)

> 
>> But having a mbuf wrapper does not seem a bad
>> thing to me.
> 
> We can add some extra wrapper then, something like:
> #define __RTE_MBUF_PUT_FREED(m)	(rte_mempool_put((m)->pool, m))
> ?

I think a wrapper should do the type checking (struct mbuf).

Thanks for this exchange.
Regards,
Olivier

Ananyev, Konstantin June 9, 2016, 1:21 p.m. UTC | #8

Hi Olivier,

> -----Original Message-----
> From: Olivier Matz [mailto:olivier.matz@6wind.com]
> Sent: Thursday, June 09, 2016 8:47 AM
> To: Ananyev, Konstantin; dev@dpdk.org; Adrien Mazarguil
> Subject: Re: [dpdk-dev] [PATCH] mbuf: remove inconsistent assert statements
> 
> Hi Konstantin,
> 
> >>>>>> Yes, it refcnt supposed to be set to 0 by __rte_pktmbuf_prefree_seg().
> >>>>>> Wright now, it is a user responsibility to make sure refcnt==0 before pushing
> >>>>>> mbuf back to the pool.
> >>>>>> Not sure why do you consider that wrong?
> >>>>>
> >>>>> I do not consider this wrong and I'm all for using assert() to catch
> >>>>> programming errors, however in this specific case, I think they are
> >>>>> inconsistent and misleading.
> >>>>
> >>>> Honestly, I don't understand why.
> >>>> Right now the rule of thumb is - when mbuf is in the pool, it's refcnt should be equal zero.
> >>
> >> What is the purpose of this? Is there some code relying on this?
> >
> > The whole current implementation of mbuf_free code path relies on that.
> > Straight here:
> > if (likely(NULL != (m = __rte_pktmbuf_prefree_seg(m)))) {
> >                 m->next = NULL;
> >                 __rte_mbuf_raw_free(m);
> >         }
> >
> > If we'll exclude indirect mbuf logic, all it does:
> > if (rte_mbuf_refcnt_update(m, -1) == 0) {
> > 	m->next = NULL;
> > 	__rte_mbuf_raw_free(m);
> > }
> >
> > I.E.:
> > decrement mbuf->refcnt.
> > If new value of refcnt is zero, then put it back into the pool.
> >
> > So having ASERT(mbuf->refcnt==0) inside
> > __rte_mbuf_raw_free()/__rte_mbuf_raw_alloc()
> > looks absolutely valid to me.
> > I *has* to be zero at that point with current implementation,
> > And if it is not then we probably have (or will have) a silent memory corruption.
> 
> This explains how the refcount is used, and why it is set
> to zero before returning the mbuf to the pool with the mbuf
> free functions.

From my point, that shows that rte_pktmbuf_free() relies on the value of the refcnt
to make a decision is it ok to put mbuf back to the pool or not.
Right now it puts mbuf to the pool *only* if it's refcnt==0.
As discussed below, we probably can change it to be refcnt==1
(if there really would be noticeable performance gain).
But I think it still should be just one predefined value of refcnt (0 or 1).
In theory it is possible to allow both (0 and 1),
but that will make it hard to debug any alloc/free issues,
plus would neglect any possible performance gain -
as in that case raw_alloc (or it's analog) should still do
mbuf->refcnt=1; 

> 
> It does not explain which code relies on the refcnt beeing 0
> while the mbuf is in the pool.
> 
> 
> >> But since [1], it is allowed to call rte_mbuf_raw_alloc() in PMDs (all
> >> PMDs were calling an internal function before).
> >
> > Yes, raw_alloc is public, NP with that.
> >
> >> We could argue that
> >> rte_mbuf_raw_free() should also be made public for PMDs.
> >
> > We could, but right now it is not.
> > Again, as I said, user could use it on his own but he obviously has to
> > obey the rules and do manually what __rte_pktmbuf_prefree_seg() does.
> > At least:
> >
> > rte_mbuf_refcnt_set(m, 0);
> > __rte_mbuf_raw_free(m);
> >
> >>
> >> As you said below, no-one promised that the free() reverts the malloc(),
> >> but given the function names, one can legitimately expect that the
> >> following code is valid:
> >>
> >> m = __rte_mbuf_raw_alloc();
> >> /* do nothing here */
> >> __rte_mbuf_raw_free(m);
> >
> > Surely people could (and would) expect various things...
> > But the reality right now is: __rte_mbuf_raw_free() is an internal
> > function and not counterpart of __rte_mbuf_raw_alloc().
> > If the people don't bother to read API docs or/and the actual code,
> > I can't see how we can help them :)
> 
> Yes, of course, people should read the doc.
> This does not prevent to have a nice API that behaves in a
> natural way :)
> 
> By the way, the fact that today the mbuf refcnt should be 0 while
> in a pool is not in the api doc, but in the code.

Ok, I admit there is a bug in the docs, let's add this line to the PG and fix it :)

> 
> >> If no code relies on having the refcnt set to 0 when a mbuf is in
> >> the pool, I suggest to relax this constraint as Adrien proposed.
> >
> > Why not just rename it to __rte_mbuf_raw_free_dont_use_after_raw_alloc()?
> > To avoid any further confusion :)
> > Seriously speaking I would prefer to leave it as it is.
> > If you feel we have to introduce a counterpart of  rte_mbuf_raw_alloc(),
> > we can make a new public one:
> >
> > rte_mbuf_raw_free(stuct rte_mbuf *m)
> > {
> >       if (rte_mbuf_refcnt_update(m, -1) == 0)
> >                 __rte_mbuf_raw_free(m);
> > }
> 
> This is an option, but I think it's not efficient to touch
> the mbuf structure when allocating/freeing. See below.

I don't think it is totally avoidable for generic case anyway.

> 
> >> Then, my opinion is that the refcount should be set to 1 in
> >> rte_pktmbuf_reset().
> >
> > I don't think we need to update rte_pktmbuf_reset(),
> > it doesn't touch refcnt at all and probably better to keep it that way.
> 
> Why would it be better?

First because pktmbuf_reset() is not very efficient one, and many PMDs
avoid to use it.
Second I think managing refcnt is more part of alloc/free then init procedure.

> All mbuf struct initializations are done in that function, why would
> it be different for the refcnt?
> 
> > To achieve what you suggesting, we probably need to:
> > 1) update  _rte_pktmbuf_prefree_seg and rte_pktmbuf_detach() to
> > set refcnt back to 1, i.e:
> >
> > static inline struct rte_mbuf* __attribute__((always_inline))
> > __rte_pktmbuf_prefree_seg(struct rte_mbuf *m)
> > {
> >         __rte_mbuf_sanity_check(m, 0);
> >
> >         if (likely(rte_mbuf_refcnt_update(m, -1) == 0)) {
> >                 /* if this is an indirect mbuf, it is detached. */
> >                 if (RTE_MBUF_INDIRECT(m))
> >                         rte_pktmbuf_detach(m);
> > +	rte_mbuf_refcnt_set(m, 1);
> >                 return m;
> >         }
> >         return NULL;
> > }
> >
> > 2) either:
> >    a) update mbuf constructor function, so it sets refcnt=1.
> >         I suppose that is easy for rte_pktmbuf_init(), but it means that all custom
> >         constructors should do the same.
> >         Which means possible changes in existing user code and all ABI change related hassle.
> >   b) keep rte_mbuf_raw_alloc() setting mbuf->refcnt=1.
> >       But then I don't see how will get any performance gain here.
> >
> > So not sure is it really worth it.
> >
> >> And rte_pktmbuf_free() should not be allowed on
> >> an uninitialized mbuf (yes, this would require some changes in PMDs).
> >
> > Not sure I understand you here...
> > free() wouldn not be allowed on mbuf whose recnf==0?
> > But it is not allowed right now anyway.
> 
> Today:
> 
>   /* allowed */
>   m = rte_pktmbuf_alloc();
>   rte_pktmbuf_free(m);
> 
>   /* not allowed */
>   m = rte_mbuf_raw_alloc();
>   __rte_mbuf_raw_free(m);
> 
>   /* we should do instead (strange): */
>   m = rte_mbuf_raw_alloc();
>   rte_pktmbuf_free(m);
> 
> What I suggest to have:
> 
>   /* allowed, no change */
>   m = rte_pktmbuf_alloc();
>   rte_pktmbuf_free(m);
> 
>   /* allowed, these functions would be symetrical */
>   m = rte_mbuf_raw_alloc();
>   rte_mbuf_raw_free(m);
> 
>   /* not allowed, m->refcnt is uninitialized */
>   m = rte_mbuf_raw_alloc();
>   rte_pktmbuf_free(m);

Hmm, and what it will buy us (except of symmetry)?

> 
> 
> 
> >
> >> This would open the door for bulk allocation/free in the mbuf api.
> >
> > Hmm and what stops us having one right now?
> > As I know we do have rte_pktmbuf_alloc_bulk(), I don't see
> > why we can't have rte_pktmbuf_free_bulk() right now.
> 
> I think, not touching the refcnt on raw allocation allows a PMD to
> do the following:
> 
>   rte_mbuf_allow_raw_bulk(pool, &mbufs, n)
>   for (i = 0; i < n; i++) {
>     prefetch(m[i+1]);
>     rte_pktmbuf_reset(); /* including refcnt = 1 */
>     do_stuff(m);
>   }

You can do that part right now.
That what many PMDs do, just function names are a bit different:

rte_mempool_get_bulk(pool, &mbufs, n);
for (i = 0; i < n; i++) {
      custom_rte_pktmbuf_fill(); /* including refcnt = 1 */
      do_stuff(m);
}

>   /* if all mbufs are not used, free the remaining */
>   rte_mbuf_free_raw_bulk(&mbufs[i], n-i);

Ok and what be a performance critical use-case for that?
As I know, the only possible cases inside PMD when it allocates & free same
mbuf internally are:
- error handling, queue stop (which usually not that performance critical).
- situation when you have cut crc bytes and they are in separate segment
(again a corner case).
What is the use case you have in your mind for that?

> 
> In that case, we can prefetch mbufs before using them,

You can do that now, look at rte_pktmbuf_alloc_bulk().
Nothing stops you to add prefetch() here, if it will be really faster -
extra parameter for rte_pktmbuf_alloc_bulk() or new function rte_pktmbuf_alloc_bulk_prefetch().

> and
> we can free the unused mbufs without touching the structure.

In many case this is not possible anyway:
you can't guarantee that all mbufs you get from upper layer are from the same pool,
and they all have refcnt==1.
There are probably some custom cases when you can safely assume that,
but I am not sure we should restructure all mbuf API for that particular case.
In such situations some custom function should do.

> It looks to be a good advantage.
> 
> Yes, we can do that with mempool functions. But I feel having a mbuf
> API with type checking is better.
> 
> >> This could be done in several steps:
> >>
> >> 1/ remove these assert(), add introduce a public rte_mbuf_raw_free()
> >> 2/ announce that rte_pktmbuf_free() won't work on uninitialized mbufs
> >> in a future version, and ensure that all PMD are inline with this
> >> requirement
> >> 3/ later, move refcount initialization in rte_pktmbuf_reset()
> >>
> >> What do you think?
> >
> > I still think that assert() is on a right place :)
> > *If* we'll change mbuf free code in a way that mbufs inside the pool
> > should have refcnt==1, then I think we'll change it to:
> > RTE_ASSERT(rte_mbuf_refcnt_read(m) == 1);
> > But as I stated above, that change might cause some backward compatibility hassle: 2.a)
> > Or might not give us any performance gain: 2.b).
> 
> I suggest instead to have no constraint on the value of the refcnt
> in the pool.
> 
> >> Another option is to remove the rte_mbuf_raw_alloc()/rte_mbuf_raw_free()
> >> and use mempool calls.
> >
> > Don't see why we have to remove them...
> > Basically we have a bug in PMD, but instead of fixing it,
> > you guys suggest to change mbuf code.
> > Sounds a bit strange to me.
> > Why not just make for these PMDs to call rte_mempool_put() directly, if you are sure it is safe here?
> 
> Yes, there are some bugs in the PMDs regarding the refcnt value when
> compiled with MBUF_DEBUG=y.

Ok, then I suggest we fix them first, then start to discuss possible mbuf optimisations.
It is really strange when invalid usage of generic API in some submodules drives the generic API change.

> By the way, looking at the i40e code, we
> have:
> 
>   i40e_alloc_rx_queue_mbufs()
>   {
>     for (...) {
>        struct rte_mbuf *mbuf = rte_mbuf_raw_alloc(rxq->mp);
>        ...
>        rte_mbuf_refcnt_set(mbuf, 1);  /* not needed */
>      ...
>   }

Yes, I agree it is unnecessary here, and could be safely removed,
but how it relates to our discussion?

> 
>   i40e_tx_free_bufs()
>   {
>     ...
>     if (txq->txq_flags & (uint32_t)ETH_TXQ_FLAGS_NOREFCOUNT) {
>       for (...) {
>          /* returning a mbuf with refcnt=1 */
>          rte_mempool_put(txep->mbuf->pool, txep->mbuf);
>    ...

Hmm, from what I see that code is not correct anyway -
as we still can't guarantee that all mbufs we TX-ed are from the same pool.
So I think that code has to be removed.
Probably i40e maintainers have a better explanation why it is here.
Again, personally I think the ETH_TXQ_FLAGS_NOREFCOUNT flag should
also be removed (or deprecated) - I think it was left here from old days.

> 
> The fact that we can find many bugs related to that in PMDs is a
> sign that the API is not understandable enough.

For me it is just a sign that we need put a better effort in writing/reviewing the code.

> 
> The point is not to fix the bugs in PMDs. I think the point is
> to enhance the mbuf API.

As I said - I think bugs in PMDs shouldn't drive API changes :)
They need to be fixed first.

BTW, I looked who is using __rte_mbuf_raw_free() in the current DPDK code.
I found only two instances:

1) drivers/net/mpipe/mpipe_tilegx.c:
Inside mpipe_recv_flush_stack(struct mpipe_dev_priv *priv)
As I can see that function does set mbuf->refcnt=1 before calling __raw_free.
So we should be fine here, I think.
  
2) drivers/net/ena/ena_ethdev.c
Inside ena_rx_queue_release_bufs().
Again that one seems ok, as I can see from the code:
ena PMD calls  rte_mempool_get_bulk() and keep mbuf->refcnt==0
till either:
-  that mbuf is freed by ena_rx_queue_release_bufs() 
- or filled by eth_ena_recv_pkts() - at that stage mbuf->recnt is set to 1.

So another question is what PMD is failing?

> 
> Hope I have convinced you ;)
> 
> >
> >> But having a mbuf wrapper does not seem a bad
> >> thing to me.
> >
> > We can add some extra wrapper then, something like:
> > #define __RTE_MBUF_PUT_FREED(m)	(rte_mempool_put((m)->pool, m))
> > ?
> 
> I think a wrapper should do the type checking (struct mbuf).

Sure, we can have a wrapper inline function, this is just an example.

> 
> Thanks for this exchange.
> Regards,
> Olivier

Bruce Richardson June 9, 2016, 2:19 p.m. UTC | #9

On Thu, Jun 09, 2016 at 01:21:18PM +0000, Ananyev, Konstantin wrote:
> Hi Olivier,
> 
> > -----Original Message-----
> > From: Olivier Matz [mailto:olivier.matz@6wind.com]
> > Sent: Thursday, June 09, 2016 8:47 AM
> > To: Ananyev, Konstantin; dev@dpdk.org; Adrien Mazarguil
> > Subject: Re: [dpdk-dev] [PATCH] mbuf: remove inconsistent assert statements
> > 
> > Hi Konstantin,
> > 
> > >>>>>> Yes, it refcnt supposed to be set to 0 by __rte_pktmbuf_prefree_seg().
> > >>>>>> Wright now, it is a user responsibility to make sure refcnt==0 before pushing
> > >>>>>> mbuf back to the pool.
> > >>>>>> Not sure why do you consider that wrong?
> > >>>>>
> > >>>>> I do not consider this wrong and I'm all for using assert() to catch
> > >>>>> programming errors, however in this specific case, I think they are
> > >>>>> inconsistent and misleading.
> > >>>>
> > >>>> Honestly, I don't understand why.
> > >>>> Right now the rule of thumb is - when mbuf is in the pool, it's refcnt should be equal zero.
> > >>
> > >> What is the purpose of this? Is there some code relying on this?
> > >
> > > The whole current implementation of mbuf_free code path relies on that.
> > > Straight here:
> > > if (likely(NULL != (m = __rte_pktmbuf_prefree_seg(m)))) {
> > >                 m->next = NULL;
> > >                 __rte_mbuf_raw_free(m);
> > >         }
> > >
> > > If we'll exclude indirect mbuf logic, all it does:
> > > if (rte_mbuf_refcnt_update(m, -1) == 0) {
> > > 	m->next = NULL;
> > > 	__rte_mbuf_raw_free(m);
> > > }
> > >
> > > I.E.:
> > > decrement mbuf->refcnt.
> > > If new value of refcnt is zero, then put it back into the pool.
> > >
> > > So having ASERT(mbuf->refcnt==0) inside
> > > __rte_mbuf_raw_free()/__rte_mbuf_raw_alloc()
> > > looks absolutely valid to me.
> > > I *has* to be zero at that point with current implementation,
> > > And if it is not then we probably have (or will have) a silent memory corruption.
> > 
> > This explains how the refcount is used, and why it is set
> > to zero before returning the mbuf to the pool with the mbuf
> > free functions.
> 
> From my point, that shows that rte_pktmbuf_free() relies on the value of the refcnt
> to make a decision is it ok to put mbuf back to the pool or not.
> Right now it puts mbuf to the pool *only* if it's refcnt==0.
> As discussed below, we probably can change it to be refcnt==1
> (if there really would be noticeable performance gain).
> But I think it still should be just one predefined value of refcnt (0 or 1).
> In theory it is possible to allow both (0 and 1),
> but that will make it hard to debug any alloc/free issues,
> plus would neglect any possible performance gain -
> as in that case raw_alloc (or it's analog) should still do
> mbuf->refcnt=1; 
> 

I think enforcing the rule of the refcnt being 1 inside the mempool is reasonable.
It saves setting the flag to zero and back to one again in the normal case. The
one case, that I can think of, where it does cause extra work is when two
threads simultaneously do the atomic decrement of the refcnt and one of them
gets zero and must free the buffer. However, in that case, the additional cost
of setting the count back to 1 on free is far less than the penalty paid on the
atomic decrement - and this should definitely be an edge-case anyway, since it
requires the two threads to free the same mbuf at the same time.

If we make this change, it also may open the possibility to look at moving the
refcount to cacheline 1 if we are reorganising the mbuf. It would no longer be
needed inside RX in our PMDs, and the path to free mbufs already uses the pool
pointer from the second cache-line anyway. That would free up 16-bits of space
to expand out the nb_segs and port values to 16-bit each if we wanted.

Regards,
/Bruce

Thomas Monjalon June 9, 2016, 3:27 p.m. UTC | #10

2016-06-09 13:21, Ananyev, Konstantin:
> From: Olivier Matz [mailto:olivier.matz@6wind.com]
> > Today:
> > 
> >   /* allowed */
> >   m = rte_pktmbuf_alloc();
> >   rte_pktmbuf_free(m);
> > 
> >   /* not allowed */
> >   m = rte_mbuf_raw_alloc();
> >   __rte_mbuf_raw_free(m);
> > 
> >   /* we should do instead (strange): */
> >   m = rte_mbuf_raw_alloc();
> >   rte_pktmbuf_free(m);
> > 
> > What I suggest to have:
> > 
> >   /* allowed, no change */
> >   m = rte_pktmbuf_alloc();
> >   rte_pktmbuf_free(m);
> > 
> >   /* allowed, these functions would be symetrical */
> >   m = rte_mbuf_raw_alloc();
> >   rte_mbuf_raw_free(m);
> > 
> >   /* not allowed, m->refcnt is uninitialized */
> >   m = rte_mbuf_raw_alloc();
> >   rte_pktmbuf_free(m);
> 
> Hmm, and what it will buy us (except of symmetry)?

API consistency is important.
It is a matter of making our users confident in DPDK.

I would not like to use a library where I need to check in the doc
for each function because I don't remember and I'm not confident
that the function fish() do some fishing or hunting.

Ananyev, Konstantin June 9, 2016, 3:45 p.m. UTC | #11

> -----Original Message-----
> From: Thomas Monjalon [mailto:thomas.monjalon@6wind.com]
> Sent: Thursday, June 09, 2016 4:28 PM
> To: Ananyev, Konstantin
> Cc: dev@dpdk.org; Olivier Matz; Adrien Mazarguil; Zhang, Helin
> Subject: Re: [dpdk-dev] [PATCH] mbuf: remove inconsistent assert statements
> 
> 2016-06-09 13:21, Ananyev, Konstantin:
> > From: Olivier Matz [mailto:olivier.matz@6wind.com]
> > > Today:
> > >
> > >   /* allowed */
> > >   m = rte_pktmbuf_alloc();
> > >   rte_pktmbuf_free(m);
> > >
> > >   /* not allowed */
> > >   m = rte_mbuf_raw_alloc();
> > >   __rte_mbuf_raw_free(m);
> > >
> > >   /* we should do instead (strange): */
> > >   m = rte_mbuf_raw_alloc();
> > >   rte_pktmbuf_free(m);
> > >
> > > What I suggest to have:
> > >
> > >   /* allowed, no change */
> > >   m = rte_pktmbuf_alloc();
> > >   rte_pktmbuf_free(m);
> > >
> > >   /* allowed, these functions would be symetrical */
> > >   m = rte_mbuf_raw_alloc();
> > >   rte_mbuf_raw_free(m);
> > >
> > >   /* not allowed, m->refcnt is uninitialized */
> > >   m = rte_mbuf_raw_alloc();
> > >   rte_pktmbuf_free(m);
> >
> > Hmm, and what it will buy us (except of symmetry)?
> 
> API consistency is important.
> It is a matter of making our users confident in DPDK.
> 
> I would not like to use a library where I need to check in the doc
> for each function because I don't remember and I'm not confident
> that the function fish() do some fishing or hunting.

As you remember, the whole story started when people used
'internal mbuf function' inside their code and then
figured out that it doesn't work as they expect :)

But as I said, if everyone are that desperate about symmetry, 
we can just create a new public function:

void
rte_mbuf_raw_free(stuct rte_mbuf *m)
{
      if (rte_mbuf_refcnt_update(m, -1) == 0)
                __rte_mbuf_raw_free(m);
}

That would be 'symmetric' to rte_mbuf_raw_alloc(),
and all three combinations above would be allowed.
Konstantin

don provan June 9, 2016, 6:42 p.m. UTC | #12

> -----Original Message-----
> From: Ananyev, Konstantin [mailto:konstantin.ananyev@intel.com] 
> Sent: Thursday, June 09, 2016 8:45 AM
> To: Thomas Monjalon <thomas.monjalon@6wind.com>
> Cc: dev@dpdk.org; Olivier Matz <olivier.matz@6wind.com>; Adrien Mazarguil <adrien.mazarguil@6wind.com>; Zhang, Helin <helin.zhang@intel.com>
> Subject: Re: [dpdk-dev] [PATCH] mbuf: remove inconsistent assert statements
>...
> But as I said, if everyone are that desperate about symmetry, we can just create a new public function:
>
> void
> rte_mbuf_raw_free(stuct rte_mbuf *m)
> {
>       if (rte_mbuf_refcnt_update(m, -1) == 0)
>                 __rte_mbuf_raw_free(m);
> }

Well, if you're going to do that, let's recognize that rte_mbuf_raw_free() was misnamed:
it doesn't free an mbuf, it returns an mbuf that's already free back to the pool. So instead
of "__rte_mbuf_raw_free", I'd call it "rte_mbuf_raw_release".

Logically I like this solution, as I'm very uncomfortable with the idea that a free mbuf
might sometimes have refcnt set to one. On the other hand, if mbufs can often be freed
without touching refcnt, that could be a big win that might persuade me to accept being
uncomfortable.

-don provan
dprovan@bivio.net

Adrien Mazarguil June 20, 2016, 1:49 p.m. UTC | #13

On Wed, Jun 08, 2016 at 10:34:17AM +0000, Ananyev, Konstantin wrote:
> Hi Adrien,
> 
> > 
> > An assertion failure occurs in __rte_mbuf_raw_free() (called by a few PMDs)
> > when compiling DPDK with CONFIG_RTE_LOG_LEVEL=RTE_LOG_DEBUG and starting
> > applications with a log level high enough to trigger it.
> > 
> > While rte_mbuf_raw_alloc() sets refcount to 1, __rte_mbuf_raw_free()
> > expects it to be 0. 
> >Considering users are not expected to reset the
> > reference count to satisfy assert() and that raw functions are designed on
> > purpose without safety belts, remove these checks.
> 
> Yes, it refcnt supposed to be set to 0 by __rte_pktmbuf_prefree_seg().
> Wright now, it is a user responsibility to make sure refcnt==0 before pushing
> mbuf back to the pool.
> Not sure why do you consider that wrong?
> If the user calls __rte_mbuf_raw_free() manualy it is his responsibility to make
> sure mbuf's refcn==0.
> BTW, why are you doing it?
> The comment clearly states that the function is for internal use:
> /**
>  * @internal Put mbuf back into its original mempool.
>  * The use of that function is reserved for RTE internal needs.
>  * Please use rte_pktmbuf_free().
>  *
>  * @param m
>  *   The mbuf to be freed.
>  */
> static inline void __attribute__((always_inline))
> __rte_mbuf_raw_free(struct rte_mbuf *m)

Bottom line is that I'm dropping this patch for now. We'll update our last
mlx5 patchset to do it properly and keep these assert in place.

Thanks.

diff mbox

Patch

diff --git a/lib/librte_mbuf/rte_mbuf.h b/lib/librte_mbuf/rte_mbuf.h
index 11fa06d..7070bb8 100644
--- a/lib/librte_mbuf/rte_mbuf.h
+++ b/lib/librte_mbuf/rte_mbuf.h
@@ -1108,7 +1108,6 @@  static inline struct rte_mbuf *rte_mbuf_raw_alloc(struct rte_mempool *mp)
 	if (rte_mempool_get(mp, &mb) < 0)
 		return NULL;
 	m = (struct rte_mbuf *)mb;
-	RTE_ASSERT(rte_mbuf_refcnt_read(m) == 0);
 	rte_mbuf_refcnt_set(m, 1);
 	__rte_mbuf_sanity_check(m, 0);
 
@@ -1133,7 +1132,6 @@  __rte_mbuf_raw_alloc(struct rte_mempool *mp)
 static inline void __attribute__((always_inline))
 __rte_mbuf_raw_free(struct rte_mbuf *m)
 {
-	RTE_ASSERT(rte_mbuf_refcnt_read(m) == 0);
 	rte_mempool_put(m->pool, m);
 }