[dpdk-dev,v3,2/2] mempool: pktmbuf pool default fallback for mempool ops error

Message ID 1474044395-11627-2-git-send-email-hemant.agrawal@nxp.com (mailing list archive)
State Changes Requested, archived
Delegated to: Thomas Monjalon
Headers

Commit Message

Hemant Agrawal Sept. 16, 2016, 4:46 p.m. UTC
  In the rte_pktmbuf_pool_create, if the default external mempool is
not available, the implementation can default to "ring_mp_mc", which
is an software implementation.

Signed-off-by: Hemant Agrawal <hemant.agrawal@nxp.com>
---
Changes in V3:
* adding warning message to say that falling back to default sw pool
---
 lib/librte_mbuf/rte_mbuf.c | 8 ++++++++
 1 file changed, 8 insertions(+)
  

Comments

Olivier Matz Sept. 19, 2016, 1:57 p.m. UTC | #1
Hi Hemant,

On 09/16/2016 06:46 PM, Hemant Agrawal wrote:
> In the rte_pktmbuf_pool_create, if the default external mempool is
> not available, the implementation can default to "ring_mp_mc", which
> is an software implementation.
> 
> Signed-off-by: Hemant Agrawal <hemant.agrawal@nxp.com>
> ---
> Changes in V3:
> * adding warning message to say that falling back to default sw pool
> ---
>  lib/librte_mbuf/rte_mbuf.c | 8 ++++++++
>  1 file changed, 8 insertions(+)
> 
> diff --git a/lib/librte_mbuf/rte_mbuf.c b/lib/librte_mbuf/rte_mbuf.c
> index 4846b89..8ab0eb1 100644
> --- a/lib/librte_mbuf/rte_mbuf.c
> +++ b/lib/librte_mbuf/rte_mbuf.c
> @@ -176,6 +176,14 @@ rte_pktmbuf_pool_create(const char *name, unsigned n,
>  
>  	rte_errno = rte_mempool_set_ops_byname(mp,
>  			RTE_MBUF_DEFAULT_MEMPOOL_OPS, NULL);
> +
> +	/* on error, try falling back to the software based default pool */
> +	if (rte_errno == -EOPNOTSUPP) {
> +		RTE_LOG(WARNING, MBUF, "Default HW Mempool not supported. "
> +			"falling back to sw mempool \"ring_mp_mc\"");
> +		rte_errno = rte_mempool_set_ops_byname(mp, "ring_mp_mc", NULL);
> +	}
> +
>  	if (rte_errno != 0) {
>  		RTE_LOG(ERR, MBUF, "error setting mempool handler\n");
>  		return NULL;
> 

Without adding a new method ".supported()", the first call to
rte_mempool_populate() could return the same error ENOTSUP. In this
case, it is still possible to fallback.

I've just submitted an RFC, which I think is quite linked:
http://dpdk.org/ml/archives/dev/2016-September/046974.html
Assuming a new parameter "mempool_ops" is added to
rte_pktmbuf_pool_create(), would it make sense to fallback to
"ring_mp_mc"? What about just returning ENOTSUP? The application could
do the job and decide which sw fallback to use.


Regards,
Olivier
  
Hemant Agrawal Sept. 22, 2016, 1:12 p.m. UTC | #2
Hi Olivier

On 9/19/2016 7:27 PM, Olivier Matz wrote:
> Hi Hemant,
>
> On 09/16/2016 06:46 PM, Hemant Agrawal wrote:
>> In the rte_pktmbuf_pool_create, if the default external mempool is
>> not available, the implementation can default to "ring_mp_mc", which
>> is an software implementation.
>>
>> Signed-off-by: Hemant Agrawal <hemant.agrawal@nxp.com>
>> ---
>> Changes in V3:
>> * adding warning message to say that falling back to default sw pool
>> ---
>>  lib/librte_mbuf/rte_mbuf.c | 8 ++++++++
>>  1 file changed, 8 insertions(+)
>>
>> diff --git a/lib/librte_mbuf/rte_mbuf.c b/lib/librte_mbuf/rte_mbuf.c
>> index 4846b89..8ab0eb1 100644
>> --- a/lib/librte_mbuf/rte_mbuf.c
>> +++ b/lib/librte_mbuf/rte_mbuf.c
>> @@ -176,6 +176,14 @@ rte_pktmbuf_pool_create(const char *name, unsigned n,
>>
>>  	rte_errno = rte_mempool_set_ops_byname(mp,
>>  			RTE_MBUF_DEFAULT_MEMPOOL_OPS, NULL);
>> +
>> +	/* on error, try falling back to the software based default pool */
>> +	if (rte_errno == -EOPNOTSUPP) {
>> +		RTE_LOG(WARNING, MBUF, "Default HW Mempool not supported. "
>> +			"falling back to sw mempool \"ring_mp_mc\"");
>> +		rte_errno = rte_mempool_set_ops_byname(mp, "ring_mp_mc", NULL);
>> +	}
>> +
>>  	if (rte_errno != 0) {
>>  		RTE_LOG(ERR, MBUF, "error setting mempool handler\n");
>>  		return NULL;
>>
>
> Without adding a new method ".supported()", the first call to
> rte_mempool_populate() could return the same error ENOTSUP. In this
> case, it is still possible to fallback.
>
It will be bit late.

On failure, than we have to set the default ops and do a goto before
rte_pktmbuf_pool_init(mp, &mbp_priv);


> I've just submitted an RFC, which I think is quite linked:
> http://dpdk.org/ml/archives/dev/2016-September/046974.html
> Assuming a new parameter "mempool_ops" is added to
> rte_pktmbuf_pool_create(), would it make sense to fallback to
> "ring_mp_mc"? What about just returning ENOTSUP? The application could
> do the job and decide which sw fallback to use.

We ran into this issue when trying to run the standard DPDK examples 
(l3fwd) in VM. Do you think, is it practical to add fallback handling in 
each of the DPDK examples?

Typically when someone is writing a application on host, he need not 
worry non-availability of the hw offloaded mempool. He may also want to 
run the same binary in virtual machine. In VM, it is not guaranteed that 
hw offloaded mempools will be available.

w.r.t your RFC, we can do this:
if the user has specified a mempool_ops in rte_pktmbuf_pool_create(), 
don't fallback. It will be responsibility for application to decide on 
calling again rte_pktmbuf_pool_create() with different mempool_ops.

>
>
> Regards,
> Olivier
>
  
Hemant Agrawal Oct. 13, 2016, 1:15 p.m. UTC | #3
Hi Olivier,
	Any updates w.r.t this patch set?

Regards
Hemant
On 9/22/2016 6:42 PM, Hemant Agrawal wrote:
> Hi Olivier
>
> On 9/19/2016 7:27 PM, Olivier Matz wrote:
>> Hi Hemant,
>>
>> On 09/16/2016 06:46 PM, Hemant Agrawal wrote:
>>> In the rte_pktmbuf_pool_create, if the default external mempool is
>>> not available, the implementation can default to "ring_mp_mc", which
>>> is an software implementation.
>>>
>>> Signed-off-by: Hemant Agrawal <hemant.agrawal@nxp.com>
>>> ---
>>> Changes in V3:
>>> * adding warning message to say that falling back to default sw pool
>>> ---
>>>  lib/librte_mbuf/rte_mbuf.c | 8 ++++++++
>>>  1 file changed, 8 insertions(+)
>>>
>>> diff --git a/lib/librte_mbuf/rte_mbuf.c b/lib/librte_mbuf/rte_mbuf.c
>>> index 4846b89..8ab0eb1 100644
>>> --- a/lib/librte_mbuf/rte_mbuf.c
>>> +++ b/lib/librte_mbuf/rte_mbuf.c
>>> @@ -176,6 +176,14 @@ rte_pktmbuf_pool_create(const char *name,
>>> unsigned n,
>>>
>>>      rte_errno = rte_mempool_set_ops_byname(mp,
>>>              RTE_MBUF_DEFAULT_MEMPOOL_OPS, NULL);
>>> +
>>> +    /* on error, try falling back to the software based default pool */
>>> +    if (rte_errno == -EOPNOTSUPP) {
>>> +        RTE_LOG(WARNING, MBUF, "Default HW Mempool not supported. "
>>> +            "falling back to sw mempool \"ring_mp_mc\"");
>>> +        rte_errno = rte_mempool_set_ops_byname(mp, "ring_mp_mc", NULL);
>>> +    }
>>> +
>>>      if (rte_errno != 0) {
>>>          RTE_LOG(ERR, MBUF, "error setting mempool handler\n");
>>>          return NULL;
>>>
>>
>> Without adding a new method ".supported()", the first call to
>> rte_mempool_populate() could return the same error ENOTSUP. In this
>> case, it is still possible to fallback.
>>
> It will be bit late.
>
> On failure, than we have to set the default ops and do a goto before
> rte_pktmbuf_pool_init(mp, &mbp_priv);
>
>
>> I've just submitted an RFC, which I think is quite linked:
>> http://dpdk.org/ml/archives/dev/2016-September/046974.html
>> Assuming a new parameter "mempool_ops" is added to
>> rte_pktmbuf_pool_create(), would it make sense to fallback to
>> "ring_mp_mc"? What about just returning ENOTSUP? The application could
>> do the job and decide which sw fallback to use.
>
> We ran into this issue when trying to run the standard DPDK examples
> (l3fwd) in VM. Do you think, is it practical to add fallback handling in
> each of the DPDK examples?
>
> Typically when someone is writing a application on host, he need not
> worry non-availability of the hw offloaded mempool. He may also want to
> run the same binary in virtual machine. In VM, it is not guaranteed that
> hw offloaded mempools will be available.
>
> w.r.t your RFC, we can do this:
> if the user has specified a mempool_ops in rte_pktmbuf_pool_create(),
> don't fallback. It will be responsibility for application to decide on
> calling again rte_pktmbuf_pool_create() with different mempool_ops.
>
>>
>>
>> Regards,
>> Olivier
>>
>
>
>
  
Olivier Matz Oct. 14, 2016, 12:10 p.m. UTC | #4
Hi Hemant,

Sorry for the late answer. Please see some comments inline.

On 10/13/2016 03:15 PM, Hemant Agrawal wrote:
> Hi Olivier,
>     Any updates w.r.t this patch set?
> 
> Regards
> Hemant
> On 9/22/2016 6:42 PM, Hemant Agrawal wrote:
>> Hi Olivier
>>
>> On 9/19/2016 7:27 PM, Olivier Matz wrote:
>>> Hi Hemant,
>>>
>>> On 09/16/2016 06:46 PM, Hemant Agrawal wrote:
>>>> In the rte_pktmbuf_pool_create, if the default external mempool is
>>>> not available, the implementation can default to "ring_mp_mc", which
>>>> is an software implementation.
>>>>
>>>> Signed-off-by: Hemant Agrawal <hemant.agrawal@nxp.com>
>>>> ---
>>>> Changes in V3:
>>>> * adding warning message to say that falling back to default sw pool
>>>> ---
>>>>  lib/librte_mbuf/rte_mbuf.c | 8 ++++++++
>>>>  1 file changed, 8 insertions(+)
>>>>
>>>> diff --git a/lib/librte_mbuf/rte_mbuf.c b/lib/librte_mbuf/rte_mbuf.c
>>>> index 4846b89..8ab0eb1 100644
>>>> --- a/lib/librte_mbuf/rte_mbuf.c
>>>> +++ b/lib/librte_mbuf/rte_mbuf.c
>>>> @@ -176,6 +176,14 @@ rte_pktmbuf_pool_create(const char *name,
>>>> unsigned n,
>>>>
>>>>      rte_errno = rte_mempool_set_ops_byname(mp,
>>>>              RTE_MBUF_DEFAULT_MEMPOOL_OPS, NULL);
>>>> +
>>>> +    /* on error, try falling back to the software based default
>>>> pool */
>>>> +    if (rte_errno == -EOPNOTSUPP) {
>>>> +        RTE_LOG(WARNING, MBUF, "Default HW Mempool not supported. "
>>>> +            "falling back to sw mempool \"ring_mp_mc\"");
>>>> +        rte_errno = rte_mempool_set_ops_byname(mp, "ring_mp_mc",
>>>> NULL);
>>>> +    }
>>>> +
>>>>      if (rte_errno != 0) {
>>>>          RTE_LOG(ERR, MBUF, "error setting mempool handler\n");
>>>>          return NULL;
>>>>
>>>
>>> Without adding a new method ".supported()", the first call to
>>> rte_mempool_populate() could return the same error ENOTSUP. In this
>>> case, it is still possible to fallback.
>>>
>> It will be bit late.
>>
>> On failure, than we have to set the default ops and do a goto before
>> rte_pktmbuf_pool_init(mp, &mbp_priv);

I still think we can do the job without adding the .supported() method.
The following code is just an (untested) example:

struct rte_mempool *
rte_pktmbuf_pool_create(const char *name, unsigned n,
    unsigned cache_size, uint16_t priv_size, uint16_t data_room_size,
    int socket_id)
{
    struct rte_mempool *mp;
    struct rte_pktmbuf_pool_private mbp_priv;
    unsigned elt_size;
    int ret;
    const char *ops[] = {
        RTE_MBUF_DEFAULT_MEMPOOL_OPS, "ring_mp_mc", NULL,
    };
    const char **op;

    if (RTE_ALIGN(priv_size, RTE_MBUF_PRIV_ALIGN) != priv_size) {
        RTE_LOG(ERR, MBUF, "mbuf priv_size=%u is not aligned\n",
            priv_size);
        rte_errno = EINVAL;
        return NULL;
    }
    elt_size = sizeof(struct rte_mbuf) + (unsigned)priv_size +
        (unsigned)data_room_size;
    mbp_priv.mbuf_data_room_size = data_room_size;
    mbp_priv.mbuf_priv_size = priv_size;

    for (op = &ops[0]; *op != NULL; op++) {
        mp = rte_mempool_create_empty(name, n, elt_size, cache_size,
            sizeof(struct rte_pktmbuf_pool_private), socket_id, 0);
        if (mp == NULL)
            return NULL;

        ret = rte_mempool_set_ops_byname(mp, *op, NULL);
        if (ret != 0) {
            RTE_LOG(ERR, MBUF, "error setting mempool handler\n");
            rte_mempool_free(mp);
            if (ret == -ENOTSUP)
                continue;
            rte_errno = -ret;
            return NULL;
        }
        rte_pktmbuf_pool_init(mp, &mbp_priv);

        ret = rte_mempool_populate_default(mp);
        if (ret < 0) {
            rte_mempool_free(mp);
            if (ret == -ENOTSUP)
                continue;
            rte_errno = -ret;
            return NULL;
        }
    }

    rte_mempool_obj_iter(mp, rte_pktmbuf_init, NULL);

    return mp;
}


>>> I've just submitted an RFC, which I think is quite linked:
>>> http://dpdk.org/ml/archives/dev/2016-September/046974.html
>>> Assuming a new parameter "mempool_ops" is added to
>>> rte_pktmbuf_pool_create(), would it make sense to fallback to
>>> "ring_mp_mc"? What about just returning ENOTSUP? The application could
>>> do the job and decide which sw fallback to use.
>>
>> We ran into this issue when trying to run the standard DPDK examples
>> (l3fwd) in VM. Do you think, is it practical to add fallback handling in
>> each of the DPDK examples?

OK. What is still unclear for me, is how the software is aware of the
different hardware-assisted handlers. Moreover, we could imagine more
software handlers, which could be used depending on the use case.

I think this choice has to be made by the user or the application:

- the application may want to use a specific (sw or hw) handler: in
  this case, it want to be notified if it fails, instead of having
  a quiet fallback to ring_mp_mc
- if several handlers are available, the application may want to
  try them in a specific order
- maybe some handlers will have some limitations with some
  configurations or driver? The application could decide to use
  a different handler according the configuration.

So that's why I think this is an application decision.

>> Typically when someone is writing a application on host, he need not
>> worry non-availability of the hw offloaded mempool. He may also want to
>> run the same binary in virtual machine. In VM, it is not guaranteed that
>> hw offloaded mempools will be available.

Running the same binary is of course a need. But if your VM does not
provide the same virtualized hardware than the host, I think the command
line option makes sense.

AFAIU, on the host, you can use a hw mempool handler or a sw one, and
on the guest, only a sw one. So you need an option to select the
behavior you want on the host, without recompiling.

I understand that modifying all the applications is not a good option
either. I'm thinking a a EAL parameter that would allow to configure a
library (mbuf in this case). Something like kernel boot options.
Example: testpmd -l 2,4 --opt mbuf.handler="ring_mp_mc" -- [app args]

I don't know if this is feasible, this has to be discussed first, but
what do you think on the principle? The value could also be an
ordered list if we want a fallback, and this option could be overriden
by the application before creating the mbuf pool. There was some
discussions about a kind of dpdk global configuration some time ago.


Regards,
Olivier
  
Hemant Agrawal Nov. 7, 2016, 12:30 p.m. UTC | #5
Hi Olivier,
	
> -----Original Message-----
> From: Olivier Matz [mailto:olivier.matz@6wind.com]
> Sent: Friday, October 14, 2016 5:41 PM
> > On 9/22/2016 6:42 PM, Hemant Agrawal wrote:
> >> Hi Olivier
> >>
> >> On 9/19/2016 7:27 PM, Olivier Matz wrote:
> >>> Hi Hemant,
> >>>
> >>> On 09/16/2016 06:46 PM, Hemant Agrawal wrote:
> >>>> In the rte_pktmbuf_pool_create, if the default external mempool is
> >>>> not available, the implementation can default to "ring_mp_mc",
> >>>> which is an software implementation.
> >>>>
> >>>> Signed-off-by: Hemant Agrawal <hemant.agrawal@nxp.com>
> >>>> ---
> >>>> Changes in V3:
> >>>> * adding warning message to say that falling back to default sw
> >>>> pool
> >>>> ---
> >>>>  lib/librte_mbuf/rte_mbuf.c | 8 ++++++++
> >>>>  1 file changed, 8 insertions(+)
> >>>>
> >>>> diff --git a/lib/librte_mbuf/rte_mbuf.c
> >>>> b/lib/librte_mbuf/rte_mbuf.c index 4846b89..8ab0eb1 100644
> >>>> --- a/lib/librte_mbuf/rte_mbuf.c
> >>>> +++ b/lib/librte_mbuf/rte_mbuf.c
> >>>> @@ -176,6 +176,14 @@ rte_pktmbuf_pool_create(const char *name,
> >>>> unsigned n,
> >>>>
> >>>>      rte_errno = rte_mempool_set_ops_byname(mp,
> >>>>              RTE_MBUF_DEFAULT_MEMPOOL_OPS, NULL);
> >>>> +
> >>>> +    /* on error, try falling back to the software based default
> >>>> pool */
> >>>> +    if (rte_errno == -EOPNOTSUPP) {
> >>>> +        RTE_LOG(WARNING, MBUF, "Default HW Mempool not supported. "
> >>>> +            "falling back to sw mempool \"ring_mp_mc\"");
> >>>> +        rte_errno = rte_mempool_set_ops_byname(mp, "ring_mp_mc",
> >>>> NULL);
> >>>> +    }
> >>>> +
> >>>>      if (rte_errno != 0) {
> >>>>          RTE_LOG(ERR, MBUF, "error setting mempool handler\n");
> >>>>          return NULL;
> >>>>
> >>>
> >>> Without adding a new method ".supported()", the first call to
> >>> rte_mempool_populate() could return the same error ENOTSUP. In this
> >>> case, it is still possible to fallback.
> >>>
> >> It will be bit late.
> >>
> >> On failure, than we have to set the default ops and do a goto before
> >> rte_pktmbuf_pool_init(mp, &mbp_priv);
> 
> I still think we can do the job without adding the .supported() method.
> The following code is just an (untested) example:
> 
> struct rte_mempool *
> rte_pktmbuf_pool_create(const char *name, unsigned n,
>     unsigned cache_size, uint16_t priv_size, uint16_t data_room_size,
>     int socket_id)
> {
>     struct rte_mempool *mp;
>     struct rte_pktmbuf_pool_private mbp_priv;
>     unsigned elt_size;
>     int ret;
>     const char *ops[] = {
>         RTE_MBUF_DEFAULT_MEMPOOL_OPS, "ring_mp_mc", NULL,
>     };
>     const char **op;
> 
>     if (RTE_ALIGN(priv_size, RTE_MBUF_PRIV_ALIGN) != priv_size) {
>         RTE_LOG(ERR, MBUF, "mbuf priv_size=%u is not aligned\n",
>             priv_size);
>         rte_errno = EINVAL;
>         return NULL;
>     }
>     elt_size = sizeof(struct rte_mbuf) + (unsigned)priv_size +
>         (unsigned)data_room_size;
>     mbp_priv.mbuf_data_room_size = data_room_size;
>     mbp_priv.mbuf_priv_size = priv_size;
> 
>     for (op = &ops[0]; *op != NULL; op++) {
>         mp = rte_mempool_create_empty(name, n, elt_size, cache_size,
>             sizeof(struct rte_pktmbuf_pool_private), socket_id, 0);
>         if (mp == NULL)
>             return NULL;
> 
>         ret = rte_mempool_set_ops_byname(mp, *op, NULL);
>         if (ret != 0) {
>             RTE_LOG(ERR, MBUF, "error setting mempool handler\n");
>             rte_mempool_free(mp);
>             if (ret == -ENOTSUP)
>                 continue;
>             rte_errno = -ret;
>             return NULL;
>         }
>         rte_pktmbuf_pool_init(mp, &mbp_priv);
> 
>         ret = rte_mempool_populate_default(mp);
>         if (ret < 0) {
>             rte_mempool_free(mp);
>             if (ret == -ENOTSUP)
>                 continue;
>             rte_errno = -ret;
>             return NULL;
>         }
>     }
> 
>     rte_mempool_obj_iter(mp, rte_pktmbuf_init, NULL);
> 
>     return mp;
> }
> 
> 
[Hemant]  This look fine to me. Please submit a patch for the same. 

> >>> I've just submitted an RFC, which I think is quite linked:
> >>> http://dpdk.org/ml/archives/dev/2016-September/046974.html
> >>> Assuming a new parameter "mempool_ops" is added to
> >>> rte_pktmbuf_pool_create(), would it make sense to fallback to
> >>> "ring_mp_mc"? What about just returning ENOTSUP? The application
> >>> could do the job and decide which sw fallback to use.
> >>
> >> We ran into this issue when trying to run the standard DPDK examples
> >> (l3fwd) in VM. Do you think, is it practical to add fallback handling
> >> in each of the DPDK examples?
> 
> OK. What is still unclear for me, is how the software is aware of the different
> hardware-assisted handlers. Moreover, we could imagine more software
> handlers, which could be used depending on the use case.
> 
> I think this choice has to be made by the user or the application:
> 
> - the application may want to use a specific (sw or hw) handler: in
>   this case, it want to be notified if it fails, instead of having
>   a quiet fallback to ring_mp_mc
> - if several handlers are available, the application may want to
>   try them in a specific order
> - maybe some handlers will have some limitations with some
>   configurations or driver? The application could decide to use
>   a different handler according the configuration.
> 
> So that's why I think this is an application decision.
> 
[Hemant]  We should simplify it:  if the application has supplied the handler, it is application's responsibility to take care of failure. Only if the application want to use the default handler, the implementation can fallback.  The fallback handler can again be configurable. 

> >> Typically when someone is writing a application on host, he need not
> >> worry non-availability of the hw offloaded mempool. He may also want
> >> to run the same binary in virtual machine. In VM, it is not
> >> guaranteed that hw offloaded mempools will be available.
> 
> Running the same binary is of course a need. But if your VM does not provide
> the same virtualized hardware than the host, I think the command line option
> makes sense.
> 
> AFAIU, on the host, you can use a hw mempool handler or a sw one, and on the
> guest, only a sw one. So you need an option to select the behavior you want on
> the host, without recompiling.
> 
> I understand that modifying all the applications is not a good option either. I'm
> thinking a a EAL parameter that would allow to configure a library (mbuf in this
> case). Something like kernel boot options.
> Example: testpmd -l 2,4 --opt mbuf.handler="ring_mp_mc" -- [app args]
> 
> I don't know if this is feasible, this has to be discussed first, but what do you
> think on the principle? The value could also be an ordered list if we want a
> fallback, and this option could be overriden by the application before creating
> the mbuf pool. There was some discussions about a kind of dpdk global
> configuration some time ago.
> 
[Hemant] I agree that command line option provide a better control in this case. 
On the flipside, We need to be careful that we do not end up having too many command line options. 

> 
> Regards,
> Olivier
  
Olivier Matz Nov. 22, 2016, 9:24 a.m. UTC | #6
Hi Hemant,

Back on this topic, please see some comments below.

On 11/07/2016 01:30 PM, Hemant Agrawal wrote:
> Hi Olivier,
> 	
>> -----Original Message-----
>> From: Olivier Matz [mailto:olivier.matz@6wind.com]
>> Sent: Friday, October 14, 2016 5:41 PM
>>> On 9/22/2016 6:42 PM, Hemant Agrawal wrote:
>>>> Hi Olivier
>>>>
>>>> On 9/19/2016 7:27 PM, Olivier Matz wrote:
>>>>> Hi Hemant,
>>>>>
>>>>> On 09/16/2016 06:46 PM, Hemant Agrawal wrote:
>>>>>> In the rte_pktmbuf_pool_create, if the default external mempool is
>>>>>> not available, the implementation can default to "ring_mp_mc",
>>>>>> which is an software implementation.
>>>>>>
>>>>>> Signed-off-by: Hemant Agrawal <hemant.agrawal@nxp.com>
>>>>>> ---
>>>>>> Changes in V3:
>>>>>> * adding warning message to say that falling back to default sw
>>>>>> pool
>>>>>> ---
>>>>>>  lib/librte_mbuf/rte_mbuf.c | 8 ++++++++
>>>>>>  1 file changed, 8 insertions(+)
>>>>>>
>>>>>> diff --git a/lib/librte_mbuf/rte_mbuf.c
>>>>>> b/lib/librte_mbuf/rte_mbuf.c index 4846b89..8ab0eb1 100644
>>>>>> --- a/lib/librte_mbuf/rte_mbuf.c
>>>>>> +++ b/lib/librte_mbuf/rte_mbuf.c
>>>>>> @@ -176,6 +176,14 @@ rte_pktmbuf_pool_create(const char *name,
>>>>>> unsigned n,
>>>>>>
>>>>>>      rte_errno = rte_mempool_set_ops_byname(mp,
>>>>>>              RTE_MBUF_DEFAULT_MEMPOOL_OPS, NULL);
>>>>>> +
>>>>>> +    /* on error, try falling back to the software based default
>>>>>> pool */
>>>>>> +    if (rte_errno == -EOPNOTSUPP) {
>>>>>> +        RTE_LOG(WARNING, MBUF, "Default HW Mempool not supported. "
>>>>>> +            "falling back to sw mempool \"ring_mp_mc\"");
>>>>>> +        rte_errno = rte_mempool_set_ops_byname(mp, "ring_mp_mc",
>>>>>> NULL);
>>>>>> +    }
>>>>>> +
>>>>>>      if (rte_errno != 0) {
>>>>>>          RTE_LOG(ERR, MBUF, "error setting mempool handler\n");
>>>>>>          return NULL;
>>>>>>
>>>>>
>>>>> Without adding a new method ".supported()", the first call to
>>>>> rte_mempool_populate() could return the same error ENOTSUP. In this
>>>>> case, it is still possible to fallback.
>>>>>
>>>> It will be bit late.
>>>>
>>>> On failure, than we have to set the default ops and do a goto before
>>>> rte_pktmbuf_pool_init(mp, &mbp_priv);
>>
>> I still think we can do the job without adding the .supported() method.
>> The following code is just an (untested) example:
>>
>> struct rte_mempool *
>> rte_pktmbuf_pool_create(const char *name, unsigned n,
>>     unsigned cache_size, uint16_t priv_size, uint16_t data_room_size,
>>     int socket_id)
>> {
>>     struct rte_mempool *mp;
>>     struct rte_pktmbuf_pool_private mbp_priv;
>>     unsigned elt_size;
>>     int ret;
>>     const char *ops[] = {
>>         RTE_MBUF_DEFAULT_MEMPOOL_OPS, "ring_mp_mc", NULL,
>>     };
>>     const char **op;
>>
>>     if (RTE_ALIGN(priv_size, RTE_MBUF_PRIV_ALIGN) != priv_size) {
>>         RTE_LOG(ERR, MBUF, "mbuf priv_size=%u is not aligned\n",
>>             priv_size);
>>         rte_errno = EINVAL;
>>         return NULL;
>>     }
>>     elt_size = sizeof(struct rte_mbuf) + (unsigned)priv_size +
>>         (unsigned)data_room_size;
>>     mbp_priv.mbuf_data_room_size = data_room_size;
>>     mbp_priv.mbuf_priv_size = priv_size;
>>
>>     for (op = &ops[0]; *op != NULL; op++) {
>>         mp = rte_mempool_create_empty(name, n, elt_size, cache_size,
>>             sizeof(struct rte_pktmbuf_pool_private), socket_id, 0);
>>         if (mp == NULL)
>>             return NULL;
>>
>>         ret = rte_mempool_set_ops_byname(mp, *op, NULL);
>>         if (ret != 0) {
>>             RTE_LOG(ERR, MBUF, "error setting mempool handler\n");
>>             rte_mempool_free(mp);
>>             if (ret == -ENOTSUP)
>>                 continue;
>>             rte_errno = -ret;
>>             return NULL;
>>         }
>>         rte_pktmbuf_pool_init(mp, &mbp_priv);
>>
>>         ret = rte_mempool_populate_default(mp);
>>         if (ret < 0) {
>>             rte_mempool_free(mp);
>>             if (ret == -ENOTSUP)
>>                 continue;
>>             rte_errno = -ret;
>>             return NULL;
>>         }
>>     }
>>
>>     rte_mempool_obj_iter(mp, rte_pktmbuf_init, NULL);
>>
>>     return mp;
>> }
>>
>>
> [Hemant]  This look fine to me. Please submit a patch for the same. 
> 
>>>>> I've just submitted an RFC, which I think is quite linked:
>>>>> http://dpdk.org/ml/archives/dev/2016-September/046974.html
>>>>> Assuming a new parameter "mempool_ops" is added to
>>>>> rte_pktmbuf_pool_create(), would it make sense to fallback to
>>>>> "ring_mp_mc"? What about just returning ENOTSUP? The application
>>>>> could do the job and decide which sw fallback to use.
>>>>
>>>> We ran into this issue when trying to run the standard DPDK examples
>>>> (l3fwd) in VM. Do you think, is it practical to add fallback handling
>>>> in each of the DPDK examples?
>>
>> OK. What is still unclear for me, is how the software is aware of the different
>> hardware-assisted handlers. Moreover, we could imagine more software
>> handlers, which could be used depending on the use case.
>>
>> I think this choice has to be made by the user or the application:
>>
>> - the application may want to use a specific (sw or hw) handler: in
>>   this case, it want to be notified if it fails, instead of having
>>   a quiet fallback to ring_mp_mc
>> - if several handlers are available, the application may want to
>>   try them in a specific order
>> - maybe some handlers will have some limitations with some
>>   configurations or driver? The application could decide to use
>>   a different handler according the configuration.
>>
>> So that's why I think this is an application decision.
>>
> [Hemant]  We should simplify it:  if the application has supplied the handler, it is application's responsibility to take care of failure. Only if the application want to use the default handler, the implementation can fallback.  The fallback handler can again be configurable. 

Honestly, I'm not very convinced that having a quiet fallback is
the proper solution: the application won't be notified that the
fallback occured.

I understand your patch solves your use-case, but my fear is that
we just integrate this minimal patch and never do the harder work
of solving all the use-cases. I think we need to give the tools to
the applications to control what occurs because we also need to solve
these issues:
- you have a hardware handler, but you want to use the software
  handler
- you have several software handlers, and you (as an administrator) or
  the application knows which one is the best to use in your case.

An alternative (which I don't like either) to solve your issue without
modifying the mbuf code is to have your own handler fallbacking to
ring_mp_mc.


>>>> Typically when someone is writing a application on host, he need not
>>>> worry non-availability of the hw offloaded mempool. He may also want
>>>> to run the same binary in virtual machine. In VM, it is not
>>>> guaranteed that hw offloaded mempools will be available.
>>
>> Running the same binary is of course a need. But if your VM does not provide
>> the same virtualized hardware than the host, I think the command line option
>> makes sense.
>>
>> AFAIU, on the host, you can use a hw mempool handler or a sw one, and on the
>> guest, only a sw one. So you need an option to select the behavior you want on
>> the host, without recompiling.
>>
>> I understand that modifying all the applications is not a good option either. I'm
>> thinking a a EAL parameter that would allow to configure a library (mbuf in this
>> case). Something like kernel boot options.
>> Example: testpmd -l 2,4 --opt mbuf.handler="ring_mp_mc" -- [app args]
>>
>> I don't know if this is feasible, this has to be discussed first, but what do you
>> think on the principle? The value could also be an ordered list if we want a
>> fallback, and this option could be overriden by the application before creating
>> the mbuf pool. There was some discussions about a kind of dpdk global
>> configuration some time ago.
>>
> [Hemant] I agree that command line option provide a better control in this case. 
> On the flipside, We need to be careful that we do not end up having too many command line options. 

Yes, we should avoid having too many command line arguments.
I think we should go in the direction of having a sort of key/value
database in EAL. It could be set by:
- a generic command line argument
- a config file (maybe later)
- the application through an API

Then, it would be used by:
- dpdk libraries
- the application (maybe)

This has been discussed some times, the latest was probably:
http://dpdk.org/ml/archives/dev/2016-June/040079.html

I think it would be a good tool for dpdk configurability, and it would
help to remove some compile-time options and maybe some eal options.
Unfortunately, I don't have the time to work on this at the moment.

Regards,
Olivier
  
Hemant Agrawal Dec. 8, 2016, 8:57 a.m. UTC | #7
Hi Olivier,
	Apology for a delayed response.

> -----Original Message-----
> From: Olivier Matz [mailto:olivier.matz@6wind.com]
> Sent: Tuesday, November 22, 2016 2:55 PM
> To: Hemant Agrawal <hemant.agrawal@nxp.com>
> Cc: dev@dpdk.org; jerin.jacob@caviumnetworks.com; david.hunt@intel.com
> Subject: Re: [PATCH v3 2/2] mempool: pktmbuf pool default fallback for
> mempool ops error
> 
> Hi Hemant,
> 
> Back on this topic, please see some comments below.
> 
> On 11/07/2016 01:30 PM, Hemant Agrawal wrote:
> > Hi Olivier,
> >
> >> -----Original Message-----
> >> From: Olivier Matz [mailto:olivier.matz@6wind.com]
> >> Sent: Friday, October 14, 2016 5:41 PM
> >>> On 9/22/2016 6:42 PM, Hemant Agrawal wrote:
> >>>> Hi Olivier
> >>>>
> >>>> On 9/19/2016 7:27 PM, Olivier Matz wrote:
> >>>>> Hi Hemant,
> >>>>>
> >>>>> On 09/16/2016 06:46 PM, Hemant Agrawal wrote:
> >>>>>> In the rte_pktmbuf_pool_create, if the default external mempool
> >>>>>> is not available, the implementation can default to "ring_mp_mc",
> >>>>>> which is an software implementation.
> >>>>>>
> >>>>>> Signed-off-by: Hemant Agrawal <hemant.agrawal@nxp.com>
> >>>>>> ---
> >>>>>> Changes in V3:
> >>>>>> * adding warning message to say that falling back to default sw
> >>>>>> pool
> >>>>>> ---
> >>>>>>  lib/librte_mbuf/rte_mbuf.c | 8 ++++++++
> >>>>>>  1 file changed, 8 insertions(+)
> >>>>>>
> >>>>>> diff --git a/lib/librte_mbuf/rte_mbuf.c
> >>>>>> b/lib/librte_mbuf/rte_mbuf.c index 4846b89..8ab0eb1 100644
> >>>>>> --- a/lib/librte_mbuf/rte_mbuf.c
> >>>>>> +++ b/lib/librte_mbuf/rte_mbuf.c
> >>>>>> @@ -176,6 +176,14 @@ rte_pktmbuf_pool_create(const char *name,
> >>>>>> unsigned n,
> >>>>>>
> >>>>>>      rte_errno = rte_mempool_set_ops_byname(mp,
> >>>>>>              RTE_MBUF_DEFAULT_MEMPOOL_OPS, NULL);
> >>>>>> +
> >>>>>> +    /* on error, try falling back to the software based default
> >>>>>> pool */
> >>>>>> +    if (rte_errno == -EOPNOTSUPP) {
> >>>>>> +        RTE_LOG(WARNING, MBUF, "Default HW Mempool not
> supported. "
> >>>>>> +            "falling back to sw mempool \"ring_mp_mc\"");
> >>>>>> +        rte_errno = rte_mempool_set_ops_byname(mp, "ring_mp_mc",
> >>>>>> NULL);
> >>>>>> +    }
> >>>>>> +
> >>>>>>      if (rte_errno != 0) {
> >>>>>>          RTE_LOG(ERR, MBUF, "error setting mempool handler\n");
> >>>>>>          return NULL;
> >>>>>>
> >>>>>
> >>>>> Without adding a new method ".supported()", the first call to
> >>>>> rte_mempool_populate() could return the same error ENOTSUP. In
> >>>>> this case, it is still possible to fallback.
> >>>>>
> >>>> It will be bit late.
> >>>>
> >>>> On failure, than we have to set the default ops and do a goto
> >>>> before rte_pktmbuf_pool_init(mp, &mbp_priv);
> >>
> >> I still think we can do the job without adding the .supported() method.
> >> The following code is just an (untested) example:
> >>
> >> struct rte_mempool *
> >> rte_pktmbuf_pool_create(const char *name, unsigned n,
> >>     unsigned cache_size, uint16_t priv_size, uint16_t data_room_size,
> >>     int socket_id)
> >> {
> >>     struct rte_mempool *mp;
> >>     struct rte_pktmbuf_pool_private mbp_priv;
> >>     unsigned elt_size;
> >>     int ret;
> >>     const char *ops[] = {
> >>         RTE_MBUF_DEFAULT_MEMPOOL_OPS, "ring_mp_mc", NULL,
> >>     };
> >>     const char **op;
> >>
> >>     if (RTE_ALIGN(priv_size, RTE_MBUF_PRIV_ALIGN) != priv_size) {
> >>         RTE_LOG(ERR, MBUF, "mbuf priv_size=%u is not aligned\n",
> >>             priv_size);
> >>         rte_errno = EINVAL;
> >>         return NULL;
> >>     }
> >>     elt_size = sizeof(struct rte_mbuf) + (unsigned)priv_size +
> >>         (unsigned)data_room_size;
> >>     mbp_priv.mbuf_data_room_size = data_room_size;
> >>     mbp_priv.mbuf_priv_size = priv_size;
> >>
> >>     for (op = &ops[0]; *op != NULL; op++) {
> >>         mp = rte_mempool_create_empty(name, n, elt_size, cache_size,
> >>             sizeof(struct rte_pktmbuf_pool_private), socket_id, 0);
> >>         if (mp == NULL)
> >>             return NULL;
> >>
> >>         ret = rte_mempool_set_ops_byname(mp, *op, NULL);
> >>         if (ret != 0) {
> >>             RTE_LOG(ERR, MBUF, "error setting mempool handler\n");
> >>             rte_mempool_free(mp);
> >>             if (ret == -ENOTSUP)
> >>                 continue;
> >>             rte_errno = -ret;
> >>             return NULL;
> >>         }
> >>         rte_pktmbuf_pool_init(mp, &mbp_priv);
> >>
> >>         ret = rte_mempool_populate_default(mp);
> >>         if (ret < 0) {
> >>             rte_mempool_free(mp);
> >>             if (ret == -ENOTSUP)
> >>                 continue;
> >>             rte_errno = -ret;
> >>             return NULL;
> >>         }
> >>     }
> >>
> >>     rte_mempool_obj_iter(mp, rte_pktmbuf_init, NULL);
> >>
> >>     return mp;
> >> }
> >>
> >>
> > [Hemant]  This look fine to me. Please submit a patch for the same.
> >
> >>>>> I've just submitted an RFC, which I think is quite linked:
> >>>>> http://dpdk.org/ml/archives/dev/2016-September/046974.html
> >>>>> Assuming a new parameter "mempool_ops" is added to
> >>>>> rte_pktmbuf_pool_create(), would it make sense to fallback to
> >>>>> "ring_mp_mc"? What about just returning ENOTSUP? The application
> >>>>> could do the job and decide which sw fallback to use.
> >>>>
> >>>> We ran into this issue when trying to run the standard DPDK
> >>>> examples
> >>>> (l3fwd) in VM. Do you think, is it practical to add fallback
> >>>> handling in each of the DPDK examples?
> >>
> >> OK. What is still unclear for me, is how the software is aware of the
> >> different hardware-assisted handlers. Moreover, we could imagine more
> >> software handlers, which could be used depending on the use case.
> >>
> >> I think this choice has to be made by the user or the application:
> >>
> >> - the application may want to use a specific (sw or hw) handler: in
> >>   this case, it want to be notified if it fails, instead of having
> >>   a quiet fallback to ring_mp_mc
> >> - if several handlers are available, the application may want to
> >>   try them in a specific order
> >> - maybe some handlers will have some limitations with some
> >>   configurations or driver? The application could decide to use
> >>   a different handler according the configuration.
> >>
> >> So that's why I think this is an application decision.
> >>
> > [Hemant]  We should simplify it:  if the application has supplied the handler, it
> is application's responsibility to take care of failure. Only if the application want
> to use the default handler, the implementation can fallback.  The fallback
> handler can again be configurable.
> 
> Honestly, I'm not very convinced that having a quiet fallback is the proper
> solution: the application won't be notified that the fallback occured.
> 
> I understand your patch solves your use-case, but my fear is that we just
> integrate this minimal patch and never do the harder work of solving all the use-
> cases. I think we need to give the tools to the applications to control what
> occurs because we also need to solve these issues:
> - you have a hardware handler, but you want to use the software
>   handler
> - you have several software handlers, and you (as an administrator) or
>   the application knows which one is the best to use in your case.
> 
> An alternative (which I don't like either) to solve your issue without modifying
> the mbuf code is to have your own handler fallbacking to ring_mp_mc.
> 
[Hemant] are you suggesting that we also configure a fallback handler in addition to default? 

I agree that we need to provide full flexibility to the applications, who are willing to implement the fallback mechanism. 

> 
> >>>> Typically when someone is writing a application on host, he need
> >>>> not worry non-availability of the hw offloaded mempool. He may also
> >>>> want to run the same binary in virtual machine. In VM, it is not
> >>>> guaranteed that hw offloaded mempools will be available.
> >>
> >> Running the same binary is of course a need. But if your VM does not
> >> provide the same virtualized hardware than the host, I think the
> >> command line option makes sense.
> >>
> >> AFAIU, on the host, you can use a hw mempool handler or a sw one, and
> >> on the guest, only a sw one. So you need an option to select the
> >> behavior you want on the host, without recompiling.
> >>
> >> I understand that modifying all the applications is not a good option
> >> either. I'm thinking a a EAL parameter that would allow to configure
> >> a library (mbuf in this case). Something like kernel boot options.
> >> Example: testpmd -l 2,4 --opt mbuf.handler="ring_mp_mc" -- [app args]
> >>
> >> I don't know if this is feasible, this has to be discussed first, but
> >> what do you think on the principle? The value could also be an
> >> ordered list if we want a fallback, and this option could be
> >> overriden by the application before creating the mbuf pool. There was
> >> some discussions about a kind of dpdk global configuration some time ago.
> >>
> > [Hemant] I agree that command line option provide a better control in this
> case.
> > On the flipside, We need to be careful that we do not end up having too many
> command line options.
> 
> Yes, we should avoid having too many command line arguments.
> I think we should go in the direction of having a sort of key/value database in
> EAL. It could be set by:
> - a generic command line argument
> - a config file (maybe later)
> - the application through an API
> 
> Then, it would be used by:
> - dpdk libraries
> - the application (maybe)
> 
> This has been discussed some times, the latest was probably:
> http://dpdk.org/ml/archives/dev/2016-June/040079.html
> 
> I think it would be a good tool for dpdk configurability, and it would help to
> remove some compile-time options and maybe some eal options.
> Unfortunately, I don't have the time to work on this at the moment.
> 
> Regards,
> Olivier
  

Patch

diff --git a/lib/librte_mbuf/rte_mbuf.c b/lib/librte_mbuf/rte_mbuf.c
index 4846b89..8ab0eb1 100644
--- a/lib/librte_mbuf/rte_mbuf.c
+++ b/lib/librte_mbuf/rte_mbuf.c
@@ -176,6 +176,14 @@  rte_pktmbuf_pool_create(const char *name, unsigned n,
 
 	rte_errno = rte_mempool_set_ops_byname(mp,
 			RTE_MBUF_DEFAULT_MEMPOOL_OPS, NULL);
+
+	/* on error, try falling back to the software based default pool */
+	if (rte_errno == -EOPNOTSUPP) {
+		RTE_LOG(WARNING, MBUF, "Default HW Mempool not supported. "
+			"falling back to sw mempool \"ring_mp_mc\"");
+		rte_errno = rte_mempool_set_ops_byname(mp, "ring_mp_mc", NULL);
+	}
+
 	if (rte_errno != 0) {
 		RTE_LOG(ERR, MBUF, "error setting mempool handler\n");
 		return NULL;