[dpdk-dev] Revert "bonding: use existing enslaved device queues"

Message ID 1473251290-22053-1-git-send-email-i.maximets@samsung.com (mailing list archive)
State Superseded, archived
Delegated to: Bruce Richardson
Headers

Commit Message

Ilya Maximets Sept. 7, 2016, 12:28 p.m. UTC
  This reverts commit 5b7bb2bda5519b7800f814df64d4e015282140e5.

It is necessary to reconfigure all queues every time because configuration
can be changed.

For example, if we're reconfiguring bonding device with new memory pool,
already configured queues will still use the old one. And if the old
mempool be freed, application likely will panic in attempt to use
freed mempool.

This happens when we use the bonding device with OVS 2.6 while MTU
reconfiguration:

PANIC in rte_mempool_get_ops():
assert "(ops_index >= 0) && (ops_index < RTE_MEMPOOL_MAX_OPS_IDX)" failed

Cc: <stable@dpdk.org>
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
---
 drivers/net/bonding/rte_eth_bond_pmd.c | 10 ++--------
 1 file changed, 2 insertions(+), 8 deletions(-)
  

Comments

Ilya Maximets Sept. 16, 2016, 5:03 a.m. UTC | #1
Ping.

Best regards, Ilya Maximets.

On 07.09.2016 15:28, Ilya Maximets wrote:
> This reverts commit 5b7bb2bda5519b7800f814df64d4e015282140e5.
> 
> It is necessary to reconfigure all queues every time because configuration
> can be changed.
> 
> For example, if we're reconfiguring bonding device with new memory pool,
> already configured queues will still use the old one. And if the old
> mempool be freed, application likely will panic in attempt to use
> freed mempool.
> 
> This happens when we use the bonding device with OVS 2.6 while MTU
> reconfiguration:
> 
> PANIC in rte_mempool_get_ops():
> assert "(ops_index >= 0) && (ops_index < RTE_MEMPOOL_MAX_OPS_IDX)" failed
> 
> Cc: <stable@dpdk.org>
> Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
> ---
>  drivers/net/bonding/rte_eth_bond_pmd.c | 10 ++--------
>  1 file changed, 2 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/net/bonding/rte_eth_bond_pmd.c b/drivers/net/bonding/rte_eth_bond_pmd.c
> index b20a272..eb5b6d1 100644
> --- a/drivers/net/bonding/rte_eth_bond_pmd.c
> +++ b/drivers/net/bonding/rte_eth_bond_pmd.c
> @@ -1305,8 +1305,6 @@ slave_configure(struct rte_eth_dev *bonded_eth_dev,
>  	struct bond_rx_queue *bd_rx_q;
>  	struct bond_tx_queue *bd_tx_q;
>  
> -	uint16_t old_nb_tx_queues = slave_eth_dev->data->nb_tx_queues;
> -	uint16_t old_nb_rx_queues = slave_eth_dev->data->nb_rx_queues;
>  	int errval;
>  	uint16_t q_id;
>  
> @@ -1347,9 +1345,7 @@ slave_configure(struct rte_eth_dev *bonded_eth_dev,
>  	}
>  
>  	/* Setup Rx Queues */
> -	/* Use existing queues, if any */
> -	for (q_id = old_nb_rx_queues;
> -	     q_id < bonded_eth_dev->data->nb_rx_queues; q_id++) {
> +	for (q_id = 0; q_id < bonded_eth_dev->data->nb_rx_queues; q_id++) {
>  		bd_rx_q = (struct bond_rx_queue *)bonded_eth_dev->data->rx_queues[q_id];
>  
>  		errval = rte_eth_rx_queue_setup(slave_eth_dev->data->port_id, q_id,
> @@ -1365,9 +1361,7 @@ slave_configure(struct rte_eth_dev *bonded_eth_dev,
>  	}
>  
>  	/* Setup Tx Queues */
> -	/* Use existing queues, if any */
> -	for (q_id = old_nb_tx_queues;
> -	     q_id < bonded_eth_dev->data->nb_tx_queues; q_id++) {
> +	for (q_id = 0; q_id < bonded_eth_dev->data->nb_tx_queues; q_id++) {
>  		bd_tx_q = (struct bond_tx_queue *)bonded_eth_dev->data->tx_queues[q_id];
>  
>  		errval = rte_eth_tx_queue_setup(slave_eth_dev->data->port_id, q_id,
>
  
Doherty, Declan Oct. 6, 2016, 2:32 p.m. UTC | #2
On 07/09/16 13:28, Ilya Maximets wrote:
> This reverts commit 5b7bb2bda5519b7800f814df64d4e015282140e5.
>
> It is necessary to reconfigure all queues every time because configuration
> can be changed.
>

Hey Ilya, this makes sense. I guess this was my original intention but I 
missed this case in my review of the change.

> For example, if we're reconfiguring bonding device with new memory pool,
> already configured queues will still use the old one. And if the old
> mempool be freed, application likely will panic in attempt to use
> freed mempool.
>
> This happens when we use the bonding device with OVS 2.6 while MTU
> reconfiguration:
>
> PANIC in rte_mempool_get_ops():
> assert "(ops_index >= 0) && (ops_index < RTE_MEMPOOL_MAX_OPS_IDX)" failed
>
> Cc: <stable@dpdk.org>
> Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
> ---
>  drivers/net/bonding/rte_eth_bond_pmd.c | 10 ++--------
>  1 file changed, 2 insertions(+), 8 deletions(-)
>
> diff --git a/drivers/net/bonding/rte_eth_bond_pmd.c b/drivers/net/bonding/rte_eth_bond_pmd.c
> index b20a272..eb5b6d1 100644
> --- a/drivers/net/bonding/rte_eth_bond_pmd.c
> +++ b/drivers/net/bonding/rte_eth_bond_pmd.c
> @@ -1305,8 +1305,6 @@ slave_configure(struct rte_eth_dev *bonded_eth_dev,
>  	struct bond_rx_queue *bd_rx_q;
>  	struct bond_tx_queue *bd_tx_q;
>
> -	uint16_t old_nb_tx_queues = slave_eth_dev->data->nb_tx_queues;
> -	uint16_t old_nb_rx_queues = slave_eth_dev->data->nb_rx_queues;
>  	int errval;
>  	uint16_t q_id;
>
> @@ -1347,9 +1345,7 @@ slave_configure(struct rte_eth_dev *bonded_eth_dev,
>  	}
>
>  	/* Setup Rx Queues */
> -	/* Use existing queues, if any */
> -	for (q_id = old_nb_rx_queues;
> -	     q_id < bonded_eth_dev->data->nb_rx_queues; q_id++) {
> +	for (q_id = 0; q_id < bonded_eth_dev->data->nb_rx_queues; q_id++) {
>  		bd_rx_q = (struct bond_rx_queue *)bonded_eth_dev->data->rx_queues[q_id];
>
>  		errval = rte_eth_rx_queue_setup(slave_eth_dev->data->port_id, q_id,
> @@ -1365,9 +1361,7 @@ slave_configure(struct rte_eth_dev *bonded_eth_dev,
>  	}
>
>  	/* Setup Tx Queues */
> -	/* Use existing queues, if any */
> -	for (q_id = old_nb_tx_queues;
> -	     q_id < bonded_eth_dev->data->nb_tx_queues; q_id++) {
> +	for (q_id = 0; q_id < bonded_eth_dev->data->nb_tx_queues; q_id++) {
>  		bd_tx_q = (struct bond_tx_queue *)bonded_eth_dev->data->tx_queues[q_id];
>
>  		errval = rte_eth_tx_queue_setup(slave_eth_dev->data->port_id, q_id,
>

Acked-by: Declan Doherty <declan.doherty@intel.com>
  
Eric Kinzie Oct. 7, 2016, 2:02 a.m. UTC | #3
On Wed Sep 07 15:28:10 +0300 2016, Ilya Maximets wrote:
> This reverts commit 5b7bb2bda5519b7800f814df64d4e015282140e5.
> 
> It is necessary to reconfigure all queues every time because configuration
> can be changed.
> 
> For example, if we're reconfiguring bonding device with new memory pool,
> already configured queues will still use the old one. And if the old
> mempool be freed, application likely will panic in attempt to use
> freed mempool.
> 
> This happens when we use the bonding device with OVS 2.6 while MTU
> reconfiguration:
> 
> PANIC in rte_mempool_get_ops():
> assert "(ops_index >= 0) && (ops_index < RTE_MEMPOOL_MAX_OPS_IDX)" failed
> 
> Cc: <stable@dpdk.org>
> Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
> ---
>  drivers/net/bonding/rte_eth_bond_pmd.c | 10 ++--------
>  1 file changed, 2 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/net/bonding/rte_eth_bond_pmd.c b/drivers/net/bonding/rte_eth_bond_pmd.c
> index b20a272..eb5b6d1 100644
> --- a/drivers/net/bonding/rte_eth_bond_pmd.c
> +++ b/drivers/net/bonding/rte_eth_bond_pmd.c
> @@ -1305,8 +1305,6 @@ slave_configure(struct rte_eth_dev *bonded_eth_dev,
>  	struct bond_rx_queue *bd_rx_q;
>  	struct bond_tx_queue *bd_tx_q;
>  
> -	uint16_t old_nb_tx_queues = slave_eth_dev->data->nb_tx_queues;
> -	uint16_t old_nb_rx_queues = slave_eth_dev->data->nb_rx_queues;
>  	int errval;
>  	uint16_t q_id;
>  
> @@ -1347,9 +1345,7 @@ slave_configure(struct rte_eth_dev *bonded_eth_dev,
>  	}
>  
>  	/* Setup Rx Queues */
> -	/* Use existing queues, if any */
> -	for (q_id = old_nb_rx_queues;
> -	     q_id < bonded_eth_dev->data->nb_rx_queues; q_id++) {
> +	for (q_id = 0; q_id < bonded_eth_dev->data->nb_rx_queues; q_id++) {
>  		bd_rx_q = (struct bond_rx_queue *)bonded_eth_dev->data->rx_queues[q_id];
>  
>  		errval = rte_eth_rx_queue_setup(slave_eth_dev->data->port_id, q_id,
> @@ -1365,9 +1361,7 @@ slave_configure(struct rte_eth_dev *bonded_eth_dev,
>  	}
>  
>  	/* Setup Tx Queues */
> -	/* Use existing queues, if any */
> -	for (q_id = old_nb_tx_queues;
> -	     q_id < bonded_eth_dev->data->nb_tx_queues; q_id++) {
> +	for (q_id = 0; q_id < bonded_eth_dev->data->nb_tx_queues; q_id++) {
>  		bd_tx_q = (struct bond_tx_queue *)bonded_eth_dev->data->tx_queues[q_id];
>  
>  		errval = rte_eth_tx_queue_setup(slave_eth_dev->data->port_id, q_id,
> -- 
> 2.7.4
> 

NAK

There are still some users of this code.  Let's give them a chance to
comment before removing it.


Thanks,

Eric
  
Ilya Maximets Oct. 12, 2016, 1:24 p.m. UTC | #4
On 07.10.2016 05:02, Eric Kinzie wrote:
> On Wed Sep 07 15:28:10 +0300 2016, Ilya Maximets wrote:
>> This reverts commit 5b7bb2bda5519b7800f814df64d4e015282140e5.
>>
>> It is necessary to reconfigure all queues every time because configuration
>> can be changed.
>>
>> For example, if we're reconfiguring bonding device with new memory pool,
>> already configured queues will still use the old one. And if the old
>> mempool be freed, application likely will panic in attempt to use
>> freed mempool.
>>
>> This happens when we use the bonding device with OVS 2.6 while MTU
>> reconfiguration:
>>
>> PANIC in rte_mempool_get_ops():
>> assert "(ops_index >= 0) && (ops_index < RTE_MEMPOOL_MAX_OPS_IDX)" failed
>>
>> Cc: <stable@dpdk.org>
>> Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
>> ---
>>  drivers/net/bonding/rte_eth_bond_pmd.c | 10 ++--------
>>  1 file changed, 2 insertions(+), 8 deletions(-)
>>
>> diff --git a/drivers/net/bonding/rte_eth_bond_pmd.c b/drivers/net/bonding/rte_eth_bond_pmd.c
>> index b20a272..eb5b6d1 100644
>> --- a/drivers/net/bonding/rte_eth_bond_pmd.c
>> +++ b/drivers/net/bonding/rte_eth_bond_pmd.c
>> @@ -1305,8 +1305,6 @@ slave_configure(struct rte_eth_dev *bonded_eth_dev,
>>  	struct bond_rx_queue *bd_rx_q;
>>  	struct bond_tx_queue *bd_tx_q;
>>  
>> -	uint16_t old_nb_tx_queues = slave_eth_dev->data->nb_tx_queues;
>> -	uint16_t old_nb_rx_queues = slave_eth_dev->data->nb_rx_queues;
>>  	int errval;
>>  	uint16_t q_id;
>>  
>> @@ -1347,9 +1345,7 @@ slave_configure(struct rte_eth_dev *bonded_eth_dev,
>>  	}
>>  
>>  	/* Setup Rx Queues */
>> -	/* Use existing queues, if any */
>> -	for (q_id = old_nb_rx_queues;
>> -	     q_id < bonded_eth_dev->data->nb_rx_queues; q_id++) {
>> +	for (q_id = 0; q_id < bonded_eth_dev->data->nb_rx_queues; q_id++) {
>>  		bd_rx_q = (struct bond_rx_queue *)bonded_eth_dev->data->rx_queues[q_id];
>>  
>>  		errval = rte_eth_rx_queue_setup(slave_eth_dev->data->port_id, q_id,
>> @@ -1365,9 +1361,7 @@ slave_configure(struct rte_eth_dev *bonded_eth_dev,
>>  	}
>>  
>>  	/* Setup Tx Queues */
>> -	/* Use existing queues, if any */
>> -	for (q_id = old_nb_tx_queues;
>> -	     q_id < bonded_eth_dev->data->nb_tx_queues; q_id++) {
>> +	for (q_id = 0; q_id < bonded_eth_dev->data->nb_tx_queues; q_id++) {
>>  		bd_tx_q = (struct bond_tx_queue *)bonded_eth_dev->data->tx_queues[q_id];
>>  
>>  		errval = rte_eth_tx_queue_setup(slave_eth_dev->data->port_id, q_id,
>> -- 
>> 2.7.4
>>
> 
> NAK
> 
> There are still some users of this code.  Let's give them a chance to
> comment before removing it.

Hi Eric,

Are these users in CC-list? If not, could you, please, add them?
This patch awaits in mail-list already more than a month. I think, it's enough
time period for all who wants to say something. Patch fixes a real bug that
prevent using of DPDK bonding in all applications that reconfigures devices
in runtime including OVS.

Best regards, Ilya Maximets.
  
Bruce Richardson Oct. 12, 2016, 3:24 p.m. UTC | #5
On Wed, Oct 12, 2016 at 04:24:54PM +0300, Ilya Maximets wrote:
> On 07.10.2016 05:02, Eric Kinzie wrote:
> > On Wed Sep 07 15:28:10 +0300 2016, Ilya Maximets wrote:
> >> This reverts commit 5b7bb2bda5519b7800f814df64d4e015282140e5.
> >>
> >> It is necessary to reconfigure all queues every time because configuration
> >> can be changed.
> >>
> >> For example, if we're reconfiguring bonding device with new memory pool,
> >> already configured queues will still use the old one. And if the old
> >> mempool be freed, application likely will panic in attempt to use
> >> freed mempool.
> >>
> >> This happens when we use the bonding device with OVS 2.6 while MTU
> >> reconfiguration:
> >>
> >> PANIC in rte_mempool_get_ops():
> >> assert "(ops_index >= 0) && (ops_index < RTE_MEMPOOL_MAX_OPS_IDX)" failed
> >>
> >> Cc: <stable@dpdk.org>
> >> Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
> >> ---
> >>  drivers/net/bonding/rte_eth_bond_pmd.c | 10 ++--------
> >>  1 file changed, 2 insertions(+), 8 deletions(-)
> >>
> >> diff --git a/drivers/net/bonding/rte_eth_bond_pmd.c b/drivers/net/bonding/rte_eth_bond_pmd.c
> >> index b20a272..eb5b6d1 100644
> >> --- a/drivers/net/bonding/rte_eth_bond_pmd.c
> >> +++ b/drivers/net/bonding/rte_eth_bond_pmd.c
> >> @@ -1305,8 +1305,6 @@ slave_configure(struct rte_eth_dev *bonded_eth_dev,
> >>  	struct bond_rx_queue *bd_rx_q;
> >>  	struct bond_tx_queue *bd_tx_q;
> >>  
> >> -	uint16_t old_nb_tx_queues = slave_eth_dev->data->nb_tx_queues;
> >> -	uint16_t old_nb_rx_queues = slave_eth_dev->data->nb_rx_queues;
> >>  	int errval;
> >>  	uint16_t q_id;
> >>  
> >> @@ -1347,9 +1345,7 @@ slave_configure(struct rte_eth_dev *bonded_eth_dev,
> >>  	}
> >>  
> >>  	/* Setup Rx Queues */
> >> -	/* Use existing queues, if any */
> >> -	for (q_id = old_nb_rx_queues;
> >> -	     q_id < bonded_eth_dev->data->nb_rx_queues; q_id++) {
> >> +	for (q_id = 0; q_id < bonded_eth_dev->data->nb_rx_queues; q_id++) {
> >>  		bd_rx_q = (struct bond_rx_queue *)bonded_eth_dev->data->rx_queues[q_id];
> >>  
> >>  		errval = rte_eth_rx_queue_setup(slave_eth_dev->data->port_id, q_id,
> >> @@ -1365,9 +1361,7 @@ slave_configure(struct rte_eth_dev *bonded_eth_dev,
> >>  	}
> >>  
> >>  	/* Setup Tx Queues */
> >> -	/* Use existing queues, if any */
> >> -	for (q_id = old_nb_tx_queues;
> >> -	     q_id < bonded_eth_dev->data->nb_tx_queues; q_id++) {
> >> +	for (q_id = 0; q_id < bonded_eth_dev->data->nb_tx_queues; q_id++) {
> >>  		bd_tx_q = (struct bond_tx_queue *)bonded_eth_dev->data->tx_queues[q_id];
> >>  
> >>  		errval = rte_eth_tx_queue_setup(slave_eth_dev->data->port_id, q_id,
> >> -- 
> >> 2.7.4
> >>
> > 
> > NAK
> > 
> > There are still some users of this code.  Let's give them a chance to
> > comment before removing it.
> 
> Hi Eric,
> 
> Are these users in CC-list? If not, could you, please, add them?
> This patch awaits in mail-list already more than a month. I think, it's enough
> time period for all who wants to say something. Patch fixes a real bug that
> prevent using of DPDK bonding in all applications that reconfigures devices
> in runtime including OVS.
> 
Agreed.

Eric, does reverting this patch cause you problems directly, or is your concern
just with regards to potential impact to others?

Thanks,
/Bruce
  
Eric Kinzie Oct. 13, 2016, 11:37 p.m. UTC | #6
On Wed Oct 12 16:24:21 +0100 2016, Bruce Richardson wrote:
> On Wed, Oct 12, 2016 at 04:24:54PM +0300, Ilya Maximets wrote:
> > On 07.10.2016 05:02, Eric Kinzie wrote:
> > > On Wed Sep 07 15:28:10 +0300 2016, Ilya Maximets wrote:
> > >> This reverts commit 5b7bb2bda5519b7800f814df64d4e015282140e5.
> > >>
> > >> It is necessary to reconfigure all queues every time because configuration
> > >> can be changed.
> > >>
> > >> For example, if we're reconfiguring bonding device with new memory pool,
> > >> already configured queues will still use the old one. And if the old
> > >> mempool be freed, application likely will panic in attempt to use
> > >> freed mempool.
> > >>
> > >> This happens when we use the bonding device with OVS 2.6 while MTU
> > >> reconfiguration:
> > >>
> > >> PANIC in rte_mempool_get_ops():
> > >> assert "(ops_index >= 0) && (ops_index < RTE_MEMPOOL_MAX_OPS_IDX)" failed
> > >>
> > >> Cc: <stable@dpdk.org>
> > >> Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
> > >> ---
> > >>  drivers/net/bonding/rte_eth_bond_pmd.c | 10 ++--------
> > >>  1 file changed, 2 insertions(+), 8 deletions(-)
> > >>
> > >> diff --git a/drivers/net/bonding/rte_eth_bond_pmd.c b/drivers/net/bonding/rte_eth_bond_pmd.c
> > >> index b20a272..eb5b6d1 100644
> > >> --- a/drivers/net/bonding/rte_eth_bond_pmd.c
> > >> +++ b/drivers/net/bonding/rte_eth_bond_pmd.c
> > >> @@ -1305,8 +1305,6 @@ slave_configure(struct rte_eth_dev *bonded_eth_dev,
> > >>  	struct bond_rx_queue *bd_rx_q;
> > >>  	struct bond_tx_queue *bd_tx_q;
> > >>  
> > >> -	uint16_t old_nb_tx_queues = slave_eth_dev->data->nb_tx_queues;
> > >> -	uint16_t old_nb_rx_queues = slave_eth_dev->data->nb_rx_queues;
> > >>  	int errval;
> > >>  	uint16_t q_id;
> > >>  
> > >> @@ -1347,9 +1345,7 @@ slave_configure(struct rte_eth_dev *bonded_eth_dev,
> > >>  	}
> > >>  
> > >>  	/* Setup Rx Queues */
> > >> -	/* Use existing queues, if any */
> > >> -	for (q_id = old_nb_rx_queues;
> > >> -	     q_id < bonded_eth_dev->data->nb_rx_queues; q_id++) {
> > >> +	for (q_id = 0; q_id < bonded_eth_dev->data->nb_rx_queues; q_id++) {
> > >>  		bd_rx_q = (struct bond_rx_queue *)bonded_eth_dev->data->rx_queues[q_id];
> > >>  
> > >>  		errval = rte_eth_rx_queue_setup(slave_eth_dev->data->port_id, q_id,
> > >> @@ -1365,9 +1361,7 @@ slave_configure(struct rte_eth_dev *bonded_eth_dev,
> > >>  	}
> > >>  
> > >>  	/* Setup Tx Queues */
> > >> -	/* Use existing queues, if any */
> > >> -	for (q_id = old_nb_tx_queues;
> > >> -	     q_id < bonded_eth_dev->data->nb_tx_queues; q_id++) {
> > >> +	for (q_id = 0; q_id < bonded_eth_dev->data->nb_tx_queues; q_id++) {
> > >>  		bd_tx_q = (struct bond_tx_queue *)bonded_eth_dev->data->tx_queues[q_id];
> > >>  
> > >>  		errval = rte_eth_tx_queue_setup(slave_eth_dev->data->port_id, q_id,
> > >> -- 
> > >> 2.7.4
> > >>
> > > 
> > > NAK
> > > 
> > > There are still some users of this code.  Let's give them a chance to
> > > comment before removing it.
> > 
> > Hi Eric,
> > 
> > Are these users in CC-list? If not, could you, please, add them?
> > This patch awaits in mail-list already more than a month. I think, it's enough
> > time period for all who wants to say something. Patch fixes a real bug that
> > prevent using of DPDK bonding in all applications that reconfigures devices
> > in runtime including OVS.
> > 
> Agreed.
> 
> Eric, does reverting this patch cause you problems directly, or is your concern
> just with regards to potential impact to others?
> 
> Thanks,
> /Bruce

This won't impact me directly.  The users are CCed (different thread)
and I haven't seen any comment, so I no longer have any objection to
reverting this change.

Eric
  
Jan Blunck Oct. 18, 2016, 12:28 p.m. UTC | #7
If the application already configured queues the PMD should not
silently claim ownership and reset them.

What exactly is the problem when changing MTU? This works fine from
what I can tell.

Cheers,
Jan

On Wed, Sep 7, 2016 at 2:28 PM, Ilya Maximets <i.maximets@samsung.com> wrote:
> This reverts commit 5b7bb2bda5519b7800f814df64d4e015282140e5.
>
> It is necessary to reconfigure all queues every time because configuration
> can be changed.
>
> For example, if we're reconfiguring bonding device with new memory pool,
> already configured queues will still use the old one. And if the old
> mempool be freed, application likely will panic in attempt to use
> freed mempool.
>
> This happens when we use the bonding device with OVS 2.6 while MTU
> reconfiguration:
>
> PANIC in rte_mempool_get_ops():
> assert "(ops_index >= 0) && (ops_index < RTE_MEMPOOL_MAX_OPS_IDX)" failed
>
> Cc: <stable@dpdk.org>
> Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
> ---
>  drivers/net/bonding/rte_eth_bond_pmd.c | 10 ++--------
>  1 file changed, 2 insertions(+), 8 deletions(-)
>
> diff --git a/drivers/net/bonding/rte_eth_bond_pmd.c b/drivers/net/bonding/rte_eth_bond_pmd.c
> index b20a272..eb5b6d1 100644
> --- a/drivers/net/bonding/rte_eth_bond_pmd.c
> +++ b/drivers/net/bonding/rte_eth_bond_pmd.c
> @@ -1305,8 +1305,6 @@ slave_configure(struct rte_eth_dev *bonded_eth_dev,
>         struct bond_rx_queue *bd_rx_q;
>         struct bond_tx_queue *bd_tx_q;
>
> -       uint16_t old_nb_tx_queues = slave_eth_dev->data->nb_tx_queues;
> -       uint16_t old_nb_rx_queues = slave_eth_dev->data->nb_rx_queues;
>         int errval;
>         uint16_t q_id;
>
> @@ -1347,9 +1345,7 @@ slave_configure(struct rte_eth_dev *bonded_eth_dev,
>         }
>
>         /* Setup Rx Queues */
> -       /* Use existing queues, if any */
> -       for (q_id = old_nb_rx_queues;
> -            q_id < bonded_eth_dev->data->nb_rx_queues; q_id++) {
> +       for (q_id = 0; q_id < bonded_eth_dev->data->nb_rx_queues; q_id++) {
>                 bd_rx_q = (struct bond_rx_queue *)bonded_eth_dev->data->rx_queues[q_id];
>
>                 errval = rte_eth_rx_queue_setup(slave_eth_dev->data->port_id, q_id,
> @@ -1365,9 +1361,7 @@ slave_configure(struct rte_eth_dev *bonded_eth_dev,
>         }
>
>         /* Setup Tx Queues */
> -       /* Use existing queues, if any */
> -       for (q_id = old_nb_tx_queues;
> -            q_id < bonded_eth_dev->data->nb_tx_queues; q_id++) {
> +       for (q_id = 0; q_id < bonded_eth_dev->data->nb_tx_queues; q_id++) {
>                 bd_tx_q = (struct bond_tx_queue *)bonded_eth_dev->data->tx_queues[q_id];
>
>                 errval = rte_eth_tx_queue_setup(slave_eth_dev->data->port_id, q_id,
> --
> 2.7.4
>
  
Ilya Maximets Oct. 18, 2016, 12:49 p.m. UTC | #8
On 18.10.2016 15:28, Jan Blunck wrote:
> If the application already configured queues the PMD should not
> silently claim ownership and reset them.
> 
> What exactly is the problem when changing MTU? This works fine from
> what I can tell.

Following scenario leads to APP PANIC:

	1. mempool_1 = rte_mempool_create()
	2. rte_eth_rx_queue_setup(bond0, ..., mempool_1);
	3. rte_eth_dev_start(bond0);
	4. mempool_2 = rte_mempool_create();
	5. rte_eth_dev_stop(bond0);
	6. rte_eth_rx_queue_setup(bond0, ..., mempool_2);
	7. rte_eth_dev_start(bond0);
	* RX queues still use 'mempool_1' because reconfiguration doesn't affect them. *
	8. rte_mempool_free(mempool_1);
	9. On any rx operation we'll get PANIC because of using freed 'mempool_1':
	 PANIC in rte_mempool_get_ops():
	 assert "(ops_index >= 0) && (ops_index < RTE_MEMPOOL_MAX_OPS_IDX)" failed

You may just start OVS 2.6 with DPDK bonding device and attempt to change MTU via 'mtu_request'.
Bug is easily reproducible.

Best regards, Ilya Maximets.

> 
> On Wed, Sep 7, 2016 at 2:28 PM, Ilya Maximets <i.maximets@samsung.com> wrote:
>> This reverts commit 5b7bb2bda5519b7800f814df64d4e015282140e5.
>>
>> It is necessary to reconfigure all queues every time because configuration
>> can be changed.
>>
>> For example, if we're reconfiguring bonding device with new memory pool,
>> already configured queues will still use the old one. And if the old
>> mempool be freed, application likely will panic in attempt to use
>> freed mempool.
>>
>> This happens when we use the bonding device with OVS 2.6 while MTU
>> reconfiguration:
>>
>> PANIC in rte_mempool_get_ops():
>> assert "(ops_index >= 0) && (ops_index < RTE_MEMPOOL_MAX_OPS_IDX)" failed
>>
>> Cc: <stable@dpdk.org>
>> Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
>> ---
>>  drivers/net/bonding/rte_eth_bond_pmd.c | 10 ++--------
>>  1 file changed, 2 insertions(+), 8 deletions(-)
>>
>> diff --git a/drivers/net/bonding/rte_eth_bond_pmd.c b/drivers/net/bonding/rte_eth_bond_pmd.c
>> index b20a272..eb5b6d1 100644
>> --- a/drivers/net/bonding/rte_eth_bond_pmd.c
>> +++ b/drivers/net/bonding/rte_eth_bond_pmd.c
>> @@ -1305,8 +1305,6 @@ slave_configure(struct rte_eth_dev *bonded_eth_dev,
>>         struct bond_rx_queue *bd_rx_q;
>>         struct bond_tx_queue *bd_tx_q;
>>
>> -       uint16_t old_nb_tx_queues = slave_eth_dev->data->nb_tx_queues;
>> -       uint16_t old_nb_rx_queues = slave_eth_dev->data->nb_rx_queues;
>>         int errval;
>>         uint16_t q_id;
>>
>> @@ -1347,9 +1345,7 @@ slave_configure(struct rte_eth_dev *bonded_eth_dev,
>>         }
>>
>>         /* Setup Rx Queues */
>> -       /* Use existing queues, if any */
>> -       for (q_id = old_nb_rx_queues;
>> -            q_id < bonded_eth_dev->data->nb_rx_queues; q_id++) {
>> +       for (q_id = 0; q_id < bonded_eth_dev->data->nb_rx_queues; q_id++) {
>>                 bd_rx_q = (struct bond_rx_queue *)bonded_eth_dev->data->rx_queues[q_id];
>>
>>                 errval = rte_eth_rx_queue_setup(slave_eth_dev->data->port_id, q_id,
>> @@ -1365,9 +1361,7 @@ slave_configure(struct rte_eth_dev *bonded_eth_dev,
>>         }
>>
>>         /* Setup Tx Queues */
>> -       /* Use existing queues, if any */
>> -       for (q_id = old_nb_tx_queues;
>> -            q_id < bonded_eth_dev->data->nb_tx_queues; q_id++) {
>> +       for (q_id = 0; q_id < bonded_eth_dev->data->nb_tx_queues; q_id++) {
>>                 bd_tx_q = (struct bond_tx_queue *)bonded_eth_dev->data->tx_queues[q_id];
>>
>>                 errval = rte_eth_tx_queue_setup(slave_eth_dev->data->port_id, q_id,
>> --
>> 2.7.4
>>
> 
> 
>
  
Jan Blunck Oct. 18, 2016, 3:19 p.m. UTC | #9
On Tue, Oct 18, 2016 at 2:49 PM, Ilya Maximets <i.maximets@samsung.com> wrote:
> On 18.10.2016 15:28, Jan Blunck wrote:
>> If the application already configured queues the PMD should not
>> silently claim ownership and reset them.
>>
>> What exactly is the problem when changing MTU? This works fine from
>> what I can tell.
>
> Following scenario leads to APP PANIC:
>
>         1. mempool_1 = rte_mempool_create()
>         2. rte_eth_rx_queue_setup(bond0, ..., mempool_1);
>         3. rte_eth_dev_start(bond0);
>         4. mempool_2 = rte_mempool_create();
>         5. rte_eth_dev_stop(bond0);
>         6. rte_eth_rx_queue_setup(bond0, ..., mempool_2);
>         7. rte_eth_dev_start(bond0);
>         * RX queues still use 'mempool_1' because reconfiguration doesn't affect them. *
>         8. rte_mempool_free(mempool_1);
>         9. On any rx operation we'll get PANIC because of using freed 'mempool_1':
>          PANIC in rte_mempool_get_ops():
>          assert "(ops_index >= 0) && (ops_index < RTE_MEMPOOL_MAX_OPS_IDX)" failed
>
> You may just start OVS 2.6 with DPDK bonding device and attempt to change MTU via 'mtu_request'.
> Bug is easily reproducible.
>

I see. I'm not 100% that this is expected to work without leaking the
driver's queues though. The driver is allowed to do allocations in
its rx_queue_setup() function that are being freed via
rx_queue_release() later. But rx_queue_release() is only called if you
reconfigure the
device with 0 queues. From what I understand there is no other way to
reconfigure a device to use another mempool.

But ... even that wouldn't work with the bonding driver right now: the
bonding master only configures the slaves during startup. I can put
that on my todo list.

Coming back to your original problem: changing the MTU for the bond
does work through rte_eth_dev_set_mtu() for slaves supporting that. In
any other case you could (re-)configure rxmode.max_rx_pkt_len (and
jumbo_frame / enable_scatter accordingly). This does work without a
call to rte_eth_rx_queue_setup().

Hope this helps,
Jan

> Best regards, Ilya Maximets.
>
>>
>> On Wed, Sep 7, 2016 at 2:28 PM, Ilya Maximets <i.maximets@samsung.com> wrote:
>>> This reverts commit 5b7bb2bda5519b7800f814df64d4e015282140e5.
>>>
>>> It is necessary to reconfigure all queues every time because configuration
>>> can be changed.
>>>
>>> For example, if we're reconfiguring bonding device with new memory pool,
>>> already configured queues will still use the old one. And if the old
>>> mempool be freed, application likely will panic in attempt to use
>>> freed mempool.
>>>
>>> This happens when we use the bonding device with OVS 2.6 while MTU
>>> reconfiguration:
>>>
>>> PANIC in rte_mempool_get_ops():
>>> assert "(ops_index >= 0) && (ops_index < RTE_MEMPOOL_MAX_OPS_IDX)" failed
>>>
>>> Cc: <stable@dpdk.org>
>>> Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
>>> ---
>>>  drivers/net/bonding/rte_eth_bond_pmd.c | 10 ++--------
>>>  1 file changed, 2 insertions(+), 8 deletions(-)
>>>
>>> diff --git a/drivers/net/bonding/rte_eth_bond_pmd.c b/drivers/net/bonding/rte_eth_bond_pmd.c
>>> index b20a272..eb5b6d1 100644
>>> --- a/drivers/net/bonding/rte_eth_bond_pmd.c
>>> +++ b/drivers/net/bonding/rte_eth_bond_pmd.c
>>> @@ -1305,8 +1305,6 @@ slave_configure(struct rte_eth_dev *bonded_eth_dev,
>>>         struct bond_rx_queue *bd_rx_q;
>>>         struct bond_tx_queue *bd_tx_q;
>>>
>>> -       uint16_t old_nb_tx_queues = slave_eth_dev->data->nb_tx_queues;
>>> -       uint16_t old_nb_rx_queues = slave_eth_dev->data->nb_rx_queues;
>>>         int errval;
>>>         uint16_t q_id;
>>>
>>> @@ -1347,9 +1345,7 @@ slave_configure(struct rte_eth_dev *bonded_eth_dev,
>>>         }
>>>
>>>         /* Setup Rx Queues */
>>> -       /* Use existing queues, if any */
>>> -       for (q_id = old_nb_rx_queues;
>>> -            q_id < bonded_eth_dev->data->nb_rx_queues; q_id++) {
>>> +       for (q_id = 0; q_id < bonded_eth_dev->data->nb_rx_queues; q_id++) {
>>>                 bd_rx_q = (struct bond_rx_queue *)bonded_eth_dev->data->rx_queues[q_id];
>>>
>>>                 errval = rte_eth_rx_queue_setup(slave_eth_dev->data->port_id, q_id,
>>> @@ -1365,9 +1361,7 @@ slave_configure(struct rte_eth_dev *bonded_eth_dev,
>>>         }
>>>
>>>         /* Setup Tx Queues */
>>> -       /* Use existing queues, if any */
>>> -       for (q_id = old_nb_tx_queues;
>>> -            q_id < bonded_eth_dev->data->nb_tx_queues; q_id++) {
>>> +       for (q_id = 0; q_id < bonded_eth_dev->data->nb_tx_queues; q_id++) {
>>>                 bd_tx_q = (struct bond_tx_queue *)bonded_eth_dev->data->tx_queues[q_id];
>>>
>>>                 errval = rte_eth_tx_queue_setup(slave_eth_dev->data->port_id, q_id,
>>> --
>>> 2.7.4
>>>
>>
>>
>>
  
Ilya Maximets Oct. 19, 2016, 9:47 a.m. UTC | #10
On 18.10.2016 18:19, Jan Blunck wrote:
> On Tue, Oct 18, 2016 at 2:49 PM, Ilya Maximets <i.maximets@samsung.com> wrote:
>> On 18.10.2016 15:28, Jan Blunck wrote:
>>> If the application already configured queues the PMD should not
>>> silently claim ownership and reset them.
>>>
>>> What exactly is the problem when changing MTU? This works fine from
>>> what I can tell.
>>
>> Following scenario leads to APP PANIC:
>>
>>         1. mempool_1 = rte_mempool_create()
>>         2. rte_eth_rx_queue_setup(bond0, ..., mempool_1);
>>         3. rte_eth_dev_start(bond0);
>>         4. mempool_2 = rte_mempool_create();
>>         5. rte_eth_dev_stop(bond0);
>>         6. rte_eth_rx_queue_setup(bond0, ..., mempool_2);
>>         7. rte_eth_dev_start(bond0);
>>         * RX queues still use 'mempool_1' because reconfiguration doesn't affect them. *
>>         8. rte_mempool_free(mempool_1);
>>         9. On any rx operation we'll get PANIC because of using freed 'mempool_1':
>>          PANIC in rte_mempool_get_ops():
>>          assert "(ops_index >= 0) && (ops_index < RTE_MEMPOOL_MAX_OPS_IDX)" failed
>>
>> You may just start OVS 2.6 with DPDK bonding device and attempt to change MTU via 'mtu_request'.
>> Bug is easily reproducible.
>>
> 
> I see. I'm not 100% that this is expected to work without leaking the
> driver's queues though. The driver is allowed to do allocations in
> its rx_queue_setup() function that are being freed via
> rx_queue_release() later. But rx_queue_release() is only called if you
> reconfigure the
> device with 0 queues. From what I understand there is no other way to
> reconfigure a device to use another mempool.
> 
> But ... even that wouldn't work with the bonding driver right now: the
> bonding master only configures the slaves during startup. I can put
> that on my todo list.
> 
> Coming back to your original problem: changing the MTU for the bond
> does work through rte_eth_dev_set_mtu() for slaves supporting that. In
> any other case you could (re-)configure rxmode.max_rx_pkt_len (and
> jumbo_frame / enable_scatter accordingly). This does work without a
> call to rte_eth_rx_queue_setup().

Thanks for suggestion, but using of rte_eth_dev_set_mtu() without
reconfiguration will require to have mempools with huge mbufs (9KB)
for all ports from the start. This is unacceptable because leads to
significant performance regressions because of fast cache exhausting.
Also this will require big work to rewrite OVS reconfiguration code
this way.
Anyway, it isn't the MTU only problem. Number of rx/tx descriptors
also can't be changed in runtime.


I'm not fully understand what is the use case for this 'reusing' code.
Could you, please, describe situation where this behaviour is necessary?

Best regards, Ilya Maximets.

>>
>>>
>>> On Wed, Sep 7, 2016 at 2:28 PM, Ilya Maximets <i.maximets@samsung.com> wrote:
>>>> This reverts commit 5b7bb2bda5519b7800f814df64d4e015282140e5.
>>>>
>>>> It is necessary to reconfigure all queues every time because configuration
>>>> can be changed.
>>>>
>>>> For example, if we're reconfiguring bonding device with new memory pool,
>>>> already configured queues will still use the old one. And if the old
>>>> mempool be freed, application likely will panic in attempt to use
>>>> freed mempool.
>>>>
>>>> This happens when we use the bonding device with OVS 2.6 while MTU
>>>> reconfiguration:
>>>>
>>>> PANIC in rte_mempool_get_ops():
>>>> assert "(ops_index >= 0) && (ops_index < RTE_MEMPOOL_MAX_OPS_IDX)" failed
>>>>
>>>> Cc: <stable@dpdk.org>
>>>> Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
>>>> ---
>>>>  drivers/net/bonding/rte_eth_bond_pmd.c | 10 ++--------
>>>>  1 file changed, 2 insertions(+), 8 deletions(-)
>>>>
>>>> diff --git a/drivers/net/bonding/rte_eth_bond_pmd.c b/drivers/net/bonding/rte_eth_bond_pmd.c
>>>> index b20a272..eb5b6d1 100644
>>>> --- a/drivers/net/bonding/rte_eth_bond_pmd.c
>>>> +++ b/drivers/net/bonding/rte_eth_bond_pmd.c
>>>> @@ -1305,8 +1305,6 @@ slave_configure(struct rte_eth_dev *bonded_eth_dev,
>>>>         struct bond_rx_queue *bd_rx_q;
>>>>         struct bond_tx_queue *bd_tx_q;
>>>>
>>>> -       uint16_t old_nb_tx_queues = slave_eth_dev->data->nb_tx_queues;
>>>> -       uint16_t old_nb_rx_queues = slave_eth_dev->data->nb_rx_queues;
>>>>         int errval;
>>>>         uint16_t q_id;
>>>>
>>>> @@ -1347,9 +1345,7 @@ slave_configure(struct rte_eth_dev *bonded_eth_dev,
>>>>         }
>>>>
>>>>         /* Setup Rx Queues */
>>>> -       /* Use existing queues, if any */
>>>> -       for (q_id = old_nb_rx_queues;
>>>> -            q_id < bonded_eth_dev->data->nb_rx_queues; q_id++) {
>>>> +       for (q_id = 0; q_id < bonded_eth_dev->data->nb_rx_queues; q_id++) {
>>>>                 bd_rx_q = (struct bond_rx_queue *)bonded_eth_dev->data->rx_queues[q_id];
>>>>
>>>>                 errval = rte_eth_rx_queue_setup(slave_eth_dev->data->port_id, q_id,
>>>> @@ -1365,9 +1361,7 @@ slave_configure(struct rte_eth_dev *bonded_eth_dev,
>>>>         }
>>>>
>>>>         /* Setup Tx Queues */
>>>> -       /* Use existing queues, if any */
>>>> -       for (q_id = old_nb_tx_queues;
>>>> -            q_id < bonded_eth_dev->data->nb_tx_queues; q_id++) {
>>>> +       for (q_id = 0; q_id < bonded_eth_dev->data->nb_tx_queues; q_id++) {
>>>>                 bd_tx_q = (struct bond_tx_queue *)bonded_eth_dev->data->tx_queues[q_id];
>>>>
>>>>                 errval = rte_eth_tx_queue_setup(slave_eth_dev->data->port_id, q_id,
>>>> --
>>>> 2.7.4
>>>>
>>>
>>>
>>>
> 
> 
>
  
Bruce Richardson Oct. 19, 2016, 9:55 a.m. UTC | #11
On Thu, Oct 06, 2016 at 03:32:36PM +0100, Declan Doherty wrote:
> On 07/09/16 13:28, Ilya Maximets wrote:
> > This reverts commit 5b7bb2bda5519b7800f814df64d4e015282140e5.
> > 
> > It is necessary to reconfigure all queues every time because configuration
> > can be changed.
> > 
> 
> Hey Ilya, this makes sense. I guess this was my original intention but I
> missed this case in my review of the change.
> 
> > For example, if we're reconfiguring bonding device with new memory pool,
> > already configured queues will still use the old one. And if the old
> > mempool be freed, application likely will panic in attempt to use
> > freed mempool.
> > 
> > This happens when we use the bonding device with OVS 2.6 while MTU
> > reconfiguration:
> > 
> > PANIC in rte_mempool_get_ops():
> > assert "(ops_index >= 0) && (ops_index < RTE_MEMPOOL_MAX_OPS_IDX)" failed
> > 
> > Cc: <stable@dpdk.org>
> > Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
> > ---
> 
> Acked-by: Declan Doherty <declan.doherty@intel.com>

Applied to dpdk-next-net/rel_16_11

/Bruce
  
Doherty, Declan Oct. 24, 2016, 11:02 a.m. UTC | #12
On 14/10/16 00:37, Eric Kinzie wrote:
> On Wed Oct 12 16:24:21 +0100 2016, Bruce Richardson wrote:
>> On Wed, Oct 12, 2016 at 04:24:54PM +0300, Ilya Maximets wrote:
>>> On 07.10.2016 05:02, Eric Kinzie wrote:
>>>> On Wed Sep 07 15:28:10 +0300 2016, Ilya Maximets wrote:
>>>>> This reverts commit 5b7bb2bda5519b7800f814df64d4e015282140e5.
>>>>>
>>>>> It is necessary to reconfigure all queues every time because configuration
>>>>> can be changed.
>>>>>
>>>>> For example, if we're reconfiguring bonding device with new memory pool,
>>>>> already configured queues will still use the old one. And if the old
>>>>> mempool be freed, application likely will panic in attempt to use
>>>>> freed mempool.
>>>>>
>>>>> This happens when we use the bonding device with OVS 2.6 while MTU
>>>>> reconfiguration:
>>>>>
>>>>> PANIC in rte_mempool_get_ops():
>>>>> assert "(ops_index >= 0) && (ops_index < RTE_MEMPOOL_MAX_OPS_IDX)" failed
>>>>>
>>>>> Cc: <stable@dpdk.org>
>>>>> Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
>>>>> ---
>>>>>  drivers/net/bonding/rte_eth_bond_pmd.c | 10 ++--------
>>>>>  1 file changed, 2 insertions(+), 8 deletions(-)
>>>>>
>>>>> diff --git a/drivers/net/bonding/rte_eth_bond_pmd.c b/drivers/net/bonding/rte_eth_bond_pmd.c
>>>>> index b20a272..eb5b6d1 100644
>>>>> --- a/drivers/net/bonding/rte_eth_bond_pmd.c
>>>>> +++ b/drivers/net/bonding/rte_eth_bond_pmd.c
>>>>> @@ -1305,8 +1305,6 @@ slave_configure(struct rte_eth_dev *bonded_eth_dev,
>>>>>  	struct bond_rx_queue *bd_rx_q;
>>>>>  	struct bond_tx_queue *bd_tx_q;
>>>>>
>>>>> -	uint16_t old_nb_tx_queues = slave_eth_dev->data->nb_tx_queues;
>>>>> -	uint16_t old_nb_rx_queues = slave_eth_dev->data->nb_rx_queues;
>>>>>  	int errval;
>>>>>  	uint16_t q_id;
>>>>>
>>>>> @@ -1347,9 +1345,7 @@ slave_configure(struct rte_eth_dev *bonded_eth_dev,
>>>>>  	}
>>>>>
>>>>>  	/* Setup Rx Queues */
>>>>> -	/* Use existing queues, if any */
>>>>> -	for (q_id = old_nb_rx_queues;
>>>>> -	     q_id < bonded_eth_dev->data->nb_rx_queues; q_id++) {
>>>>> +	for (q_id = 0; q_id < bonded_eth_dev->data->nb_rx_queues; q_id++) {
>>>>>  		bd_rx_q = (struct bond_rx_queue *)bonded_eth_dev->data->rx_queues[q_id];
>>>>>
>>>>>  		errval = rte_eth_rx_queue_setup(slave_eth_dev->data->port_id, q_id,
>>>>> @@ -1365,9 +1361,7 @@ slave_configure(struct rte_eth_dev *bonded_eth_dev,
>>>>>  	}
>>>>>
>>>>>  	/* Setup Tx Queues */
>>>>> -	/* Use existing queues, if any */
>>>>> -	for (q_id = old_nb_tx_queues;
>>>>> -	     q_id < bonded_eth_dev->data->nb_tx_queues; q_id++) {
>>>>> +	for (q_id = 0; q_id < bonded_eth_dev->data->nb_tx_queues; q_id++) {
>>>>>  		bd_tx_q = (struct bond_tx_queue *)bonded_eth_dev->data->tx_queues[q_id];
>>>>>
>>>>>  		errval = rte_eth_tx_queue_setup(slave_eth_dev->data->port_id, q_id,
>>>>> --
>>>>> 2.7.4
>>>>>
>>>>
>>>> NAK
>>>>
>>>> There are still some users of this code.  Let's give them a chance to
>>>> comment before removing it.
>>>
>>> Hi Eric,
>>>
>>> Are these users in CC-list? If not, could you, please, add them?
>>> This patch awaits in mail-list already more than a month. I think, it's enough
>>> time period for all who wants to say something. Patch fixes a real bug that
>>> prevent using of DPDK bonding in all applications that reconfigures devices
>>> in runtime including OVS.
>>>
>> Agreed.
>>
>> Eric, does reverting this patch cause you problems directly, or is your concern
>> just with regards to potential impact to others?
>>
>> Thanks,
>> /Bruce
>
> This won't impact me directly.  The users are CCed (different thread)
> and I haven't seen any comment, so I no longer have any objection to
> reverting this change.
>
> Eric
>

As there has been no further objections and this reinstates the original 
expected behavior of the bonding driver. I'm re-ack'ing for inclusion in 
release.

Acked-by: Declan Doherty <declan.doherty@intel.com>
  
Jan Blunck Oct. 24, 2016, 2:51 p.m. UTC | #13
On Mon, Oct 24, 2016 at 7:02 AM, Declan Doherty
<declan.doherty@intel.com> wrote:
> On 14/10/16 00:37, Eric Kinzie wrote:
>>
>> On Wed Oct 12 16:24:21 +0100 2016, Bruce Richardson wrote:
>>>
>>> On Wed, Oct 12, 2016 at 04:24:54PM +0300, Ilya Maximets wrote:
>>>>
>>>> On 07.10.2016 05:02, Eric Kinzie wrote:
>>>>>
>>>>> On Wed Sep 07 15:28:10 +0300 2016, Ilya Maximets wrote:
>>>>>>
>>>>>> This reverts commit 5b7bb2bda5519b7800f814df64d4e015282140e5.
>>>>>>
>>>>>> It is necessary to reconfigure all queues every time because
>>>>>> configuration
>>>>>> can be changed.
>>>>>>
>>>>>> For example, if we're reconfiguring bonding device with new memory
>>>>>> pool,
>>>>>> already configured queues will still use the old one. And if the old
>>>>>> mempool be freed, application likely will panic in attempt to use
>>>>>> freed mempool.
>>>>>>
>>>>>> This happens when we use the bonding device with OVS 2.6 while MTU
>>>>>> reconfiguration:
>>>>>>
>>>>>> PANIC in rte_mempool_get_ops():
>>>>>> assert "(ops_index >= 0) && (ops_index < RTE_MEMPOOL_MAX_OPS_IDX)"
>>>>>> failed
>>>>>>
>>>>>> Cc: <stable@dpdk.org>
>>>>>> Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
>>>>>> ---
>>>>>>  drivers/net/bonding/rte_eth_bond_pmd.c | 10 ++--------
>>>>>>  1 file changed, 2 insertions(+), 8 deletions(-)
>>>>>>
>>>>>> diff --git a/drivers/net/bonding/rte_eth_bond_pmd.c
>>>>>> b/drivers/net/bonding/rte_eth_bond_pmd.c
>>>>>> index b20a272..eb5b6d1 100644
>>>>>> --- a/drivers/net/bonding/rte_eth_bond_pmd.c
>>>>>> +++ b/drivers/net/bonding/rte_eth_bond_pmd.c
>>>>>> @@ -1305,8 +1305,6 @@ slave_configure(struct rte_eth_dev
>>>>>> *bonded_eth_dev,
>>>>>>         struct bond_rx_queue *bd_rx_q;
>>>>>>         struct bond_tx_queue *bd_tx_q;
>>>>>>
>>>>>> -       uint16_t old_nb_tx_queues = slave_eth_dev->data->nb_tx_queues;
>>>>>> -       uint16_t old_nb_rx_queues = slave_eth_dev->data->nb_rx_queues;
>>>>>>         int errval;
>>>>>>         uint16_t q_id;
>>>>>>
>>>>>> @@ -1347,9 +1345,7 @@ slave_configure(struct rte_eth_dev
>>>>>> *bonded_eth_dev,
>>>>>>         }
>>>>>>
>>>>>>         /* Setup Rx Queues */
>>>>>> -       /* Use existing queues, if any */
>>>>>> -       for (q_id = old_nb_rx_queues;
>>>>>> -            q_id < bonded_eth_dev->data->nb_rx_queues; q_id++) {
>>>>>> +       for (q_id = 0; q_id < bonded_eth_dev->data->nb_rx_queues;
>>>>>> q_id++) {
>>>>>>                 bd_rx_q = (struct bond_rx_queue
>>>>>> *)bonded_eth_dev->data->rx_queues[q_id];
>>>>>>
>>>>>>                 errval =
>>>>>> rte_eth_rx_queue_setup(slave_eth_dev->data->port_id, q_id,
>>>>>> @@ -1365,9 +1361,7 @@ slave_configure(struct rte_eth_dev
>>>>>> *bonded_eth_dev,
>>>>>>         }
>>>>>>
>>>>>>         /* Setup Tx Queues */
>>>>>> -       /* Use existing queues, if any */
>>>>>> -       for (q_id = old_nb_tx_queues;
>>>>>> -            q_id < bonded_eth_dev->data->nb_tx_queues; q_id++) {
>>>>>> +       for (q_id = 0; q_id < bonded_eth_dev->data->nb_tx_queues;
>>>>>> q_id++) {
>>>>>>                 bd_tx_q = (struct bond_tx_queue
>>>>>> *)bonded_eth_dev->data->tx_queues[q_id];
>>>>>>
>>>>>>                 errval =
>>>>>> rte_eth_tx_queue_setup(slave_eth_dev->data->port_id, q_id,
>>>>>> --
>>>>>> 2.7.4
>>>>>>
>>>>>
>>>>> NAK
>>>>>
>>>>> There are still some users of this code.  Let's give them a chance to
>>>>> comment before removing it.
>>>>
>>>>
>>>> Hi Eric,
>>>>
>>>> Are these users in CC-list? If not, could you, please, add them?
>>>> This patch awaits in mail-list already more than a month. I think, it's
>>>> enough
>>>> time period for all who wants to say something. Patch fixes a real bug
>>>> that
>>>> prevent using of DPDK bonding in all applications that reconfigures
>>>> devices
>>>> in runtime including OVS.
>>>>
>>> Agreed.
>>>
>>> Eric, does reverting this patch cause you problems directly, or is your
>>> concern
>>> just with regards to potential impact to others?
>>>
>>> Thanks,
>>> /Bruce
>>
>>
>> This won't impact me directly.  The users are CCed (different thread)
>> and I haven't seen any comment, so I no longer have any objection to
>> reverting this change.
>>
>> Eric
>>
>
> As there has been no further objections and this reinstates the original
> expected behavior of the bonding driver. I'm re-ack'ing for inclusion in
> release.
>
> Acked-by: Declan Doherty <declan.doherty@intel.com>

Ok, I can revert the revert for us.

Do I read this correctly that you are not interested in fixing this properly?!

Thanks,
Jan
  
Jan Blunck Oct. 24, 2016, 2:54 p.m. UTC | #14
On Wed, Oct 19, 2016 at 5:47 AM, Ilya Maximets <i.maximets@samsung.com> wrote:
> On 18.10.2016 18:19, Jan Blunck wrote:
>> On Tue, Oct 18, 2016 at 2:49 PM, Ilya Maximets <i.maximets@samsung.com> wrote:
>>> On 18.10.2016 15:28, Jan Blunck wrote:
>>>> If the application already configured queues the PMD should not
>>>> silently claim ownership and reset them.
>>>>
>>>> What exactly is the problem when changing MTU? This works fine from
>>>> what I can tell.
>>>
>>> Following scenario leads to APP PANIC:
>>>
>>>         1. mempool_1 = rte_mempool_create()
>>>         2. rte_eth_rx_queue_setup(bond0, ..., mempool_1);
>>>         3. rte_eth_dev_start(bond0);
>>>         4. mempool_2 = rte_mempool_create();
>>>         5. rte_eth_dev_stop(bond0);
>>>         6. rte_eth_rx_queue_setup(bond0, ..., mempool_2);
>>>         7. rte_eth_dev_start(bond0);
>>>         * RX queues still use 'mempool_1' because reconfiguration doesn't affect them. *
>>>         8. rte_mempool_free(mempool_1);
>>>         9. On any rx operation we'll get PANIC because of using freed 'mempool_1':
>>>          PANIC in rte_mempool_get_ops():
>>>          assert "(ops_index >= 0) && (ops_index < RTE_MEMPOOL_MAX_OPS_IDX)" failed
>>>
>>> You may just start OVS 2.6 with DPDK bonding device and attempt to change MTU via 'mtu_request'.
>>> Bug is easily reproducible.
>>>
>>
>> I see. I'm not 100% that this is expected to work without leaking the
>> driver's queues though. The driver is allowed to do allocations in
>> its rx_queue_setup() function that are being freed via
>> rx_queue_release() later. But rx_queue_release() is only called if you
>> reconfigure the
>> device with 0 queues. From what I understand there is no other way to
>> reconfigure a device to use another mempool.
>>
>> But ... even that wouldn't work with the bonding driver right now: the
>> bonding master only configures the slaves during startup. I can put
>> that on my todo list.
>>
>> Coming back to your original problem: changing the MTU for the bond
>> does work through rte_eth_dev_set_mtu() for slaves supporting that. In
>> any other case you could (re-)configure rxmode.max_rx_pkt_len (and
>> jumbo_frame / enable_scatter accordingly). This does work without a
>> call to rte_eth_rx_queue_setup().
>
> Thanks for suggestion, but using of rte_eth_dev_set_mtu() without
> reconfiguration will require to have mempools with huge mbufs (9KB)
> for all ports from the start. This is unacceptable because leads to
> significant performance regressions because of fast cache exhausting.
> Also this will require big work to rewrite OVS reconfiguration code
> this way.
> Anyway, it isn't the MTU only problem. Number of rx/tx descriptors
> also can't be changed in runtime.
>
>
> I'm not fully understand what is the use case for this 'reusing' code.
> Could you, please, describe situation where this behaviour is necessary?

The device that is added to the bond was used before and therefore
already has allocated queues. Therefore we reuse the existing queues
of the devices instead of borrowing the queues of the bond device. If
the slave is removed from the bond again there is no need to allocate
the queues again.

Hope that clarifies the usecase,
Jan


>
> Best regards, Ilya Maximets.
>
>>>
>>>>
>>>> On Wed, Sep 7, 2016 at 2:28 PM, Ilya Maximets <i.maximets@samsung.com> wrote:
>>>>> This reverts commit 5b7bb2bda5519b7800f814df64d4e015282140e5.
>>>>>
>>>>> It is necessary to reconfigure all queues every time because configuration
>>>>> can be changed.
>>>>>
>>>>> For example, if we're reconfiguring bonding device with new memory pool,
>>>>> already configured queues will still use the old one. And if the old
>>>>> mempool be freed, application likely will panic in attempt to use
>>>>> freed mempool.
>>>>>
>>>>> This happens when we use the bonding device with OVS 2.6 while MTU
>>>>> reconfiguration:
>>>>>
>>>>> PANIC in rte_mempool_get_ops():
>>>>> assert "(ops_index >= 0) && (ops_index < RTE_MEMPOOL_MAX_OPS_IDX)" failed
>>>>>
>>>>> Cc: <stable@dpdk.org>
>>>>> Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
>>>>> ---
>>>>>  drivers/net/bonding/rte_eth_bond_pmd.c | 10 ++--------
>>>>>  1 file changed, 2 insertions(+), 8 deletions(-)
>>>>>
>>>>> diff --git a/drivers/net/bonding/rte_eth_bond_pmd.c b/drivers/net/bonding/rte_eth_bond_pmd.c
>>>>> index b20a272..eb5b6d1 100644
>>>>> --- a/drivers/net/bonding/rte_eth_bond_pmd.c
>>>>> +++ b/drivers/net/bonding/rte_eth_bond_pmd.c
>>>>> @@ -1305,8 +1305,6 @@ slave_configure(struct rte_eth_dev *bonded_eth_dev,
>>>>>         struct bond_rx_queue *bd_rx_q;
>>>>>         struct bond_tx_queue *bd_tx_q;
>>>>>
>>>>> -       uint16_t old_nb_tx_queues = slave_eth_dev->data->nb_tx_queues;
>>>>> -       uint16_t old_nb_rx_queues = slave_eth_dev->data->nb_rx_queues;
>>>>>         int errval;
>>>>>         uint16_t q_id;
>>>>>
>>>>> @@ -1347,9 +1345,7 @@ slave_configure(struct rte_eth_dev *bonded_eth_dev,
>>>>>         }
>>>>>
>>>>>         /* Setup Rx Queues */
>>>>> -       /* Use existing queues, if any */
>>>>> -       for (q_id = old_nb_rx_queues;
>>>>> -            q_id < bonded_eth_dev->data->nb_rx_queues; q_id++) {
>>>>> +       for (q_id = 0; q_id < bonded_eth_dev->data->nb_rx_queues; q_id++) {
>>>>>                 bd_rx_q = (struct bond_rx_queue *)bonded_eth_dev->data->rx_queues[q_id];
>>>>>
>>>>>                 errval = rte_eth_rx_queue_setup(slave_eth_dev->data->port_id, q_id,
>>>>> @@ -1365,9 +1361,7 @@ slave_configure(struct rte_eth_dev *bonded_eth_dev,
>>>>>         }
>>>>>
>>>>>         /* Setup Tx Queues */
>>>>> -       /* Use existing queues, if any */
>>>>> -       for (q_id = old_nb_tx_queues;
>>>>> -            q_id < bonded_eth_dev->data->nb_tx_queues; q_id++) {
>>>>> +       for (q_id = 0; q_id < bonded_eth_dev->data->nb_tx_queues; q_id++) {
>>>>>                 bd_tx_q = (struct bond_tx_queue *)bonded_eth_dev->data->tx_queues[q_id];
>>>>>
>>>>>                 errval = rte_eth_tx_queue_setup(slave_eth_dev->data->port_id, q_id,
>>>>> --
>>>>> 2.7.4
>>>>>
>>>>
>>>>
>>>>
>>
>>
>>
  
Doherty, Declan Oct. 24, 2016, 3:07 p.m. UTC | #15
On 24/10/16 15:51, Jan Blunck wrote:
> On Mon, Oct 24, 2016 at 7:02 AM, Declan Doherty
> <declan.doherty@intel.com> wrote:
>> On 14/10/16 00:37, Eric Kinzie wrote:
>>>
>>> On Wed Oct 12 16:24:21 +0100 2016, Bruce Richardson wrote:
>>>>
>>>> On Wed, Oct 12, 2016 at 04:24:54PM +0300, Ilya Maximets wrote:
>>>>>
>>>>> On 07.10.2016 05:02, Eric Kinzie wrote:
>>>>>>
>>>>>> On Wed Sep 07 15:28:10 +0300 2016, Ilya Maximets wrote:
>>>>>>>
>>>>>>> This reverts commit 5b7bb2bda5519b7800f814df64d4e015282140e5.
>>>>>>>
>>>>>>> It is necessary to reconfigure all queues every time because
>>>>>>> configuration
>>>>>>> can be changed.
>>>>>>>
>>>>>>> For example, if we're reconfiguring bonding device with new memory
>>>>>>> pool,
>>>>>>> already configured queues will still use the old one. And if the old
>>>>>>> mempool be freed, application likely will panic in attempt to use
>>>>>>> freed mempool.
>>>>>>>
>>>>>>> This happens when we use the bonding device with OVS 2.6 while MTU
>>>>>>> reconfiguration:
>>>>>>>
>>>>>>> PANIC in rte_mempool_get_ops():
>>>>>>> assert "(ops_index >= 0) && (ops_index < RTE_MEMPOOL_MAX_OPS_IDX)"
>>>>>>> failed
>>>>>>>
>>>>>>> Cc: <stable@dpdk.org>
>>>>>>> Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
>>>>>>> ---
>>>>>>>  drivers/net/bonding/rte_eth_bond_pmd.c | 10 ++--------
>>>>>>>  1 file changed, 2 insertions(+), 8 deletions(-)
>>>>>>>
>>>>>>> diff --git a/drivers/net/bonding/rte_eth_bond_pmd.c
>>>>>>> b/drivers/net/bonding/rte_eth_bond_pmd.c
>>>>>>> index b20a272..eb5b6d1 100644
>>>>>>> --- a/drivers/net/bonding/rte_eth_bond_pmd.c
>>>>>>> +++ b/drivers/net/bonding/rte_eth_bond_pmd.c
>>>>>>> @@ -1305,8 +1305,6 @@ slave_configure(struct rte_eth_dev
>>>>>>> *bonded_eth_dev,
>>>>>>>         struct bond_rx_queue *bd_rx_q;
>>>>>>>         struct bond_tx_queue *bd_tx_q;
>>>>>>>
>>>>>>> -       uint16_t old_nb_tx_queues = slave_eth_dev->data->nb_tx_queues;
>>>>>>> -       uint16_t old_nb_rx_queues = slave_eth_dev->data->nb_rx_queues;
>>>>>>>         int errval;
>>>>>>>         uint16_t q_id;
>>>>>>>
>>>>>>> @@ -1347,9 +1345,7 @@ slave_configure(struct rte_eth_dev
>>>>>>> *bonded_eth_dev,
>>>>>>>         }
>>>>>>>
>>>>>>>         /* Setup Rx Queues */
>>>>>>> -       /* Use existing queues, if any */
>>>>>>> -       for (q_id = old_nb_rx_queues;
>>>>>>> -            q_id < bonded_eth_dev->data->nb_rx_queues; q_id++) {
>>>>>>> +       for (q_id = 0; q_id < bonded_eth_dev->data->nb_rx_queues;
>>>>>>> q_id++) {
>>>>>>>                 bd_rx_q = (struct bond_rx_queue
>>>>>>> *)bonded_eth_dev->data->rx_queues[q_id];
>>>>>>>
>>>>>>>                 errval =
>>>>>>> rte_eth_rx_queue_setup(slave_eth_dev->data->port_id, q_id,
>>>>>>> @@ -1365,9 +1361,7 @@ slave_configure(struct rte_eth_dev
>>>>>>> *bonded_eth_dev,
>>>>>>>         }
>>>>>>>
>>>>>>>         /* Setup Tx Queues */
>>>>>>> -       /* Use existing queues, if any */
>>>>>>> -       for (q_id = old_nb_tx_queues;
>>>>>>> -            q_id < bonded_eth_dev->data->nb_tx_queues; q_id++) {
>>>>>>> +       for (q_id = 0; q_id < bonded_eth_dev->data->nb_tx_queues;
>>>>>>> q_id++) {
>>>>>>>                 bd_tx_q = (struct bond_tx_queue
>>>>>>> *)bonded_eth_dev->data->tx_queues[q_id];
>>>>>>>
>>>>>>>                 errval =
>>>>>>> rte_eth_tx_queue_setup(slave_eth_dev->data->port_id, q_id,
>>>>>>> --
>>>>>>> 2.7.4
>>>>>>>
>>>>>>
>>>>>> NAK
>>>>>>
>>>>>> There are still some users of this code.  Let's give them a chance to
>>>>>> comment before removing it.
>>>>>
>>>>>
>>>>> Hi Eric,
>>>>>
>>>>> Are these users in CC-list? If not, could you, please, add them?
>>>>> This patch awaits in mail-list already more than a month. I think, it's
>>>>> enough
>>>>> time period for all who wants to say something. Patch fixes a real bug
>>>>> that
>>>>> prevent using of DPDK bonding in all applications that reconfigures
>>>>> devices
>>>>> in runtime including OVS.
>>>>>
>>>> Agreed.
>>>>
>>>> Eric, does reverting this patch cause you problems directly, or is your
>>>> concern
>>>> just with regards to potential impact to others?
>>>>
>>>> Thanks,
>>>> /Bruce
>>>
>>>
>>> This won't impact me directly.  The users are CCed (different thread)
>>> and I haven't seen any comment, so I no longer have any objection to
>>> reverting this change.
>>>
>>> Eric
>>>
>>
>> As there has been no further objections and this reinstates the original
>> expected behavior of the bonding driver. I'm re-ack'ing for inclusion in
>> release.
>>
>> Acked-by: Declan Doherty <declan.doherty@intel.com>
>
> Ok, I can revert the revert for us.
>
> Do I read this correctly that you are not interested in fixing this properly?!
>
> Thanks,
> Jan
>

Jan, sorry I missed the replies from last week due to the way my mail 
client was filtering the conversation. Let me have another look at this 
and I'll come back to the list.

Thanks
Declan
  
Ilya Maximets Oct. 25, 2016, 6:26 a.m. UTC | #16
On 24.10.2016 17:54, Jan Blunck wrote:
> On Wed, Oct 19, 2016 at 5:47 AM, Ilya Maximets <i.maximets@samsung.com> wrote:
>> On 18.10.2016 18:19, Jan Blunck wrote:
>>> On Tue, Oct 18, 2016 at 2:49 PM, Ilya Maximets <i.maximets@samsung.com> wrote:
>>>> On 18.10.2016 15:28, Jan Blunck wrote:
>>>>> If the application already configured queues the PMD should not
>>>>> silently claim ownership and reset them.
>>>>>
>>>>> What exactly is the problem when changing MTU? This works fine from
>>>>> what I can tell.
>>>>
>>>> Following scenario leads to APP PANIC:
>>>>
>>>>         1. mempool_1 = rte_mempool_create()
>>>>         2. rte_eth_rx_queue_setup(bond0, ..., mempool_1);
>>>>         3. rte_eth_dev_start(bond0);
>>>>         4. mempool_2 = rte_mempool_create();
>>>>         5. rte_eth_dev_stop(bond0);
>>>>         6. rte_eth_rx_queue_setup(bond0, ..., mempool_2);
>>>>         7. rte_eth_dev_start(bond0);
>>>>         * RX queues still use 'mempool_1' because reconfiguration doesn't affect them. *
>>>>         8. rte_mempool_free(mempool_1);
>>>>         9. On any rx operation we'll get PANIC because of using freed 'mempool_1':
>>>>          PANIC in rte_mempool_get_ops():
>>>>          assert "(ops_index >= 0) && (ops_index < RTE_MEMPOOL_MAX_OPS_IDX)" failed
>>>>
>>>> You may just start OVS 2.6 with DPDK bonding device and attempt to change MTU via 'mtu_request'.
>>>> Bug is easily reproducible.
>>>>
>>>
>>> I see. I'm not 100% that this is expected to work without leaking the
>>> driver's queues though. The driver is allowed to do allocations in
>>> its rx_queue_setup() function that are being freed via
>>> rx_queue_release() later. But rx_queue_release() is only called if you
>>> reconfigure the
>>> device with 0 queues.

It's not true. Drivers usually calls 'rx_queue_release()' inside
'rx_queue_setup()' function while reallocating the already allocated
queue. (See ixgbe driver for example). Also all HW queues are
usually destroyed inside 'eth_dev_stop()' and reallocated in
'eth_dev_start()' or '{rx,tx}_queue_setup()'.
So, there is no leaks at all.

>>> From what I understand there is no other way to
>>> reconfigure a device to use another mempool.
>>>
>>> But ... even that wouldn't work with the bonding driver right now: the
>>> bonding master only configures the slaves during startup. I can put
>>> that on my todo list.

No, bonding master reconfigures new slaves in 'rte_eth_bond_slave_add()'
if needed.

>>> Coming back to your original problem: changing the MTU for the bond
>>> does work through rte_eth_dev_set_mtu() for slaves supporting that. In
>>> any other case you could (re-)configure rxmode.max_rx_pkt_len (and
>>> jumbo_frame / enable_scatter accordingly). This does work without a
>>> call to rte_eth_rx_queue_setup().
>>
>> Thanks for suggestion, but using of rte_eth_dev_set_mtu() without
>> reconfiguration will require to have mempools with huge mbufs (9KB)
>> for all ports from the start. This is unacceptable because leads to
>> significant performance regressions because of fast cache exhausting.
>> Also this will require big work to rewrite OVS reconfiguration code
>> this way.
>> Anyway, it isn't the MTU only problem. Number of rx/tx descriptors
>> also can't be changed in runtime.
>>
>>
>> I'm not fully understand what is the use case for this 'reusing' code.
>> Could you, please, describe situation where this behaviour is necessary?
> 
> The device that is added to the bond was used before and therefore
> already has allocated queues. Therefore we reuse the existing queues
> of the devices instead of borrowing the queues of the bond device. If
> the slave is removed from the bond again there is no need to allocate
> the queues again.
> 
> Hope that clarifies the usecase

So, As I see, this is just an optimization that leads to differently
configured queues of same device and possible application crash if the
old configuration doesn't valid any more.

With revert applied in your usecase while adding the device to bond
it's queues will be automatically reconfigured according to configuration
of the bond device. If the slave is removed from the bond all its'
queues will remain as configured by bond. You can reconfigure them if
needed. I guess, that in your case configuration of slave devices,
actually, matches configuration of bond device. In that case slave
will remain in the same state after removing from bond as it was
before adding.

Best regards, Ilya Maximets.

>>>>>
>>>>> On Wed, Sep 7, 2016 at 2:28 PM, Ilya Maximets <i.maximets@samsung.com> wrote:
>>>>>> This reverts commit 5b7bb2bda5519b7800f814df64d4e015282140e5.
>>>>>>
>>>>>> It is necessary to reconfigure all queues every time because configuration
>>>>>> can be changed.
>>>>>>
>>>>>> For example, if we're reconfiguring bonding device with new memory pool,
>>>>>> already configured queues will still use the old one. And if the old
>>>>>> mempool be freed, application likely will panic in attempt to use
>>>>>> freed mempool.
>>>>>>
>>>>>> This happens when we use the bonding device with OVS 2.6 while MTU
>>>>>> reconfiguration:
>>>>>>
>>>>>> PANIC in rte_mempool_get_ops():
>>>>>> assert "(ops_index >= 0) && (ops_index < RTE_MEMPOOL_MAX_OPS_IDX)" failed
>>>>>>
>>>>>> Cc: <stable@dpdk.org>
>>>>>> Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
>>>>>> ---
>>>>>>  drivers/net/bonding/rte_eth_bond_pmd.c | 10 ++--------
>>>>>>  1 file changed, 2 insertions(+), 8 deletions(-)
>>>>>>
>>>>>> diff --git a/drivers/net/bonding/rte_eth_bond_pmd.c b/drivers/net/bonding/rte_eth_bond_pmd.c
>>>>>> index b20a272..eb5b6d1 100644
>>>>>> --- a/drivers/net/bonding/rte_eth_bond_pmd.c
>>>>>> +++ b/drivers/net/bonding/rte_eth_bond_pmd.c
>>>>>> @@ -1305,8 +1305,6 @@ slave_configure(struct rte_eth_dev *bonded_eth_dev,
>>>>>>         struct bond_rx_queue *bd_rx_q;
>>>>>>         struct bond_tx_queue *bd_tx_q;
>>>>>>
>>>>>> -       uint16_t old_nb_tx_queues = slave_eth_dev->data->nb_tx_queues;
>>>>>> -       uint16_t old_nb_rx_queues = slave_eth_dev->data->nb_rx_queues;
>>>>>>         int errval;
>>>>>>         uint16_t q_id;
>>>>>>
>>>>>> @@ -1347,9 +1345,7 @@ slave_configure(struct rte_eth_dev *bonded_eth_dev,
>>>>>>         }
>>>>>>
>>>>>>         /* Setup Rx Queues */
>>>>>> -       /* Use existing queues, if any */
>>>>>> -       for (q_id = old_nb_rx_queues;
>>>>>> -            q_id < bonded_eth_dev->data->nb_rx_queues; q_id++) {
>>>>>> +       for (q_id = 0; q_id < bonded_eth_dev->data->nb_rx_queues; q_id++) {
>>>>>>                 bd_rx_q = (struct bond_rx_queue *)bonded_eth_dev->data->rx_queues[q_id];
>>>>>>
>>>>>>                 errval = rte_eth_rx_queue_setup(slave_eth_dev->data->port_id, q_id,
>>>>>> @@ -1365,9 +1361,7 @@ slave_configure(struct rte_eth_dev *bonded_eth_dev,
>>>>>>         }
>>>>>>
>>>>>>         /* Setup Tx Queues */
>>>>>> -       /* Use existing queues, if any */
>>>>>> -       for (q_id = old_nb_tx_queues;
>>>>>> -            q_id < bonded_eth_dev->data->nb_tx_queues; q_id++) {
>>>>>> +       for (q_id = 0; q_id < bonded_eth_dev->data->nb_tx_queues; q_id++) {
>>>>>>                 bd_tx_q = (struct bond_tx_queue *)bonded_eth_dev->data->tx_queues[q_id];
>>>>>>
>>>>>>                 errval = rte_eth_tx_queue_setup(slave_eth_dev->data->port_id, q_id,
>>>>>> --
>>>>>> 2.7.4
  
Bruce Richardson Oct. 25, 2016, 12:57 p.m. UTC | #17
On Mon, Oct 24, 2016 at 04:07:17PM +0100, Declan Doherty wrote:
> On 24/10/16 15:51, Jan Blunck wrote:
> > On Mon, Oct 24, 2016 at 7:02 AM, Declan Doherty
> > <declan.doherty@intel.com> wrote:
> > > On 14/10/16 00:37, Eric Kinzie wrote:
> > > > 
> > > > On Wed Oct 12 16:24:21 +0100 2016, Bruce Richardson wrote:
> > > > > 
> > > > > On Wed, Oct 12, 2016 at 04:24:54PM +0300, Ilya Maximets wrote:
> > > > > > 
> > > > > > On 07.10.2016 05:02, Eric Kinzie wrote:
> > > > > > > 
> > > > > > > On Wed Sep 07 15:28:10 +0300 2016, Ilya Maximets wrote:
> > > > > > > > 
> > > > > > > > This reverts commit 5b7bb2bda5519b7800f814df64d4e015282140e5.
> > > > > > > > 
> > > > > > > > It is necessary to reconfigure all queues every time because
> > > > > > > > configuration
> > > > > > > > can be changed.
> > > > > > > > 
> > > > > > > > For example, if we're reconfiguring bonding device with new memory
> > > > > > > > pool,
> > > > > > > > already configured queues will still use the old one. And if the old
> > > > > > > > mempool be freed, application likely will panic in attempt to use
> > > > > > > > freed mempool.
> > > > > > > > 
> > > > > > > > This happens when we use the bonding device with OVS 2.6 while MTU
> > > > > > > > reconfiguration:
> > > > > > > > 
> > > > > > > > PANIC in rte_mempool_get_ops():
> > > > > > > > assert "(ops_index >= 0) && (ops_index < RTE_MEMPOOL_MAX_OPS_IDX)"
> > > > > > > > failed
> > > > > > > > 
> > > > > > > > Cc: <stable@dpdk.org>
> > > > > > > > Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
> > > > > > > > ---
> > > > > > > >  drivers/net/bonding/rte_eth_bond_pmd.c | 10 ++--------
> > > > > > > >  1 file changed, 2 insertions(+), 8 deletions(-)
> > > > > > > > 
> > > > > > > > diff --git a/drivers/net/bonding/rte_eth_bond_pmd.c
> > > > > > > > b/drivers/net/bonding/rte_eth_bond_pmd.c
> > > > > > > > index b20a272..eb5b6d1 100644
> > > > > > > > --- a/drivers/net/bonding/rte_eth_bond_pmd.c
> > > > > > > > +++ b/drivers/net/bonding/rte_eth_bond_pmd.c
> > > > > > > > @@ -1305,8 +1305,6 @@ slave_configure(struct rte_eth_dev
> > > > > > > > *bonded_eth_dev,
> > > > > > > >         struct bond_rx_queue *bd_rx_q;
> > > > > > > >         struct bond_tx_queue *bd_tx_q;
> > > > > > > > 
> > > > > > > > -       uint16_t old_nb_tx_queues = slave_eth_dev->data->nb_tx_queues;
> > > > > > > > -       uint16_t old_nb_rx_queues = slave_eth_dev->data->nb_rx_queues;
> > > > > > > >         int errval;
> > > > > > > >         uint16_t q_id;
> > > > > > > > 
> > > > > > > > @@ -1347,9 +1345,7 @@ slave_configure(struct rte_eth_dev
> > > > > > > > *bonded_eth_dev,
> > > > > > > >         }
> > > > > > > > 
> > > > > > > >         /* Setup Rx Queues */
> > > > > > > > -       /* Use existing queues, if any */
> > > > > > > > -       for (q_id = old_nb_rx_queues;
> > > > > > > > -            q_id < bonded_eth_dev->data->nb_rx_queues; q_id++) {
> > > > > > > > +       for (q_id = 0; q_id < bonded_eth_dev->data->nb_rx_queues;
> > > > > > > > q_id++) {
> > > > > > > >                 bd_rx_q = (struct bond_rx_queue
> > > > > > > > *)bonded_eth_dev->data->rx_queues[q_id];
> > > > > > > > 
> > > > > > > >                 errval =
> > > > > > > > rte_eth_rx_queue_setup(slave_eth_dev->data->port_id, q_id,
> > > > > > > > @@ -1365,9 +1361,7 @@ slave_configure(struct rte_eth_dev
> > > > > > > > *bonded_eth_dev,
> > > > > > > >         }
> > > > > > > > 
> > > > > > > >         /* Setup Tx Queues */
> > > > > > > > -       /* Use existing queues, if any */
> > > > > > > > -       for (q_id = old_nb_tx_queues;
> > > > > > > > -            q_id < bonded_eth_dev->data->nb_tx_queues; q_id++) {
> > > > > > > > +       for (q_id = 0; q_id < bonded_eth_dev->data->nb_tx_queues;
> > > > > > > > q_id++) {
> > > > > > > >                 bd_tx_q = (struct bond_tx_queue
> > > > > > > > *)bonded_eth_dev->data->tx_queues[q_id];
> > > > > > > > 
> > > > > > > >                 errval =
> > > > > > > > rte_eth_tx_queue_setup(slave_eth_dev->data->port_id, q_id,
> > > > > > > > --
> > > > > > > > 2.7.4
> > > > > > > > 
> > > > > > > 
> > > > > > > NAK
> > > > > > > 
> > > > > > > There are still some users of this code.  Let's give them a chance to
> > > > > > > comment before removing it.
> > > > > > 
> > > > > > 
> > > > > > Hi Eric,
> > > > > > 
> > > > > > Are these users in CC-list? If not, could you, please, add them?
> > > > > > This patch awaits in mail-list already more than a month. I think, it's
> > > > > > enough
> > > > > > time period for all who wants to say something. Patch fixes a real bug
> > > > > > that
> > > > > > prevent using of DPDK bonding in all applications that reconfigures
> > > > > > devices
> > > > > > in runtime including OVS.
> > > > > > 
> > > > > Agreed.
> > > > > 
> > > > > Eric, does reverting this patch cause you problems directly, or is your
> > > > > concern
> > > > > just with regards to potential impact to others?
> > > > > 
> > > > > Thanks,
> > > > > /Bruce
> > > > 
> > > > 
> > > > This won't impact me directly.  The users are CCed (different thread)
> > > > and I haven't seen any comment, so I no longer have any objection to
> > > > reverting this change.
> > > > 
> > > > Eric
> > > > 
> > > 
> > > As there has been no further objections and this reinstates the original
> > > expected behavior of the bonding driver. I'm re-ack'ing for inclusion in
> > > release.
> > > 
> > > Acked-by: Declan Doherty <declan.doherty@intel.com>
> > 
> > Ok, I can revert the revert for us.
> > 
> > Do I read this correctly that you are not interested in fixing this properly?!
> > 
> > Thanks,
> > Jan
> > 
> 
> Jan, sorry I missed the replies from last week due to the way my mail client
> was filtering the conversation. Let me have another look at this and I'll
> come back to the list.
> 
> Thanks
> Declan

While this patch has already been applied to dpdk-next-net tree, it
appears that there is still some ongoing discussion about it. I'm
therefore planning to pull it back out of the tree for rc2. If a
subsequent consensus is reached we can see about including it in rc3.

Declan, as maintainer, does this seem reasonable to you.

Regards,
/Bruce
  
Doherty, Declan Oct. 25, 2016, 1:48 p.m. UTC | #18
On 25/10/16 13:57, Bruce Richardson wrote:
> On Mon, Oct 24, 2016 at 04:07:17PM +0100, Declan Doherty wrote:
>> On 24/10/16 15:51, Jan Blunck wrote:
>>> On Mon, Oct 24, 2016 at 7:02 AM, Declan Doherty
>>> <declan.doherty@intel.com> wrote:
>>>> On 14/10/16 00:37, Eric Kinzie wrote:
>>>>>
>>>>> On Wed Oct 12 16:24:21 +0100 2016, Bruce Richardson wrote:
>>>>>>
>>>>>> On Wed, Oct 12, 2016 at 04:24:54PM +0300, Ilya Maximets wrote:
>>>>>>>
>>>>>>> On 07.10.2016 05:02, Eric Kinzie wrote:
>>>>>>>>
>>>>>>>> On Wed Sep 07 15:28:10 +0300 2016, Ilya Maximets wrote:
>>>>>>>>>
>>>>>>>>> This reverts commit 5b7bb2bda5519b7800f814df64d4e015282140e5.
>>>>>>>>>
>>>>>>>>> It is necessary to reconfigure all queues every time because
>>>>>>>>> configuration
>>>>>>>>> can be changed.
>>>>>>>>>
>>>>>>>>> For example, if we're reconfiguring bonding device with new memory
>>>>>>>>> pool,
>>>>>>>>> already configured queues will still use the old one. And if the old
>>>>>>>>> mempool be freed, application likely will panic in attempt to use
>>>>>>>>> freed mempool.
>>>>>>>>>
>>>>>>>>> This happens when we use the bonding device with OVS 2.6 while MTU
>>>>>>>>> reconfiguration:
>>>>>>>>>
>>>>>>>>> PANIC in rte_mempool_get_ops():
>>>>>>>>> assert "(ops_index >= 0) && (ops_index < RTE_MEMPOOL_MAX_OPS_IDX)"
>>>>>>>>> failed
>>>>>>>>>
>>>>>>>>> Cc: <stable@dpdk.org>
>>>>>>>>> Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
>>>>>>>>> ---
>>>>>>>>>  drivers/net/bonding/rte_eth_bond_pmd.c | 10 ++--------
>>>>>>>>>  1 file changed, 2 insertions(+), 8 deletions(-)
>>>>>>>>>
>>>>>>>>> diff --git a/drivers/net/bonding/rte_eth_bond_pmd.c
>>>>>>>>> b/drivers/net/bonding/rte_eth_bond_pmd.c
>>>>>>>>> index b20a272..eb5b6d1 100644
>>>>>>>>> --- a/drivers/net/bonding/rte_eth_bond_pmd.c
>>>>>>>>> +++ b/drivers/net/bonding/rte_eth_bond_pmd.c
>>>>>>>>> @@ -1305,8 +1305,6 @@ slave_configure(struct rte_eth_dev
>>>>>>>>> *bonded_eth_dev,
>>>>>>>>>         struct bond_rx_queue *bd_rx_q;
>>>>>>>>>         struct bond_tx_queue *bd_tx_q;
>>>>>>>>>
>>>>>>>>> -       uint16_t old_nb_tx_queues = slave_eth_dev->data->nb_tx_queues;
>>>>>>>>> -       uint16_t old_nb_rx_queues = slave_eth_dev->data->nb_rx_queues;
>>>>>>>>>         int errval;
>>>>>>>>>         uint16_t q_id;
>>>>>>>>>
>>>>>>>>> @@ -1347,9 +1345,7 @@ slave_configure(struct rte_eth_dev
>>>>>>>>> *bonded_eth_dev,
>>>>>>>>>         }
>>>>>>>>>
>>>>>>>>>         /* Setup Rx Queues */
>>>>>>>>> -       /* Use existing queues, if any */
>>>>>>>>> -       for (q_id = old_nb_rx_queues;
>>>>>>>>> -            q_id < bonded_eth_dev->data->nb_rx_queues; q_id++) {
>>>>>>>>> +       for (q_id = 0; q_id < bonded_eth_dev->data->nb_rx_queues;
>>>>>>>>> q_id++) {
>>>>>>>>>                 bd_rx_q = (struct bond_rx_queue
>>>>>>>>> *)bonded_eth_dev->data->rx_queues[q_id];
>>>>>>>>>
>>>>>>>>>                 errval =
>>>>>>>>> rte_eth_rx_queue_setup(slave_eth_dev->data->port_id, q_id,
>>>>>>>>> @@ -1365,9 +1361,7 @@ slave_configure(struct rte_eth_dev
>>>>>>>>> *bonded_eth_dev,
>>>>>>>>>         }
>>>>>>>>>
>>>>>>>>>         /* Setup Tx Queues */
>>>>>>>>> -       /* Use existing queues, if any */
>>>>>>>>> -       for (q_id = old_nb_tx_queues;
>>>>>>>>> -            q_id < bonded_eth_dev->data->nb_tx_queues; q_id++) {
>>>>>>>>> +       for (q_id = 0; q_id < bonded_eth_dev->data->nb_tx_queues;
>>>>>>>>> q_id++) {
>>>>>>>>>                 bd_tx_q = (struct bond_tx_queue
>>>>>>>>> *)bonded_eth_dev->data->tx_queues[q_id];
>>>>>>>>>
>>>>>>>>>                 errval =
>>>>>>>>> rte_eth_tx_queue_setup(slave_eth_dev->data->port_id, q_id,
>>>>>>>>> --
>>>>>>>>> 2.7.4
>>>>>>>>>
>>>>>>>>
>>>>>>>> NAK
>>>>>>>>
>>>>>>>> There are still some users of this code.  Let's give them a chance to
>>>>>>>> comment before removing it.
>>>>>>>
>>>>>>>
>>>>>>> Hi Eric,
>>>>>>>
>>>>>>> Are these users in CC-list? If not, could you, please, add them?
>>>>>>> This patch awaits in mail-list already more than a month. I think, it's
>>>>>>> enough
>>>>>>> time period for all who wants to say something. Patch fixes a real bug
>>>>>>> that
>>>>>>> prevent using of DPDK bonding in all applications that reconfigures
>>>>>>> devices
>>>>>>> in runtime including OVS.
>>>>>>>
>>>>>> Agreed.
>>>>>>
>>>>>> Eric, does reverting this patch cause you problems directly, or is your
>>>>>> concern
>>>>>> just with regards to potential impact to others?
>>>>>>
>>>>>> Thanks,
>>>>>> /Bruce
>>>>>
>>>>>
>>>>> This won't impact me directly.  The users are CCed (different thread)
>>>>> and I haven't seen any comment, so I no longer have any objection to
>>>>> reverting this change.
>>>>>
>>>>> Eric
>>>>>
>>>>
>>>> As there has been no further objections and this reinstates the original
>>>> expected behavior of the bonding driver. I'm re-ack'ing for inclusion in
>>>> release.
>>>>
>>>> Acked-by: Declan Doherty <declan.doherty@intel.com>
>>>
>>> Ok, I can revert the revert for us.
>>>
>>> Do I read this correctly that you are not interested in fixing this properly?!
>>>
>>> Thanks,
>>> Jan
>>>
>>
>> Jan, sorry I missed the replies from last week due to the way my mail client
>> was filtering the conversation. Let me have another look at this and I'll
>> come back to the list.
>>
>> Thanks
>> Declan
>
> While this patch has already been applied to dpdk-next-net tree, it
> appears that there is still some ongoing discussion about it. I'm
> therefore planning to pull it back out of the tree for rc2. If a
> subsequent consensus is reached we can see about including it in rc3.
>
> Declan, as maintainer, does this seem reasonable to you.
>
> Regards,
> /Bruce
>


Hey Bruce, that seems reasonable, I would like to discuss this further 
with Jan and Ilya.

Declan
  
Bruce Richardson Oct. 25, 2016, 1:59 p.m. UTC | #19
On Wed, Oct 19, 2016 at 10:55:25AM +0100, Bruce Richardson wrote:
> On Thu, Oct 06, 2016 at 03:32:36PM +0100, Declan Doherty wrote:
> > On 07/09/16 13:28, Ilya Maximets wrote:
> > > This reverts commit 5b7bb2bda5519b7800f814df64d4e015282140e5.
> > > 
> > > It is necessary to reconfigure all queues every time because configuration
> > > can be changed.
> > > 
> > 
> > Hey Ilya, this makes sense. I guess this was my original intention but I
> > missed this case in my review of the change.
> > 
> > > For example, if we're reconfiguring bonding device with new memory pool,
> > > already configured queues will still use the old one. And if the old
> > > mempool be freed, application likely will panic in attempt to use
> > > freed mempool.
> > > 
> > > This happens when we use the bonding device with OVS 2.6 while MTU
> > > reconfiguration:
> > > 
> > > PANIC in rte_mempool_get_ops():
> > > assert "(ops_index >= 0) && (ops_index < RTE_MEMPOOL_MAX_OPS_IDX)" failed
> > > 
> > > Cc: <stable@dpdk.org>
> > > Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
> > > ---
> > 
> > Acked-by: Declan Doherty <declan.doherty@intel.com>
> 
> Applied to dpdk-next-net/rel_16_11
> 
Patch taken out of branch due to on-going discussion in this thread. It
can be re-applied later if consensus is reached that it is ok.

Apologies for any confusion caused by this, but after discussion with
the driver maintainer, we feel this is the safest course for now.

/Bruce
  
Bruce Richardson Oct. 25, 2016, 2 p.m. UTC | #20
On Tue, Oct 25, 2016 at 02:48:04PM +0100, Declan Doherty wrote:
> On 25/10/16 13:57, Bruce Richardson wrote:
> > On Mon, Oct 24, 2016 at 04:07:17PM +0100, Declan Doherty wrote:
> > > On 24/10/16 15:51, Jan Blunck wrote:
> > > > On Mon, Oct 24, 2016 at 7:02 AM, Declan Doherty
> > > > <declan.doherty@intel.com> wrote:
> > > > > On 14/10/16 00:37, Eric Kinzie wrote:
> > > > > > 
> > > > > > On Wed Oct 12 16:24:21 +0100 2016, Bruce Richardson wrote:
> > > > > > > 
> > > > > > > On Wed, Oct 12, 2016 at 04:24:54PM +0300, Ilya Maximets wrote:
> > > > > > > > 
> > > > > > > > On 07.10.2016 05:02, Eric Kinzie wrote:
> > > > > > > > > 
> > > > > > > > > On Wed Sep 07 15:28:10 +0300 2016, Ilya Maximets wrote:
> > > > > > > > > > 
> > > > > > > > > > This reverts commit 5b7bb2bda5519b7800f814df64d4e015282140e5.
> > > > > > > > > > 
> > > > > > > > > > It is necessary to reconfigure all queues every time because
> > > > > > > > > > configuration
> > > > > > > > > > can be changed.
> > > > > > > > > > 
> > > > > > > > > > For example, if we're reconfiguring bonding device with new memory
> > > > > > > > > > pool,
> > > > > > > > > > already configured queues will still use the old one. And if the old
> > > > > > > > > > mempool be freed, application likely will panic in attempt to use
> > > > > > > > > > freed mempool.
> > > > > > > > > > 
> > > > > > > > > > This happens when we use the bonding device with OVS 2.6 while MTU
> > > > > > > > > > reconfiguration:
> > > > > > > > > > 
> > > > > > > > > > PANIC in rte_mempool_get_ops():
> > > > > > > > > > assert "(ops_index >= 0) && (ops_index < RTE_MEMPOOL_MAX_OPS_IDX)"
> > > > > > > > > > failed
> > > > > > > > > > 
> > > > > > > > > > Cc: <stable@dpdk.org>
> > > > > > > > > > Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
> > > > > > > > > > ---
> > > > > > > > > >  drivers/net/bonding/rte_eth_bond_pmd.c | 10 ++--------
> > > > > > > > > >  1 file changed, 2 insertions(+), 8 deletions(-)
> > > > > > > > > > 
> > > > > > > > > > diff --git a/drivers/net/bonding/rte_eth_bond_pmd.c
> > > > > > > > > > b/drivers/net/bonding/rte_eth_bond_pmd.c
> > > > > > > > > > index b20a272..eb5b6d1 100644
> > > > > > > > > > --- a/drivers/net/bonding/rte_eth_bond_pmd.c
> > > > > > > > > > +++ b/drivers/net/bonding/rte_eth_bond_pmd.c
> > > > > > > > > > @@ -1305,8 +1305,6 @@ slave_configure(struct rte_eth_dev
> > > > > > > > > > *bonded_eth_dev,
> > > > > > > > > >         struct bond_rx_queue *bd_rx_q;
> > > > > > > > > >         struct bond_tx_queue *bd_tx_q;
> > > > > > > > > > 
> > > > > > > > > > -       uint16_t old_nb_tx_queues = slave_eth_dev->data->nb_tx_queues;
> > > > > > > > > > -       uint16_t old_nb_rx_queues = slave_eth_dev->data->nb_rx_queues;
> > > > > > > > > >         int errval;
> > > > > > > > > >         uint16_t q_id;
> > > > > > > > > > 
> > > > > > > > > > @@ -1347,9 +1345,7 @@ slave_configure(struct rte_eth_dev
> > > > > > > > > > *bonded_eth_dev,
> > > > > > > > > >         }
> > > > > > > > > > 
> > > > > > > > > >         /* Setup Rx Queues */
> > > > > > > > > > -       /* Use existing queues, if any */
> > > > > > > > > > -       for (q_id = old_nb_rx_queues;
> > > > > > > > > > -            q_id < bonded_eth_dev->data->nb_rx_queues; q_id++) {
> > > > > > > > > > +       for (q_id = 0; q_id < bonded_eth_dev->data->nb_rx_queues;
> > > > > > > > > > q_id++) {
> > > > > > > > > >                 bd_rx_q = (struct bond_rx_queue
> > > > > > > > > > *)bonded_eth_dev->data->rx_queues[q_id];
> > > > > > > > > > 
> > > > > > > > > >                 errval =
> > > > > > > > > > rte_eth_rx_queue_setup(slave_eth_dev->data->port_id, q_id,
> > > > > > > > > > @@ -1365,9 +1361,7 @@ slave_configure(struct rte_eth_dev
> > > > > > > > > > *bonded_eth_dev,
> > > > > > > > > >         }
> > > > > > > > > > 
> > > > > > > > > >         /* Setup Tx Queues */
> > > > > > > > > > -       /* Use existing queues, if any */
> > > > > > > > > > -       for (q_id = old_nb_tx_queues;
> > > > > > > > > > -            q_id < bonded_eth_dev->data->nb_tx_queues; q_id++) {
> > > > > > > > > > +       for (q_id = 0; q_id < bonded_eth_dev->data->nb_tx_queues;
> > > > > > > > > > q_id++) {
> > > > > > > > > >                 bd_tx_q = (struct bond_tx_queue
> > > > > > > > > > *)bonded_eth_dev->data->tx_queues[q_id];
> > > > > > > > > > 
> > > > > > > > > >                 errval =
> > > > > > > > > > rte_eth_tx_queue_setup(slave_eth_dev->data->port_id, q_id,
> > > > > > > > > > --
> > > > > > > > > > 2.7.4
> > > > > > > > > > 
> > > > > > > > > 
> > > > > > > > > NAK
> > > > > > > > > 
> > > > > > > > > There are still some users of this code.  Let's give them a chance to
> > > > > > > > > comment before removing it.
> > > > > > > > 
> > > > > > > > 
> > > > > > > > Hi Eric,
> > > > > > > > 
> > > > > > > > Are these users in CC-list? If not, could you, please, add them?
> > > > > > > > This patch awaits in mail-list already more than a month. I think, it's
> > > > > > > > enough
> > > > > > > > time period for all who wants to say something. Patch fixes a real bug
> > > > > > > > that
> > > > > > > > prevent using of DPDK bonding in all applications that reconfigures
> > > > > > > > devices
> > > > > > > > in runtime including OVS.
> > > > > > > > 
> > > > > > > Agreed.
> > > > > > > 
> > > > > > > Eric, does reverting this patch cause you problems directly, or is your
> > > > > > > concern
> > > > > > > just with regards to potential impact to others?
> > > > > > > 
> > > > > > > Thanks,
> > > > > > > /Bruce
> > > > > > 
> > > > > > 
> > > > > > This won't impact me directly.  The users are CCed (different thread)
> > > > > > and I haven't seen any comment, so I no longer have any objection to
> > > > > > reverting this change.
> > > > > > 
> > > > > > Eric
> > > > > > 
> > > > > 
> > > > > As there has been no further objections and this reinstates the original
> > > > > expected behavior of the bonding driver. I'm re-ack'ing for inclusion in
> > > > > release.
> > > > > 
> > > > > Acked-by: Declan Doherty <declan.doherty@intel.com>
> > > > 
> > > > Ok, I can revert the revert for us.
> > > > 
> > > > Do I read this correctly that you are not interested in fixing this properly?!
> > > > 
> > > > Thanks,
> > > > Jan
> > > > 
> > > 
> > > Jan, sorry I missed the replies from last week due to the way my mail client
> > > was filtering the conversation. Let me have another look at this and I'll
> > > come back to the list.
> > > 
> > > Thanks
> > > Declan
> > 
> > While this patch has already been applied to dpdk-next-net tree, it
> > appears that there is still some ongoing discussion about it. I'm
> > therefore planning to pull it back out of the tree for rc2. If a
> > subsequent consensus is reached we can see about including it in rc3.
> > 
> > Declan, as maintainer, does this seem reasonable to you.
> > 
> > Regards,
> > /Bruce
> > 
> 
> 
> Hey Bruce, that seems reasonable, I would like to discuss this further with
> Jan and Ilya.
> 

Done. Hopefully consensus on a correct solution for this driver can be
reached soon.

Regards,
/Bruce
  
Ilya Maximets Oct. 28, 2016, 6:14 a.m. UTC | #21
On 25.10.2016 09:26, Ilya Maximets wrote:
> On 24.10.2016 17:54, Jan Blunck wrote:
>> On Wed, Oct 19, 2016 at 5:47 AM, Ilya Maximets <i.maximets@samsung.com> wrote:
>>> On 18.10.2016 18:19, Jan Blunck wrote:
>>>> On Tue, Oct 18, 2016 at 2:49 PM, Ilya Maximets <i.maximets@samsung.com> wrote:
>>>>> On 18.10.2016 15:28, Jan Blunck wrote:
>>>>>> If the application already configured queues the PMD should not
>>>>>> silently claim ownership and reset them.
>>>>>>
>>>>>> What exactly is the problem when changing MTU? This works fine from
>>>>>> what I can tell.
>>>>>
>>>>> Following scenario leads to APP PANIC:
>>>>>
>>>>>         1. mempool_1 = rte_mempool_create()
>>>>>         2. rte_eth_rx_queue_setup(bond0, ..., mempool_1);
>>>>>         3. rte_eth_dev_start(bond0);
>>>>>         4. mempool_2 = rte_mempool_create();
>>>>>         5. rte_eth_dev_stop(bond0);
>>>>>         6. rte_eth_rx_queue_setup(bond0, ..., mempool_2);
>>>>>         7. rte_eth_dev_start(bond0);
>>>>>         * RX queues still use 'mempool_1' because reconfiguration doesn't affect them. *
>>>>>         8. rte_mempool_free(mempool_1);
>>>>>         9. On any rx operation we'll get PANIC because of using freed 'mempool_1':
>>>>>          PANIC in rte_mempool_get_ops():
>>>>>          assert "(ops_index >= 0) && (ops_index < RTE_MEMPOOL_MAX_OPS_IDX)" failed
>>>>>
>>>>> You may just start OVS 2.6 with DPDK bonding device and attempt to change MTU via 'mtu_request'.
>>>>> Bug is easily reproducible.
>>>>>
>>>>
>>>> I see. I'm not 100% that this is expected to work without leaking the
>>>> driver's queues though. The driver is allowed to do allocations in
>>>> its rx_queue_setup() function that are being freed via
>>>> rx_queue_release() later. But rx_queue_release() is only called if you
>>>> reconfigure the
>>>> device with 0 queues.
> 
> It's not true. Drivers usually calls 'rx_queue_release()' inside
> 'rx_queue_setup()' function while reallocating the already allocated
> queue. (See ixgbe driver for example). Also all HW queues are
> usually destroyed inside 'eth_dev_stop()' and reallocated in
> 'eth_dev_start()' or '{rx,tx}_queue_setup()'.
> So, there is no leaks at all.
> 
>>>> From what I understand there is no other way to
>>>> reconfigure a device to use another mempool.
>>>>
>>>> But ... even that wouldn't work with the bonding driver right now: the
>>>> bonding master only configures the slaves during startup. I can put
>>>> that on my todo list.
> 
> No, bonding master reconfigures new slaves in 'rte_eth_bond_slave_add()'
> if needed.
> 
>>>> Coming back to your original problem: changing the MTU for the bond
>>>> does work through rte_eth_dev_set_mtu() for slaves supporting that. In
>>>> any other case you could (re-)configure rxmode.max_rx_pkt_len (and
>>>> jumbo_frame / enable_scatter accordingly). This does work without a
>>>> call to rte_eth_rx_queue_setup().
>>>
>>> Thanks for suggestion, but using of rte_eth_dev_set_mtu() without
>>> reconfiguration will require to have mempools with huge mbufs (9KB)
>>> for all ports from the start. This is unacceptable because leads to
>>> significant performance regressions because of fast cache exhausting.
>>> Also this will require big work to rewrite OVS reconfiguration code
>>> this way.
>>> Anyway, it isn't the MTU only problem. Number of rx/tx descriptors
>>> also can't be changed in runtime.
>>>
>>>
>>> I'm not fully understand what is the use case for this 'reusing' code.
>>> Could you, please, describe situation where this behaviour is necessary?
>>
>> The device that is added to the bond was used before and therefore
>> already has allocated queues. Therefore we reuse the existing queues
>> of the devices instead of borrowing the queues of the bond device. If
>> the slave is removed from the bond again there is no need to allocate
>> the queues again.
>>
>> Hope that clarifies the usecase
> 
> So, As I see, this is just an optimization that leads to differently
> configured queues of same device and possible application crash if the
> old configuration doesn't valid any more.
> 
> With revert applied in your usecase while adding the device to bond
> it's queues will be automatically reconfigured according to configuration
> of the bond device. If the slave is removed from the bond all its'
> queues will remain as configured by bond. You can reconfigure them if
> needed. I guess, that in your case configuration of slave devices,
> actually, matches configuration of bond device. In that case slave
> will remain in the same state after removing from bond as it was
> before adding.

So, Jan, Declan, what do you think about this?

Best regards, Ilya Maximets.
  
Ilya Maximets Nov. 11, 2016, 9:16 a.m. UTC | #22
Ping.

On 28.10.2016 09:14, Ilya Maximets wrote:
> On 25.10.2016 09:26, Ilya Maximets wrote:
>> On 24.10.2016 17:54, Jan Blunck wrote:
>>> On Wed, Oct 19, 2016 at 5:47 AM, Ilya Maximets <i.maximets@samsung.com> wrote:
>>>> On 18.10.2016 18:19, Jan Blunck wrote:
>>>>> On Tue, Oct 18, 2016 at 2:49 PM, Ilya Maximets <i.maximets@samsung.com> wrote:
>>>>>> On 18.10.2016 15:28, Jan Blunck wrote:
>>>>>>> If the application already configured queues the PMD should not
>>>>>>> silently claim ownership and reset them.
>>>>>>>
>>>>>>> What exactly is the problem when changing MTU? This works fine from
>>>>>>> what I can tell.
>>>>>>
>>>>>> Following scenario leads to APP PANIC:
>>>>>>
>>>>>>         1. mempool_1 = rte_mempool_create()
>>>>>>         2. rte_eth_rx_queue_setup(bond0, ..., mempool_1);
>>>>>>         3. rte_eth_dev_start(bond0);
>>>>>>         4. mempool_2 = rte_mempool_create();
>>>>>>         5. rte_eth_dev_stop(bond0);
>>>>>>         6. rte_eth_rx_queue_setup(bond0, ..., mempool_2);
>>>>>>         7. rte_eth_dev_start(bond0);
>>>>>>         * RX queues still use 'mempool_1' because reconfiguration doesn't affect them. *
>>>>>>         8. rte_mempool_free(mempool_1);
>>>>>>         9. On any rx operation we'll get PANIC because of using freed 'mempool_1':
>>>>>>          PANIC in rte_mempool_get_ops():
>>>>>>          assert "(ops_index >= 0) && (ops_index < RTE_MEMPOOL_MAX_OPS_IDX)" failed
>>>>>>
>>>>>> You may just start OVS 2.6 with DPDK bonding device and attempt to change MTU via 'mtu_request'.
>>>>>> Bug is easily reproducible.
>>>>>>
>>>>>
>>>>> I see. I'm not 100% that this is expected to work without leaking the
>>>>> driver's queues though. The driver is allowed to do allocations in
>>>>> its rx_queue_setup() function that are being freed via
>>>>> rx_queue_release() later. But rx_queue_release() is only called if you
>>>>> reconfigure the
>>>>> device with 0 queues.
>>
>> It's not true. Drivers usually calls 'rx_queue_release()' inside
>> 'rx_queue_setup()' function while reallocating the already allocated
>> queue. (See ixgbe driver for example). Also all HW queues are
>> usually destroyed inside 'eth_dev_stop()' and reallocated in
>> 'eth_dev_start()' or '{rx,tx}_queue_setup()'.
>> So, there is no leaks at all.
>>
>>>>> From what I understand there is no other way to
>>>>> reconfigure a device to use another mempool.
>>>>>
>>>>> But ... even that wouldn't work with the bonding driver right now: the
>>>>> bonding master only configures the slaves during startup. I can put
>>>>> that on my todo list.
>>
>> No, bonding master reconfigures new slaves in 'rte_eth_bond_slave_add()'
>> if needed.
>>
>>>>> Coming back to your original problem: changing the MTU for the bond
>>>>> does work through rte_eth_dev_set_mtu() for slaves supporting that. In
>>>>> any other case you could (re-)configure rxmode.max_rx_pkt_len (and
>>>>> jumbo_frame / enable_scatter accordingly). This does work without a
>>>>> call to rte_eth_rx_queue_setup().
>>>>
>>>> Thanks for suggestion, but using of rte_eth_dev_set_mtu() without
>>>> reconfiguration will require to have mempools with huge mbufs (9KB)
>>>> for all ports from the start. This is unacceptable because leads to
>>>> significant performance regressions because of fast cache exhausting.
>>>> Also this will require big work to rewrite OVS reconfiguration code
>>>> this way.
>>>> Anyway, it isn't the MTU only problem. Number of rx/tx descriptors
>>>> also can't be changed in runtime.
>>>>
>>>>
>>>> I'm not fully understand what is the use case for this 'reusing' code.
>>>> Could you, please, describe situation where this behaviour is necessary?
>>>
>>> The device that is added to the bond was used before and therefore
>>> already has allocated queues. Therefore we reuse the existing queues
>>> of the devices instead of borrowing the queues of the bond device. If
>>> the slave is removed from the bond again there is no need to allocate
>>> the queues again.
>>>
>>> Hope that clarifies the usecase
>>
>> So, As I see, this is just an optimization that leads to differently
>> configured queues of same device and possible application crash if the
>> old configuration doesn't valid any more.
>>
>> With revert applied in your usecase while adding the device to bond
>> it's queues will be automatically reconfigured according to configuration
>> of the bond device. If the slave is removed from the bond all its'
>> queues will remain as configured by bond. You can reconfigure them if
>> needed. I guess, that in your case configuration of slave devices,
>> actually, matches configuration of bond device. In that case slave
>> will remain in the same state after removing from bond as it was
>> before adding.
> 
> So, Jan, Declan, what do you think about this?
> 
> Best regards, Ilya Maximets.
>
  
Ferruh Yigit Nov. 21, 2016, 11:30 a.m. UTC | #23
On 10/25/2016 3:00 PM, Bruce Richardson wrote:
> On Tue, Oct 25, 2016 at 02:48:04PM +0100, Declan Doherty wrote:
>> On 25/10/16 13:57, Bruce Richardson wrote:
>>> On Mon, Oct 24, 2016 at 04:07:17PM +0100, Declan Doherty wrote:
>>>> On 24/10/16 15:51, Jan Blunck wrote:
>>>>> On Mon, Oct 24, 2016 at 7:02 AM, Declan Doherty
>>>>> <declan.doherty@intel.com> wrote:
>>>>>> On 14/10/16 00:37, Eric Kinzie wrote:
>>>>>>>
>>>>>>> On Wed Oct 12 16:24:21 +0100 2016, Bruce Richardson wrote:
>>>>>>>>
>>>>>>>> On Wed, Oct 12, 2016 at 04:24:54PM +0300, Ilya Maximets wrote:
>>>>>>>>>
>>>>>>>>> On 07.10.2016 05:02, Eric Kinzie wrote:
>>>>>>>>>>
>>>>>>>>>> On Wed Sep 07 15:28:10 +0300 2016, Ilya Maximets wrote:
>>>>>>>>>>>
>>>>>>>>>>> This reverts commit 5b7bb2bda5519b7800f814df64d4e015282140e5.
>>>>>>>>>>>
>>>>>>>>>>> It is necessary to reconfigure all queues every time because
>>>>>>>>>>> configuration
>>>>>>>>>>> can be changed.
>>>>>>>>>>>
>>>>>>>>>>> For example, if we're reconfiguring bonding device with new memory
>>>>>>>>>>> pool,
>>>>>>>>>>> already configured queues will still use the old one. And if the old
>>>>>>>>>>> mempool be freed, application likely will panic in attempt to use
>>>>>>>>>>> freed mempool.
>>>>>>>>>>>
>>>>>>>>>>> This happens when we use the bonding device with OVS 2.6 while MTU
>>>>>>>>>>> reconfiguration:
>>>>>>>>>>>
>>>>>>>>>>> PANIC in rte_mempool_get_ops():
>>>>>>>>>>> assert "(ops_index >= 0) && (ops_index < RTE_MEMPOOL_MAX_OPS_IDX)"
>>>>>>>>>>> failed
>>>>>>>>>>>
>>>>>>>>>>> Cc: <stable@dpdk.org>
>>>>>>>>>>> Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
>>>>>>>>>>> ---
>>>>>>>>>>>  drivers/net/bonding/rte_eth_bond_pmd.c | 10 ++--------
>>>>>>>>>>>  1 file changed, 2 insertions(+), 8 deletions(-)
>>>>>>>>>>>
>>>>>>>>>>> diff --git a/drivers/net/bonding/rte_eth_bond_pmd.c
>>>>>>>>>>> b/drivers/net/bonding/rte_eth_bond_pmd.c
>>>>>>>>>>> index b20a272..eb5b6d1 100644
>>>>>>>>>>> --- a/drivers/net/bonding/rte_eth_bond_pmd.c
>>>>>>>>>>> +++ b/drivers/net/bonding/rte_eth_bond_pmd.c
>>>>>>>>>>> @@ -1305,8 +1305,6 @@ slave_configure(struct rte_eth_dev
>>>>>>>>>>> *bonded_eth_dev,
>>>>>>>>>>>         struct bond_rx_queue *bd_rx_q;
>>>>>>>>>>>         struct bond_tx_queue *bd_tx_q;
>>>>>>>>>>>
>>>>>>>>>>> -       uint16_t old_nb_tx_queues = slave_eth_dev->data->nb_tx_queues;
>>>>>>>>>>> -       uint16_t old_nb_rx_queues = slave_eth_dev->data->nb_rx_queues;
>>>>>>>>>>>         int errval;
>>>>>>>>>>>         uint16_t q_id;
>>>>>>>>>>>
>>>>>>>>>>> @@ -1347,9 +1345,7 @@ slave_configure(struct rte_eth_dev
>>>>>>>>>>> *bonded_eth_dev,
>>>>>>>>>>>         }
>>>>>>>>>>>
>>>>>>>>>>>         /* Setup Rx Queues */
>>>>>>>>>>> -       /* Use existing queues, if any */
>>>>>>>>>>> -       for (q_id = old_nb_rx_queues;
>>>>>>>>>>> -            q_id < bonded_eth_dev->data->nb_rx_queues; q_id++) {
>>>>>>>>>>> +       for (q_id = 0; q_id < bonded_eth_dev->data->nb_rx_queues;
>>>>>>>>>>> q_id++) {
>>>>>>>>>>>                 bd_rx_q = (struct bond_rx_queue
>>>>>>>>>>> *)bonded_eth_dev->data->rx_queues[q_id];
>>>>>>>>>>>
>>>>>>>>>>>                 errval =
>>>>>>>>>>> rte_eth_rx_queue_setup(slave_eth_dev->data->port_id, q_id,
>>>>>>>>>>> @@ -1365,9 +1361,7 @@ slave_configure(struct rte_eth_dev
>>>>>>>>>>> *bonded_eth_dev,
>>>>>>>>>>>         }
>>>>>>>>>>>
>>>>>>>>>>>         /* Setup Tx Queues */
>>>>>>>>>>> -       /* Use existing queues, if any */
>>>>>>>>>>> -       for (q_id = old_nb_tx_queues;
>>>>>>>>>>> -            q_id < bonded_eth_dev->data->nb_tx_queues; q_id++) {
>>>>>>>>>>> +       for (q_id = 0; q_id < bonded_eth_dev->data->nb_tx_queues;
>>>>>>>>>>> q_id++) {
>>>>>>>>>>>                 bd_tx_q = (struct bond_tx_queue
>>>>>>>>>>> *)bonded_eth_dev->data->tx_queues[q_id];
>>>>>>>>>>>
>>>>>>>>>>>                 errval =
>>>>>>>>>>> rte_eth_tx_queue_setup(slave_eth_dev->data->port_id, q_id,
>>>>>>>>>>> --
>>>>>>>>>>> 2.7.4
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> NAK
>>>>>>>>>>
>>>>>>>>>> There are still some users of this code.  Let's give them a chance to
>>>>>>>>>> comment before removing it.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Hi Eric,
>>>>>>>>>
>>>>>>>>> Are these users in CC-list? If not, could you, please, add them?
>>>>>>>>> This patch awaits in mail-list already more than a month. I think, it's
>>>>>>>>> enough
>>>>>>>>> time period for all who wants to say something. Patch fixes a real bug
>>>>>>>>> that
>>>>>>>>> prevent using of DPDK bonding in all applications that reconfigures
>>>>>>>>> devices
>>>>>>>>> in runtime including OVS.
>>>>>>>>>
>>>>>>>> Agreed.
>>>>>>>>
>>>>>>>> Eric, does reverting this patch cause you problems directly, or is your
>>>>>>>> concern
>>>>>>>> just with regards to potential impact to others?
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> /Bruce
>>>>>>>
>>>>>>>
>>>>>>> This won't impact me directly.  The users are CCed (different thread)
>>>>>>> and I haven't seen any comment, so I no longer have any objection to
>>>>>>> reverting this change.
>>>>>>>
>>>>>>> Eric
>>>>>>>
>>>>>>
>>>>>> As there has been no further objections and this reinstates the original
>>>>>> expected behavior of the bonding driver. I'm re-ack'ing for inclusion in
>>>>>> release.
>>>>>>
>>>>>> Acked-by: Declan Doherty <declan.doherty@intel.com>
>>>>>
>>>>> Ok, I can revert the revert for us.
>>>>>
>>>>> Do I read this correctly that you are not interested in fixing this properly?!
>>>>>
>>>>> Thanks,
>>>>> Jan
>>>>>
>>>>
>>>> Jan, sorry I missed the replies from last week due to the way my mail client
>>>> was filtering the conversation. Let me have another look at this and I'll
>>>> come back to the list.
>>>>
>>>> Thanks
>>>> Declan
>>>
>>> While this patch has already been applied to dpdk-next-net tree, it
>>> appears that there is still some ongoing discussion about it. I'm
>>> therefore planning to pull it back out of the tree for rc2. If a
>>> subsequent consensus is reached we can see about including it in rc3.
>>>
>>> Declan, as maintainer, does this seem reasonable to you.
>>>
>>> Regards,
>>> /Bruce
>>>
>>
>>
>> Hey Bruce, that seems reasonable, I would like to discuss this further with
>> Jan and Ilya.
>>
> 
> Done. Hopefully consensus on a correct solution for this driver can be
> reached soon.
> 

Is there an update for this patch? Is a consensus reached?

Thanks,
ferruh
  
Jan Blunck Nov. 21, 2016, 11:39 a.m. UTC | #24
Ferruh,

I've been working on a patch but was distracted by other stuff and
therefore haven't tested it yet.

Stay tuned,
Jan

On Mon, Nov 21, 2016 at 12:30 PM, Ferruh Yigit <ferruh.yigit@intel.com> wrote:
> On 10/25/2016 3:00 PM, Bruce Richardson wrote:
>> On Tue, Oct 25, 2016 at 02:48:04PM +0100, Declan Doherty wrote:
>>> On 25/10/16 13:57, Bruce Richardson wrote:
>>>> On Mon, Oct 24, 2016 at 04:07:17PM +0100, Declan Doherty wrote:
>>>>> On 24/10/16 15:51, Jan Blunck wrote:
>>>>>> On Mon, Oct 24, 2016 at 7:02 AM, Declan Doherty
>>>>>> <declan.doherty@intel.com> wrote:
>>>>>>> On 14/10/16 00:37, Eric Kinzie wrote:
>>>>>>>>
>>>>>>>> On Wed Oct 12 16:24:21 +0100 2016, Bruce Richardson wrote:
>>>>>>>>>
>>>>>>>>> On Wed, Oct 12, 2016 at 04:24:54PM +0300, Ilya Maximets wrote:
>>>>>>>>>>
>>>>>>>>>> On 07.10.2016 05:02, Eric Kinzie wrote:
>>>>>>>>>>>
>>>>>>>>>>> On Wed Sep 07 15:28:10 +0300 2016, Ilya Maximets wrote:
>>>>>>>>>>>>
>>>>>>>>>>>> This reverts commit 5b7bb2bda5519b7800f814df64d4e015282140e5.
>>>>>>>>>>>>
>>>>>>>>>>>> It is necessary to reconfigure all queues every time because
>>>>>>>>>>>> configuration
>>>>>>>>>>>> can be changed.
>>>>>>>>>>>>
>>>>>>>>>>>> For example, if we're reconfiguring bonding device with new memory
>>>>>>>>>>>> pool,
>>>>>>>>>>>> already configured queues will still use the old one. And if the old
>>>>>>>>>>>> mempool be freed, application likely will panic in attempt to use
>>>>>>>>>>>> freed mempool.
>>>>>>>>>>>>
>>>>>>>>>>>> This happens when we use the bonding device with OVS 2.6 while MTU
>>>>>>>>>>>> reconfiguration:
>>>>>>>>>>>>
>>>>>>>>>>>> PANIC in rte_mempool_get_ops():
>>>>>>>>>>>> assert "(ops_index >= 0) && (ops_index < RTE_MEMPOOL_MAX_OPS_IDX)"
>>>>>>>>>>>> failed
>>>>>>>>>>>>
>>>>>>>>>>>> Cc: <stable@dpdk.org>
>>>>>>>>>>>> Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
>>>>>>>>>>>> ---
>>>>>>>>>>>>  drivers/net/bonding/rte_eth_bond_pmd.c | 10 ++--------
>>>>>>>>>>>>  1 file changed, 2 insertions(+), 8 deletions(-)
>>>>>>>>>>>>
>>>>>>>>>>>> diff --git a/drivers/net/bonding/rte_eth_bond_pmd.c
>>>>>>>>>>>> b/drivers/net/bonding/rte_eth_bond_pmd.c
>>>>>>>>>>>> index b20a272..eb5b6d1 100644
>>>>>>>>>>>> --- a/drivers/net/bonding/rte_eth_bond_pmd.c
>>>>>>>>>>>> +++ b/drivers/net/bonding/rte_eth_bond_pmd.c
>>>>>>>>>>>> @@ -1305,8 +1305,6 @@ slave_configure(struct rte_eth_dev
>>>>>>>>>>>> *bonded_eth_dev,
>>>>>>>>>>>>         struct bond_rx_queue *bd_rx_q;
>>>>>>>>>>>>         struct bond_tx_queue *bd_tx_q;
>>>>>>>>>>>>
>>>>>>>>>>>> -       uint16_t old_nb_tx_queues = slave_eth_dev->data->nb_tx_queues;
>>>>>>>>>>>> -       uint16_t old_nb_rx_queues = slave_eth_dev->data->nb_rx_queues;
>>>>>>>>>>>>         int errval;
>>>>>>>>>>>>         uint16_t q_id;
>>>>>>>>>>>>
>>>>>>>>>>>> @@ -1347,9 +1345,7 @@ slave_configure(struct rte_eth_dev
>>>>>>>>>>>> *bonded_eth_dev,
>>>>>>>>>>>>         }
>>>>>>>>>>>>
>>>>>>>>>>>>         /* Setup Rx Queues */
>>>>>>>>>>>> -       /* Use existing queues, if any */
>>>>>>>>>>>> -       for (q_id = old_nb_rx_queues;
>>>>>>>>>>>> -            q_id < bonded_eth_dev->data->nb_rx_queues; q_id++) {
>>>>>>>>>>>> +       for (q_id = 0; q_id < bonded_eth_dev->data->nb_rx_queues;
>>>>>>>>>>>> q_id++) {
>>>>>>>>>>>>                 bd_rx_q = (struct bond_rx_queue
>>>>>>>>>>>> *)bonded_eth_dev->data->rx_queues[q_id];
>>>>>>>>>>>>
>>>>>>>>>>>>                 errval =
>>>>>>>>>>>> rte_eth_rx_queue_setup(slave_eth_dev->data->port_id, q_id,
>>>>>>>>>>>> @@ -1365,9 +1361,7 @@ slave_configure(struct rte_eth_dev
>>>>>>>>>>>> *bonded_eth_dev,
>>>>>>>>>>>>         }
>>>>>>>>>>>>
>>>>>>>>>>>>         /* Setup Tx Queues */
>>>>>>>>>>>> -       /* Use existing queues, if any */
>>>>>>>>>>>> -       for (q_id = old_nb_tx_queues;
>>>>>>>>>>>> -            q_id < bonded_eth_dev->data->nb_tx_queues; q_id++) {
>>>>>>>>>>>> +       for (q_id = 0; q_id < bonded_eth_dev->data->nb_tx_queues;
>>>>>>>>>>>> q_id++) {
>>>>>>>>>>>>                 bd_tx_q = (struct bond_tx_queue
>>>>>>>>>>>> *)bonded_eth_dev->data->tx_queues[q_id];
>>>>>>>>>>>>
>>>>>>>>>>>>                 errval =
>>>>>>>>>>>> rte_eth_tx_queue_setup(slave_eth_dev->data->port_id, q_id,
>>>>>>>>>>>> --
>>>>>>>>>>>> 2.7.4
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> NAK
>>>>>>>>>>>
>>>>>>>>>>> There are still some users of this code.  Let's give them a chance to
>>>>>>>>>>> comment before removing it.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Hi Eric,
>>>>>>>>>>
>>>>>>>>>> Are these users in CC-list? If not, could you, please, add them?
>>>>>>>>>> This patch awaits in mail-list already more than a month. I think, it's
>>>>>>>>>> enough
>>>>>>>>>> time period for all who wants to say something. Patch fixes a real bug
>>>>>>>>>> that
>>>>>>>>>> prevent using of DPDK bonding in all applications that reconfigures
>>>>>>>>>> devices
>>>>>>>>>> in runtime including OVS.
>>>>>>>>>>
>>>>>>>>> Agreed.
>>>>>>>>>
>>>>>>>>> Eric, does reverting this patch cause you problems directly, or is your
>>>>>>>>> concern
>>>>>>>>> just with regards to potential impact to others?
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> /Bruce
>>>>>>>>
>>>>>>>>
>>>>>>>> This won't impact me directly.  The users are CCed (different thread)
>>>>>>>> and I haven't seen any comment, so I no longer have any objection to
>>>>>>>> reverting this change.
>>>>>>>>
>>>>>>>> Eric
>>>>>>>>
>>>>>>>
>>>>>>> As there has been no further objections and this reinstates the original
>>>>>>> expected behavior of the bonding driver. I'm re-ack'ing for inclusion in
>>>>>>> release.
>>>>>>>
>>>>>>> Acked-by: Declan Doherty <declan.doherty@intel.com>
>>>>>>
>>>>>> Ok, I can revert the revert for us.
>>>>>>
>>>>>> Do I read this correctly that you are not interested in fixing this properly?!
>>>>>>
>>>>>> Thanks,
>>>>>> Jan
>>>>>>
>>>>>
>>>>> Jan, sorry I missed the replies from last week due to the way my mail client
>>>>> was filtering the conversation. Let me have another look at this and I'll
>>>>> come back to the list.
>>>>>
>>>>> Thanks
>>>>> Declan
>>>>
>>>> While this patch has already been applied to dpdk-next-net tree, it
>>>> appears that there is still some ongoing discussion about it. I'm
>>>> therefore planning to pull it back out of the tree for rc2. If a
>>>> subsequent consensus is reached we can see about including it in rc3.
>>>>
>>>> Declan, as maintainer, does this seem reasonable to you.
>>>>
>>>> Regards,
>>>> /Bruce
>>>>
>>>
>>>
>>> Hey Bruce, that seems reasonable, I would like to discuss this further with
>>> Jan and Ilya.
>>>
>>
>> Done. Hopefully consensus on a correct solution for this driver can be
>> reached soon.
>>
>
> Is there an update for this patch? Is a consensus reached?
>
> Thanks,
> ferruh
>
  
Ilya Maximets Nov. 21, 2016, 12:49 p.m. UTC | #25
On 21.11.2016 14:39, Jan Blunck wrote:
> Ferruh,
> 
> I've been working on a patch but was distracted by other stuff and
> therefore haven't tested it yet.

Jan, on what patch are you working? I don't understand.

> 
> Stay tuned,
> Jan
> 
> On Mon, Nov 21, 2016 at 12:30 PM, Ferruh Yigit <ferruh.yigit@intel.com> wrote:
>>
>> Is there an update for this patch? Is a consensus reached?

Ferruh, I didn't receive any response on my e-mails. I've pinged this thread
few times, but didn't receive any feedback and I have no idea what patch Jan
talking about.

Best regards, Ilya Maximets.
  
Ilya Maximets Nov. 21, 2016, 1:11 p.m. UTC | #26
To be clear, I still insist on applying this path as is.

Best regards, Ilya Maximets.

On 21.11.2016 15:49, Ilya Maximets wrote:
> On 21.11.2016 14:39, Jan Blunck wrote:
>> Ferruh,
>>
>> I've been working on a patch but was distracted by other stuff and
>> therefore haven't tested it yet.
> 
> Jan, on what patch are you working? I don't understand.
> 
>>
>> Stay tuned,
>> Jan
>>
>> On Mon, Nov 21, 2016 at 12:30 PM, Ferruh Yigit <ferruh.yigit@intel.com> wrote:
>>>
>>> Is there an update for this patch? Is a consensus reached?
> 
> Ferruh, I didn't receive any response on my e-mails. I've pinged this thread
> few times, but didn't receive any feedback and I have no idea what patch Jan
> talking about.
> 
> Best regards, Ilya Maximets.
>
  
Jan Blunck Nov. 23, 2016, 8:35 p.m. UTC | #27
On Mon, Nov 21, 2016 at 2:11 PM, Ilya Maximets <i.maximets@samsung.com> wrote:
> To be clear, I still insist on applying this path as is.
>

That is very kind of you. Please see the patches that I've send in
reply to this thread which are required additionally to the revert you
send.

Happy Thanksgiving,
Jan


> Best regards, Ilya Maximets.
>
> On 21.11.2016 15:49, Ilya Maximets wrote:
>> On 21.11.2016 14:39, Jan Blunck wrote:
>>> Ferruh,
>>>
>>> I've been working on a patch but was distracted by other stuff and
>>> therefore haven't tested it yet.
>>
>> Jan, on what patch are you working? I don't understand.
>>
>>>
>>> Stay tuned,
>>> Jan
>>>
>>> On Mon, Nov 21, 2016 at 12:30 PM, Ferruh Yigit <ferruh.yigit@intel.com> wrote:
>>>>
>>>> Is there an update for this patch? Is a consensus reached?
>>
>> Ferruh, I didn't receive any response on my e-mails. I've pinged this thread
>> few times, but didn't receive any feedback and I have no idea what patch Jan
>> talking about.
>>
>> Best regards, Ilya Maximets.
>>
  

Patch

diff --git a/drivers/net/bonding/rte_eth_bond_pmd.c b/drivers/net/bonding/rte_eth_bond_pmd.c
index b20a272..eb5b6d1 100644
--- a/drivers/net/bonding/rte_eth_bond_pmd.c
+++ b/drivers/net/bonding/rte_eth_bond_pmd.c
@@ -1305,8 +1305,6 @@  slave_configure(struct rte_eth_dev *bonded_eth_dev,
 	struct bond_rx_queue *bd_rx_q;
 	struct bond_tx_queue *bd_tx_q;
 
-	uint16_t old_nb_tx_queues = slave_eth_dev->data->nb_tx_queues;
-	uint16_t old_nb_rx_queues = slave_eth_dev->data->nb_rx_queues;
 	int errval;
 	uint16_t q_id;
 
@@ -1347,9 +1345,7 @@  slave_configure(struct rte_eth_dev *bonded_eth_dev,
 	}
 
 	/* Setup Rx Queues */
-	/* Use existing queues, if any */
-	for (q_id = old_nb_rx_queues;
-	     q_id < bonded_eth_dev->data->nb_rx_queues; q_id++) {
+	for (q_id = 0; q_id < bonded_eth_dev->data->nb_rx_queues; q_id++) {
 		bd_rx_q = (struct bond_rx_queue *)bonded_eth_dev->data->rx_queues[q_id];
 
 		errval = rte_eth_rx_queue_setup(slave_eth_dev->data->port_id, q_id,
@@ -1365,9 +1361,7 @@  slave_configure(struct rte_eth_dev *bonded_eth_dev,
 	}
 
 	/* Setup Tx Queues */
-	/* Use existing queues, if any */
-	for (q_id = old_nb_tx_queues;
-	     q_id < bonded_eth_dev->data->nb_tx_queues; q_id++) {
+	for (q_id = 0; q_id < bonded_eth_dev->data->nb_tx_queues; q_id++) {
 		bd_tx_q = (struct bond_tx_queue *)bonded_eth_dev->data->tx_queues[q_id];
 
 		errval = rte_eth_tx_queue_setup(slave_eth_dev->data->port_id, q_id,