[dpdk-dev,v2,1/1] mempool: Add sanity check when secondary link in less mempools than primary

Message ID 20161028183705.GA6907@labs.hpe.com (mailing list archive)
State Superseded, archived
Headers

Commit Message

Jean Tourrilhes Oct. 28, 2016, 6:37 p.m. UTC
  If the mempool ops the caller wants to use is not registered, the
library will segfault in an obscure way when trying to use that
mempool. It's better to catch it early and warn the user.

If the primary and secondary process were build using different build
systems, the list of constructors included by the linker in each
binary might be different. Mempools are registered via constructors, so
the linker magic will directly impact which tailqs are registered with
the primary and the secondary.
DPDK currently assumes that the secondary has a superset of the
mempools registered at the primary, and they are in the same order
(same index in primary and secondary). In some build scenario, the
secondary might not initialise any mempools at all.

This would also catch cases where there is a bug in the mempool
registration, or some memory corruptions, but this has not been
observed.

Signed-off-by: Jean Tourrilhes <jt@labs.hpe.com>
---
 lib/librte_mempool/rte_mempool.c | 19 +++++++++++++++++++
 1 file changed, 19 insertions(+)
  

Comments

Olivier Matz Nov. 8, 2016, 1:59 p.m. UTC | #1
Hello Jean,

On 10/28/2016 08:37 PM, Jean Tourrilhes wrote:
> If the mempool ops the caller wants to use is not registered, the
> library will segfault in an obscure way when trying to use that
> mempool. It's better to catch it early and warn the user.
> 
> If the primary and secondary process were build using different build
> systems, the list of constructors included by the linker in each
> binary might be different. Mempools are registered via constructors, so
> the linker magic will directly impact which tailqs are registered with
> the primary and the secondary.
> DPDK currently assumes that the secondary has a superset of the
> mempools registered at the primary, and they are in the same order
> (same index in primary and secondary). In some build scenario, the
> secondary might not initialise any mempools at all.
> 
> This would also catch cases where there is a bug in the mempool
> registration, or some memory corruptions, but this has not been
> observed.

I still don't get how you can have different constructors in your
primary and secondary. As I said in my previous answer, my
understanding of how secondary process works in dpdk is that the
binaries have to be synchronized.

Your are just talking about linker magic but you don't explain how
to reproduce your issue easily. Please, can you provide a simple
example (let's say one .c and one Makefile) plus a simple screenshot
that highlights the issue?

We'll then check if this issue should be fixed in mempool or if
we should provide helpers in the build system to avoid this situation.

> 
> Signed-off-by: Jean Tourrilhes <jt@labs.hpe.com>
> ---
>  lib/librte_mempool/rte_mempool.c | 19 +++++++++++++++++++
>  1 file changed, 19 insertions(+)
> 
> diff --git a/lib/librte_mempool/rte_mempool.c b/lib/librte_mempool/rte_mempool.c
> index 2e28e2e..82260cc 100644
> --- a/lib/librte_mempool/rte_mempool.c
> +++ b/lib/librte_mempool/rte_mempool.c
> @@ -1275,6 +1275,25 @@ rte_mempool_lookup(const char *name)
>  		return NULL;
>  	}
>  
> +	/* Sanity check : secondary may have initialised less mempools
> +	 * than primary due to linker and constructor magic. Or maybe
> +	 * there is a mempool corruption or bug. In any case, we can't
> +	 * go on, we will segfault in an obscure way.
> +	 * This does not detect the case where the constructor order
> +	 * is different between primary and secondary and where the
> +	 * index points to the wrong ops. This would require more
> +	 * extensive changes, and is much less likely.
> +	 * Jean II */
> +	if(mp->ops_index >= (int32_t) rte_mempool_ops_table.num_ops) {
> +		unsigned i;
> +		/* Dump list of mempool ops for further investigation. */
> +		for (i = 0; i < rte_mempool_ops_table.num_ops; i++) {
> +			RTE_LOG(ERR, EAL, "Registered mempool[%d] is %s\n", i, rte_mempool_ops_table.ops[i].name);
> +		}
> +		/* Do not dump mempool list itself, it will segfault. */
> +		rte_panic("Cannot find ops for mempool, ops_index %d, num_ops %d - maybe due to build process or linker configuration\n", mp->ops_index, rte_mempool_ops_table.num_ops);
> +	}
> +

Also, please use checkpatch to ensure it matches the style.
See
http://dpdk.org/doc/guides/contributing/patches.html#checking-the-patches

I don't feel signing your comments is absolutely required. In addition
it does not give a lot of information about who wrote it, given the
large number of Jean II: https://fr.wikipedia.org/wiki/Jean_II

Regards,
Olivier
  
Jean Tourrilhes Nov. 18, 2016, 4:39 p.m. UTC | #2
On Fri, Nov 18, 2016 at 04:11:12PM +0100, Olivier Matz wrote:
> Hi Jean,
> 
> 
> Do you mind if we put back this conversation on the ML?

	Oh, I forgot to do it ? I intended to. Bummer. Please do so.

> I think your example shows that there is no linker magic: you just
> need the same linker flags for dpdk libraries than in the dpdk
> framework. I suppose we need something in the build framework
> to provide these flags externally,

	Good luck integrating that in all foreign build system (I'm
looking at you, Snort).

> but I don't think we need to patch mempool for that.

	This sanity check is just that, a sanity check. I don't
understand what's bad about a sanity check, it does not change
functionality, it does not fix anything and just warn users about
those issues.
	Please look at the patch itself at face value.

> Regards,
> Olivier

	Regards,

	Jean
  

Patch

diff --git a/lib/librte_mempool/rte_mempool.c b/lib/librte_mempool/rte_mempool.c
index 2e28e2e..82260cc 100644
--- a/lib/librte_mempool/rte_mempool.c
+++ b/lib/librte_mempool/rte_mempool.c
@@ -1275,6 +1275,25 @@  rte_mempool_lookup(const char *name)
 		return NULL;
 	}
 
+	/* Sanity check : secondary may have initialised less mempools
+	 * than primary due to linker and constructor magic. Or maybe
+	 * there is a mempool corruption or bug. In any case, we can't
+	 * go on, we will segfault in an obscure way.
+	 * This does not detect the case where the constructor order
+	 * is different between primary and secondary and where the
+	 * index points to the wrong ops. This would require more
+	 * extensive changes, and is much less likely.
+	 * Jean II */
+	if(mp->ops_index >= (int32_t) rte_mempool_ops_table.num_ops) {
+		unsigned i;
+		/* Dump list of mempool ops for further investigation. */
+		for (i = 0; i < rte_mempool_ops_table.num_ops; i++) {
+			RTE_LOG(ERR, EAL, "Registered mempool[%d] is %s\n", i, rte_mempool_ops_table.ops[i].name);
+		}
+		/* Do not dump mempool list itself, it will segfault. */
+		rte_panic("Cannot find ops for mempool, ops_index %d, num_ops %d - maybe due to build process or linker configuration\n", mp->ops_index, rte_mempool_ops_table.num_ops);
+	}
+
 	return mp;
 }