[dpdk-dev] eal: add function to check if primary proc alive

Message ID 1453296322-1210-1-git-send-email-harry.van.haaren@intel.com (mailing list archive)
State Superseded, archived
Headers

Commit Message

Van Haaren, Harry Jan. 20, 2016, 1:25 p.m. UTC
  This patch adds a new function to the EAL API:
int rte_eal_primary_proc_alive(const char *path);

The function indicates if a primary process is alive right now.
This functionality is implemented by testing for a write-
lock on the config file, and the function tests for a lock.

The use case for this functionality is that a secondary
process can wait until a primary process starts by polling
the function and waiting. When the primary is running, the
secondary continues to poll to detect if the primary process
has quit unexpectedly, the secondary process can detect this.

The RTE_MAGIC number is written to the shared config by the
primary process, this is the signal to the secondary process
that the EAL is set up, and ready to be used. The function
rte_eal_mcfg_complete() writes RTE_MAGIC. This has been
delayed in the EAL init proceedure, as the PCI probing in
the primary process can interfere with the secondary running.

Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com>
---
 doc/guides/rel_notes/release_2_3.rst            |  7 +++++++
 lib/librte_eal/bsdapp/eal/rte_eal_version.map   |  8 ++++++++
 lib/librte_eal/common/include/rte_eal.h         | 19 +++++++++++++++++++
 lib/librte_eal/linuxapp/eal/eal.c               | 18 ++++++++++++++++--
 lib/librte_eal/linuxapp/eal/rte_eal_version.map |  7 +++++++
 5 files changed, 57 insertions(+), 2 deletions(-)
  

Comments

Michael Qiu Jan. 21, 2016, 6:14 a.m. UTC | #1
On 1/20/2016 9:26 PM, Harry van Haaren wrote:
> This patch adds a new function to the EAL API:
> int rte_eal_primary_proc_alive(const char *path);
>
> The function indicates if a primary process is alive right now.
> This functionality is implemented by testing for a write-
> lock on the config file, and the function tests for a lock.
>
> The use case for this functionality is that a secondary
> process can wait until a primary process starts by polling
> the function and waiting. When the primary is running, the
> secondary continues to poll to detect if the primary process
> has quit unexpectedly, the secondary process can detect this.
>
> The RTE_MAGIC number is written to the shared config by the
> primary process, this is the signal to the secondary process
> that the EAL is set up, and ready to be used. The function
> rte_eal_mcfg_complete() writes RTE_MAGIC. This has been
> delayed in the EAL init proceedure, as the PCI probing in
> the primary process can interfere with the secondary running.
>
> Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com>
> ---

one question:

As we could start up many primaries, how does your secondary process
work with them?

Thanks,
Michael
  
Matthew Hall Jan. 21, 2016, 6:19 a.m. UTC | #2
On 1/20/16 10:14 PM, Qiu, Michael wrote:
> As we could start up many primaries, how does your secondary process
> work with them?

I just worked on this tonight myself. When doing > 1 primary (for 
example pktgen and app), I had to specify:

--no-shconf
--file-prefix pktgen
--file-prefix app

Or you get a panic and RTE fails to init, but the file-prefix seems to 
get applied both to the hugepage mmap() files and also to the lockfiles 
in /var/run:

$ ls -a /var/run | egrep -i '^\.'
.
..
.pktgen_hugepage_info
.rte_config
.rte_hugepage_info
.sdn_sensor_hugepage_info

So I think you have to keep the different primary-secondary sets 
separate using --file-prefix .

Matthew.
  
Van Haaren, Harry Jan. 21, 2016, 9:02 a.m. UTC | #3
> From: Qiu, Michael
> Sent: Thursday, January 21, 2016 6:14 AM
> To: Van Haaren, Harry <harry.van.haaren@intel.com>; david.marchand@6wind.com
> Cc: dev@dpdk.org
> Subject: Re: [dpdk-dev] [PATCH] eal: add function to check if primary proc alive
> <snip>
> As we could start up many primaries, how does your secondary process
> work with them?

When a primary process initializes, the location of the config file is important. The default is /var/run/.rte_config

To run multiple primary processes, the --file-prefix= option is used to specific a custom location for the config file. Eg: --file-prefix=testing    /var/run/.testing_config

The rte_eal_check_primary_alive(const char*) function takes a char* parameter - this is the location of the config file that the secondary process will wait for. Setting it to the correct value will make this secondary process wait for the corresponding primary process.

Regards, -Harry
  
Bruce Richardson Jan. 22, 2016, 5:37 p.m. UTC | #4
On Thu, Jan 21, 2016 at 09:02:41AM +0000, Van Haaren, Harry wrote:
> > From: Qiu, Michael
> > Sent: Thursday, January 21, 2016 6:14 AM
> > To: Van Haaren, Harry <harry.van.haaren@intel.com>; david.marchand@6wind.com
> > Cc: dev@dpdk.org
> > Subject: Re: [dpdk-dev] [PATCH] eal: add function to check if primary proc alive
> > <snip>
> > As we could start up many primaries, how does your secondary process
> > work with them?
> 
> When a primary process initializes, the location of the config file is important. The default is /var/run/.rte_config
> 
> To run multiple primary processes, the --file-prefix= option is used to specific a custom location for the config file. Eg: --file-prefix=testing    /var/run/.testing_config
> 
> The rte_eal_check_primary_alive(const char*) function takes a char* parameter - this is the location of the config file that the secondary process will wait for. Setting it to the correct value will make this secondary process wait for the corresponding primary process.
> 
> Regards, -Harry

Since a given secondary process only works with a single primary process, I'm not
sure why the user should want or need to pass in this parameter. What's the use
case for a secondary process wanting to know about a different primary process?
The details of what the config file is should largely be hidden from the user
IMHO.

If you want to allow a secondary to query an arbitrary primary process can you
still allow a NULL string to query the default primary based on the passed in
file-prefix parameter (if any)?

/Bruce
  
Michael Qiu Jan. 25, 2016, 8:06 a.m. UTC | #5
On 1/23/2016 1:38 AM, Richardson, Bruce wrote:
> On Thu, Jan 21, 2016 at 09:02:41AM +0000, Van Haaren, Harry wrote:
>>> From: Qiu, Michael
>>> Sent: Thursday, January 21, 2016 6:14 AM
>>> To: Van Haaren, Harry <harry.van.haaren@intel.com>; david.marchand@6wind.com
>>> Cc: dev@dpdk.org
>>> Subject: Re: [dpdk-dev] [PATCH] eal: add function to check if primary proc alive
>>> <snip>
>>> As we could start up many primaries, how does your secondary process
>>> work with them?
>> When a primary process initializes, the location of the config file is important. The default is /var/run/.rte_config
>>
>> To run multiple primary processes, the --file-prefix= option is used to specific a custom location for the config file. Eg: --file-prefix=testing    /var/run/.testing_config
>>
>> The rte_eal_check_primary_alive(const char*) function takes a char* parameter - this is the location of the config file that the secondary process will wait for. Setting it to the correct value will make this secondary process wait for the corresponding primary process.
>>
>> Regards, -Harry
> Since a given secondary process only works with a single primary process, I'm not
> sure why the user should want or need to pass in this parameter. What's the use
> case for a secondary process wanting to know about a different primary process?
> The details of what the config file is should largely be hidden from the user
> IMHO.

So using the prefix, and get the file name inside the
API(--file-prefix=xxx then the config file /var/run/.xxx_config), if no
perfix, then could be /var/run/.rte_config.

Just a suggestion. Maybe there are better solutions .

Thanks,
Michael
> If you want to allow a secondary to query an arbitrary primary process can you
> still allow a NULL string to query the default primary based on the passed in
> file-prefix parameter (if any)?
>
> /Bruce
>
  
Michael Qiu Jan. 25, 2016, 8:11 a.m. UTC | #6
On 1/20/2016 9:26 PM, Harry van Haaren wrote:
> This patch adds a new function to the EAL API:
> int rte_eal_primary_proc_alive(const char *path);
>
> The function indicates if a primary process is alive right now.
> This functionality is implemented by testing for a write-
> lock on the config file, and the function tests for a lock.
>
> The use case for this functionality is that a secondary
> process can wait until a primary process starts by polling
> the function and waiting. When the primary is running, the
> secondary continues to poll to detect if the primary process
> has quit unexpectedly, the secondary process can detect this.
>
> The RTE_MAGIC number is written to the shared config by the
> primary process, this is the signal to the secondary process
> that the EAL is set up, and ready to be used. The function
> rte_eal_mcfg_complete() writes RTE_MAGIC. This has been
> delayed in the EAL init proceedure, as the PCI probing in
> the primary process can interfere with the secondary running.
>
> Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com>
> ---
>  

Hi, Harry

So secondary  will waste a whole lcore to do such polling?

Thanks,
Michael
  
Van Haaren, Harry Jan. 25, 2016, 11:44 a.m. UTC | #7
> From: Richardson, Bruce
> The details of what the config file is should largely be hidden from the user
> IMHO.

Agreed, however hiding it totally removes the flexibility of waiting for a primary
that is starting with --file-prefix (aka: in a non-default location). Imposing
a limit on only monitoring primary procs in the default location seems wrong.

> If you want to allow a secondary to query an arbitrary primary process can you
> still allow a NULL string to query the default primary based on the passed in
> file-prefix parameter (if any)?

Yep, I've made a v2 which includes handling NULL as path arg and using the
default config file, will post later after some more testing.

-Harry
  
Van Haaren, Harry Jan. 25, 2016, 11:51 a.m. UTC | #8
> From: Qiu, Michael
> Subject: Re: [dpdk-dev] [PATCH] eal: add function to check if primary proc alive
>
> So secondary will waste a whole lcore to do such polling?

Not really, the secondary process will need some CPU,
however it can sleep so it doesn't have to use 100% of it.
It shouldn't be run on a core that is used by the primary
for packet-forwarding though - that will impact performance.

-Harry
  
Michael Qiu Jan. 26, 2016, 2:25 a.m. UTC | #9
On 1/25/2016 7:51 PM, Van Haaren, Harry wrote:
>> From: Qiu, Michael
>> Subject: Re: [dpdk-dev] [PATCH] eal: add function to check if primary proc alive
>>
>> So secondary will waste a whole lcore to do such polling?
> Not really, the secondary process will need some CPU,
> however it can sleep so it doesn't have to use 100% of it.
> It shouldn't be run on a core that is used by the primary
> for packet-forwarding though - that will impact performance.

If not, what will happen if the primary been killed after you check
alive? At that time, the secondary may be doing some work need primary
alive.

Thanks,
Michael
> -Harry
>
  
Van Haaren, Harry Jan. 26, 2016, 9:04 a.m. UTC | #10
> From: Qiu, Michael
> On 1/25/2016 7:51 PM, Van Haaren, Harry wrote:
> > Not really, the secondary process will need some CPU,
> > however it can sleep so it doesn't have to use 100% of it.
> > It shouldn't be run on a core that is used by the primary
> > for packet-forwarding though - that will impact performance.
> 
> If not, what will happen if the primary been killed after you check
> alive? At that time, the secondary may be doing some work need primary
> alive.

What work are you thinking of? Apart from the shared config
and hugepages, primary and secondary processes are running
in their own address-space, and if the primary gets killed,
the secondary will notice when it next polls rte_eal_primary_proc_alive().

Whatever work the secondary was performing (in its own address space)
won't be directly changed by the primary being killed, because the
shared config and hugepages stay (EAL "cleans up" when the primary
is re-launched, not on quit).

-Harry
  
Michael Qiu Jan. 26, 2016, 11:07 a.m. UTC | #11
On 1/26/2016 5:04 PM, Van Haaren, Harry wrote:
>> From: Qiu, Michael
>> On 1/25/2016 7:51 PM, Van Haaren, Harry wrote:
>>> Not really, the secondary process will need some CPU,
>>> however it can sleep so it doesn't have to use 100% of it.
>>> It shouldn't be run on a core that is used by the primary
>>> for packet-forwarding though - that will impact performance.
>> If not, what will happen if the primary been killed after you check
>> alive? At that time, the secondary may be doing some work need primary
>> alive.
> What work are you thinking of? Apart from the shared config
> and hugepages, primary and secondary processes are running
> in their own address-space, and if the primary gets killed,
> the secondary will notice when it next polls rte_eal_primary_proc_alive().
>
> Whatever work the secondary was performing (in its own address space)
> won't be directly changed by the primary being killed, because the
> shared config and hugepages stay (EAL "cleans up" when the primary
> is re-launched, not on quit).

OK,  when primary quit or be killed, the queues will be freed, it will
be a potential issue when secondary try to access, maybe I'm wrong.

Thanks,
Michael

> -Harry
>
>
  
Van Haaren, Harry Jan. 26, 2016, 11:19 a.m. UTC | #12
> From: Qiu, Michael
> > Whatever work the secondary was performing (in its own address space)
> > won't be directly changed by the primary being killed, because the
> > shared config and hugepages stay (EAL "cleans up" when the primary
> > is re-launched, not on quit).
> 
> OK,  when primary quit or be killed, the queues will be freed, it will
> be a potential issue when secondary try to access, maybe I'm wrong.

The use-case for this patch is monitoring statistics and fault-detection.
That involves reading registers directly from the NIC, and the NIC
rx/tx queues are not used. I think you are right that using the rx/tx
queues from a secondary process when they have been cleaned-up by the
primary process will indeed cause issues.

If there is a valid use-case where both primary and secondary processes
will be forwarding packets on the same NIC, this issue should be discussed
in more detail.

In its current state, this patch solves a problem for the use case of a
primary process forwarding packets, and a secondary process monitoring
and providing fault-detection.

-Harry
  
Bruce Richardson Jan. 26, 2016, 7:13 p.m. UTC | #13
On Mon, Jan 25, 2016 at 11:44:59AM +0000, Van Haaren, Harry wrote:
> > From: Richardson, Bruce
> > The details of what the config file is should largely be hidden from the user
> > IMHO.
> 
> Agreed, however hiding it totally removes the flexibility of waiting for a primary
> that is starting with --file-prefix (aka: in a non-default location). Imposing
> a limit on only monitoring primary procs in the default location seems wrong.
> 

But the secondary also needs the same prefix. Is that prefix not accessible by
this function to be used?

/Bruce
  
Van Haaren, Harry Jan. 27, 2016, 10:35 a.m. UTC | #14
> From: Richardson, Bruce
> > Agreed, however hiding it totally removes the flexibility of waiting for a primary
> > that is starting with --file-prefix (aka: in a non-default location). Imposing
> > a limit on only monitoring primary procs in the default location seems wrong.
> 
> But the secondary also needs the same prefix. Is that prefix not accessible by
> this function to be used?

The issue is that the EAL parsing code is performed during rte_init(), which
is exactly what this function tries to avoid - initializing EAL before a primary
process starts.

I looked at changing the EAL parsing to come before rte_init(), and considered
adding a minimal parser for --file-prefix. Both routes seem a bad solution,
either for complexity or code-duplication.

v2 of this patch posted to list:
http://dpdk.org/dev/patchwork/patch/10126/

-Harry
  

Patch

diff --git a/doc/guides/rel_notes/release_2_3.rst b/doc/guides/rel_notes/release_2_3.rst
index 99de186..14b5b06 100644
--- a/doc/guides/rel_notes/release_2_3.rst
+++ b/doc/guides/rel_notes/release_2_3.rst
@@ -11,6 +11,13 @@  Resolved Issues
 EAL
 ~~~
 
+* **Added rte_eal_primary_proc_alive() function**
+
+  A new function ``rte_eal_primary_proc_alive()`` has been added
+  to allow the user to detect if a primary process is running.
+  Use cases for this feature include fault detection, and monitoring
+  using secondary processes.
+
 
 Drivers
 ~~~~~~~
diff --git a/lib/librte_eal/bsdapp/eal/rte_eal_version.map b/lib/librte_eal/bsdapp/eal/rte_eal_version.map
index 9d7adf1..0e28017 100644
--- a/lib/librte_eal/bsdapp/eal/rte_eal_version.map
+++ b/lib/librte_eal/bsdapp/eal/rte_eal_version.map
@@ -135,3 +135,11 @@  DPDK_2.2 {
 	rte_xen_dom0_supported;
 
 } DPDK_2.1;
+
+
+DPDK_2.3 {
+       global:
+
+       rte_eal_primary_proc_alive;
+
+} DPDK_2.2;
diff --git a/lib/librte_eal/common/include/rte_eal.h b/lib/librte_eal/common/include/rte_eal.h
index d2816a8..6eb65f9 100644
--- a/lib/librte_eal/common/include/rte_eal.h
+++ b/lib/librte_eal/common/include/rte_eal.h
@@ -156,6 +156,25 @@  int rte_eal_iopl_init(void);
  *   - On failure, a negative error value.
  */
 int rte_eal_init(int argc, char **argv);
+
+/**
+ * Check if a primary process is currently alive
+ *
+ * This function returns true when a primary process is currently
+ * active.
+ *
+ * @param config_file_path
+ *   The config_file_path argument provided should point at the location
+ *   that the primary process will create its config file. By default,
+ *   /var/run/.rte_config is used. This path can be customized when starting
+ *   a primary process using --file-prefix=custom_path
+ *
+ * @return
+ *  - If alive, returns one.
+ *  - If dead, returns zero.
+ */
+int rte_eal_primary_proc_alive(const char *config_file_path);
+
 /**
  * Usage function typedef used by the application usage function.
  *
diff --git a/lib/librte_eal/linuxapp/eal/eal.c b/lib/librte_eal/linuxapp/eal/eal.c
index 635ec36..b419066 100644
--- a/lib/librte_eal/linuxapp/eal/eal.c
+++ b/lib/librte_eal/linuxapp/eal/eal.c
@@ -818,8 +818,6 @@  rte_eal_init(int argc, char **argv)
 
 	eal_check_mem_on_local_socket();
 
-	rte_eal_mcfg_complete();
-
 	if (eal_plugins_init() < 0)
 		rte_panic("Cannot init plugins\n");
 
@@ -877,9 +875,25 @@  rte_eal_init(int argc, char **argv)
 	if (rte_eal_pci_probe())
 		rte_panic("Cannot probe PCI\n");
 
+	rte_eal_mcfg_complete();
+
 	return fctret;
 }
 
+int
+rte_eal_primary_proc_alive(const char *config_file_path)
+{
+	int config_fd;
+	config_fd = open(config_file_path, O_RDONLY);
+	if (config_fd < 0)
+		return 0;
+
+	int ret = lockf(config_fd, F_TEST, 0);
+	close(config_fd);
+
+	return !!ret;
+}
+
 /* get core role */
 enum rte_lcore_role_t
 rte_eal_lcore_role(unsigned lcore_id)
diff --git a/lib/librte_eal/linuxapp/eal/rte_eal_version.map b/lib/librte_eal/linuxapp/eal/rte_eal_version.map
index cbe175f..7a8c530 100644
--- a/lib/librte_eal/linuxapp/eal/rte_eal_version.map
+++ b/lib/librte_eal/linuxapp/eal/rte_eal_version.map
@@ -138,3 +138,10 @@  DPDK_2.2 {
 	rte_xen_dom0_supported;
 
 } DPDK_2.1;
+
+DPDK_2.3 {
+	global:
+
+	rte_eal_primary_proc_alive;
+
+} DPDK_2.2;