[dpdk-dev,v2] ethdev: fix multi-process NULL dereference crashes
Checks
Commit Message
Secondary processes were blanket zeroing ethernet device memory,
resulting in NULL dereference crashes in multi-process setups.
Fixes: 7f95f78a8aea ("ethdev: clear data when allocating device")
Signed-off-by: Remy Horton <remy.horton@intel.com>
---
doc/guides/rel_notes/release_17_02.rst | 5 +++++
lib/librte_ether/rte_ethdev.c | 4 +++-
2 files changed, 8 insertions(+), 1 deletion(-)
Comments
2017-01-24 15:01, Remy Horton:
> Secondary processes were blanket zeroing ethernet device memory,
> resulting in NULL dereference crashes in multi-process setups.
>
> Fixes: 7f95f78a8aea ("ethdev: clear data when allocating device")
>
> Signed-off-by: Remy Horton <remy.horton@intel.com>
> ---
> doc/guides/rel_notes/release_17_02.rst | 5 +++++
> lib/librte_ether/rte_ethdev.c | 4 +++-
> 2 files changed, 8 insertions(+), 1 deletion(-)
>
> diff --git a/doc/guides/rel_notes/release_17_02.rst b/doc/guides/rel_notes/release_17_02.rst
> index 0ecd720..1472f84 100644
> --- a/doc/guides/rel_notes/release_17_02.rst
> +++ b/doc/guides/rel_notes/release_17_02.rst
> @@ -222,6 +222,11 @@ Drivers
> Fixed few regressions introduced in recent releases that break the virtio
> multiple process support.
>
> +* **ethdev: Fixed crash with multi-processing.**
> +
> + Secondary processes were blanket zeroing ethernet device memory,
> + resulting in NULL dereference crashes in multi-process setups.
It does not describe exactly the use-case it is fixing (same in commit message).
I guess you saw an issue when creating a vdev in the primary process and
another one in a secondary process, erasing the data of the first one.
nit: ethdev bug should be shown before PMD bugs like virtio one above.
> --- a/lib/librte_ether/rte_ethdev.c
> +++ b/lib/librte_ether/rte_ethdev.c
> @@ -225,8 +225,10 @@ rte_eth_dev_allocate(const char *name)
> return NULL;
> }
>
> - memset(&rte_eth_dev_data[port_id], 0, sizeof(struct rte_eth_dev_data));
> eth_dev = eth_dev_get(port_id);
> + if (rte_eal_process_type() == RTE_PROC_PRIMARY)
> + memset(&rte_eth_dev_data[port_id], 0,
> + sizeof(struct rte_eth_dev_data));
My previous proposal was:
memset(eth_dev->data, 0, sizeof(*eth_dev->data))
It is better to avoid reference to the global array rte_eth_dev_data.
Anyway, the shared data are still overwritten for the name, the port id
and the MTU.
Please describe the exact case where it is working for you.
On 25/01/2017 11:56, Thomas Monjalon wrote:
[..]
> It does not describe exactly the use-case it is fixing (same in commit message).
> I guess you saw an issue when creating a vdev in the primary process and
> another one in a secondary process, erasing the data of the first one.
In my use-case the secondary process is proc_info, which appeared to be
blanking the shared memory then leaving the NULL-pointer landmines for
the primary process to land on. I'm not entirely sure why this type of
secondary process needs to be running any ethdev startup code at all, as
all it is doing is pulling data out of shared memory..
> My previous proposal was:
> memset(eth_dev->data, 0, sizeof(*eth_dev->data))
> It is better to avoid reference to the global array rte_eth_dev_data.
Git rebase screwed up, and it got lost en-route :(
..Remy
On 24/01/2017 15:01, Remy Horton wrote:
> Secondary processes were blanket zeroing ethernet device memory,
> resulting in NULL dereference crashes in multi-process setups.
>
> Fixes: 7f95f78a8aea ("ethdev: clear data when allocating device")
>
> Signed-off-by: Remy Horton <remy.horton@intel.com>
Self-NAK: Condition is now tautology on code path that was causing crashes
2017-01-25 14:02, Remy Horton:
>
> On 24/01/2017 15:01, Remy Horton wrote:
> > Secondary processes were blanket zeroing ethernet device memory,
> > resulting in NULL dereference crashes in multi-process setups.
> >
> > Fixes: 7f95f78a8aea ("ethdev: clear data when allocating device")
> >
> > Signed-off-by: Remy Horton <remy.horton@intel.com>
>
> Self-NAK: Condition is now tautology on code path that was causing crashes
What do you mean exactly?
On 25/01/2017 14:31, Thomas Monjalon wrote:
> 2017-01-25 14:02, Remy Horton:
[..]
>> Self-NAK: Condition is now tautology on code path that was causing crashes
>
> What do you mean exactly?
There is an if(rte_eal_process_type() == RTE_PROC_PRIMARY) in a calling
function, so the one my patch was introducing is now redundant.
..Remy
@@ -222,6 +222,11 @@ Drivers
Fixed few regressions introduced in recent releases that break the virtio
multiple process support.
+* **ethdev: Fixed crash with multi-processing.**
+
+ Secondary processes were blanket zeroing ethernet device memory,
+ resulting in NULL dereference crashes in multi-process setups.
+
Libraries
~~~~~~~~~
@@ -225,8 +225,10 @@ rte_eth_dev_allocate(const char *name)
return NULL;
}
- memset(&rte_eth_dev_data[port_id], 0, sizeof(struct rte_eth_dev_data));
eth_dev = eth_dev_get(port_id);
+ if (rte_eal_process_type() == RTE_PROC_PRIMARY)
+ memset(&rte_eth_dev_data[port_id], 0,
+ sizeof(struct rte_eth_dev_data));
snprintf(eth_dev->data->name, sizeof(eth_dev->data->name), "%s", name);
eth_dev->data->port_id = port_id;
eth_dev->data->mtu = ETHER_MTU;