diff mbox

[dpdk-dev,v2,2/6] ethdev: add port ownership

Message ID	1515318351-4756-3-git-send-email-matan@mellanox.com (mailing list archive)
State	Superseded, archived
Delegated to:	Ferruh Yigit
Headers	From: Matan Azrad <matan@mellanox.com> To: Thomas Monjalon <thomas@monjalon.net>, Gaetan Rivet <gaetan.rivet@6wind.com>, Jingjing Wu <jingjing.wu@intel.com> Cc: dev@dpdk.org, Neil Horman <nhorman@tuxdriver.com>, Bruce Richardson <bruce.richardson@intel.com>, Konstantin Ananyev <konstantin.ananyev@intel.com> Date: Sun, 7 Jan 2018 09:45:47 +0000 Message-Id: <1515318351-4756-3-git-send-email-matan@mellanox.com> In-Reply-To: <1515318351-4756-1-git-send-email-matan@mellanox.com> References: <1511870281-15282-1-git-send-email-matan@mellanox.com> <1515318351-4756-1-git-send-email-matan@mellanox.com> MIME-Version: 1.0 Content-Type: text/plain Received-SPF: None (protection.outlook.com: mellanox.com does not designate permitted sender hosts) X-Microsoft-Exchange-Diagnostics: =?us-ascii?Q?1; AM0PR0502MB3793; 23:P/Ax+pxlOyirJA4zHpV4V8MzUSgszPET6ZY65T4?= ugkX/Bdl5ptuumJ1noN6Es0zh5NhBynA5D/c36B6iwk+MU3eDnec4anmHTs1SoUDt1y2Qbl1buzA8am8nvgbjJkffCu8vwJKrLiij1O3vGrW3ScN19TTsn/ZBacrtbx9CB6nw4cLFfWUh8xA9M/6Pf5Sf8CkgyFSDyFurS+vRb/j3iQ6jIxXDLLQUrY7NajAiM9Tt9dBkffQyuLl1AG0OxBReAhbDiVMIYKksFuQWYfyU9MbBP3NEWsW2xBfHiSszC4CbQFmZcF9u28Fl/5HaM2McWoyTuxR2pyTgf5vri6/Wxd620RmNRo2ynVMN1SFSGIAArzF5W42cieOdXXntCK1vTSB2ffFpjdQ/gWxq3DaNvQ9KjZbcsRXpHcQvcH3SNLg/ESD2/X3n6wzFVTFmB+OZrsOnIxJBjQ7RjJ2xSNxEmhp9Hzc5cGXy8HzAypR/bZZRsVR5EAb8vLkwOE/qW84d0AS6PxX1C35qeDRsX7XIE3muYM/Gc46rjbk+2+z53ViTnf4FztmrTBm4yCHB1m/k6DT11hdxss/n/S0uTrk0HOiXMbtdb93cW4nNnb8WpEajCzvP3/OOLZXzVPhcZ0Wqub+mi+ceBu6czbG2X9PsJmKaG1dYhsp2EmZQ1i4LctVWDqGbePHwYQ9AQ3vRbm9J+hg4RrSQElKsutdcmmHZJuXJJIrurYO2pIamNBceVyauZWHknms5iAnhjNwGlzoIIA97NhwjXsUmPeZ6nILEKhc1FKo+97j4iqa3V8DF8ylFqGouqjid4nd8cz31hMC+Ckovc/DIE6fPrI15Q1T7U5E/kXDKYPOdvftox9dwCtIhrx6YDO3a270ovyUakkvPQJUNSGl0PN7c3toD8pcx13YdARPFa5Kv7oKtbx/hrh/SZaeKohoHWOSVOu6Goc55tWKZXvevyEdl2TFd57Tf/WRzhD/oi/d5En75DX6Wilj5pOk04VSWj8QIujieIYtoxbOcxKFxlXd1wgNYM4R+st0q9JYvwnsIl90VnI9MfwtRgiSbvr7l07lcukw1ko9CpUP8LKf4PVkT/bGvbk5JKG+RUBjpqMrAMJbdeEVI115sR5WMu/uA+j2N2qWRpC4rvRS7csdduz/lCSXo6woCnB4Utw4ljNNsORCutGLgZICNIQs2LcUrnXXSVq8XwvEX4sOweE5KALoYHsRa5oyi/HrG419pkbmUTLLeY+1w+rCDws4tq/5oiszGWEq1ptpW X-Microsoft-Exchange-Diagnostics: 1; AM0PR0502MB3793; 6:1dstmPwROz6dCbaGjazgTI2sjd0vVHLtyITZKqL3scsYPu7+x75tD1PGVEweicwPgrO2ZXEKpGGP2RsW/Q3+oLo4F89e8D+Wx2asu7iY8bzz2P5vnK1Tuz+J29ZkVuACBdYQcY4xcJj0D5DYq4vxcUrvr5CO6rZXIifg9NkbNJVkG/Px9kkNWgP91sv5Ovp5krypUVVoMGg4XL9AnaRb27BHsOhRl/R2MrWUVE8xtsacifIVTEmLF8zWvJJ083vXHSd+TJXWFDZYhpj3Ua5/bLgXjiTXZOfhQAkNqgWZGJJ65CE9jSxL2M/1HB5yPAtJaXE4QdfLz/dpN8L9O1ziOiXgsASRvJLovBshxzvv1S8=; 5:eOov4HNP8wZoxBHphYhvbyIVaZwGF61KASxnVFw2pUJbz+nJ0Jb7TP9893YGY/noM6HuYBILgpaE4xtP3dCSnnuxGvfPhUtK7EZ6CYzsz5tj4eucm3BKtXRdVdANRDpQifv9xPRnbj8rWwOFjgHn7BbBmeWH0/qQrQywcHJgFtk=; 24:Jv5fTsnhQ0KXiK4U+JmYK+/Z+71/wNKvKwU/46z0NjExoigivm+MrmAYNwYs+qFJkU2pj/g5cTFNvmBcghHQON6LhEaj+fhBLrHWhrTknzA=; 7:WakOLfAu50I8bnvccetUjcUuhGAIDQyAxtT/lkC09eHr9tmYiZEfR74uqilBR3REvhyGKAF75PhdIj5C4Dihb+VfyTnx4bRHsT9RgFKHZY5h+2Q21O/+hichhUWD/S3dyZyMhvwFbnG52y9qojVZpnkLlioWK4OcQNUYH/d6zRblE0WdGmOjszJJ4stCU+YgplPOL2jwERdTSTP+UKNiF4SsY45c+THJ80lrin+6ss2/EEt+wcZNZ05D/Fitdiqk SpamDiagnosticOutput: 1:99 SpamDiagnosticMetadata: NSPM Subject: [dpdk-dev] [PATCH v2 2/6] ethdev: add port ownership Precedence: list Errors-To: dev-bounces@dpdk.org Sender: "dev" <dev-bounces@dpdk.org>

Checks

Context	Check	Description
ci/checkpatch	success	coding style OK
ci/Intel-compilation	success	Compilation OK

Commit Message

Matan Azrad Jan. 7, 2018, 9:45 a.m. UTC

  The ownership of a port is implicit in DPDK.
Making it explicit is better from the next reasons:
1. It will define well who is in charge of the port usage synchronization.
2. A library could work on top of a port.
3. A port can work on top of another port.

Also in the fail-safe case, an issue has been met in testpmd.
We need to check that the application is not trying to use a port which
is already managed by fail-safe.

A port owner is built from owner id(number) and owner name(string) while
the owner id must be unique to distinguish between two identical entity
instances and the owner name can be any name.
The name helps to logically recognize the owner by different DPDK
entities and allows easy debug.
Each DPDK entity can allocate an owner unique identifier and can use it
and its preferred name to owns valid ethdev ports.
Each DPDK entity can get any port owner status to decide if it can
manage the port or not.

The mechanism is synchronized for both the primary process threads and
the secondary processes threads to allow secondary process entity to be
a port owner.

Add a sinchronized ownership mechanism to DPDK Ethernet devices to
avoid multiple management of a device by different DPDK entities.

The current ethdev internal port management is not affected by this
feature.

Signed-off-by: Matan Azrad <matan@mellanox.com>
---
 doc/guides/prog_guide/poll_mode_drv.rst |  14 ++-
 lib/librte_ether/rte_ethdev.c           | 206 ++++++++++++++++++++++++++++++--
 lib/librte_ether/rte_ethdev.h           |  89 ++++++++++++++
 lib/librte_ether/rte_ethdev_version.map |  12 ++
 4 files changed, 311 insertions(+), 10 deletions(-)

Comments

Ananyev, Konstantin Jan. 10, 2018, 1:36 p.m. UTC | #1

Hi Matan,

Few comments from me below.
BTW, do you plan to add ownership mandatory check in control path functions
that change port configuration?
Konstantin

> -----Original Message-----
> From: Matan Azrad [mailto:matan@mellanox.com]
> Sent: Sunday, January 7, 2018 9:46 AM
> To: Thomas Monjalon <thomas@monjalon.net>; Gaetan Rivet <gaetan.rivet@6wind.com>; Wu, Jingjing <jingjing.wu@intel.com>
> Cc: dev@dpdk.org; Neil Horman <nhorman@tuxdriver.com>; Richardson, Bruce <bruce.richardson@intel.com>; Ananyev, Konstantin
> <konstantin.ananyev@intel.com>
> Subject: [PATCH v2 2/6] ethdev: add port ownership
> 
> The ownership of a port is implicit in DPDK.
> Making it explicit is better from the next reasons:
> 1. It will define well who is in charge of the port usage synchronization.
> 2. A library could work on top of a port.
> 3. A port can work on top of another port.
> 
> Also in the fail-safe case, an issue has been met in testpmd.
> We need to check that the application is not trying to use a port which
> is already managed by fail-safe.
> 
> A port owner is built from owner id(number) and owner name(string) while
> the owner id must be unique to distinguish between two identical entity
> instances and the owner name can be any name.
> The name helps to logically recognize the owner by different DPDK
> entities and allows easy debug.
> Each DPDK entity can allocate an owner unique identifier and can use it
> and its preferred name to owns valid ethdev ports.
> Each DPDK entity can get any port owner status to decide if it can
> manage the port or not.
> 
> The mechanism is synchronized for both the primary process threads and
> the secondary processes threads to allow secondary process entity to be
> a port owner.
> 
> Add a sinchronized ownership mechanism to DPDK Ethernet devices to
> avoid multiple management of a device by different DPDK entities.
> 
> The current ethdev internal port management is not affected by this
> feature.
> 
> Signed-off-by: Matan Azrad <matan@mellanox.com>
> ---
>  doc/guides/prog_guide/poll_mode_drv.rst |  14 ++-
>  lib/librte_ether/rte_ethdev.c           | 206 ++++++++++++++++++++++++++++++--
>  lib/librte_ether/rte_ethdev.h           |  89 ++++++++++++++
>  lib/librte_ether/rte_ethdev_version.map |  12 ++
>  4 files changed, 311 insertions(+), 10 deletions(-)


> 
> 
> diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c
> index 684e3e8..0e12452 100644
> --- a/lib/librte_ether/rte_ethdev.c
> +++ b/lib/librte_ether/rte_ethdev.c
> @@ -70,7 +70,10 @@
> 
>  static const char *MZ_RTE_ETH_DEV_DATA = "rte_eth_dev_data";
>  struct rte_eth_dev rte_eth_devices[RTE_MAX_ETHPORTS];
> +/* ports data array stored in shared memory */
>  static struct rte_eth_dev_data *rte_eth_dev_data;
> +/* next owner identifier stored in shared memory */
> +static uint16_t *rte_eth_next_owner_id;
>  static uint8_t eth_dev_last_created_port;
> 
>  /* spinlock for eth device callbacks */
> @@ -82,6 +85,9 @@
>  /* spinlock for add/remove tx callbacks */
>  static rte_spinlock_t rte_eth_tx_cb_lock = RTE_SPINLOCK_INITIALIZER;
> 
> +/* spinlock for eth device ownership management stored in shared memory */
> +static rte_spinlock_t *rte_eth_dev_ownership_lock;
> +
>  /* store statistics names and its offset in stats structure  */
>  struct rte_eth_xstats_name_off {
>  	char name[RTE_ETH_XSTATS_NAME_SIZE];
> @@ -153,14 +159,18 @@ enum {
>  }
> 
>  static void
> -rte_eth_dev_data_alloc(void)
> +rte_eth_dev_share_data_alloc(void)
>  {
>  	const unsigned flags = 0;
>  	const struct rte_memzone *mz;
> +	const unsigned int data_size = RTE_MAX_ETHPORTS *
> +						sizeof(*rte_eth_dev_data);
> 
>  	if (rte_eal_process_type() == RTE_PROC_PRIMARY) {
> +		/* Allocate shared memory for port data and ownership */
>  		mz = rte_memzone_reserve(MZ_RTE_ETH_DEV_DATA,
> -				RTE_MAX_ETHPORTS * sizeof(*rte_eth_dev_data),
> +				data_size + sizeof(*rte_eth_next_owner_id) +
> +				sizeof(*rte_eth_dev_ownership_lock),
>  				rte_socket_id(), flags);
>  	} else
>  		mz = rte_memzone_lookup(MZ_RTE_ETH_DEV_DATA);
> @@ -168,9 +178,17 @@ enum {
>  		rte_panic("Cannot allocate memzone for ethernet port data\n");
> 
>  	rte_eth_dev_data = mz->addr;
> -	if (rte_eal_process_type() == RTE_PROC_PRIMARY)
> -		memset(rte_eth_dev_data, 0,
> -				RTE_MAX_ETHPORTS * sizeof(*rte_eth_dev_data));
> +	rte_eth_next_owner_id = (uint16_t *)((uintptr_t)mz->addr +
> +					     data_size);
> +	rte_eth_dev_ownership_lock = (rte_spinlock_t *)
> +		((uintptr_t)rte_eth_next_owner_id +
> +		 sizeof(*rte_eth_next_owner_id));


I think that might make  rte_eth_dev_ownership_lock location not 4B aligned...
Why just not to put all data that you are trying to allocate as one chunck into the same struct:
static struct {
        uint16_t next_owner_id;
        /* spinlock for eth device ownership management stored in shared memory */
        rte_spinlock_t dev_ownership_lock;
        rte_eth_dev_data *data;
} rte_eth_dev_data;
and allocate/use it everywhere?
That would simplify allocation/management stuff. 

It is good to see that now scanning/updating rte_eth_dev_data[] is lock protected,
but it might be not very plausible to protect both data[] and next_owner_id using the same lock.
In fact, for next_owner_id, you don't need a lock - just rte_atomic_t should be enough.
Another alternative would be to use 2 locks - one for next_owner_id second for actual data[]
protection. 

Another thing - you'll probably need to grab/release a lock inside rte_eth_dev_allocated() too.
It is a public function used by drivers, so need to be protected too.

> +
> +	if (rte_eal_process_type() == RTE_PROC_PRIMARY) {
> +		memset(rte_eth_dev_data, 0, data_size);
> +		*rte_eth_next_owner_id = RTE_ETH_DEV_NO_OWNER + 1;
> +		rte_spinlock_init(rte_eth_dev_ownership_lock);
> +	}
>  }
> 
>  struct rte_eth_dev *
> @@ -225,7 +243,7 @@ struct rte_eth_dev *
>  	}
> 
>  	if (rte_eth_dev_data == NULL)
> -		rte_eth_dev_data_alloc();
> +		rte_eth_dev_share_data_alloc();
> 
>  	if (rte_eth_dev_allocated(name) != NULL) {
>  		RTE_PMD_DEBUG_TRACE("Ethernet Device with name %s already allocated!\n",
> @@ -253,7 +271,7 @@ struct rte_eth_dev *
>  	struct rte_eth_dev *eth_dev;
> 
>  	if (rte_eth_dev_data == NULL)
> -		rte_eth_dev_data_alloc();
> +		rte_eth_dev_share_data_alloc();
> 
>  	for (i = 0; i < RTE_MAX_ETHPORTS; i++) {
>  		if (strcmp(rte_eth_dev_data[i].name, name) == 0)
> @@ -278,8 +296,12 @@ struct rte_eth_dev *
>  	if (eth_dev == NULL)
>  		return -EINVAL;
> 
> -	memset(eth_dev->data, 0, sizeof(struct rte_eth_dev_data));
> +	rte_spinlock_lock(rte_eth_dev_ownership_lock);
> +
>  	eth_dev->state = RTE_ETH_DEV_UNUSED;
> +	memset(eth_dev->data, 0, sizeof(struct rte_eth_dev_data));
> +
> +	rte_spinlock_unlock(rte_eth_dev_ownership_lock);
>  	return 0;
>  }
> 
> @@ -294,6 +316,174 @@ struct rte_eth_dev *
>  		return 1;
>  }
> 
> +static int
> +rte_eth_is_valid_owner_id(uint16_t owner_id)
> +{
> +	if (owner_id == RTE_ETH_DEV_NO_OWNER ||
> +	    (*rte_eth_next_owner_id > RTE_ETH_DEV_NO_OWNER &&
> +	     *rte_eth_next_owner_id <= owner_id)) {
> +		RTE_LOG(ERR, EAL, "Invalid owner_id=%d.\n", owner_id);
> +		return 0;
> +	}
> +	return 1;
> +}
> +
> +uint16_t
> +rte_eth_find_next_owned_by(uint16_t port_id, const uint16_t owner_id)
> +{
> +	while (port_id < RTE_MAX_ETHPORTS &&
> +	       (rte_eth_devices[port_id].state != RTE_ETH_DEV_ATTACHED ||
> +	       rte_eth_devices[port_id].data->owner.id != owner_id))
> +		port_id++;
> +
> +	if (port_id >= RTE_MAX_ETHPORTS)
> +		return RTE_MAX_ETHPORTS;
> +
> +	return port_id;
> +}
> +
> +int
> +rte_eth_dev_owner_new(uint16_t *owner_id)
> +{
> +	int ret = 0;
> +
> +	rte_spinlock_lock(rte_eth_dev_ownership_lock);
> +
> +	if (*rte_eth_next_owner_id == RTE_ETH_DEV_NO_OWNER) {
> +		/* Counter wrap around. */
> +		RTE_PMD_DEBUG_TRACE("Reached maximum number of Ethernet port owners.\n");
> +		ret = -EUSERS;
> +	} else {
> +		*owner_id = (*rte_eth_next_owner_id)++;
> +	}
> +
> +	rte_spinlock_unlock(rte_eth_dev_ownership_lock);
> +	return ret;
> +}
> +
> +int
> +rte_eth_dev_owner_set(const uint16_t port_id,
> +		      const struct rte_eth_dev_owner *owner)

As a nit - if you'll have rte_eth_dev_owner_set(port_id, old_owner, new_owner) 
- that might be more plausible for user, and would greatly simplify unset() part:
just set(port_id, cur_owner, zero_owner);

> +{
> +	struct rte_eth_dev_owner *port_owner;
> +	int ret = 0;
> +	int sret;
> +
> +	rte_spinlock_lock(rte_eth_dev_ownership_lock);
> +
> +	if (!rte_eth_dev_is_valid_port(port_id)) {
> +		RTE_PMD_DEBUG_TRACE("Invalid port_id=%d\n", port_id);
> +		ret = -ENODEV;
> +		goto unlock;
> +	}
> +
> +	if (!rte_eth_is_valid_owner_id(owner->id)) {
> +		ret = -EINVAL;
> +		goto unlock;
> +	}
> +
> +	port_owner = &rte_eth_devices[port_id].data->owner;
> +	if (port_owner->id != RTE_ETH_DEV_NO_OWNER &&
> +	    port_owner->id != owner->id) {
> +		RTE_LOG(ERR, EAL,
> +			"Cannot set owner to port %d already owned by %s_%05d.\n",
> +			port_id, port_owner->name, port_owner->id);
> +		ret = -EPERM;
> +		goto unlock;
> +	}
> +
> +	sret = snprintf(port_owner->name, RTE_ETH_MAX_OWNER_NAME_LEN, "%s",
> +			owner->name);
> +	if (sret < 0 || sret >= RTE_ETH_MAX_OWNER_NAME_LEN) {

Personally, I don't see any reason to fail if description was truncated...
Another alternative - just use rte_malloc() here to allocate big enough buffer to hold the description.

> +		memset(port_owner->name, 0, RTE_ETH_MAX_OWNER_NAME_LEN);
> +		RTE_LOG(ERR, EAL, "Invalid owner name.\n");
> +		ret = -EINVAL;
> +		goto unlock;
> +	}
> +
> +	port_owner->id = owner->id;
> +	RTE_PMD_DEBUG_TRACE("Port %d owner is %s_%05d.\n", port_id,
> +			    owner->name, owner->id);
> +

As another nit - you can avoid all these gotos by restructuring code a bit:

rte_eth_dev_owner_set(const uint16_t port_id, const struct rte_eth_dev_owner *owner)
{
    rte_spinlock_lock(...);
    ret = _eth_dev_owner_set_unlocked(port_id, owner);
    rte_spinlock_unlock(...);
    return ret;
}


> +unlock:
> +	rte_spinlock_unlock(rte_eth_dev_ownership_lock);
> +	return ret;
> +}
> +
> +int
> +rte_eth_dev_owner_unset(const uint16_t port_id, const uint16_t owner_id)
> +{
> +	struct rte_eth_dev_owner *port_owner;
> +	int ret = 0;
> +
> +	rte_spinlock_lock(rte_eth_dev_ownership_lock);
> +
> +	if (!rte_eth_dev_is_valid_port(port_id)) {
> +		RTE_PMD_DEBUG_TRACE("Invalid port_id=%d\n", port_id);
> +		ret = -ENODEV;
> +		goto unlock;
> +	}
> +
> +	if (!rte_eth_is_valid_owner_id(owner_id)) {
> +		ret = -EINVAL;
> +		goto unlock;
> +	}
> +
> +	port_owner = &rte_eth_devices[port_id].data->owner;
> +	if (port_owner->id != owner_id) {
> +		RTE_LOG(ERR, EAL, "Cannot unset port %d owner (%s_%05d) by"
> +			" a different owner with id %5d.\n", port_id,
> +			port_owner->name, port_owner->id, owner_id);
> +		ret = -EPERM;
> +		goto unlock;
> +	}
> +	RTE_PMD_DEBUG_TRACE("Port %d owner %s_%05d has removed.\n", port_id,
> +			    port_owner->name, port_owner->id);
> +
> +	memset(port_owner, 0, sizeof(struct rte_eth_dev_owner));
> +
> +unlock:
> +	rte_spinlock_unlock(rte_eth_dev_ownership_lock);
> +	return ret;
> +}
> +
> +void
> +rte_eth_dev_owner_delete(const uint16_t owner_id)
> +{
> +	uint16_t port_id;
> +
> +	rte_spinlock_lock(rte_eth_dev_ownership_lock);
> +
> +	if (rte_eth_is_valid_owner_id(owner_id)) {
> +		RTE_ETH_FOREACH_DEV_OWNED_BY(port_id, owner_id)
> +			memset(&rte_eth_devices[port_id].data->owner, 0,
> +			       sizeof(struct rte_eth_dev_owner));
> +		RTE_PMD_DEBUG_TRACE("All port owners owned by %05d identifier"
> +				    " have removed.\n", owner_id);
> +	}
> +
> +	rte_spinlock_unlock(rte_eth_dev_ownership_lock);
> +}
> +
> +int
> +rte_eth_dev_owner_get(const uint16_t port_id, struct rte_eth_dev_owner *owner)
> +{
> +	int ret = 0;
> +
> +	rte_spinlock_lock(rte_eth_dev_ownership_lock);
> +
> +	if (!rte_eth_dev_is_valid_port(port_id)) {
> +		RTE_PMD_DEBUG_TRACE("Invalid port_id=%d\n", port_id);
> +		ret = -ENODEV;
> +	} else {
> +		rte_memcpy(owner, &rte_eth_devices[port_id].data->owner,
> +			   sizeof(*owner));
> +	}
> +
> +	rte_spinlock_unlock(rte_eth_dev_ownership_lock);
> +	return ret;
> +}
> +
>  int
>  rte_eth_dev_socket_id(uint16_t port_id)
>  {
> diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
> index 57b61ed..88ad765 100644
> --- a/lib/librte_ether/rte_ethdev.h
> +++ b/lib/librte_ether/rte_ethdev.h
> @@ -1760,6 +1760,15 @@ struct rte_eth_dev_sriov {
> 
>  #define RTE_ETH_NAME_MAX_LEN RTE_DEV_NAME_MAX_LEN
> 
> +#define RTE_ETH_DEV_NO_OWNER 0
> +
> +#define RTE_ETH_MAX_OWNER_NAME_LEN 64
> +
> +struct rte_eth_dev_owner {
> +	uint16_t id; /**< The owner unique identifier. */

Why limit yourself to 16bit here?
Why not uint32_t/uint64_t - or even uuid_t and make system library to generate it for you?
Wouldn't need to worry about overflows then.

> +	char name[RTE_ETH_MAX_OWNER_NAME_LEN]; /**< The owner name. */
> +};
> +
>  /**
>   * @internal
>   * The data part, with no function pointers, associated with each ethernet device.
> @@ -1810,6 +1819,7 @@ struct rte_eth_dev_data {
>  	int numa_node;  /**< NUMA node connection */
>  	struct rte_vlan_filter_conf vlan_filter_conf;
>  	/**< VLAN filter configuration. */
> +	struct rte_eth_dev_owner owner; /**< The port owner. */
>  };
> 
>  /** Device supports link state interrupt */
> @@ -1846,6 +1856,85 @@ struct rte_eth_dev_data {
> 
> 
>  /**
> + * Iterates over valid ethdev ports owned by a specific owner.
> + *
> + * @param port_id
> + *   The id of the next possible valid owned port.
> + * @param	owner_id
> + *  The owner identifier.
> + *  RTE_ETH_DEV_NO_OWNER means iterate over all valid ownerless ports.
> + * @return
> + *   Next valid port id owned by owner_id, RTE_MAX_ETHPORTS if there is none.
> + */
> +uint16_t rte_eth_find_next_owned_by(uint16_t port_id, const uint16_t owner_id);
> +
> +/**
> + * Macro to iterate over all enabled ethdev ports owned by a specific owner.
> + */
> +#define RTE_ETH_FOREACH_DEV_OWNED_BY(p, o) \
> +	for (p = rte_eth_find_next_owned_by(0, o); \
> +	     (unsigned int)p < (unsigned int)RTE_MAX_ETHPORTS; \
> +	     p = rte_eth_find_next_owned_by(p + 1, o))
> +
> +/**
> + * Get a new unique owner identifier.
> + * An owner identifier is used to owns Ethernet devices by only one DPDK entity
> + * to avoid multiple management of device by different entities.
> + *
> + * @param	owner_id
> + *   Owner identifier pointer.
> + * @return
> + *   Negative errno value on error, 0 on success.
> + */
> +int rte_eth_dev_owner_new(uint16_t *owner_id);
> +
> +/**
> + * Set an Ethernet device owner.
> + *
> + * @param	port_id
> + *  The identifier of the port to own.
> + * @param	owner
> + *  The owner pointer.
> + * @return
> + *  Negative errno value on error, 0 on success.
> + */
> +int rte_eth_dev_owner_set(const uint16_t port_id,
> +			  const struct rte_eth_dev_owner *owner);
> +
> +/**
> + * Unset Ethernet device owner to make the device ownerless.
> + *
> + * @param	port_id
> + *  The identifier of port to make ownerless.
> + * @param	owner
> + *  The owner identifier.
> + * @return
> + *  0 on success, negative errno value on error.
> + */
> +int rte_eth_dev_owner_unset(const uint16_t port_id, const uint16_t owner_id);
> +
> +/**
> + * Remove owner from all Ethernet devices owned by a specific owner.
> + *
> + * @param	owner
> + *  The owner identifier.
> + */
> +void rte_eth_dev_owner_delete(const uint16_t owner_id);
> +
> +/**
> + * Get the owner of an Ethernet device.
> + *
> + * @param	port_id
> + *  The port identifier.
> + * @param	owner
> + *  The owner structure pointer to fill.
> + * @return
> + *  0 on success, negative errno value on error..
> + */
> +int rte_eth_dev_owner_get(const uint16_t port_id,
> +			  struct rte_eth_dev_owner *owner);
> +
> +/**
>   * Get the total number of Ethernet devices that have been successfully
>   * initialized by the matching Ethernet driver during the PCI probing phase
>   * and that are available for applications to use. These devices must be
> diff --git a/lib/librte_ether/rte_ethdev_version.map b/lib/librte_ether/rte_ethdev_version.map
> index e9681ac..5d20b5f 100644
> --- a/lib/librte_ether/rte_ethdev_version.map
> +++ b/lib/librte_ether/rte_ethdev_version.map
> @@ -198,6 +198,18 @@ DPDK_17.11 {
> 
>  } DPDK_17.08;
> 
> +DPDK_18.02 {
> +	global:
> +
> +	rte_eth_dev_owner_delete;
> +	rte_eth_dev_owner_get;
> +	rte_eth_dev_owner_new;
> +	rte_eth_dev_owner_set;
> +	rte_eth_dev_owner_unset;
> +	rte_eth_find_next_owned_by;
> +
> +} DPDK_17.11;
> +
>  EXPERIMENTAL {
>  	global:
> 
> --
> 1.8.3.1

Matan Azrad Jan. 10, 2018, 4:58 p.m. UTC | #2

Hi Konstantin

From: Ananyev, Konstantin, Wednesday, January 10, 2018 3:36 PM
> Hi Matan,
> 
> Few comments from me below.
> BTW, do you plan to add ownership mandatory check in control path
> functions that change port configuration?

No.


> Konstantin
> 
> > -----Original Message-----
> > From: Matan Azrad [mailto:matan@mellanox.com]
> > Sent: Sunday, January 7, 2018 9:46 AM
> > To: Thomas Monjalon <thomas@monjalon.net>; Gaetan Rivet
> > <gaetan.rivet@6wind.com>; Wu, Jingjing <jingjing.wu@intel.com>
> > Cc: dev@dpdk.org; Neil Horman <nhorman@tuxdriver.com>; Richardson,
> > Bruce <bruce.richardson@intel.com>; Ananyev, Konstantin
> > <konstantin.ananyev@intel.com>
> > Subject: [PATCH v2 2/6] ethdev: add port ownership
> >
> > The ownership of a port is implicit in DPDK.
> > Making it explicit is better from the next reasons:
> > 1. It will define well who is in charge of the port usage synchronization.
> > 2. A library could work on top of a port.
> > 3. A port can work on top of another port.
> >
> > Also in the fail-safe case, an issue has been met in testpmd.
> > We need to check that the application is not trying to use a port
> > which is already managed by fail-safe.
> >
> > A port owner is built from owner id(number) and owner name(string)
> > while the owner id must be unique to distinguish between two identical
> > entity instances and the owner name can be any name.
> > The name helps to logically recognize the owner by different DPDK
> > entities and allows easy debug.
> > Each DPDK entity can allocate an owner unique identifier and can use
> > it and its preferred name to owns valid ethdev ports.
> > Each DPDK entity can get any port owner status to decide if it can
> > manage the port or not.
> >
> > The mechanism is synchronized for both the primary process threads and
> > the secondary processes threads to allow secondary process entity to
> > be a port owner.
> >
> > Add a sinchronized ownership mechanism to DPDK Ethernet devices to
> > avoid multiple management of a device by different DPDK entities.
> >
> > The current ethdev internal port management is not affected by this
> > feature.
> >
> > Signed-off-by: Matan Azrad <matan@mellanox.com>
> > ---
> >  doc/guides/prog_guide/poll_mode_drv.rst |  14 ++-
> >  lib/librte_ether/rte_ethdev.c           | 206
> ++++++++++++++++++++++++++++++--
> >  lib/librte_ether/rte_ethdev.h           |  89 ++++++++++++++
> >  lib/librte_ether/rte_ethdev_version.map |  12 ++
> >  4 files changed, 311 insertions(+), 10 deletions(-)
> 
> 
> >
> >
> > diff --git a/lib/librte_ether/rte_ethdev.c
> > b/lib/librte_ether/rte_ethdev.c index 684e3e8..0e12452 100644
> > --- a/lib/librte_ether/rte_ethdev.c
> > +++ b/lib/librte_ether/rte_ethdev.c
> > @@ -70,7 +70,10 @@
> >
> >  static const char *MZ_RTE_ETH_DEV_DATA = "rte_eth_dev_data";  struct
> > rte_eth_dev rte_eth_devices[RTE_MAX_ETHPORTS];
> > +/* ports data array stored in shared memory */
> >  static struct rte_eth_dev_data *rte_eth_dev_data;
> > +/* next owner identifier stored in shared memory */ static uint16_t
> > +*rte_eth_next_owner_id;
> >  static uint8_t eth_dev_last_created_port;
> >
> >  /* spinlock for eth device callbacks */ @@ -82,6 +85,9 @@
> >  /* spinlock for add/remove tx callbacks */  static rte_spinlock_t
> > rte_eth_tx_cb_lock = RTE_SPINLOCK_INITIALIZER;
> >
> > +/* spinlock for eth device ownership management stored in shared
> > +memory */ static rte_spinlock_t *rte_eth_dev_ownership_lock;
> > +
> >  /* store statistics names and its offset in stats structure  */
> > struct rte_eth_xstats_name_off {
> >  	char name[RTE_ETH_XSTATS_NAME_SIZE]; @@ -153,14 +159,18 @@
> enum {  }
> >
> >  static void
> > -rte_eth_dev_data_alloc(void)
> > +rte_eth_dev_share_data_alloc(void)
> >  {
> >  	const unsigned flags = 0;
> >  	const struct rte_memzone *mz;
> > +	const unsigned int data_size = RTE_MAX_ETHPORTS *
> > +						sizeof(*rte_eth_dev_data);
> >
> >  	if (rte_eal_process_type() == RTE_PROC_PRIMARY) {
> > +		/* Allocate shared memory for port data and ownership */
> >  		mz = rte_memzone_reserve(MZ_RTE_ETH_DEV_DATA,
> > -				RTE_MAX_ETHPORTS *
> sizeof(*rte_eth_dev_data),
> > +				data_size + sizeof(*rte_eth_next_owner_id)
> +
> > +				sizeof(*rte_eth_dev_ownership_lock),
> >  				rte_socket_id(), flags);
> >  	} else
> >  		mz = rte_memzone_lookup(MZ_RTE_ETH_DEV_DATA);
> > @@ -168,9 +178,17 @@ enum {
> >  		rte_panic("Cannot allocate memzone for ethernet port
> data\n");
> >
> >  	rte_eth_dev_data = mz->addr;
> > -	if (rte_eal_process_type() == RTE_PROC_PRIMARY)
> > -		memset(rte_eth_dev_data, 0,
> > -				RTE_MAX_ETHPORTS *
> sizeof(*rte_eth_dev_data));
> > +	rte_eth_next_owner_id = (uint16_t *)((uintptr_t)mz->addr +
> > +					     data_size);
> > +	rte_eth_dev_ownership_lock = (rte_spinlock_t *)
> > +		((uintptr_t)rte_eth_next_owner_id +
> > +		 sizeof(*rte_eth_next_owner_id));
> 
> 
> I think that might make  rte_eth_dev_ownership_lock location not 4B
> aligned...

Where can I find the documentation about it?

> Why just not to put all data that you are trying to allocate as one chunck into
> the same struct:
> static struct {
>         uint16_t next_owner_id;
>         /* spinlock for eth device ownership management stored in shared
> memory */
>         rte_spinlock_t dev_ownership_lock;
>         rte_eth_dev_data *data;
> } rte_eth_dev_data;
> and allocate/use it everywhere?
> That would simplify allocation/management stuff.
>
I don't understand what exactly do you mean. ?
If you mean to group all in one struct like:

static struct {
        uint16_t next_owner_id;
        rte_spinlock_t dev_ownership_lock;
        rte_eth_dev_data  data[];
} rte_eth_dev_share_data;

Just to simplify the addresses calculation above,
It will change more code in ethdev relative to the old rte_eth_dev_data global array and will be more intrusive.
Stay it as is, focuses the change only here.

I can just move the spinlock memory allocation to be at the beginning of the memzone(to be sure about the alignment).
 
> It is good to see that now scanning/updating rte_eth_dev_data[] is lock
> protected, but it might be not very plausible to protect both data[] and
> next_owner_id using the same lock.

I guess you mean to the owner structure in rte_eth_dev_data[port_id].
The next_owner_id is read by ownership APIs(for owner validation), so it makes sense to use the same lock.
Actually, why not?

> In fact, for next_owner_id, you don't need a lock - just rte_atomic_t should
> be enough.

I don't think so, it is problematic in next_owner_id wraparound and may complicate the code in other places which read it.
Why not just to keep it simple and using the same lock?

> Another alternative would be to use 2 locks - one for next_owner_id second
> for actual data[] protection.
> 
> Another thing - you'll probably need to grab/release a lock inside
> rte_eth_dev_allocated() too.
> It is a public function used by drivers, so need to be protected too.
> 

Yes, I thought about it, but decided not to use lock in next:
rte_eth_dev_allocated
rte_eth_dev_count
rte_eth_dev_get_name_by_port
rte_eth_dev_get_port_by_name
maybe more...

Don't you think it is just timing depended?(ask in the next moment and you may get another answer) I don't see optional crash.

> > +
> > +	if (rte_eal_process_type() == RTE_PROC_PRIMARY) {
> > +		memset(rte_eth_dev_data, 0, data_size);
> > +		*rte_eth_next_owner_id = RTE_ETH_DEV_NO_OWNER + 1;
> > +		rte_spinlock_init(rte_eth_dev_ownership_lock);
> > +	}
> >  }
> >
> >  struct rte_eth_dev *
> > @@ -225,7 +243,7 @@ struct rte_eth_dev *
> >  	}
> >
> >  	if (rte_eth_dev_data == NULL)
> > -		rte_eth_dev_data_alloc();
> > +		rte_eth_dev_share_data_alloc();
> >
> >  	if (rte_eth_dev_allocated(name) != NULL) {
> >  		RTE_PMD_DEBUG_TRACE("Ethernet Device with name %s
> already
> > allocated!\n", @@ -253,7 +271,7 @@ struct rte_eth_dev *
> >  	struct rte_eth_dev *eth_dev;
> >
> >  	if (rte_eth_dev_data == NULL)
> > -		rte_eth_dev_data_alloc();
> > +		rte_eth_dev_share_data_alloc();
> >
> >  	for (i = 0; i < RTE_MAX_ETHPORTS; i++) {
> >  		if (strcmp(rte_eth_dev_data[i].name, name) == 0) @@ -
> 278,8 +296,12
> > @@ struct rte_eth_dev *
> >  	if (eth_dev == NULL)
> >  		return -EINVAL;
> >
> > -	memset(eth_dev->data, 0, sizeof(struct rte_eth_dev_data));
> > +	rte_spinlock_lock(rte_eth_dev_ownership_lock);
> > +
> >  	eth_dev->state = RTE_ETH_DEV_UNUSED;
> > +	memset(eth_dev->data, 0, sizeof(struct rte_eth_dev_data));
> > +
> > +	rte_spinlock_unlock(rte_eth_dev_ownership_lock);
> >  	return 0;
> >  }
> >
> > @@ -294,6 +316,174 @@ struct rte_eth_dev *
> >  		return 1;
> >  }
> >
> > +static int
> > +rte_eth_is_valid_owner_id(uint16_t owner_id) {
> > +	if (owner_id == RTE_ETH_DEV_NO_OWNER ||
> > +	    (*rte_eth_next_owner_id > RTE_ETH_DEV_NO_OWNER &&
> > +	     *rte_eth_next_owner_id <= owner_id)) {
> > +		RTE_LOG(ERR, EAL, "Invalid owner_id=%d.\n", owner_id);
> > +		return 0;
> > +	}
> > +	return 1;
> > +}
> > +
> > +uint16_t
> > +rte_eth_find_next_owned_by(uint16_t port_id, const uint16_t
> owner_id)
> > +{
> > +	while (port_id < RTE_MAX_ETHPORTS &&
> > +	       (rte_eth_devices[port_id].state != RTE_ETH_DEV_ATTACHED ||
> > +	       rte_eth_devices[port_id].data->owner.id != owner_id))
> > +		port_id++;
> > +
> > +	if (port_id >= RTE_MAX_ETHPORTS)
> > +		return RTE_MAX_ETHPORTS;
> > +
> > +	return port_id;
> > +}
> > +
> > +int
> > +rte_eth_dev_owner_new(uint16_t *owner_id) {
> > +	int ret = 0;
> > +
> > +	rte_spinlock_lock(rte_eth_dev_ownership_lock);
> > +
> > +	if (*rte_eth_next_owner_id == RTE_ETH_DEV_NO_OWNER) {
> > +		/* Counter wrap around. */
> > +		RTE_PMD_DEBUG_TRACE("Reached maximum number of
> Ethernet port owners.\n");
> > +		ret = -EUSERS;
> > +	} else {
> > +		*owner_id = (*rte_eth_next_owner_id)++;
> > +	}
> > +
> > +	rte_spinlock_unlock(rte_eth_dev_ownership_lock);
> > +	return ret;
> > +}
> > +
> > +int
> > +rte_eth_dev_owner_set(const uint16_t port_id,
> > +		      const struct rte_eth_dev_owner *owner)
> 
> As a nit - if you'll have rte_eth_dev_owner_set(port_id, old_owner,
> new_owner)
> - that might be more plausible for user, and would greatly simplify unset()
> part:
> just set(port_id, cur_owner, zero_owner);
> 

How the user should know the old owner?

> > +{
> > +	struct rte_eth_dev_owner *port_owner;
> > +	int ret = 0;
> > +	int sret;
> > +
> > +	rte_spinlock_lock(rte_eth_dev_ownership_lock);
> > +
> > +	if (!rte_eth_dev_is_valid_port(port_id)) {
> > +		RTE_PMD_DEBUG_TRACE("Invalid port_id=%d\n", port_id);
> > +		ret = -ENODEV;
> > +		goto unlock;
> > +	}
> > +
> > +	if (!rte_eth_is_valid_owner_id(owner->id)) {
> > +		ret = -EINVAL;
> > +		goto unlock;
> > +	}
> > +
> > +	port_owner = &rte_eth_devices[port_id].data->owner;
> > +	if (port_owner->id != RTE_ETH_DEV_NO_OWNER &&
> > +	    port_owner->id != owner->id) {
> > +		RTE_LOG(ERR, EAL,
> > +			"Cannot set owner to port %d already owned by
> %s_%05d.\n",
> > +			port_id, port_owner->name, port_owner->id);
> > +		ret = -EPERM;
> > +		goto unlock;
> > +	}
> > +
> > +	sret = snprintf(port_owner->name,
> RTE_ETH_MAX_OWNER_NAME_LEN, "%s",
> > +			owner->name);
> > +	if (sret < 0 || sret >= RTE_ETH_MAX_OWNER_NAME_LEN) {
> 
> Personally, I don't see any reason to fail if description was truncated...
> Another alternative - just use rte_malloc() here to allocate big enough buffer
> to hold the description.
> 

But it is static allocation like in the device name, why to allocate it differently?
 
> > +		memset(port_owner->name, 0,
> RTE_ETH_MAX_OWNER_NAME_LEN);
> > +		RTE_LOG(ERR, EAL, "Invalid owner name.\n");
> > +		ret = -EINVAL;
> > +		goto unlock;
> > +	}
> > +
> > +	port_owner->id = owner->id;
> > +	RTE_PMD_DEBUG_TRACE("Port %d owner is %s_%05d.\n", port_id,
> > +			    owner->name, owner->id);
> > +
> 
> As another nit - you can avoid all these gotos by restructuring code a bit:
> 
> rte_eth_dev_owner_set(const uint16_t port_id, const struct
> rte_eth_dev_owner *owner) {
>     rte_spinlock_lock(...);
>     ret = _eth_dev_owner_set_unlocked(port_id, owner);
>     rte_spinlock_unlock(...);
>     return ret;
> }
> 
Don't you like gotos? :)
I personally use it only in error\performance scenarios.
Do you think it worth the effort?

> 
> > +unlock:
> > +	rte_spinlock_unlock(rte_eth_dev_ownership_lock);
> > +	return ret;
> > +}
> > +
> > +int
> > +rte_eth_dev_owner_unset(const uint16_t port_id, const uint16_t
> > +owner_id) {
> > +	struct rte_eth_dev_owner *port_owner;
> > +	int ret = 0;
> > +
> > +	rte_spinlock_lock(rte_eth_dev_ownership_lock);
> > +
> > +	if (!rte_eth_dev_is_valid_port(port_id)) {
> > +		RTE_PMD_DEBUG_TRACE("Invalid port_id=%d\n", port_id);
> > +		ret = -ENODEV;
> > +		goto unlock;
> > +	}
> > +
> > +	if (!rte_eth_is_valid_owner_id(owner_id)) {
> > +		ret = -EINVAL;
> > +		goto unlock;
> > +	}
> > +
> > +	port_owner = &rte_eth_devices[port_id].data->owner;
> > +	if (port_owner->id != owner_id) {
> > +		RTE_LOG(ERR, EAL, "Cannot unset port %d owner (%s_%05d)
> by"
> > +			" a different owner with id %5d.\n", port_id,
> > +			port_owner->name, port_owner->id, owner_id);
> > +		ret = -EPERM;
> > +		goto unlock;
> > +	}
> > +	RTE_PMD_DEBUG_TRACE("Port %d owner %s_%05d has
> removed.\n", port_id,
> > +			    port_owner->name, port_owner->id);
> > +
> > +	memset(port_owner, 0, sizeof(struct rte_eth_dev_owner));
> > +
> > +unlock:
> > +	rte_spinlock_unlock(rte_eth_dev_ownership_lock);
> > +	return ret;
> > +}
> > +
> > +void
> > +rte_eth_dev_owner_delete(const uint16_t owner_id) {
> > +	uint16_t port_id;
> > +
> > +	rte_spinlock_lock(rte_eth_dev_ownership_lock);
> > +
> > +	if (rte_eth_is_valid_owner_id(owner_id)) {
> > +		RTE_ETH_FOREACH_DEV_OWNED_BY(port_id, owner_id)
> > +			memset(&rte_eth_devices[port_id].data->owner, 0,
> > +			       sizeof(struct rte_eth_dev_owner));
> > +		RTE_PMD_DEBUG_TRACE("All port owners owned by %05d
> identifier"
> > +				    " have removed.\n", owner_id);
> > +	}
> > +
> > +	rte_spinlock_unlock(rte_eth_dev_ownership_lock);
> > +}
> > +
> > +int
> > +rte_eth_dev_owner_get(const uint16_t port_id, struct
> > +rte_eth_dev_owner *owner) {
> > +	int ret = 0;
> > +
> > +	rte_spinlock_lock(rte_eth_dev_ownership_lock);
> > +
> > +	if (!rte_eth_dev_is_valid_port(port_id)) {
> > +		RTE_PMD_DEBUG_TRACE("Invalid port_id=%d\n", port_id);
> > +		ret = -ENODEV;
> > +	} else {
> > +		rte_memcpy(owner, &rte_eth_devices[port_id].data-
> >owner,
> > +			   sizeof(*owner));
> > +	}
> > +
> > +	rte_spinlock_unlock(rte_eth_dev_ownership_lock);
> > +	return ret;
> > +}
> > +
> >  int
> >  rte_eth_dev_socket_id(uint16_t port_id)  { diff --git
> > a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h index
> > 57b61ed..88ad765 100644
> > --- a/lib/librte_ether/rte_ethdev.h
> > +++ b/lib/librte_ether/rte_ethdev.h
> > @@ -1760,6 +1760,15 @@ struct rte_eth_dev_sriov {
> >
> >  #define RTE_ETH_NAME_MAX_LEN RTE_DEV_NAME_MAX_LEN
> >
> > +#define RTE_ETH_DEV_NO_OWNER 0
> > +
> > +#define RTE_ETH_MAX_OWNER_NAME_LEN 64
> > +
> > +struct rte_eth_dev_owner {
> > +	uint16_t id; /**< The owner unique identifier. */
> 
> Why limit yourself to 16bit here?
> Why not uint32_t/uint64_t - or even uuid_t and make system library to
> generate it for you?
> Wouldn't need to worry about overflows then.
> 

Interesting.
Will change it and will remove the overflow code from next_id!
(just didn't think about realistic usage of a lot of owners and take same type as port ID).

> > +	char name[RTE_ETH_MAX_OWNER_NAME_LEN]; /**< The owner
> name. */ };
> > +
> >  /**
> >   * @internal
> >   * The data part, with no function pointers, associated with each ethernet
> device.
> > @@ -1810,6 +1819,7 @@ struct rte_eth_dev_data {
> >  	int numa_node;  /**< NUMA node connection */
> >  	struct rte_vlan_filter_conf vlan_filter_conf;
> >  	/**< VLAN filter configuration. */
> > +	struct rte_eth_dev_owner owner; /**< The port owner. */
> >  };
> >
> >  /** Device supports link state interrupt */ @@ -1846,6 +1856,85 @@
> > struct rte_eth_dev_data {
> >
> >
> >  /**
> > + * Iterates over valid ethdev ports owned by a specific owner.
> > + *
> > + * @param port_id
> > + *   The id of the next possible valid owned port.
> > + * @param	owner_id
> > + *  The owner identifier.
> > + *  RTE_ETH_DEV_NO_OWNER means iterate over all valid ownerless
> ports.
> > + * @return
> > + *   Next valid port id owned by owner_id, RTE_MAX_ETHPORTS if there is
> none.
> > + */
> > +uint16_t rte_eth_find_next_owned_by(uint16_t port_id, const uint16_t
> > +owner_id);
> > +
> > +/**
> > + * Macro to iterate over all enabled ethdev ports owned by a specific
> owner.
> > + */
> > +#define RTE_ETH_FOREACH_DEV_OWNED_BY(p, o) \
> > +	for (p = rte_eth_find_next_owned_by(0, o); \
> > +	     (unsigned int)p < (unsigned int)RTE_MAX_ETHPORTS; \
> > +	     p = rte_eth_find_next_owned_by(p + 1, o))
> > +
> > +/**
> > + * Get a new unique owner identifier.
> > + * An owner identifier is used to owns Ethernet devices by only one
> > +DPDK entity
> > + * to avoid multiple management of device by different entities.
> > + *
> > + * @param	owner_id
> > + *   Owner identifier pointer.
> > + * @return
> > + *   Negative errno value on error, 0 on success.
> > + */
> > +int rte_eth_dev_owner_new(uint16_t *owner_id);
> > +
> > +/**
> > + * Set an Ethernet device owner.
> > + *
> > + * @param	port_id
> > + *  The identifier of the port to own.
> > + * @param	owner
> > + *  The owner pointer.
> > + * @return
> > + *  Negative errno value on error, 0 on success.
> > + */
> > +int rte_eth_dev_owner_set(const uint16_t port_id,
> > +			  const struct rte_eth_dev_owner *owner);
> > +
> > +/**
> > + * Unset Ethernet device owner to make the device ownerless.
> > + *
> > + * @param	port_id
> > + *  The identifier of port to make ownerless.
> > + * @param	owner
> > + *  The owner identifier.
> > + * @return
> > + *  0 on success, negative errno value on error.
> > + */
> > +int rte_eth_dev_owner_unset(const uint16_t port_id, const uint16_t
> > +owner_id);
> > +
> > +/**
> > + * Remove owner from all Ethernet devices owned by a specific owner.
> > + *
> > + * @param	owner
> > + *  The owner identifier.
> > + */
> > +void rte_eth_dev_owner_delete(const uint16_t owner_id);
> > +
> > +/**
> > + * Get the owner of an Ethernet device.
> > + *
> > + * @param	port_id
> > + *  The port identifier.
> > + * @param	owner
> > + *  The owner structure pointer to fill.
> > + * @return
> > + *  0 on success, negative errno value on error..
> > + */
> > +int rte_eth_dev_owner_get(const uint16_t port_id,
> > +			  struct rte_eth_dev_owner *owner);
> > +
> > +/**
> >   * Get the total number of Ethernet devices that have been successfully
> >   * initialized by the matching Ethernet driver during the PCI probing phase
> >   * and that are available for applications to use. These devices must
> > be diff --git a/lib/librte_ether/rte_ethdev_version.map
> > b/lib/librte_ether/rte_ethdev_version.map
> > index e9681ac..5d20b5f 100644
> > --- a/lib/librte_ether/rte_ethdev_version.map
> > +++ b/lib/librte_ether/rte_ethdev_version.map
> > @@ -198,6 +198,18 @@ DPDK_17.11 {
> >
> >  } DPDK_17.08;
> >
> > +DPDK_18.02 {
> > +	global:
> > +
> > +	rte_eth_dev_owner_delete;
> > +	rte_eth_dev_owner_get;
> > +	rte_eth_dev_owner_new;
> > +	rte_eth_dev_owner_set;
> > +	rte_eth_dev_owner_unset;
> > +	rte_eth_find_next_owned_by;
> > +
> > +} DPDK_17.11;
> > +
> >  EXPERIMENTAL {
> >  	global:
> >
> > --
> > 1.8.3.1

Ananyev, Konstantin Jan. 11, 2018, 12:40 p.m. UTC | #3

Hi Matan,

> 
> Hi Konstantin
> 
> From: Ananyev, Konstantin, Wednesday, January 10, 2018 3:36 PM
> > Hi Matan,
> >
> > Few comments from me below.
> > BTW, do you plan to add ownership mandatory check in control path
> > functions that change port configuration?
> 
> No.

So it still totally voluntary usage and application nneds to be changed
to exploit it?
Apart from RTE_FOR_EACH_DEV() change proposed by Gaetan?

> 
> 
> > Konstantin
> >
> > > -----Original Message-----
> > > From: Matan Azrad [mailto:matan@mellanox.com]
> > > Sent: Sunday, January 7, 2018 9:46 AM
> > > To: Thomas Monjalon <thomas@monjalon.net>; Gaetan Rivet
> > > <gaetan.rivet@6wind.com>; Wu, Jingjing <jingjing.wu@intel.com>
> > > Cc: dev@dpdk.org; Neil Horman <nhorman@tuxdriver.com>; Richardson,
> > > Bruce <bruce.richardson@intel.com>; Ananyev, Konstantin
> > > <konstantin.ananyev@intel.com>
> > > Subject: [PATCH v2 2/6] ethdev: add port ownership
> > >
> > > The ownership of a port is implicit in DPDK.
> > > Making it explicit is better from the next reasons:
> > > 1. It will define well who is in charge of the port usage synchronization.
> > > 2. A library could work on top of a port.
> > > 3. A port can work on top of another port.
> > >
> > > Also in the fail-safe case, an issue has been met in testpmd.
> > > We need to check that the application is not trying to use a port
> > > which is already managed by fail-safe.
> > >
> > > A port owner is built from owner id(number) and owner name(string)
> > > while the owner id must be unique to distinguish between two identical
> > > entity instances and the owner name can be any name.
> > > The name helps to logically recognize the owner by different DPDK
> > > entities and allows easy debug.
> > > Each DPDK entity can allocate an owner unique identifier and can use
> > > it and its preferred name to owns valid ethdev ports.
> > > Each DPDK entity can get any port owner status to decide if it can
> > > manage the port or not.
> > >
> > > The mechanism is synchronized for both the primary process threads and
> > > the secondary processes threads to allow secondary process entity to
> > > be a port owner.
> > >
> > > Add a sinchronized ownership mechanism to DPDK Ethernet devices to
> > > avoid multiple management of a device by different DPDK entities.
> > >
> > > The current ethdev internal port management is not affected by this
> > > feature.
> > >
> > > Signed-off-by: Matan Azrad <matan@mellanox.com>
> > > ---
> > >  doc/guides/prog_guide/poll_mode_drv.rst |  14 ++-
> > >  lib/librte_ether/rte_ethdev.c           | 206
> > ++++++++++++++++++++++++++++++--
> > >  lib/librte_ether/rte_ethdev.h           |  89 ++++++++++++++
> > >  lib/librte_ether/rte_ethdev_version.map |  12 ++
> > >  4 files changed, 311 insertions(+), 10 deletions(-)
> >
> >
> > >
> > >
> > > diff --git a/lib/librte_ether/rte_ethdev.c
> > > b/lib/librte_ether/rte_ethdev.c index 684e3e8..0e12452 100644
> > > --- a/lib/librte_ether/rte_ethdev.c
> > > +++ b/lib/librte_ether/rte_ethdev.c
> > > @@ -70,7 +70,10 @@
> > >
> > >  static const char *MZ_RTE_ETH_DEV_DATA = "rte_eth_dev_data";  struct
> > > rte_eth_dev rte_eth_devices[RTE_MAX_ETHPORTS];
> > > +/* ports data array stored in shared memory */
> > >  static struct rte_eth_dev_data *rte_eth_dev_data;
> > > +/* next owner identifier stored in shared memory */ static uint16_t
> > > +*rte_eth_next_owner_id;
> > >  static uint8_t eth_dev_last_created_port;
> > >
> > >  /* spinlock for eth device callbacks */ @@ -82,6 +85,9 @@
> > >  /* spinlock for add/remove tx callbacks */  static rte_spinlock_t
> > > rte_eth_tx_cb_lock = RTE_SPINLOCK_INITIALIZER;
> > >
> > > +/* spinlock for eth device ownership management stored in shared
> > > +memory */ static rte_spinlock_t *rte_eth_dev_ownership_lock;
> > > +
> > >  /* store statistics names and its offset in stats structure  */
> > > struct rte_eth_xstats_name_off {
> > >  	char name[RTE_ETH_XSTATS_NAME_SIZE]; @@ -153,14 +159,18 @@
> > enum {  }
> > >
> > >  static void
> > > -rte_eth_dev_data_alloc(void)
> > > +rte_eth_dev_share_data_alloc(void)
> > >  {
> > >  	const unsigned flags = 0;
> > >  	const struct rte_memzone *mz;
> > > +	const unsigned int data_size = RTE_MAX_ETHPORTS *
> > > +						sizeof(*rte_eth_dev_data);
> > >
> > >  	if (rte_eal_process_type() == RTE_PROC_PRIMARY) {
> > > +		/* Allocate shared memory for port data and ownership */
> > >  		mz = rte_memzone_reserve(MZ_RTE_ETH_DEV_DATA,
> > > -				RTE_MAX_ETHPORTS *
> > sizeof(*rte_eth_dev_data),
> > > +				data_size + sizeof(*rte_eth_next_owner_id)
> > +
> > > +				sizeof(*rte_eth_dev_ownership_lock),
> > >  				rte_socket_id(), flags);
> > >  	} else
> > >  		mz = rte_memzone_lookup(MZ_RTE_ETH_DEV_DATA);
> > > @@ -168,9 +178,17 @@ enum {
> > >  		rte_panic("Cannot allocate memzone for ethernet port
> > data\n");
> > >
> > >  	rte_eth_dev_data = mz->addr;
> > > -	if (rte_eal_process_type() == RTE_PROC_PRIMARY)
> > > -		memset(rte_eth_dev_data, 0,
> > > -				RTE_MAX_ETHPORTS *
> > sizeof(*rte_eth_dev_data));
> > > +	rte_eth_next_owner_id = (uint16_t *)((uintptr_t)mz->addr +
> > > +					     data_size);
> > > +	rte_eth_dev_ownership_lock = (rte_spinlock_t *)
> > > +		((uintptr_t)rte_eth_next_owner_id +
> > > +		 sizeof(*rte_eth_next_owner_id));
> >
> >
> > I think that might make  rte_eth_dev_ownership_lock location not 4B
> > aligned...
> 
> Where can I find the documentation about it?

That's in your code above - data_size and mz_->addr are both at least 4B aligned -
rte_eth_dev_ownership_lock = mz->addr + data_size + 2;
You can align it manually, but as discussed below it is probably easier to group related
fields into the same struct. 

> 
> > Why just not to put all data that you are trying to allocate as one chunck into
> > the same struct:
> > static struct {
> >         uint16_t next_owner_id;
> >         /* spinlock for eth device ownership management stored in shared
> > memory */
> >         rte_spinlock_t dev_ownership_lock;
> >         rte_eth_dev_data *data;
> > } rte_eth_dev_data;
> > and allocate/use it everywhere?
> > That would simplify allocation/management stuff.
> >
> I don't understand what exactly do you mean. ?
> If you mean to group all in one struct like:
> 
> static struct {
>         uint16_t next_owner_id;
>         rte_spinlock_t dev_ownership_lock;
>         rte_eth_dev_data  data[];
> } rte_eth_dev_share_data;
> 
> Just to simplify the addresses calculation above,

Yep, that's exactly what I meant.
As you said it would help with bulk allocation/alignment stuff, plus
IMO it is better and easier to group several related global together -
Improve code quality, will make it easier to read & maintain in future. 

> It will change more code in ethdev relative to the old rte_eth_dev_data global array and will be more intrusive.
> Stay it as is, focuses the change only here.

Yes it would require few more changes, though I think it worth it.

> 
> I can just move the spinlock memory allocation to be at the beginning of the memzone(to be sure about the alignment).
> 
> > It is good to see that now scanning/updating rte_eth_dev_data[] is lock
> > protected, but it might be not very plausible to protect both data[] and
> > next_owner_id using the same lock.
> 
> I guess you mean to the owner structure in rte_eth_dev_data[port_id].
> The next_owner_id is read by ownership APIs(for owner validation), so it makes sense to use the same lock.
> Actually, why not?

Well to me next_owner_id and rte_eth_dev_data[] are not directly related.
You may create new owner_id but it doesn't mean you would update rte_eth_dev_data[] immediately.
And visa-versa - you might just want to update rte_eth_dev_data[].name or .owner_id.
It is not very good coding practice to use same lock for non-related data structures.

> 
> > In fact, for next_owner_id, you don't need a lock - just rte_atomic_t should
> > be enough.
> 
> I don't think so, it is problematic in next_owner_id wraparound and may complicate the code in other places which read it.

IMO it is not that complicated, something like that should work I think.

/* init to 0 at startup*/
rte_atomic32_t *owner_id;

int new_owner_id(void)
{
    int32_t x;
    x = rte_atomic32_add_return(&owner_id, 1);
    if (x > UINT16_MAX) {
       rte_atomic32_dec(&owner_id);
       return -EOVERWLOW;
    } else 
        return x;    
} 


> Why not just to keep it simple and using the same lock?

Lock is also fine, I just think it better be a separate one - that would protext just next_owner_id.
Though if you are going to use uuid here - all that probably not relevant any more.

> 
> > Another alternative would be to use 2 locks - one for next_owner_id second
> > for actual data[] protection.
> >
> > Another thing - you'll probably need to grab/release a lock inside
> > rte_eth_dev_allocated() too.
> > It is a public function used by drivers, so need to be protected too.
> >
> 
> Yes, I thought about it, but decided not to use lock in next:
> rte_eth_dev_allocated
> rte_eth_dev_count
> rte_eth_dev_get_name_by_port
> rte_eth_dev_get_port_by_name
> maybe more...

As I can see in patch #3 you protect by lock access to  rte_eth_dev_data[].name
(which seems like a good  thing).
So I think any other public function that access rte_eth_dev_data[].name should be
protected by the same lock.

> 
> Don't you think it is just timing depended?(ask in the next moment and you may get another answer) I don't see optional crash.
> 
> > > +
> > > +	if (rte_eal_process_type() == RTE_PROC_PRIMARY) {
> > > +		memset(rte_eth_dev_data, 0, data_size);
> > > +		*rte_eth_next_owner_id = RTE_ETH_DEV_NO_OWNER + 1;
> > > +		rte_spinlock_init(rte_eth_dev_ownership_lock);
> > > +	}
> > >  }
> > >
> > >  struct rte_eth_dev *
> > > @@ -225,7 +243,7 @@ struct rte_eth_dev *
> > >  	}
> > >
> > >  	if (rte_eth_dev_data == NULL)
> > > -		rte_eth_dev_data_alloc();
> > > +		rte_eth_dev_share_data_alloc();
> > >
> > >  	if (rte_eth_dev_allocated(name) != NULL) {
> > >  		RTE_PMD_DEBUG_TRACE("Ethernet Device with name %s
> > already
> > > allocated!\n", @@ -253,7 +271,7 @@ struct rte_eth_dev *
> > >  	struct rte_eth_dev *eth_dev;
> > >
> > >  	if (rte_eth_dev_data == NULL)
> > > -		rte_eth_dev_data_alloc();
> > > +		rte_eth_dev_share_data_alloc();
> > >
> > >  	for (i = 0; i < RTE_MAX_ETHPORTS; i++) {
> > >  		if (strcmp(rte_eth_dev_data[i].name, name) == 0) @@ -
> > 278,8 +296,12
> > > @@ struct rte_eth_dev *
> > >  	if (eth_dev == NULL)
> > >  		return -EINVAL;
> > >
> > > -	memset(eth_dev->data, 0, sizeof(struct rte_eth_dev_data));
> > > +	rte_spinlock_lock(rte_eth_dev_ownership_lock);
> > > +
> > >  	eth_dev->state = RTE_ETH_DEV_UNUSED;
> > > +	memset(eth_dev->data, 0, sizeof(struct rte_eth_dev_data));
> > > +
> > > +	rte_spinlock_unlock(rte_eth_dev_ownership_lock);
> > >  	return 0;
> > >  }
> > >
> > > @@ -294,6 +316,174 @@ struct rte_eth_dev *
> > >  		return 1;
> > >  }
> > >
> > > +static int
> > > +rte_eth_is_valid_owner_id(uint16_t owner_id) {
> > > +	if (owner_id == RTE_ETH_DEV_NO_OWNER ||
> > > +	    (*rte_eth_next_owner_id > RTE_ETH_DEV_NO_OWNER &&
> > > +	     *rte_eth_next_owner_id <= owner_id)) {
> > > +		RTE_LOG(ERR, EAL, "Invalid owner_id=%d.\n", owner_id);
> > > +		return 0;
> > > +	}
> > > +	return 1;
> > > +}
> > > +
> > > +uint16_t
> > > +rte_eth_find_next_owned_by(uint16_t port_id, const uint16_t
> > owner_id)
> > > +{
> > > +	while (port_id < RTE_MAX_ETHPORTS &&
> > > +	       (rte_eth_devices[port_id].state != RTE_ETH_DEV_ATTACHED ||
> > > +	       rte_eth_devices[port_id].data->owner.id != owner_id))
> > > +		port_id++;
> > > +
> > > +	if (port_id >= RTE_MAX_ETHPORTS)
> > > +		return RTE_MAX_ETHPORTS;
> > > +
> > > +	return port_id;
> > > +}
> > > +
> > > +int
> > > +rte_eth_dev_owner_new(uint16_t *owner_id) {
> > > +	int ret = 0;
> > > +
> > > +	rte_spinlock_lock(rte_eth_dev_ownership_lock);
> > > +
> > > +	if (*rte_eth_next_owner_id == RTE_ETH_DEV_NO_OWNER) {
> > > +		/* Counter wrap around. */
> > > +		RTE_PMD_DEBUG_TRACE("Reached maximum number of
> > Ethernet port owners.\n");
> > > +		ret = -EUSERS;
> > > +	} else {
> > > +		*owner_id = (*rte_eth_next_owner_id)++;
> > > +	}
> > > +
> > > +	rte_spinlock_unlock(rte_eth_dev_ownership_lock);
> > > +	return ret;
> > > +}
> > > +
> > > +int
> > > +rte_eth_dev_owner_set(const uint16_t port_id,
> > > +		      const struct rte_eth_dev_owner *owner)
> >
> > As a nit - if you'll have rte_eth_dev_owner_set(port_id, old_owner,
> > new_owner)
> > - that might be more plausible for user, and would greatly simplify unset()
> > part:
> > just set(port_id, cur_owner, zero_owner);
> >
> 
> How the user should know the old owner?

By dev_owner_get() or it might have it stored somewhere already
(or constructed on the fly in case of NO_OWNER).

> 
> > > +{
> > > +	struct rte_eth_dev_owner *port_owner;
> > > +	int ret = 0;
> > > +	int sret;
> > > +
> > > +	rte_spinlock_lock(rte_eth_dev_ownership_lock);
> > > +
> > > +	if (!rte_eth_dev_is_valid_port(port_id)) {
> > > +		RTE_PMD_DEBUG_TRACE("Invalid port_id=%d\n", port_id);
> > > +		ret = -ENODEV;
> > > +		goto unlock;
> > > +	}
> > > +
> > > +	if (!rte_eth_is_valid_owner_id(owner->id)) {
> > > +		ret = -EINVAL;
> > > +		goto unlock;
> > > +	}
> > > +
> > > +	port_owner = &rte_eth_devices[port_id].data->owner;
> > > +	if (port_owner->id != RTE_ETH_DEV_NO_OWNER &&
> > > +	    port_owner->id != owner->id) {
> > > +		RTE_LOG(ERR, EAL,
> > > +			"Cannot set owner to port %d already owned by
> > %s_%05d.\n",
> > > +			port_id, port_owner->name, port_owner->id);
> > > +		ret = -EPERM;
> > > +		goto unlock;
> > > +	}
> > > +
> > > +	sret = snprintf(port_owner->name,
> > RTE_ETH_MAX_OWNER_NAME_LEN, "%s",
> > > +			owner->name);
> > > +	if (sret < 0 || sret >= RTE_ETH_MAX_OWNER_NAME_LEN) {
> >
> > Personally, I don't see any reason to fail if description was truncated...
> > Another alternative - just use rte_malloc() here to allocate big enough buffer
> > to hold the description.
> >
> 
> But it is static allocation like in the device name, why to allocate it differently?

Static allocation is fine by me - I just said there is probably no need to fail
if description provide by use will be truncated in that case.
Though if used description is *that* important - rte_malloc() can help here. 

> 
> > > +		memset(port_owner->name, 0,
> > RTE_ETH_MAX_OWNER_NAME_LEN);
> > > +		RTE_LOG(ERR, EAL, "Invalid owner name.\n");
> > > +		ret = -EINVAL;
> > > +		goto unlock;
> > > +	}
> > > +
> > > +	port_owner->id = owner->id;
> > > +	RTE_PMD_DEBUG_TRACE("Port %d owner is %s_%05d.\n", port_id,
> > > +			    owner->name, owner->id);
> > > +
> >
> > As another nit - you can avoid all these gotos by restructuring code a bit:
> >
> > rte_eth_dev_owner_set(const uint16_t port_id, const struct
> > rte_eth_dev_owner *owner) {
> >     rte_spinlock_lock(...);
> >     ret = _eth_dev_owner_set_unlocked(port_id, owner);
> >     rte_spinlock_unlock(...);
> >     return ret;
> > }
> >
> Don't you like gotos? :)

Not really :)

> I personally use it only in error\performance scenarios.

Same here - prefer to avoid them if possible.

> Do you think it worth the effort?

IMO - yes, well structured code is much easier to understand and maintain.
Konstantin

Matan Azrad Jan. 11, 2018, 2:51 p.m. UTC | #4

Hi Konstantin

From: Ananyev, Konstantin, Thursday, January 11, 2018 2:40 PM
> Hi Matan,
> 
> >
> > Hi Konstantin
> >
> > From: Ananyev, Konstantin, Wednesday, January 10, 2018 3:36 PM
> > > Hi Matan,
> > >
> > > Few comments from me below.
> > > BTW, do you plan to add ownership mandatory check in control path
> > > functions that change port configuration?
> >
> > No.
> 
> So it still totally voluntary usage and application nneds to be changed to
> exploit it?
> Apart from RTE_FOR_EACH_DEV() change proposed by Gaetan?
> 

Also RTE_FOR_EACH_DEV() change proposed by Gaetan is not protected because 2 DPDK entities can get the same port while using it.
As I wrote in the log\docs and as discussed a lot in the first version:
The new synchronization rules are:
1. The port allocation and port release synchronization will be
   managed by ethdev.
2. The port usage synchronization will be managed by the port owner.
3. The port ownership API synchronization(also with port creation) will be managed by ethdev.
5. DPDK entity which want to use a port must take ownership before.

Ethdev should not protect 2 and 4 according these rules.

> > > Konstantin
> > >
> > > > -----Original Message-----
> > > > From: Matan Azrad [mailto:matan@mellanox.com]
> > > > Sent: Sunday, January 7, 2018 9:46 AM
> > > > To: Thomas Monjalon <thomas@monjalon.net>; Gaetan Rivet
> > > > <gaetan.rivet@6wind.com>; Wu, Jingjing <jingjing.wu@intel.com>
> > > > Cc: dev@dpdk.org; Neil Horman <nhorman@tuxdriver.com>;
> Richardson,
> > > > Bruce <bruce.richardson@intel.com>; Ananyev, Konstantin
> > > > <konstantin.ananyev@intel.com>
> > > > Subject: [PATCH v2 2/6] ethdev: add port ownership
> > > >
> > > > The ownership of a port is implicit in DPDK.
> > > > Making it explicit is better from the next reasons:
> > > > 1. It will define well who is in charge of the port usage synchronization.
> > > > 2. A library could work on top of a port.
> > > > 3. A port can work on top of another port.
> > > >
> > > > Also in the fail-safe case, an issue has been met in testpmd.
> > > > We need to check that the application is not trying to use a port
> > > > which is already managed by fail-safe.
> > > >
> > > > A port owner is built from owner id(number) and owner name(string)
> > > > while the owner id must be unique to distinguish between two
> > > > identical entity instances and the owner name can be any name.
> > > > The name helps to logically recognize the owner by different DPDK
> > > > entities and allows easy debug.
> > > > Each DPDK entity can allocate an owner unique identifier and can
> > > > use it and its preferred name to owns valid ethdev ports.
> > > > Each DPDK entity can get any port owner status to decide if it can
> > > > manage the port or not.
> > > >
> > > > The mechanism is synchronized for both the primary process threads
> > > > and the secondary processes threads to allow secondary process
> > > > entity to be a port owner.
> > > >
> > > > Add a sinchronized ownership mechanism to DPDK Ethernet devices to
> > > > avoid multiple management of a device by different DPDK entities.
> > > >
> > > > The current ethdev internal port management is not affected by
> > > > this feature.
> > > >
> > > > Signed-off-by: Matan Azrad <matan@mellanox.com>
> > > > ---
> > > >  doc/guides/prog_guide/poll_mode_drv.rst |  14 ++-
> > > >  lib/librte_ether/rte_ethdev.c           | 206
> > > ++++++++++++++++++++++++++++++--
> > > >  lib/librte_ether/rte_ethdev.h           |  89 ++++++++++++++
> > > >  lib/librte_ether/rte_ethdev_version.map |  12 ++
> > > >  4 files changed, 311 insertions(+), 10 deletions(-)
> > >
> > >
> > > >
> > > >
> > > > diff --git a/lib/librte_ether/rte_ethdev.c
> > > > b/lib/librte_ether/rte_ethdev.c index 684e3e8..0e12452 100644
> > > > --- a/lib/librte_ether/rte_ethdev.c
> > > > +++ b/lib/librte_ether/rte_ethdev.c
> > > > @@ -70,7 +70,10 @@
> > > >
> > > >  static const char *MZ_RTE_ETH_DEV_DATA = "rte_eth_dev_data";
> > > > struct rte_eth_dev rte_eth_devices[RTE_MAX_ETHPORTS];
> > > > +/* ports data array stored in shared memory */
> > > >  static struct rte_eth_dev_data *rte_eth_dev_data;
> > > > +/* next owner identifier stored in shared memory */ static
> > > > +uint16_t *rte_eth_next_owner_id;
> > > >  static uint8_t eth_dev_last_created_port;
> > > >
> > > >  /* spinlock for eth device callbacks */ @@ -82,6 +85,9 @@
> > > >  /* spinlock for add/remove tx callbacks */  static rte_spinlock_t
> > > > rte_eth_tx_cb_lock = RTE_SPINLOCK_INITIALIZER;
> > > >
> > > > +/* spinlock for eth device ownership management stored in shared
> > > > +memory */ static rte_spinlock_t *rte_eth_dev_ownership_lock;
> > > > +
> > > >  /* store statistics names and its offset in stats structure  */
> > > > struct rte_eth_xstats_name_off {
> > > >  	char name[RTE_ETH_XSTATS_NAME_SIZE]; @@ -153,14 +159,18 @@
> > > enum {  }
> > > >
> > > >  static void
> > > > -rte_eth_dev_data_alloc(void)
> > > > +rte_eth_dev_share_data_alloc(void)
> > > >  {
> > > >  	const unsigned flags = 0;
> > > >  	const struct rte_memzone *mz;
> > > > +	const unsigned int data_size = RTE_MAX_ETHPORTS *
> > > > +						sizeof(*rte_eth_dev_data);
> > > >
> > > >  	if (rte_eal_process_type() == RTE_PROC_PRIMARY) {
> > > > +		/* Allocate shared memory for port data and ownership */
> > > >  		mz = rte_memzone_reserve(MZ_RTE_ETH_DEV_DATA,
> > > > -				RTE_MAX_ETHPORTS *
> > > sizeof(*rte_eth_dev_data),
> > > > +				data_size + sizeof(*rte_eth_next_owner_id)
> > > +
> > > > +				sizeof(*rte_eth_dev_ownership_lock),
> > > >  				rte_socket_id(), flags);
> > > >  	} else
> > > >  		mz = rte_memzone_lookup(MZ_RTE_ETH_DEV_DATA);
> > > > @@ -168,9 +178,17 @@ enum {
> > > >  		rte_panic("Cannot allocate memzone for ethernet port
> > > data\n");
> > > >
> > > >  	rte_eth_dev_data = mz->addr;
> > > > -	if (rte_eal_process_type() == RTE_PROC_PRIMARY)
> > > > -		memset(rte_eth_dev_data, 0,
> > > > -				RTE_MAX_ETHPORTS *
> > > sizeof(*rte_eth_dev_data));
> > > > +	rte_eth_next_owner_id = (uint16_t *)((uintptr_t)mz->addr +
> > > > +					     data_size);
> > > > +	rte_eth_dev_ownership_lock = (rte_spinlock_t *)
> > > > +		((uintptr_t)rte_eth_next_owner_id +
> > > > +		 sizeof(*rte_eth_next_owner_id));
> > >
> > >
> > > I think that might make  rte_eth_dev_ownership_lock location not 4B
> > > aligned...
> >
> > Where can I find the documentation about it?
> 
> That's in your code above - data_size and mz_->addr are both at least 4B
> aligned - rte_eth_dev_ownership_lock = mz->addr + data_size + 2; You can
> align it manually, but as discussed below it is probably easier to group related
> fields into the same struct.
> 
I mean the documentation about the needed alignment for spinlock. Where is it?

> >
> > > Why just not to put all data that you are trying to allocate as one
> > > chunck into the same struct:
> > > static struct {
> > >         uint16_t next_owner_id;
> > >         /* spinlock for eth device ownership management stored in
> > > shared memory */
> > >         rte_spinlock_t dev_ownership_lock;
> > >         rte_eth_dev_data *data;
> > > } rte_eth_dev_data;
> > > and allocate/use it everywhere?
> > > That would simplify allocation/management stuff.
> > >
> > I don't understand what exactly do you mean. ?
> > If you mean to group all in one struct like:
> >
> > static struct {
> >         uint16_t next_owner_id;
> >         rte_spinlock_t dev_ownership_lock;
> >         rte_eth_dev_data  data[];
> > } rte_eth_dev_share_data;
> >
> > Just to simplify the addresses calculation above,
> 
> Yep, that's exactly what I meant.
> As you said it would help with bulk allocation/alignment stuff, plus IMO it is
> better and easier to group several related global together - Improve code
> quality, will make it easier to read & maintain in future.
> 
> > It will change more code in ethdev relative to the old rte_eth_dev_data
> global array and will be more intrusive.
> > Stay it as is, focuses the change only here.
> 
> Yes it would require few more changes, though I think it worth it.
> 

Ok, Got you and agree.

> >
> > I can just move the spinlock memory allocation to be at the beginning of
> the memzone(to be sure about the alignment).
> >
> > > It is good to see that now scanning/updating rte_eth_dev_data[] is
> > > lock protected, but it might be not very plausible to protect both
> > > data[] and next_owner_id using the same lock.
> >
> > I guess you mean to the owner structure in rte_eth_dev_data[port_id].
> > The next_owner_id is read by ownership APIs(for owner validation), so it
> makes sense to use the same lock.
> > Actually, why not?
> 
> Well to me next_owner_id and rte_eth_dev_data[] are not directly related.
> You may create new owner_id but it doesn't mean you would update
> rte_eth_dev_data[] immediately.
> And visa-versa - you might just want to update rte_eth_dev_data[].name or
> .owner_id.
> It is not very good coding practice to use same lock for non-related data
> structures.
>
I see the relation like next:
Since the ownership mechanism synchronization is in ethdev responsibility,
we must protect against user mistakes as much as we can by using the same lock.
So, if user try to set by invalid owner (exactly the ID which currently is allocated) we can protect on it.
 
> >
> > > In fact, for next_owner_id, you don't need a lock - just
> > > rte_atomic_t should be enough.
> >
> > I don't think so, it is problematic in next_owner_id wraparound and may
> complicate the code in other places which read it.
> 
> IMO it is not that complicated, something like that should work I think.
> 
> /* init to 0 at startup*/
> rte_atomic32_t *owner_id;
> 
> int new_owner_id(void)
> {
>     int32_t x;
>     x = rte_atomic32_add_return(&owner_id, 1);
>     if (x > UINT16_MAX) {
>        rte_atomic32_dec(&owner_id);
>        return -EOVERWLOW;
>     } else
>         return x;
> }
> 
> 
> > Why not just to keep it simple and using the same lock?
> 
> Lock is also fine, I just think it better be a separate one - that would protext
> just next_owner_id.
> Though if you are going to use uuid here - all that probably not relevant any
> more.
> 

I agree about the uuid but still think the same lock should be used for both.

> >
> > > Another alternative would be to use 2 locks - one for next_owner_id
> > > second for actual data[] protection.
> > >
> > > Another thing - you'll probably need to grab/release a lock inside
> > > rte_eth_dev_allocated() too.
> > > It is a public function used by drivers, so need to be protected too.
> > >
> >
> > Yes, I thought about it, but decided not to use lock in next:
> > rte_eth_dev_allocated
> > rte_eth_dev_count
> > rte_eth_dev_get_name_by_port
> > rte_eth_dev_get_port_by_name
> > maybe more...
> 
> As I can see in patch #3 you protect by lock access to
> rte_eth_dev_data[].name (which seems like a good  thing).
> So I think any other public function that access rte_eth_dev_data[].name
> should be protected by the same lock.
> 

I don't think so, I can understand to use the ownership lock here(as in port creation) but I don't think it is necessary too.
What are we exactly protecting here?
Don't you think it is just timing?(ask in the next moment and you
 may get another answer) I don't see optional crash.
 
> > > > +
> > > > +	if (rte_eal_process_type() == RTE_PROC_PRIMARY) {
> > > > +		memset(rte_eth_dev_data, 0, data_size);
> > > > +		*rte_eth_next_owner_id = RTE_ETH_DEV_NO_OWNER + 1;
> > > > +		rte_spinlock_init(rte_eth_dev_ownership_lock);
> > > > +	}
> > > >  }
> > > >
> > > >  struct rte_eth_dev *
> > > > @@ -225,7 +243,7 @@ struct rte_eth_dev *
> > > >  	}
> > > >
> > > >  	if (rte_eth_dev_data == NULL)
> > > > -		rte_eth_dev_data_alloc();
> > > > +		rte_eth_dev_share_data_alloc();
> > > >
> > > >  	if (rte_eth_dev_allocated(name) != NULL) {
> > > >  		RTE_PMD_DEBUG_TRACE("Ethernet Device with name %s
> > > already
> > > > allocated!\n", @@ -253,7 +271,7 @@ struct rte_eth_dev *
> > > >  	struct rte_eth_dev *eth_dev;
> > > >
> > > >  	if (rte_eth_dev_data == NULL)
> > > > -		rte_eth_dev_data_alloc();
> > > > +		rte_eth_dev_share_data_alloc();
> > > >
> > > >  	for (i = 0; i < RTE_MAX_ETHPORTS; i++) {
> > > >  		if (strcmp(rte_eth_dev_data[i].name, name) == 0) @@ -
> > > 278,8 +296,12
> > > > @@ struct rte_eth_dev *
> > > >  	if (eth_dev == NULL)
> > > >  		return -EINVAL;
> > > >
> > > > -	memset(eth_dev->data, 0, sizeof(struct rte_eth_dev_data));
> > > > +	rte_spinlock_lock(rte_eth_dev_ownership_lock);
> > > > +
> > > >  	eth_dev->state = RTE_ETH_DEV_UNUSED;
> > > > +	memset(eth_dev->data, 0, sizeof(struct rte_eth_dev_data));
> > > > +
> > > > +	rte_spinlock_unlock(rte_eth_dev_ownership_lock);
> > > >  	return 0;
> > > >  }
> > > >
> > > > @@ -294,6 +316,174 @@ struct rte_eth_dev *
> > > >  		return 1;
> > > >  }
> > > >
> > > > +static int
> > > > +rte_eth_is_valid_owner_id(uint16_t owner_id) {
> > > > +	if (owner_id == RTE_ETH_DEV_NO_OWNER ||
> > > > +	    (*rte_eth_next_owner_id > RTE_ETH_DEV_NO_OWNER &&
> > > > +	     *rte_eth_next_owner_id <= owner_id)) {
> > > > +		RTE_LOG(ERR, EAL, "Invalid owner_id=%d.\n", owner_id);
> > > > +		return 0;
> > > > +	}
> > > > +	return 1;
> > > > +}
> > > > +
> > > > +uint16_t
> > > > +rte_eth_find_next_owned_by(uint16_t port_id, const uint16_t
> > > owner_id)
> > > > +{
> > > > +	while (port_id < RTE_MAX_ETHPORTS &&
> > > > +	       (rte_eth_devices[port_id].state != RTE_ETH_DEV_ATTACHED ||
> > > > +	       rte_eth_devices[port_id].data->owner.id != owner_id))
> > > > +		port_id++;
> > > > +
> > > > +	if (port_id >= RTE_MAX_ETHPORTS)
> > > > +		return RTE_MAX_ETHPORTS;
> > > > +
> > > > +	return port_id;
> > > > +}
> > > > +
> > > > +int
> > > > +rte_eth_dev_owner_new(uint16_t *owner_id) {
> > > > +	int ret = 0;
> > > > +
> > > > +	rte_spinlock_lock(rte_eth_dev_ownership_lock);
> > > > +
> > > > +	if (*rte_eth_next_owner_id == RTE_ETH_DEV_NO_OWNER) {
> > > > +		/* Counter wrap around. */
> > > > +		RTE_PMD_DEBUG_TRACE("Reached maximum number of
> > > Ethernet port owners.\n");
> > > > +		ret = -EUSERS;
> > > > +	} else {
> > > > +		*owner_id = (*rte_eth_next_owner_id)++;
> > > > +	}
> > > > +
> > > > +	rte_spinlock_unlock(rte_eth_dev_ownership_lock);
> > > > +	return ret;
> > > > +}
> > > > +
> > > > +int
> > > > +rte_eth_dev_owner_set(const uint16_t port_id,
> > > > +		      const struct rte_eth_dev_owner *owner)
> > >
> > > As a nit - if you'll have rte_eth_dev_owner_set(port_id, old_owner,
> > > new_owner)
> > > - that might be more plausible for user, and would greatly simplify
> > > unset()
> > > part:
> > > just set(port_id, cur_owner, zero_owner);
> > >
> >
> > How the user should know the old owner?
> 
> By dev_owner_get() or it might have it stored somewhere already (or
> constructed on the fly in case of NO_OWNER).
> 
It complicates the usage.
What's about creating an internal API  _rte_eth_dev_owner_set(port_id, old_owner,
new_owner) and using it by the current exposed set\unset APIs?

> >
> > > > +{
> > > > +	struct rte_eth_dev_owner *port_owner;
> > > > +	int ret = 0;
> > > > +	int sret;
> > > > +
> > > > +	rte_spinlock_lock(rte_eth_dev_ownership_lock);
> > > > +
> > > > +	if (!rte_eth_dev_is_valid_port(port_id)) {
> > > > +		RTE_PMD_DEBUG_TRACE("Invalid port_id=%d\n", port_id);
> > > > +		ret = -ENODEV;
> > > > +		goto unlock;
> > > > +	}
> > > > +
> > > > +	if (!rte_eth_is_valid_owner_id(owner->id)) {
> > > > +		ret = -EINVAL;
> > > > +		goto unlock;
> > > > +	}
> > > > +
> > > > +	port_owner = &rte_eth_devices[port_id].data->owner;
> > > > +	if (port_owner->id != RTE_ETH_DEV_NO_OWNER &&
> > > > +	    port_owner->id != owner->id) {
> > > > +		RTE_LOG(ERR, EAL,
> > > > +			"Cannot set owner to port %d already owned by
> > > %s_%05d.\n",
> > > > +			port_id, port_owner->name, port_owner->id);
> > > > +		ret = -EPERM;
> > > > +		goto unlock;
> > > > +	}
> > > > +
> > > > +	sret = snprintf(port_owner->name,
> > > RTE_ETH_MAX_OWNER_NAME_LEN, "%s",
> > > > +			owner->name);
> > > > +	if (sret < 0 || sret >= RTE_ETH_MAX_OWNER_NAME_LEN) {
> > >
> > > Personally, I don't see any reason to fail if description was truncated...
> > > Another alternative - just use rte_malloc() here to allocate big
> > > enough buffer to hold the description.
> > >
> >
> > But it is static allocation like in the device name, why to allocate it
> differently?
> 
> Static allocation is fine by me - I just said there is probably no need to fail if
> description provide by use will be truncated in that case.
> Though if used description is *that* important - rte_malloc() can help here.
> 
Again, what is the difference between port name and owner name regarding the allocations?
The advantage of static allocation:
1. Not use protected malloc\free functions in other protected code.
2.  Easier to the user.

> >
> > > > +		memset(port_owner->name, 0,
> > > RTE_ETH_MAX_OWNER_NAME_LEN);
> > > > +		RTE_LOG(ERR, EAL, "Invalid owner name.\n");
> > > > +		ret = -EINVAL;
> > > > +		goto unlock;
> > > > +	}
> > > > +
> > > > +	port_owner->id = owner->id;
> > > > +	RTE_PMD_DEBUG_TRACE("Port %d owner is %s_%05d.\n", port_id,
> > > > +			    owner->name, owner->id);
> > > > +
> > >
> > > As another nit - you can avoid all these gotos by restructuring code a bit:
> > >
> > > rte_eth_dev_owner_set(const uint16_t port_id, const struct
> > > rte_eth_dev_owner *owner) {
> > >     rte_spinlock_lock(...);
> > >     ret = _eth_dev_owner_set_unlocked(port_id, owner);
> > >     rte_spinlock_unlock(...);
> > >     return ret;
> > > }
> > >
> > Don't you like gotos? :)
> 
> Not really :)
> 
> > I personally use it only in error\performance scenarios.
> 
> Same here - prefer to avoid them if possible.
> 
> > Do you think it worth the effort?
> 
> IMO - yes, well structured code is much easier to understand and maintain.
I don't think so in error cases(and performance), It is really clear here, but if you are insisting, I will change it.
Are you?
(If the community thinks like you I think "goto" check should be added to checkpatch).

Thanks, a lot, 
Matan.

> Konstantin

Ananyev, Konstantin Jan. 12, 2018, 12:02 a.m. UTC | #5

Hi Matan,

> 
> Hi Konstantin
> 
> From: Ananyev, Konstantin, Thursday, January 11, 2018 2:40 PM
> > Hi Matan,
> >
> > >
> > > Hi Konstantin
> > >
> > > From: Ananyev, Konstantin, Wednesday, January 10, 2018 3:36 PM
> > > > Hi Matan,
> > > >
> > > > Few comments from me below.
> > > > BTW, do you plan to add ownership mandatory check in control path
> > > > functions that change port configuration?
> > >
> > > No.
> >
> > So it still totally voluntary usage and application nneds to be changed to
> > exploit it?
> > Apart from RTE_FOR_EACH_DEV() change proposed by Gaetan?
> >
> 
> Also RTE_FOR_EACH_DEV() change proposed by Gaetan is not protected because 2 DPDK entities can get the same port while using it.

I am not talking about racing condition here.
Right now even from the same thread - I can call dev_configure()
for the port which I don't own (let say it belongs to failsafe port),
and that would remain, correct?
 
> As I wrote in the log\docs and as discussed a lot in the first version:
> The new synchronization rules are:
> 1. The port allocation and port release synchronization will be
>    managed by ethdev.
> 2. The port usage synchronization will be managed by the port owner.
> 3. The port ownership API synchronization(also with port creation) will be managed by ethdev.
> 5. DPDK entity which want to use a port must take ownership before.
> 
> Ethdev should not protect 2 and 4 according these rules.
> 
> > > > Konstantin
> > > >
> > > > > -----Original Message-----
> > > > > From: Matan Azrad [mailto:matan@mellanox.com]
> > > > > Sent: Sunday, January 7, 2018 9:46 AM
> > > > > To: Thomas Monjalon <thomas@monjalon.net>; Gaetan Rivet
> > > > > <gaetan.rivet@6wind.com>; Wu, Jingjing <jingjing.wu@intel.com>
> > > > > Cc: dev@dpdk.org; Neil Horman <nhorman@tuxdriver.com>;
> > Richardson,
> > > > > Bruce <bruce.richardson@intel.com>; Ananyev, Konstantin
> > > > > <konstantin.ananyev@intel.com>
> > > > > Subject: [PATCH v2 2/6] ethdev: add port ownership
> > > > >
> > > > > The ownership of a port is implicit in DPDK.
> > > > > Making it explicit is better from the next reasons:
> > > > > 1. It will define well who is in charge of the port usage synchronization.
> > > > > 2. A library could work on top of a port.
> > > > > 3. A port can work on top of another port.
> > > > >
> > > > > Also in the fail-safe case, an issue has been met in testpmd.
> > > > > We need to check that the application is not trying to use a port
> > > > > which is already managed by fail-safe.
> > > > >
> > > > > A port owner is built from owner id(number) and owner name(string)
> > > > > while the owner id must be unique to distinguish between two
> > > > > identical entity instances and the owner name can be any name.
> > > > > The name helps to logically recognize the owner by different DPDK
> > > > > entities and allows easy debug.
> > > > > Each DPDK entity can allocate an owner unique identifier and can
> > > > > use it and its preferred name to owns valid ethdev ports.
> > > > > Each DPDK entity can get any port owner status to decide if it can
> > > > > manage the port or not.
> > > > >
> > > > > The mechanism is synchronized for both the primary process threads
> > > > > and the secondary processes threads to allow secondary process
> > > > > entity to be a port owner.
> > > > >
> > > > > Add a sinchronized ownership mechanism to DPDK Ethernet devices to
> > > > > avoid multiple management of a device by different DPDK entities.
> > > > >
> > > > > The current ethdev internal port management is not affected by
> > > > > this feature.
> > > > >
> > > > > Signed-off-by: Matan Azrad <matan@mellanox.com>
> > > > > ---
> > > > >  doc/guides/prog_guide/poll_mode_drv.rst |  14 ++-
> > > > >  lib/librte_ether/rte_ethdev.c           | 206
> > > > ++++++++++++++++++++++++++++++--
> > > > >  lib/librte_ether/rte_ethdev.h           |  89 ++++++++++++++
> > > > >  lib/librte_ether/rte_ethdev_version.map |  12 ++
> > > > >  4 files changed, 311 insertions(+), 10 deletions(-)
> > > >
> > > >
> > > > >
> > > > >
> > > > > diff --git a/lib/librte_ether/rte_ethdev.c
> > > > > b/lib/librte_ether/rte_ethdev.c index 684e3e8..0e12452 100644
> > > > > --- a/lib/librte_ether/rte_ethdev.c
> > > > > +++ b/lib/librte_ether/rte_ethdev.c
> > > > > @@ -70,7 +70,10 @@
> > > > >
> > > > >  static const char *MZ_RTE_ETH_DEV_DATA = "rte_eth_dev_data";
> > > > > struct rte_eth_dev rte_eth_devices[RTE_MAX_ETHPORTS];
> > > > > +/* ports data array stored in shared memory */
> > > > >  static struct rte_eth_dev_data *rte_eth_dev_data;
> > > > > +/* next owner identifier stored in shared memory */ static
> > > > > +uint16_t *rte_eth_next_owner_id;
> > > > >  static uint8_t eth_dev_last_created_port;
> > > > >
> > > > >  /* spinlock for eth device callbacks */ @@ -82,6 +85,9 @@
> > > > >  /* spinlock for add/remove tx callbacks */  static rte_spinlock_t
> > > > > rte_eth_tx_cb_lock = RTE_SPINLOCK_INITIALIZER;
> > > > >
> > > > > +/* spinlock for eth device ownership management stored in shared
> > > > > +memory */ static rte_spinlock_t *rte_eth_dev_ownership_lock;
> > > > > +
> > > > >  /* store statistics names and its offset in stats structure  */
> > > > > struct rte_eth_xstats_name_off {
> > > > >  	char name[RTE_ETH_XSTATS_NAME_SIZE]; @@ -153,14 +159,18 @@
> > > > enum {  }
> > > > >
> > > > >  static void
> > > > > -rte_eth_dev_data_alloc(void)
> > > > > +rte_eth_dev_share_data_alloc(void)
> > > > >  {
> > > > >  	const unsigned flags = 0;
> > > > >  	const struct rte_memzone *mz;
> > > > > +	const unsigned int data_size = RTE_MAX_ETHPORTS *
> > > > > +						sizeof(*rte_eth_dev_data);
> > > > >
> > > > >  	if (rte_eal_process_type() == RTE_PROC_PRIMARY) {
> > > > > +		/* Allocate shared memory for port data and ownership */
> > > > >  		mz = rte_memzone_reserve(MZ_RTE_ETH_DEV_DATA,
> > > > > -				RTE_MAX_ETHPORTS *
> > > > sizeof(*rte_eth_dev_data),
> > > > > +				data_size + sizeof(*rte_eth_next_owner_id)
> > > > +
> > > > > +				sizeof(*rte_eth_dev_ownership_lock),
> > > > >  				rte_socket_id(), flags);
> > > > >  	} else
> > > > >  		mz = rte_memzone_lookup(MZ_RTE_ETH_DEV_DATA);
> > > > > @@ -168,9 +178,17 @@ enum {
> > > > >  		rte_panic("Cannot allocate memzone for ethernet port
> > > > data\n");
> > > > >
> > > > >  	rte_eth_dev_data = mz->addr;
> > > > > -	if (rte_eal_process_type() == RTE_PROC_PRIMARY)
> > > > > -		memset(rte_eth_dev_data, 0,
> > > > > -				RTE_MAX_ETHPORTS *
> > > > sizeof(*rte_eth_dev_data));
> > > > > +	rte_eth_next_owner_id = (uint16_t *)((uintptr_t)mz->addr +
> > > > > +					     data_size);
> > > > > +	rte_eth_dev_ownership_lock = (rte_spinlock_t *)
> > > > > +		((uintptr_t)rte_eth_next_owner_id +
> > > > > +		 sizeof(*rte_eth_next_owner_id));
> > > >
> > > >
> > > > I think that might make  rte_eth_dev_ownership_lock location not 4B
> > > > aligned...
> > >
> > > Where can I find the documentation about it?
> >
> > That's in your code above - data_size and mz_->addr are both at least 4B
> > aligned - rte_eth_dev_ownership_lock = mz->addr + data_size + 2; You can
> > align it manually, but as discussed below it is probably easier to group related
> > fields into the same struct.
> >
> I mean the documentation about the needed alignment for spinlock. Where is it?

http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.faqs/ka15414.html
http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dht0008a/CJAGCFAF.html

Might be ARM and PPC guys can provide you some more complete/recent docs. 


> 
> > >
> > > > Why just not to put all data that you are trying to allocate as one
> > > > chunck into the same struct:
> > > > static struct {
> > > >         uint16_t next_owner_id;
> > > >         /* spinlock for eth device ownership management stored in
> > > > shared memory */
> > > >         rte_spinlock_t dev_ownership_lock;
> > > >         rte_eth_dev_data *data;
> > > > } rte_eth_dev_data;
> > > > and allocate/use it everywhere?
> > > > That would simplify allocation/management stuff.
> > > >
> > > I don't understand what exactly do you mean. ?
> > > If you mean to group all in one struct like:
> > >
> > > static struct {
> > >         uint16_t next_owner_id;
> > >         rte_spinlock_t dev_ownership_lock;
> > >         rte_eth_dev_data  data[];
> > > } rte_eth_dev_share_data;
> > >
> > > Just to simplify the addresses calculation above,
> >
> > Yep, that's exactly what I meant.
> > As you said it would help with bulk allocation/alignment stuff, plus IMO it is
> > better and easier to group several related global together - Improve code
> > quality, will make it easier to read & maintain in future.
> >
> > > It will change more code in ethdev relative to the old rte_eth_dev_data
> > global array and will be more intrusive.
> > > Stay it as is, focuses the change only here.
> >
> > Yes it would require few more changes, though I think it worth it.
> >
> 
> Ok, Got you and agree.
> 
> > >
> > > I can just move the spinlock memory allocation to be at the beginning of
> > the memzone(to be sure about the alignment).
> > >
> > > > It is good to see that now scanning/updating rte_eth_dev_data[] is
> > > > lock protected, but it might be not very plausible to protect both
> > > > data[] and next_owner_id using the same lock.
> > >
> > > I guess you mean to the owner structure in rte_eth_dev_data[port_id].
> > > The next_owner_id is read by ownership APIs(for owner validation), so it
> > makes sense to use the same lock.
> > > Actually, why not?
> >
> > Well to me next_owner_id and rte_eth_dev_data[] are not directly related.
> > You may create new owner_id but it doesn't mean you would update
> > rte_eth_dev_data[] immediately.
> > And visa-versa - you might just want to update rte_eth_dev_data[].name or
> > .owner_id.
> > It is not very good coding practice to use same lock for non-related data
> > structures.
> >
> I see the relation like next:
> Since the ownership mechanism synchronization is in ethdev responsibility,
> we must protect against user mistakes as much as we can by using the same lock.
> So, if user try to set by invalid owner (exactly the ID which currently is allocated) we can protect on it.

Hmm, not sure why you can't do same checking with different lock or atomic variable?

> 
> > >
> > > > In fact, for next_owner_id, you don't need a lock - just
> > > > rte_atomic_t should be enough.
> > >
> > > I don't think so, it is problematic in next_owner_id wraparound and may
> > complicate the code in other places which read it.
> >
> > IMO it is not that complicated, something like that should work I think.
> >
> > /* init to 0 at startup*/
> > rte_atomic32_t *owner_id;
> >
> > int new_owner_id(void)
> > {
> >     int32_t x;
> >     x = rte_atomic32_add_return(&owner_id, 1);
> >     if (x > UINT16_MAX) {
> >        rte_atomic32_dec(&owner_id);
> >        return -EOVERWLOW;
> >     } else
> >         return x;
> > }
> >
> >
> > > Why not just to keep it simple and using the same lock?
> >
> > Lock is also fine, I just think it better be a separate one - that would protext
> > just next_owner_id.
> > Though if you are going to use uuid here - all that probably not relevant any
> > more.
> >
> 
> I agree about the uuid but still think the same lock should be used for both.

But with uuid you don't need next_owner_id at all, right?
So lock will only be used for rte_eth_dev_data[] fields anyway.

> 
> > >
> > > > Another alternative would be to use 2 locks - one for next_owner_id
> > > > second for actual data[] protection.
> > > >
> > > > Another thing - you'll probably need to grab/release a lock inside
> > > > rte_eth_dev_allocated() too.
> > > > It is a public function used by drivers, so need to be protected too.
> > > >
> > >
> > > Yes, I thought about it, but decided not to use lock in next:
> > > rte_eth_dev_allocated
> > > rte_eth_dev_count
> > > rte_eth_dev_get_name_by_port
> > > rte_eth_dev_get_port_by_name
> > > maybe more...
> >
> > As I can see in patch #3 you protect by lock access to
> > rte_eth_dev_data[].name (which seems like a good  thing).
> > So I think any other public function that access rte_eth_dev_data[].name
> > should be protected by the same lock.
> >
> 
> I don't think so, I can understand to use the ownership lock here(as in port creation) but I don't think it is necessary too.
> What are we exactly protecting here?
> Don't you think it is just timing?(ask in the next moment and you
>  may get another answer) I don't see optional crash.

Not sure what you mean here by timing...
As I understand rte_eth_dev_data[].name unique identifies device and is used
by  port allocation/release/find functions.
As you stated above:
"1. The port allocation and port release synchronization will be  managed by ethdev."
To me it means that ethdev layer has to make sure that all accesses to
rte_eth_dev_data[].name are atomic.
Otherwise what would prevent the situation when one process does
rte_eth_dev_allocate()->snprintf(rte_eth_dev_data[x].name, ...)
while second one does rte_eth_dev_allocated(rte_eth_dev_data[x].name, ...)
?

> 
> > > > > +
> > > > > +	if (rte_eal_process_type() == RTE_PROC_PRIMARY) {
> > > > > +		memset(rte_eth_dev_data, 0, data_size);
> > > > > +		*rte_eth_next_owner_id = RTE_ETH_DEV_NO_OWNER + 1;
> > > > > +		rte_spinlock_init(rte_eth_dev_ownership_lock);
> > > > > +	}
> > > > >  }
> > > > >
> > > > >  struct rte_eth_dev *
> > > > > @@ -225,7 +243,7 @@ struct rte_eth_dev *
> > > > >  	}
> > > > >
> > > > >  	if (rte_eth_dev_data == NULL)
> > > > > -		rte_eth_dev_data_alloc();
> > > > > +		rte_eth_dev_share_data_alloc();
> > > > >
> > > > >  	if (rte_eth_dev_allocated(name) != NULL) {
> > > > >  		RTE_PMD_DEBUG_TRACE("Ethernet Device with name %s
> > > > already
> > > > > allocated!\n", @@ -253,7 +271,7 @@ struct rte_eth_dev *
> > > > >  	struct rte_eth_dev *eth_dev;
> > > > >
> > > > >  	if (rte_eth_dev_data == NULL)
> > > > > -		rte_eth_dev_data_alloc();
> > > > > +		rte_eth_dev_share_data_alloc();
> > > > >
> > > > >  	for (i = 0; i < RTE_MAX_ETHPORTS; i++) {
> > > > >  		if (strcmp(rte_eth_dev_data[i].name, name) == 0) @@ -
> > > > 278,8 +296,12
> > > > > @@ struct rte_eth_dev *
> > > > >  	if (eth_dev == NULL)
> > > > >  		return -EINVAL;
> > > > >
> > > > > -	memset(eth_dev->data, 0, sizeof(struct rte_eth_dev_data));
> > > > > +	rte_spinlock_lock(rte_eth_dev_ownership_lock);
> > > > > +
> > > > >  	eth_dev->state = RTE_ETH_DEV_UNUSED;
> > > > > +	memset(eth_dev->data, 0, sizeof(struct rte_eth_dev_data));
> > > > > +
> > > > > +	rte_spinlock_unlock(rte_eth_dev_ownership_lock);
> > > > >  	return 0;
> > > > >  }
> > > > >
> > > > > @@ -294,6 +316,174 @@ struct rte_eth_dev *
> > > > >  		return 1;
> > > > >  }
> > > > >
> > > > > +static int
> > > > > +rte_eth_is_valid_owner_id(uint16_t owner_id) {
> > > > > +	if (owner_id == RTE_ETH_DEV_NO_OWNER ||
> > > > > +	    (*rte_eth_next_owner_id > RTE_ETH_DEV_NO_OWNER &&
> > > > > +	     *rte_eth_next_owner_id <= owner_id)) {
> > > > > +		RTE_LOG(ERR, EAL, "Invalid owner_id=%d.\n", owner_id);
> > > > > +		return 0;
> > > > > +	}
> > > > > +	return 1;
> > > > > +}
> > > > > +
> > > > > +uint16_t
> > > > > +rte_eth_find_next_owned_by(uint16_t port_id, const uint16_t
> > > > owner_id)
> > > > > +{
> > > > > +	while (port_id < RTE_MAX_ETHPORTS &&
> > > > > +	       (rte_eth_devices[port_id].state != RTE_ETH_DEV_ATTACHED ||
> > > > > +	       rte_eth_devices[port_id].data->owner.id != owner_id))
> > > > > +		port_id++;
> > > > > +
> > > > > +	if (port_id >= RTE_MAX_ETHPORTS)
> > > > > +		return RTE_MAX_ETHPORTS;
> > > > > +
> > > > > +	return port_id;
> > > > > +}
> > > > > +
> > > > > +int
> > > > > +rte_eth_dev_owner_new(uint16_t *owner_id) {
> > > > > +	int ret = 0;
> > > > > +
> > > > > +	rte_spinlock_lock(rte_eth_dev_ownership_lock);
> > > > > +
> > > > > +	if (*rte_eth_next_owner_id == RTE_ETH_DEV_NO_OWNER) {
> > > > > +		/* Counter wrap around. */
> > > > > +		RTE_PMD_DEBUG_TRACE("Reached maximum number of
> > > > Ethernet port owners.\n");
> > > > > +		ret = -EUSERS;
> > > > > +	} else {
> > > > > +		*owner_id = (*rte_eth_next_owner_id)++;
> > > > > +	}
> > > > > +
> > > > > +	rte_spinlock_unlock(rte_eth_dev_ownership_lock);
> > > > > +	return ret;
> > > > > +}
> > > > > +
> > > > > +int
> > > > > +rte_eth_dev_owner_set(const uint16_t port_id,
> > > > > +		      const struct rte_eth_dev_owner *owner)
> > > >
> > > > As a nit - if you'll have rte_eth_dev_owner_set(port_id, old_owner,
> > > > new_owner)
> > > > - that might be more plausible for user, and would greatly simplify
> > > > unset()
> > > > part:
> > > > just set(port_id, cur_owner, zero_owner);
> > > >
> > >
> > > How the user should know the old owner?
> >
> > By dev_owner_get() or it might have it stored somewhere already (or
> > constructed on the fly in case of NO_OWNER).
> >
> It complicates the usage.
> What's about creating an internal API  _rte_eth_dev_owner_set(port_id, old_owner,
> new_owner) and using it by the current exposed set\unset APIs?

Sounds good to me.

> 
> > >
> > > > > +{
> > > > > +	struct rte_eth_dev_owner *port_owner;
> > > > > +	int ret = 0;
> > > > > +	int sret;
> > > > > +
> > > > > +	rte_spinlock_lock(rte_eth_dev_ownership_lock);
> > > > > +
> > > > > +	if (!rte_eth_dev_is_valid_port(port_id)) {
> > > > > +		RTE_PMD_DEBUG_TRACE("Invalid port_id=%d\n", port_id);
> > > > > +		ret = -ENODEV;
> > > > > +		goto unlock;
> > > > > +	}
> > > > > +
> > > > > +	if (!rte_eth_is_valid_owner_id(owner->id)) {
> > > > > +		ret = -EINVAL;
> > > > > +		goto unlock;
> > > > > +	}
> > > > > +
> > > > > +	port_owner = &rte_eth_devices[port_id].data->owner;
> > > > > +	if (port_owner->id != RTE_ETH_DEV_NO_OWNER &&
> > > > > +	    port_owner->id != owner->id) {
> > > > > +		RTE_LOG(ERR, EAL,
> > > > > +			"Cannot set owner to port %d already owned by
> > > > %s_%05d.\n",
> > > > > +			port_id, port_owner->name, port_owner->id);
> > > > > +		ret = -EPERM;
> > > > > +		goto unlock;
> > > > > +	}
> > > > > +
> > > > > +	sret = snprintf(port_owner->name,
> > > > RTE_ETH_MAX_OWNER_NAME_LEN, "%s",
> > > > > +			owner->name);
> > > > > +	if (sret < 0 || sret >= RTE_ETH_MAX_OWNER_NAME_LEN) {
> > > >
> > > > Personally, I don't see any reason to fail if description was truncated...
> > > > Another alternative - just use rte_malloc() here to allocate big
> > > > enough buffer to hold the description.
> > > >
> > >
> > > But it is static allocation like in the device name, why to allocate it
> > differently?
> >
> > Static allocation is fine by me - I just said there is probably no need to fail if
> > description provide by use will be truncated in that case.
> > Though if used description is *that* important - rte_malloc() can help here.
> >
> Again, what is the difference between port name and owner name regarding the allocations?

As I understand rte_eth_dev_data[].name unique identifies device and always has to be consistent.
owner.name is not critical for system operation, and I don't see a big deal if it would be truncated.

> The advantage of static allocation:
> 1. Not use protected malloc\free functions in other protected code.

You can call malloc/free before/after grabbing the lock.
But as I said - I am fine with static array here too - I just don't think
truncating user description should cause a failure.  

> 2.  Easier to the user.
> 
> > >
> > > > > +		memset(port_owner->name, 0,
> > > > RTE_ETH_MAX_OWNER_NAME_LEN);
> > > > > +		RTE_LOG(ERR, EAL, "Invalid owner name.\n");
> > > > > +		ret = -EINVAL;
> > > > > +		goto unlock;
> > > > > +	}
> > > > > +
> > > > > +	port_owner->id = owner->id;
> > > > > +	RTE_PMD_DEBUG_TRACE("Port %d owner is %s_%05d.\n", port_id,
> > > > > +			    owner->name, owner->id);
> > > > > +
> > > >
> > > > As another nit - you can avoid all these gotos by restructuring code a bit:
> > > >
> > > > rte_eth_dev_owner_set(const uint16_t port_id, const struct
> > > > rte_eth_dev_owner *owner) {
> > > >     rte_spinlock_lock(...);
> > > >     ret = _eth_dev_owner_set_unlocked(port_id, owner);
> > > >     rte_spinlock_unlock(...);
> > > >     return ret;
> > > > }
> > > >
> > > Don't you like gotos? :)
> >
> > Not really :)
> >
> > > I personally use it only in error\performance scenarios.
> >
> > Same here - prefer to avoid them if possible.
> >
> > > Do you think it worth the effort?
> >
> > IMO - yes, well structured code is much easier to understand and maintain.
> I don't think so in error cases(and performance), It is really clear here, but if you are insisting, I will change it.
> Are you?

Yes, that would be my preference.
Why otherwise I would bother to write all this? :)

> (If the community thinks like you I think "goto" check should be added to checkpatch).

Might be there are pieces of code there goto are really hard to avoid,
and/or using goto would provide some performance benefit or so...
But that case definitely doesn't look like that.
Konstantin

Matan Azrad Jan. 12, 2018, 7:24 a.m. UTC | #6

Hi Konstantin

From: Ananyev, Konstantin, Friday, January 12, 2018 2:02 AM
> Hi Matan,
> 
> >
> > Hi Konstantin
> >
> > From: Ananyev, Konstantin, Thursday, January 11, 2018 2:40 PM
> > > Hi Matan,
> > >
> > > >
> > > > Hi Konstantin
> > > >
> > > > From: Ananyev, Konstantin, Wednesday, January 10, 2018 3:36 PM
> > > > > Hi Matan,
<snip>
> > > > > Few comments from me below.
> > > > > BTW, do you plan to add ownership mandatory check in control
> > > > > path functions that change port configuration?
> > > >
> > > > No.
> > >
> > > So it still totally voluntary usage and application nneds to be
> > > changed to exploit it?
> > > Apart from RTE_FOR_EACH_DEV() change proposed by Gaetan?
> > >
> >
> > Also RTE_FOR_EACH_DEV() change proposed by Gaetan is not protected
> because 2 DPDK entities can get the same port while using it.
> 
> I am not talking about racing condition here.
> Right now even from the same thread - I can call dev_configure() for the port
> which I don't own (let say it belongs to failsafe port), and that would remain,
> correct?
> 
Yes.

> > As I wrote in the log\docs and as discussed a lot in the first version:
> > The new synchronization rules are:
> > 1. The port allocation and port release synchronization will be
> >    managed by ethdev.
> > 2. The port usage synchronization will be managed by the port owner.
> > 3. The port ownership API synchronization(also with port creation) will be
> managed by ethdev.
> > 4. DPDK entity which want to use a port must take ownership before.
> >
> > Ethdev should not protect 2 and 4 according these rules.
> >
> > > > > Konstantin
> > > > >
> > > > > > -----Original Message-----
> > > > > > From: Matan Azrad [mailto:matan@mellanox.com]
<snip>
> > I mean the documentation about the needed alignment for spinlock.
> Where is it?
> 
> https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Finfo
> center.arm.com%2Fhelp%2Findex.jsp%3Ftopic%3D%2Fcom.arm.doc.faqs%2
> Fka15414.html&data=02%7C01%7Cmatan%40mellanox.com%7Cb3c329ae9db
> f4bd29a7008d5594fb776%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C1
> %7C636513121294703050&sdata=40v3b4wk5f4qEyIY5jdDv8S47LjgXK0t9TPtav
> XIMOk%3D&reserved=0
> https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Finfo
> center.arm.com%2Fhelp%2Findex.jsp%3Ftopic%3D%2Fcom.arm.doc.dht000
> 8a%2FCJAGCFAF.html&data=02%7C01%7Cmatan%40mellanox.com%7Cb3c32
> 9ae9dbf4bd29a7008d5594fb776%7Ca652971c7d2e4d9ba6a4d149256f461b%7
> C0%7C1%7C636513121294703050&sdata=B7pEZjFJntVp3Il8fS9wr%2FlxABgNX
> FSr9PE4emEPLQE%3D&reserved=0
> 
> Might be ARM and PPC guys can provide you some more complete/recent
> docs.
Thanks.
<snip> 
> > > > > It is good to see that now scanning/updating rte_eth_dev_data[]
> > > > > is lock protected, but it might be not very plausible to protect
> > > > > both data[] and next_owner_id using the same lock.
> > > >
> > > > I guess you mean to the owner structure in rte_eth_dev_data[port_id].
> > > > The next_owner_id is read by ownership APIs(for owner validation),
> > > > so it
> > > makes sense to use the same lock.
> > > > Actually, why not?
> > >
> > > Well to me next_owner_id and rte_eth_dev_data[] are not directly
> related.
> > > You may create new owner_id but it doesn't mean you would update
> > > rte_eth_dev_data[] immediately.
> > > And visa-versa - you might just want to update
> > > rte_eth_dev_data[].name or .owner_id.
> > > It is not very good coding practice to use same lock for non-related
> > > data structures.
> > >
> > I see the relation like next:
> > Since the ownership mechanism synchronization is in ethdev
> > responsibility, we must protect against user mistakes as much as we can by
> using the same lock.
> > So, if user try to set by invalid owner (exactly the ID which currently is
> allocated) we can protect on it.
> 
> Hmm, not sure why you can't do same checking with different lock or atomic
> variable?
> 
The set ownership API is protected by ownership lock and checks the owner ID validity 
By reading the next owner ID.
So, the owner ID allocation and set API should use the same atomic mechanism.
The set(and others) ownership APIs already uses the ownership lock so I think it makes sense to use the same lock also in ID allocation.
 
> > > > > In fact, for next_owner_id, you don't need a lock - just
> > > > > rte_atomic_t should be enough.
> > > >
> > > > I don't think so, it is problematic in next_owner_id wraparound
> > > > and may
> > > complicate the code in other places which read it.
> > >
> > > IMO it is not that complicated, something like that should work I think.
> > >
> > > /* init to 0 at startup*/
> > > rte_atomic32_t *owner_id;
> > >
> > > int new_owner_id(void)
> > > {
> > >     int32_t x;
> > >     x = rte_atomic32_add_return(&owner_id, 1);
> > >     if (x > UINT16_MAX) {
> > >        rte_atomic32_dec(&owner_id);
> > >        return -EOVERWLOW;
> > >     } else
> > >         return x;
> > > }
> > >
> > >
> > > > Why not just to keep it simple and using the same lock?
> > >
> > > Lock is also fine, I just think it better be a separate one - that
> > > would protext just next_owner_id.
> > > Though if you are going to use uuid here - all that probably not
> > > relevant any more.
> > >
> >
> > I agree about the uuid but still think the same lock should be used for both.
> 
> But with uuid you don't need next_owner_id at all, right?
> So lock will only be used for rte_eth_dev_data[] fields anyway.
>
Sorry, I meant uint64_t, not uuid.

> > > > > Another alternative would be to use 2 locks - one for
> > > > > next_owner_id second for actual data[] protection.
> > > > >
> > > > > Another thing - you'll probably need to grab/release a lock
> > > > > inside
> > > > > rte_eth_dev_allocated() too.
> > > > > It is a public function used by drivers, so need to be protected too.
> > > > >
> > > >
> > > > Yes, I thought about it, but decided not to use lock in next:
> > > > rte_eth_dev_allocated
> > > > rte_eth_dev_count
> > > > rte_eth_dev_get_name_by_port
> > > > rte_eth_dev_get_port_by_name
> > > > maybe more...
> > >
> > > As I can see in patch #3 you protect by lock access to
> > > rte_eth_dev_data[].name (which seems like a good  thing).
> > > So I think any other public function that access
> > > rte_eth_dev_data[].name should be protected by the same lock.
> > >
> >
> > I don't think so, I can understand to use the ownership lock here(as in port
> creation) but I don't think it is necessary too.
> > What are we exactly protecting here?
> > Don't you think it is just timing?(ask in the next moment and you  may
> > get another answer) I don't see optional crash.
> 
> Not sure what you mean here by timing...
> As I understand rte_eth_dev_data[].name unique identifies device and is
> used by  port allocation/release/find functions.
> As you stated above:
> "1. The port allocation and port release synchronization will be  managed by
> ethdev."
> To me it means that ethdev layer has to make sure that all accesses to
> rte_eth_dev_data[].name are atomic.
> Otherwise what would prevent the situation when one process does
> rte_eth_dev_allocate()->snprintf(rte_eth_dev_data[x].name, ...) while
> second one does rte_eth_dev_allocated(rte_eth_dev_data[x].name, ...) ?
> 
The second will get True or False and that is it.
Maybe if it had been called just a moment after, It might get different answer. 
Because these APIs don't change ethdev structure(just read), it can be OK.
But again, I can understand to use ownership lock also here.

<snip>
> > > Static allocation is fine by me - I just said there is probably no
> > > need to fail if description provide by use will be truncated in that case.
> > > Though if used description is *that* important - rte_malloc() can help
> here.
> > >
> > Again, what is the difference between port name and owner name
> regarding the allocations?
> 
> As I understand rte_eth_dev_data[].name unique identifies device and
> always has to be consistent.
> owner.name is not critical for system operation, and I don't see a big deal if it
> would be truncated.
> 
> > The advantage of static allocation:
> > 1. Not use protected malloc\free functions in other protected code.
> 
> You can call malloc/free before/after grabbing the lock.
> But as I said - I am fine with static array here too - I just don't think truncating
> user description should cause a failure.
> 

Ok, will just add warning print in truncation case.

> > 2.  Easier to the user.
> >
> > > >
> > > > > > +		memset(port_owner->name, 0,
> > > > > RTE_ETH_MAX_OWNER_NAME_LEN);
> > > > > > +		RTE_LOG(ERR, EAL, "Invalid owner name.\n");
> > > > > > +		ret = -EINVAL;
> > > > > > +		goto unlock;
> > > > > > +	}
> > > > > > +
> > > > > > +	port_owner->id = owner->id;
> > > > > > +	RTE_PMD_DEBUG_TRACE("Port %d owner is %s_%05d.\n",
> port_id,
> > > > > > +			    owner->name, owner->id);
> > > > > > +
> > > > >
> > > > > As another nit - you can avoid all these gotos by restructuring code a
> bit:
> > > > >
> > > > > rte_eth_dev_owner_set(const uint16_t port_id, const struct
> > > > > rte_eth_dev_owner *owner) {
> > > > >     rte_spinlock_lock(...);
> > > > >     ret = _eth_dev_owner_set_unlocked(port_id, owner);
> > > > >     rte_spinlock_unlock(...);
> > > > >     return ret;
> > > > > }
> > > > >
> > > > Don't you like gotos? :)
> > >
> > > Not really :)
> > >
> > > > I personally use it only in error\performance scenarios.
> > >
> > > Same here - prefer to avoid them if possible.
> > >
> > > > Do you think it worth the effort?
> > >
> > > IMO - yes, well structured code is much easier to understand and
> maintain.
> > I don't think so in error cases(and performance), It is really clear here, but if
> you are insisting, I will change it.
> > Are you?
> 
> Yes, that would be my preference.
> Why otherwise I would bother to write all this? :)
> 
> > (If the community thinks like you I think "goto" check should be added to
> checkpatch).
> 
> Might be there are pieces of code there goto are really hard to avoid, and/or
> using goto would provide some performance benefit or so...
> But that case definitely doesn't look like that.

Let's stop "goto" discussion here, in spite of I don't think like you globally, In this case I have no problem to change it. 

Thanks,
Matan.

Ananyev, Konstantin Jan. 15, 2018, 11:45 a.m. UTC | #7

Hi Matan,

> 
> 
> Hi Konstantin
> 
> From: Ananyev, Konstantin, Friday, January 12, 2018 2:02 AM
> > Hi Matan,
> >
> > >
> > > Hi Konstantin
> > >
> > > From: Ananyev, Konstantin, Thursday, January 11, 2018 2:40 PM
> > > > Hi Matan,
> > > >
> > > > >
> > > > > Hi Konstantin
> > > > >
> > > > > From: Ananyev, Konstantin, Wednesday, January 10, 2018 3:36 PM
> > > > > > Hi Matan,
> <snip>
> > > > > > Few comments from me below.
> > > > > > BTW, do you plan to add ownership mandatory check in control
> > > > > > path functions that change port configuration?
> > > > >
> > > > > No.
> > > >
> > > > So it still totally voluntary usage and application nneds to be
> > > > changed to exploit it?
> > > > Apart from RTE_FOR_EACH_DEV() change proposed by Gaetan?
> > > >
> > >
> > > Also RTE_FOR_EACH_DEV() change proposed by Gaetan is not protected
> > because 2 DPDK entities can get the same port while using it.
> >
> > I am not talking about racing condition here.
> > Right now even from the same thread - I can call dev_configure() for the port
> > which I don't own (let say it belongs to failsafe port), and that would remain,
> > correct?
> >
> Yes.

Ok, thanks for clarification.
I think that makes current approach sort of incomplete, but might be it is a 
subject of separate discussion.

> 
> > > As I wrote in the log\docs and as discussed a lot in the first version:
> > > The new synchronization rules are:
> > > 1. The port allocation and port release synchronization will be
> > >    managed by ethdev.
> > > 2. The port usage synchronization will be managed by the port owner.
> > > 3. The port ownership API synchronization(also with port creation) will be
> > managed by ethdev.
> > > 4. DPDK entity which want to use a port must take ownership before.
> > >
> > > Ethdev should not protect 2 and 4 according these rules.
> > >
> > > > > > Konstantin
> > > > > >
> > > > > > > -----Original Message-----
> > > > > > > From: Matan Azrad [mailto:matan@mellanox.com]
> <snip>
> > > I mean the documentation about the needed alignment for spinlock.
> > Where is it?
> >
> > https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Finfo
> > center.arm.com%2Fhelp%2Findex.jsp%3Ftopic%3D%2Fcom.arm.doc.faqs%2
> > Fka15414.html&data=02%7C01%7Cmatan%40mellanox.com%7Cb3c329ae9db
> > f4bd29a7008d5594fb776%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C1
> > %7C636513121294703050&sdata=40v3b4wk5f4qEyIY5jdDv8S47LjgXK0t9TPtav
> > XIMOk%3D&reserved=0
> > https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Finfo
> > center.arm.com%2Fhelp%2Findex.jsp%3Ftopic%3D%2Fcom.arm.doc.dht000
> > 8a%2FCJAGCFAF.html&data=02%7C01%7Cmatan%40mellanox.com%7Cb3c32
> > 9ae9dbf4bd29a7008d5594fb776%7Ca652971c7d2e4d9ba6a4d149256f461b%7
> > C0%7C1%7C636513121294703050&sdata=B7pEZjFJntVp3Il8fS9wr%2FlxABgNX
> > FSr9PE4emEPLQE%3D&reserved=0
> >
> > Might be ARM and PPC guys can provide you some more complete/recent
> > docs.
> Thanks.
> <snip>
> > > > > > It is good to see that now scanning/updating rte_eth_dev_data[]
> > > > > > is lock protected, but it might be not very plausible to protect
> > > > > > both data[] and next_owner_id using the same lock.
> > > > >
> > > > > I guess you mean to the owner structure in rte_eth_dev_data[port_id].
> > > > > The next_owner_id is read by ownership APIs(for owner validation),
> > > > > so it
> > > > makes sense to use the same lock.
> > > > > Actually, why not?
> > > >
> > > > Well to me next_owner_id and rte_eth_dev_data[] are not directly
> > related.
> > > > You may create new owner_id but it doesn't mean you would update
> > > > rte_eth_dev_data[] immediately.
> > > > And visa-versa - you might just want to update
> > > > rte_eth_dev_data[].name or .owner_id.
> > > > It is not very good coding practice to use same lock for non-related
> > > > data structures.
> > > >
> > > I see the relation like next:
> > > Since the ownership mechanism synchronization is in ethdev
> > > responsibility, we must protect against user mistakes as much as we can by
> > using the same lock.
> > > So, if user try to set by invalid owner (exactly the ID which currently is
> > allocated) we can protect on it.
> >
> > Hmm, not sure why you can't do same checking with different lock or atomic
> > variable?
> >
> The set ownership API is protected by ownership lock and checks the owner ID validity
> By reading the next owner ID.
> So, the owner ID allocation and set API should use the same atomic mechanism.

Sure but all you are doing for checking validity, is  check that 
owner_id > 0 &&& owner_id < next_ownwe_id, right?
As you don't allow owner_id overlap (16/3248 bits) you can safely do same check
with just atomic_get(&next_owner_id). 

> The set(and others) ownership APIs already uses the ownership lock so I think it makes sense to use the same lock also in ID allocation.
> 
> > > > > > In fact, for next_owner_id, you don't need a lock - just
> > > > > > rte_atomic_t should be enough.
> > > > >
> > > > > I don't think so, it is problematic in next_owner_id wraparound
> > > > > and may
> > > > complicate the code in other places which read it.
> > > >
> > > > IMO it is not that complicated, something like that should work I think.
> > > >
> > > > /* init to 0 at startup*/
> > > > rte_atomic32_t *owner_id;
> > > >
> > > > int new_owner_id(void)
> > > > {
> > > >     int32_t x;
> > > >     x = rte_atomic32_add_return(&owner_id, 1);
> > > >     if (x > UINT16_MAX) {
> > > >        rte_atomic32_dec(&owner_id);
> > > >        return -EOVERWLOW;
> > > >     } else
> > > >         return x;
> > > > }
> > > >
> > > >
> > > > > Why not just to keep it simple and using the same lock?
> > > >
> > > > Lock is also fine, I just think it better be a separate one - that
> > > > would protext just next_owner_id.
> > > > Though if you are going to use uuid here - all that probably not
> > > > relevant any more.
> > > >
> > >
> > > I agree about the uuid but still think the same lock should be used for both.
> >
> > But with uuid you don't need next_owner_id at all, right?
> > So lock will only be used for rte_eth_dev_data[] fields anyway.
> >
> Sorry, I meant uint64_t, not uuid.

Ah ok, my thought uuid_t is better as with it you don't need to support your own code
to allocate new owner_id, but rely on system libs instead.
But wouldn't insist here.

> 
> > > > > > Another alternative would be to use 2 locks - one for
> > > > > > next_owner_id second for actual data[] protection.
> > > > > >
> > > > > > Another thing - you'll probably need to grab/release a lock
> > > > > > inside
> > > > > > rte_eth_dev_allocated() too.
> > > > > > It is a public function used by drivers, so need to be protected too.
> > > > > >
> > > > >
> > > > > Yes, I thought about it, but decided not to use lock in next:
> > > > > rte_eth_dev_allocated
> > > > > rte_eth_dev_count
> > > > > rte_eth_dev_get_name_by_port
> > > > > rte_eth_dev_get_port_by_name
> > > > > maybe more...
> > > >
> > > > As I can see in patch #3 you protect by lock access to
> > > > rte_eth_dev_data[].name (which seems like a good  thing).
> > > > So I think any other public function that access
> > > > rte_eth_dev_data[].name should be protected by the same lock.
> > > >
> > >
> > > I don't think so, I can understand to use the ownership lock here(as in port
> > creation) but I don't think it is necessary too.
> > > What are we exactly protecting here?
> > > Don't you think it is just timing?(ask in the next moment and you  may
> > > get another answer) I don't see optional crash.
> >
> > Not sure what you mean here by timing...
> > As I understand rte_eth_dev_data[].name unique identifies device and is
> > used by  port allocation/release/find functions.
> > As you stated above:
> > "1. The port allocation and port release synchronization will be  managed by
> > ethdev."
> > To me it means that ethdev layer has to make sure that all accesses to
> > rte_eth_dev_data[].name are atomic.
> > Otherwise what would prevent the situation when one process does
> > rte_eth_dev_allocate()->snprintf(rte_eth_dev_data[x].name, ...) while
> > second one does rte_eth_dev_allocated(rte_eth_dev_data[x].name, ...) ?
> >
> The second will get True or False and that is it.

Under race condition - in the worst case it might crash, though for that you'll have to be really unlucky.
Though in most cases as you said it would just not operate correctly.
I think if we start to protect dev->name by lock we need to do it for all instances
(both read and write).  

> Maybe if it had been called just a moment after, It might get different answer.
> Because these APIs don't change ethdev structure(just read), it can be OK.
> But again, I can understand to use ownership lock also here.
> 

Konstantin

Matan Azrad Jan. 15, 2018, 1:09 p.m. UTC | #8

Hi Konstantin

From: Ananyev, Konstantin, Monday, January 15, 2018 1:45 PM
> Hi Matan,
> 
> >
> >
> > Hi Konstantin
> >
> > From: Ananyev, Konstantin, Friday, January 12, 2018 2:02 AM
> > > Hi Matan,
> > >
> > > >
> > > > Hi Konstantin
> > > >
> > > > From: Ananyev, Konstantin, Thursday, January 11, 2018 2:40 PM
> > > > > Hi Matan,
> > > > >
> > > > > >
> > > > > > Hi Konstantin
> > > > > >
> > > > > > From: Ananyev, Konstantin, Wednesday, January 10, 2018 3:36 PM
> > > > > > > Hi Matan,
<snip>
> > > > > > > It is good to see that now scanning/updating
> > > > > > > rte_eth_dev_data[] is lock protected, but it might be not
> > > > > > > very plausible to protect both data[] and next_owner_id using the
> same lock.
> > > > > >
> > > > > > I guess you mean to the owner structure in
> rte_eth_dev_data[port_id].
> > > > > > The next_owner_id is read by ownership APIs(for owner
> > > > > > validation), so it
> > > > > makes sense to use the same lock.
> > > > > > Actually, why not?
> > > > >
> > > > > Well to me next_owner_id and rte_eth_dev_data[] are not directly
> > > related.
> > > > > You may create new owner_id but it doesn't mean you would update
> > > > > rte_eth_dev_data[] immediately.
> > > > > And visa-versa - you might just want to update
> > > > > rte_eth_dev_data[].name or .owner_id.
> > > > > It is not very good coding practice to use same lock for
> > > > > non-related data structures.
> > > > >
> > > > I see the relation like next:
> > > > Since the ownership mechanism synchronization is in ethdev
> > > > responsibility, we must protect against user mistakes as much as
> > > > we can by
> > > using the same lock.
> > > > So, if user try to set by invalid owner (exactly the ID which
> > > > currently is
> > > allocated) we can protect on it.
> > >
> > > Hmm, not sure why you can't do same checking with different lock or
> > > atomic variable?
> > >
> > The set ownership API is protected by ownership lock and checks the
> > owner ID validity By reading the next owner ID.
> > So, the owner ID allocation and set API should use the same atomic
> mechanism.
> 
> Sure but all you are doing for checking validity, is  check that owner_id > 0
> &&& owner_id < next_ownwe_id, right?
> As you don't allow owner_id overlap (16/3248 bits) you can safely do same
> check with just atomic_get(&next_owner_id).
> 
It will not protect it, scenario:
- current next_id is X.
- call set ownership of port A with owner id X by thread 0(by user mistake).
- context switch
- allocate new id by thread 1 and get X and change next_id to X+1 atomically.
-  context switch
- Thread 0 validate X by atomic_read and succeed to take ownership.
- The system loosed the port(or will be managed by two entities) - crash.


> > The set(and others) ownership APIs already uses the ownership lock so I
> think it makes sense to use the same lock also in ID allocation.
> >
> > > > > > > In fact, for next_owner_id, you don't need a lock - just
> > > > > > > rte_atomic_t should be enough.
> > > > > >
> > > > > > I don't think so, it is problematic in next_owner_id
> > > > > > wraparound and may
> > > > > complicate the code in other places which read it.
> > > > >
> > > > > IMO it is not that complicated, something like that should work I think.
> > > > >
> > > > > /* init to 0 at startup*/
> > > > > rte_atomic32_t *owner_id;
> > > > >
> > > > > int new_owner_id(void)
> > > > > {
> > > > >     int32_t x;
> > > > >     x = rte_atomic32_add_return(&owner_id, 1);
> > > > >     if (x > UINT16_MAX) {
> > > > >        rte_atomic32_dec(&owner_id);
> > > > >        return -EOVERWLOW;
> > > > >     } else
> > > > >         return x;
> > > > > }
> > > > >
> > > > >
> > > > > > Why not just to keep it simple and using the same lock?
> > > > >
> > > > > Lock is also fine, I just think it better be a separate one -
> > > > > that would protext just next_owner_id.
> > > > > Though if you are going to use uuid here - all that probably not
> > > > > relevant any more.
> > > > >
> > > >
> > > > I agree about the uuid but still think the same lock should be used for
> both.
> > >
> > > But with uuid you don't need next_owner_id at all, right?
> > > So lock will only be used for rte_eth_dev_data[] fields anyway.
> > >
> > Sorry, I meant uint64_t, not uuid.
> 
> Ah ok, my thought uuid_t is better as with it you don't need to support your
> own code to allocate new owner_id, but rely on system libs instead.
> But wouldn't insist here.
> 
> >
> > > > > > > Another alternative would be to use 2 locks - one for
> > > > > > > next_owner_id second for actual data[] protection.
> > > > > > >
> > > > > > > Another thing - you'll probably need to grab/release a lock
> > > > > > > inside
> > > > > > > rte_eth_dev_allocated() too.
> > > > > > > It is a public function used by drivers, so need to be protected too.
> > > > > > >
> > > > > >
> > > > > > Yes, I thought about it, but decided not to use lock in next:
> > > > > > rte_eth_dev_allocated
> > > > > > rte_eth_dev_count
> > > > > > rte_eth_dev_get_name_by_port
> > > > > > rte_eth_dev_get_port_by_name
> > > > > > maybe more...
> > > > >
> > > > > As I can see in patch #3 you protect by lock access to
> > > > > rte_eth_dev_data[].name (which seems like a good  thing).
> > > > > So I think any other public function that access
> > > > > rte_eth_dev_data[].name should be protected by the same lock.
> > > > >
> > > >
> > > > I don't think so, I can understand to use the ownership lock
> > > > here(as in port
> > > creation) but I don't think it is necessary too.
> > > > What are we exactly protecting here?
> > > > Don't you think it is just timing?(ask in the next moment and you
> > > > may get another answer) I don't see optional crash.
> > >
> > > Not sure what you mean here by timing...
> > > As I understand rte_eth_dev_data[].name unique identifies device and
> > > is used by  port allocation/release/find functions.
> > > As you stated above:
> > > "1. The port allocation and port release synchronization will be
> > > managed by ethdev."
> > > To me it means that ethdev layer has to make sure that all accesses
> > > to rte_eth_dev_data[].name are atomic.
> > > Otherwise what would prevent the situation when one process does
> > > rte_eth_dev_allocate()->snprintf(rte_eth_dev_data[x].name, ...)
> > > while second one does
> rte_eth_dev_allocated(rte_eth_dev_data[x].name, ...) ?
> > >
> > The second will get True or False and that is it.
> 
> Under race condition - in the worst case it might crash, though for that you'll
> have to be really unlucky.
> Though in most cases as you said it would just not operate correctly.
> I think if we start to protect dev->name by lock we need to do it for all
> instances (both read and write).
> 
Since under the ownership rules, the user must take ownership of a port before using it, I still don't see a problem here.
Please, Can you describe specific crash scenario and explain how could the locking fix it?

> > Maybe if it had been called just a moment after, It might get different
> answer.
> > Because these APIs don't change ethdev structure(just read), it can be OK.
> > But again, I can understand to use ownership lock also here.
> >
> 
> Konstantin

Ananyev, Konstantin Jan. 15, 2018, 6:43 p.m. UTC | #9

Hi Matan,

> 
> Hi Konstantin
> 
> From: Ananyev, Konstantin, Monday, January 15, 2018 1:45 PM
> > Hi Matan,
> >
> > >
> > >
> > > Hi Konstantin
> > >
> > > From: Ananyev, Konstantin, Friday, January 12, 2018 2:02 AM
> > > > Hi Matan,
> > > >
> > > > >
> > > > > Hi Konstantin
> > > > >
> > > > > From: Ananyev, Konstantin, Thursday, January 11, 2018 2:40 PM
> > > > > > Hi Matan,
> > > > > >
> > > > > > >
> > > > > > > Hi Konstantin
> > > > > > >
> > > > > > > From: Ananyev, Konstantin, Wednesday, January 10, 2018 3:36 PM
> > > > > > > > Hi Matan,
> <snip>
> > > > > > > > It is good to see that now scanning/updating
> > > > > > > > rte_eth_dev_data[] is lock protected, but it might be not
> > > > > > > > very plausible to protect both data[] and next_owner_id using the
> > same lock.
> > > > > > >
> > > > > > > I guess you mean to the owner structure in
> > rte_eth_dev_data[port_id].
> > > > > > > The next_owner_id is read by ownership APIs(for owner
> > > > > > > validation), so it
> > > > > > makes sense to use the same lock.
> > > > > > > Actually, why not?
> > > > > >
> > > > > > Well to me next_owner_id and rte_eth_dev_data[] are not directly
> > > > related.
> > > > > > You may create new owner_id but it doesn't mean you would update
> > > > > > rte_eth_dev_data[] immediately.
> > > > > > And visa-versa - you might just want to update
> > > > > > rte_eth_dev_data[].name or .owner_id.
> > > > > > It is not very good coding practice to use same lock for
> > > > > > non-related data structures.
> > > > > >
> > > > > I see the relation like next:
> > > > > Since the ownership mechanism synchronization is in ethdev
> > > > > responsibility, we must protect against user mistakes as much as
> > > > > we can by
> > > > using the same lock.
> > > > > So, if user try to set by invalid owner (exactly the ID which
> > > > > currently is
> > > > allocated) we can protect on it.
> > > >
> > > > Hmm, not sure why you can't do same checking with different lock or
> > > > atomic variable?
> > > >
> > > The set ownership API is protected by ownership lock and checks the
> > > owner ID validity By reading the next owner ID.
> > > So, the owner ID allocation and set API should use the same atomic
> > mechanism.
> >
> > Sure but all you are doing for checking validity, is  check that owner_id > 0
> > &&& owner_id < next_ownwe_id, right?
> > As you don't allow owner_id overlap (16/3248 bits) you can safely do same
> > check with just atomic_get(&next_owner_id).
> >
> It will not protect it, scenario:
> - current next_id is X.
> - call set ownership of port A with owner id X by thread 0(by user mistake).
> - context switch
> - allocate new id by thread 1 and get X and change next_id to X+1 atomically.
> -  context switch
> - Thread 0 validate X by atomic_read and succeed to take ownership.
> - The system loosed the port(or will be managed by two entities) - crash.


Ok, and how using lock will protect you with such scenario?
I don't think you can protect yourself against such scenario with or without locking.
Unless you'll make it harder for the mis-behaving thread to guess valid owner_id,
or add some extra logic here.

> 
> 
> > > The set(and others) ownership APIs already uses the ownership lock so I
> > think it makes sense to use the same lock also in ID allocation.
> > >
> > > > > > > > In fact, for next_owner_id, you don't need a lock - just
> > > > > > > > rte_atomic_t should be enough.
> > > > > > >
> > > > > > > I don't think so, it is problematic in next_owner_id
> > > > > > > wraparound and may
> > > > > > complicate the code in other places which read it.
> > > > > >
> > > > > > IMO it is not that complicated, something like that should work I think.
> > > > > >
> > > > > > /* init to 0 at startup*/
> > > > > > rte_atomic32_t *owner_id;
> > > > > >
> > > > > > int new_owner_id(void)
> > > > > > {
> > > > > >     int32_t x;
> > > > > >     x = rte_atomic32_add_return(&owner_id, 1);
> > > > > >     if (x > UINT16_MAX) {
> > > > > >        rte_atomic32_dec(&owner_id);
> > > > > >        return -EOVERWLOW;
> > > > > >     } else
> > > > > >         return x;
> > > > > > }
> > > > > >
> > > > > >
> > > > > > > Why not just to keep it simple and using the same lock?
> > > > > >
> > > > > > Lock is also fine, I just think it better be a separate one -
> > > > > > that would protext just next_owner_id.
> > > > > > Though if you are going to use uuid here - all that probably not
> > > > > > relevant any more.
> > > > > >
> > > > >
> > > > > I agree about the uuid but still think the same lock should be used for
> > both.
> > > >
> > > > But with uuid you don't need next_owner_id at all, right?
> > > > So lock will only be used for rte_eth_dev_data[] fields anyway.
> > > >
> > > Sorry, I meant uint64_t, not uuid.
> >
> > Ah ok, my thought uuid_t is better as with it you don't need to support your
> > own code to allocate new owner_id, but rely on system libs instead.
> > But wouldn't insist here.
> >
> > >
> > > > > > > > Another alternative would be to use 2 locks - one for
> > > > > > > > next_owner_id second for actual data[] protection.
> > > > > > > >
> > > > > > > > Another thing - you'll probably need to grab/release a lock
> > > > > > > > inside
> > > > > > > > rte_eth_dev_allocated() too.
> > > > > > > > It is a public function used by drivers, so need to be protected too.
> > > > > > > >
> > > > > > >
> > > > > > > Yes, I thought about it, but decided not to use lock in next:
> > > > > > > rte_eth_dev_allocated
> > > > > > > rte_eth_dev_count
> > > > > > > rte_eth_dev_get_name_by_port
> > > > > > > rte_eth_dev_get_port_by_name
> > > > > > > maybe more...
> > > > > >
> > > > > > As I can see in patch #3 you protect by lock access to
> > > > > > rte_eth_dev_data[].name (which seems like a good  thing).
> > > > > > So I think any other public function that access
> > > > > > rte_eth_dev_data[].name should be protected by the same lock.
> > > > > >
> > > > >
> > > > > I don't think so, I can understand to use the ownership lock
> > > > > here(as in port
> > > > creation) but I don't think it is necessary too.
> > > > > What are we exactly protecting here?
> > > > > Don't you think it is just timing?(ask in the next moment and you
> > > > > may get another answer) I don't see optional crash.
> > > >
> > > > Not sure what you mean here by timing...
> > > > As I understand rte_eth_dev_data[].name unique identifies device and
> > > > is used by  port allocation/release/find functions.
> > > > As you stated above:
> > > > "1. The port allocation and port release synchronization will be
> > > > managed by ethdev."
> > > > To me it means that ethdev layer has to make sure that all accesses
> > > > to rte_eth_dev_data[].name are atomic.
> > > > Otherwise what would prevent the situation when one process does
> > > > rte_eth_dev_allocate()->snprintf(rte_eth_dev_data[x].name, ...)
> > > > while second one does
> > rte_eth_dev_allocated(rte_eth_dev_data[x].name, ...) ?
> > > >
> > > The second will get True or False and that is it.
> >
> > Under race condition - in the worst case it might crash, though for that you'll
> > have to be really unlucky.
> > Though in most cases as you said it would just not operate correctly.
> > I think if we start to protect dev->name by lock we need to do it for all
> > instances (both read and write).
> >
> Since under the ownership rules, the user must take ownership of a port before using it, I still don't see a problem here.

I am not talking about owner id or name here.
I am talking about dev->name.

> Please, Can you describe specific crash scenario and explain how could the locking fix it?

Let say thread 0 doing rte_eth_dev_allocate()->snprintf(rte_eth_dev_data[x].name, ...),
thread 1 doing rte_pmd_ring_remove()->rte_eth_dev_allocated()->strcmp().
And because of race condition - rte_eth_dev_allocated() will return rte_eth_dev *
for the wrong device.
Then rte_pmd_ring_remove() will call rte_free() for related resources, while
It can still be in use by someone else.
Konstantin

> 
> > > Maybe if it had been called just a moment after, It might get different
> > answer.
> > > Because these APIs don't change ethdev structure(just read), it can be OK.
> > > But again, I can understand to use ownership lock also here.
> > >
> >
> > Konstantin

Matan Azrad Jan. 16, 2018, 8:04 a.m. UTC | #10

Hi Konstantin
From: Ananyev, Konstantin, Monday, January 15, 2018 8:44 PM
> Hi Matan,
> > Hi Konstantin
> > From: Ananyev, Konstantin, Monday, January 15, 2018 1:45 PM
> > > Hi Matan,
> > > > Hi Konstantin
> > > > From: Ananyev, Konstantin, Friday, January 12, 2018 2:02 AM
> > > > > Hi Matan,
> > > > > > Hi Konstantin
> > > > > > From: Ananyev, Konstantin, Thursday, January 11, 2018 2:40 PM
> > > > > > > Hi Matan,
> > > > > > > > Hi Konstantin
> > > > > > > > From: Ananyev, Konstantin, Wednesday, January 10, 2018
> > > > > > > > 3:36 PM
> > > > > > > > > Hi Matan,
 <snip>
> > > > > > > > > It is good to see that now scanning/updating
> > > > > > > > > rte_eth_dev_data[] is lock protected, but it might be
> > > > > > > > > not very plausible to protect both data[] and
> > > > > > > > > next_owner_id using the
> > > same lock.
> > > > > > > >
> > > > > > > > I guess you mean to the owner structure in
> > > rte_eth_dev_data[port_id].
> > > > > > > > The next_owner_id is read by ownership APIs(for owner
> > > > > > > > validation), so it
> > > > > > > makes sense to use the same lock.
> > > > > > > > Actually, why not?
> > > > > > >
> > > > > > > Well to me next_owner_id and rte_eth_dev_data[] are not
> > > > > > > directly
> > > > > related.
> > > > > > > You may create new owner_id but it doesn't mean you would
> > > > > > > update rte_eth_dev_data[] immediately.
> > > > > > > And visa-versa - you might just want to update
> > > > > > > rte_eth_dev_data[].name or .owner_id.
> > > > > > > It is not very good coding practice to use same lock for
> > > > > > > non-related data structures.
> > > > > > >
> > > > > > I see the relation like next:
> > > > > > Since the ownership mechanism synchronization is in ethdev
> > > > > > responsibility, we must protect against user mistakes as much
> > > > > > as we can by
> > > > > using the same lock.
> > > > > > So, if user try to set by invalid owner (exactly the ID which
> > > > > > currently is
> > > > > allocated) we can protect on it.
> > > > >
> > > > > Hmm, not sure why you can't do same checking with different lock
> > > > > or atomic variable?
> > > > >
> > > > The set ownership API is protected by ownership lock and checks
> > > > the owner ID validity By reading the next owner ID.
> > > > So, the owner ID allocation and set API should use the same atomic
> > > mechanism.
> > >
> > > Sure but all you are doing for checking validity, is  check that
> > > owner_id > 0 &&& owner_id < next_ownwe_id, right?
> > > As you don't allow owner_id overlap (16/3248 bits) you can safely do
> > > same check with just atomic_get(&next_owner_id).
> > >
> > It will not protect it, scenario:
> > - current next_id is X.
> > - call set ownership of port A with owner id X by thread 0(by user mistake).
> > - context switch 
> > - allocate new id by thread 1 and get X and change next_id to X+1
> atomically.
> > -  context switch
> > - Thread 0 validate X by atomic_read and succeed to take ownership.
> > - The system loosed the port(or will be managed by two entities) - crash.
> 
> 
> Ok, and how using lock will protect you with such scenario?

The owner set API validation by thread 0 should fail because the owner validation is included in the protected section.

> I don't think you can protect yourself against such scenario with or without
> locking.
> Unless you'll make it harder for the mis-behaving thread to guess valid
> owner_id, or add some extra logic here.
> 
> >
> >
> > > > The set(and others) ownership APIs already uses the ownership lock
> > > > so I
> > > think it makes sense to use the same lock also in ID allocation.
> > > >
> > > > > > > > > In fact, for next_owner_id, you don't need a lock - just
> > > > > > > > > rte_atomic_t should be enough.
> > > > > > > >
> > > > > > > > I don't think so, it is problematic in next_owner_id
> > > > > > > > wraparound and may
> > > > > > > complicate the code in other places which read it.
> > > > > > >
> > > > > > > IMO it is not that complicated, something like that should work I
> think.
> > > > > > >
> > > > > > > /* init to 0 at startup*/
> > > > > > > rte_atomic32_t *owner_id;
> > > > > > >
> > > > > > > int new_owner_id(void)
> > > > > > > {
> > > > > > >     int32_t x;
> > > > > > >     x = rte_atomic32_add_return(&owner_id, 1);
> > > > > > >     if (x > UINT16_MAX) {
> > > > > > >        rte_atomic32_dec(&owner_id);
> > > > > > >        return -EOVERWLOW;
> > > > > > >     } else
> > > > > > >         return x;
> > > > > > > }
> > > > > > >
> > > > > > >
> > > > > > > > Why not just to keep it simple and using the same lock?
> > > > > > >
> > > > > > > Lock is also fine, I just think it better be a separate one
> > > > > > > - that would protext just next_owner_id.
> > > > > > > Though if you are going to use uuid here - all that probably
> > > > > > > not relevant any more.
> > > > > > >
> > > > > >
> > > > > > I agree about the uuid but still think the same lock should be
> > > > > > used for
> > > both.
> > > > >
> > > > > But with uuid you don't need next_owner_id at all, right?
> > > > > So lock will only be used for rte_eth_dev_data[] fields anyway.
> > > > >
> > > > Sorry, I meant uint64_t, not uuid.
> > >
> > > Ah ok, my thought uuid_t is better as with it you don't need to
> > > support your own code to allocate new owner_id, but rely on system libs
> instead.
> > > But wouldn't insist here.
> > >
> > > >
> > > > > > > > > Another alternative would be to use 2 locks - one for
> > > > > > > > > next_owner_id second for actual data[] protection.
> > > > > > > > >
> > > > > > > > > Another thing - you'll probably need to grab/release a
> > > > > > > > > lock inside
> > > > > > > > > rte_eth_dev_allocated() too.
> > > > > > > > > It is a public function used by drivers, so need to be protected
> too.
> > > > > > > > >
> > > > > > > >
> > > > > > > > Yes, I thought about it, but decided not to use lock in next:
> > > > > > > > rte_eth_dev_allocated
> > > > > > > > rte_eth_dev_count
> > > > > > > > rte_eth_dev_get_name_by_port
> rte_eth_dev_get_port_by_name
> > > > > > > > maybe more...
> > > > > > >
> > > > > > > As I can see in patch #3 you protect by lock access to
> > > > > > > rte_eth_dev_data[].name (which seems like a good  thing).
> > > > > > > So I think any other public function that access
> > > > > > > rte_eth_dev_data[].name should be protected by the same lock.
> > > > > > >
> > > > > >
> > > > > > I don't think so, I can understand to use the ownership lock
> > > > > > here(as in port
> > > > > creation) but I don't think it is necessary too.
> > > > > > What are we exactly protecting here?
> > > > > > Don't you think it is just timing?(ask in the next moment and
> > > > > > you may get another answer) I don't see optional crash.
> > > > >
> > > > > Not sure what you mean here by timing...
> > > > > As I understand rte_eth_dev_data[].name unique identifies device
> > > > > and is used by  port allocation/release/find functions.
> > > > > As you stated above:
> > > > > "1. The port allocation and port release synchronization will be
> > > > > managed by ethdev."
> > > > > To me it means that ethdev layer has to make sure that all
> > > > > accesses to rte_eth_dev_data[].name are atomic.
> > > > > Otherwise what would prevent the situation when one process does
> > > > > rte_eth_dev_allocate()->snprintf(rte_eth_dev_data[x].name, ...)
> > > > > while second one does
> > > rte_eth_dev_allocated(rte_eth_dev_data[x].name, ...) ?
> > > > >
> > > > The second will get True or False and that is it.
> > >
> > > Under race condition - in the worst case it might crash, though for
> > > that you'll have to be really unlucky.
> > > Though in most cases as you said it would just not operate correctly.
> > > I think if we start to protect dev->name by lock we need to do it
> > > for all instances (both read and write).
> > >
> > Since under the ownership rules, the user must take ownership of a port
> before using it, I still don't see a problem here.
> 
> I am not talking about owner id or name here.
> I am talking about dev->name.
> 
So? The user still should take ownership of a device before using it (by name or by port id). 
It can just read it without owning it, but no managing it. 

> > Please, Can you describe specific crash scenario and explain how could the
> locking fix it?
> 
> Let say thread 0 doing rte_eth_dev_allocate()-
> >snprintf(rte_eth_dev_data[x].name, ...), thread 1 doing
> rte_pmd_ring_remove()->rte_eth_dev_allocated()->strcmp().
> And because of race condition - rte_eth_dev_allocated() will return
> rte_eth_dev * for the wrong device.
Which wrong device do you mean? I guess it is the device which currently is being created by thread 0.
> Then rte_pmd_ring_remove() will call rte_free() for related resources, while
> It can still be in use by someone else.
The rte_pmd_ring_remove caller(some DPDK entity) must take ownership (or validate that he is the owner) of a port before doing it(free, release), so no issue here.


Also I'm not sure I fully understand your scenario looks like moving the device state setting in allocation to be after the name setting will be good.
What do you think? 

> Konstantin
> 
> >
> > > > Maybe if it had been called just a moment after, It might get
> > > > different
> > > answer.
> > > > Because these APIs don't change ethdev structure(just read), it can be
> OK.
> > > > But again, I can understand to use ownership lock also here.
> > > >
> > >
> > > Konstantin

Ananyev, Konstantin Jan. 16, 2018, 7:11 p.m. UTC | #11

Hi Matan,

> 
> Hi Konstantin
> From: Ananyev, Konstantin, Monday, January 15, 2018 8:44 PM
> > Hi Matan,
> > > Hi Konstantin
> > > From: Ananyev, Konstantin, Monday, January 15, 2018 1:45 PM
> > > > Hi Matan,
> > > > > Hi Konstantin
> > > > > From: Ananyev, Konstantin, Friday, January 12, 2018 2:02 AM
> > > > > > Hi Matan,
> > > > > > > Hi Konstantin
> > > > > > > From: Ananyev, Konstantin, Thursday, January 11, 2018 2:40 PM
> > > > > > > > Hi Matan,
> > > > > > > > > Hi Konstantin
> > > > > > > > > From: Ananyev, Konstantin, Wednesday, January 10, 2018
> > > > > > > > > 3:36 PM
> > > > > > > > > > Hi Matan,
>  <snip>
> > > > > > > > > > It is good to see that now scanning/updating
> > > > > > > > > > rte_eth_dev_data[] is lock protected, but it might be
> > > > > > > > > > not very plausible to protect both data[] and
> > > > > > > > > > next_owner_id using the
> > > > same lock.
> > > > > > > > >
> > > > > > > > > I guess you mean to the owner structure in
> > > > rte_eth_dev_data[port_id].
> > > > > > > > > The next_owner_id is read by ownership APIs(for owner
> > > > > > > > > validation), so it
> > > > > > > > makes sense to use the same lock.
> > > > > > > > > Actually, why not?
> > > > > > > >
> > > > > > > > Well to me next_owner_id and rte_eth_dev_data[] are not
> > > > > > > > directly
> > > > > > related.
> > > > > > > > You may create new owner_id but it doesn't mean you would
> > > > > > > > update rte_eth_dev_data[] immediately.
> > > > > > > > And visa-versa - you might just want to update
> > > > > > > > rte_eth_dev_data[].name or .owner_id.
> > > > > > > > It is not very good coding practice to use same lock for
> > > > > > > > non-related data structures.
> > > > > > > >
> > > > > > > I see the relation like next:
> > > > > > > Since the ownership mechanism synchronization is in ethdev
> > > > > > > responsibility, we must protect against user mistakes as much
> > > > > > > as we can by
> > > > > > using the same lock.
> > > > > > > So, if user try to set by invalid owner (exactly the ID which
> > > > > > > currently is
> > > > > > allocated) we can protect on it.
> > > > > >
> > > > > > Hmm, not sure why you can't do same checking with different lock
> > > > > > or atomic variable?
> > > > > >
> > > > > The set ownership API is protected by ownership lock and checks
> > > > > the owner ID validity By reading the next owner ID.
> > > > > So, the owner ID allocation and set API should use the same atomic
> > > > mechanism.
> > > >
> > > > Sure but all you are doing for checking validity, is  check that
> > > > owner_id > 0 &&& owner_id < next_ownwe_id, right?
> > > > As you don't allow owner_id overlap (16/3248 bits) you can safely do
> > > > same check with just atomic_get(&next_owner_id).
> > > >
> > > It will not protect it, scenario:
> > > - current next_id is X.
> > > - call set ownership of port A with owner id X by thread 0(by user mistake).
> > > - context switch
> > > - allocate new id by thread 1 and get X and change next_id to X+1
> > atomically.
> > > -  context switch
> > > - Thread 0 validate X by atomic_read and succeed to take ownership.
> > > - The system loosed the port(or will be managed by two entities) - crash.
> >
> >
> > Ok, and how using lock will protect you with such scenario?
> 
> The owner set API validation by thread 0 should fail because the owner validation is included in the protected section.

Then your validation function would fail even if you'll use atomic ops instead of lock.
But in fact your code is not protected for that scenario - doesn't matter will you'll use lock or atomic ops.
Let's considerer your current code with the following scenario:

next_owner_id  == 1
1) Process 0:
     rte_eth_dev_owner_new(&owner_id);
     now owner_id == 1 and next_owner_id == 2
2) Process 1 (by mistake):
    rte_eth_dev_owner_set(port_id=1, owner->id=1);
It will complete successfully, as owner_id ==1 is considered as valid.
3) Process 0:
      rte_eth_dev_owner_set(port_id=1, owner->id=1);
It will also complete with success, as owner->id is valid is equal to current port owner_id.
So you finished with 2 processes assuming that they do own exclusively then same port.

Honestly in that situation  locking around nest_owner_id wouldn't give you any advantages
over atomic ops.

> 
> > I don't think you can protect yourself against such scenario with or without
> > locking.
> > Unless you'll make it harder for the mis-behaving thread to guess valid
> > owner_id, or add some extra logic here.
> >
> > >
> > >
> > > > > The set(and others) ownership APIs already uses the ownership lock
> > > > > so I
> > > > think it makes sense to use the same lock also in ID allocation.
> > > > >
> > > > > > > > > > In fact, for next_owner_id, you don't need a lock - just
> > > > > > > > > > rte_atomic_t should be enough.
> > > > > > > > >
> > > > > > > > > I don't think so, it is problematic in next_owner_id
> > > > > > > > > wraparound and may
> > > > > > > > complicate the code in other places which read it.
> > > > > > > >
> > > > > > > > IMO it is not that complicated, something like that should work I
> > think.
> > > > > > > >
> > > > > > > > /* init to 0 at startup*/
> > > > > > > > rte_atomic32_t *owner_id;
> > > > > > > >
> > > > > > > > int new_owner_id(void)
> > > > > > > > {
> > > > > > > >     int32_t x;
> > > > > > > >     x = rte_atomic32_add_return(&owner_id, 1);
> > > > > > > >     if (x > UINT16_MAX) {
> > > > > > > >        rte_atomic32_dec(&owner_id);
> > > > > > > >        return -EOVERWLOW;
> > > > > > > >     } else
> > > > > > > >         return x;
> > > > > > > > }
> > > > > > > >
> > > > > > > >
> > > > > > > > > Why not just to keep it simple and using the same lock?
> > > > > > > >
> > > > > > > > Lock is also fine, I just think it better be a separate one
> > > > > > > > - that would protext just next_owner_id.
> > > > > > > > Though if you are going to use uuid here - all that probably
> > > > > > > > not relevant any more.
> > > > > > > >
> > > > > > >
> > > > > > > I agree about the uuid but still think the same lock should be
> > > > > > > used for
> > > > both.
> > > > > >
> > > > > > But with uuid you don't need next_owner_id at all, right?
> > > > > > So lock will only be used for rte_eth_dev_data[] fields anyway.
> > > > > >
> > > > > Sorry, I meant uint64_t, not uuid.
> > > >
> > > > Ah ok, my thought uuid_t is better as with it you don't need to
> > > > support your own code to allocate new owner_id, but rely on system libs
> > instead.
> > > > But wouldn't insist here.
> > > >
> > > > >
> > > > > > > > > > Another alternative would be to use 2 locks - one for
> > > > > > > > > > next_owner_id second for actual data[] protection.
> > > > > > > > > >
> > > > > > > > > > Another thing - you'll probably need to grab/release a
> > > > > > > > > > lock inside
> > > > > > > > > > rte_eth_dev_allocated() too.
> > > > > > > > > > It is a public function used by drivers, so need to be protected
> > too.
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > > Yes, I thought about it, but decided not to use lock in next:
> > > > > > > > > rte_eth_dev_allocated
> > > > > > > > > rte_eth_dev_count
> > > > > > > > > rte_eth_dev_get_name_by_port
> > rte_eth_dev_get_port_by_name
> > > > > > > > > maybe more...
> > > > > > > >
> > > > > > > > As I can see in patch #3 you protect by lock access to
> > > > > > > > rte_eth_dev_data[].name (which seems like a good  thing).
> > > > > > > > So I think any other public function that access
> > > > > > > > rte_eth_dev_data[].name should be protected by the same lock.
> > > > > > > >
> > > > > > >
> > > > > > > I don't think so, I can understand to use the ownership lock
> > > > > > > here(as in port
> > > > > > creation) but I don't think it is necessary too.
> > > > > > > What are we exactly protecting here?
> > > > > > > Don't you think it is just timing?(ask in the next moment and
> > > > > > > you may get another answer) I don't see optional crash.
> > > > > >
> > > > > > Not sure what you mean here by timing...
> > > > > > As I understand rte_eth_dev_data[].name unique identifies device
> > > > > > and is used by  port allocation/release/find functions.
> > > > > > As you stated above:
> > > > > > "1. The port allocation and port release synchronization will be
> > > > > > managed by ethdev."
> > > > > > To me it means that ethdev layer has to make sure that all
> > > > > > accesses to rte_eth_dev_data[].name are atomic.
> > > > > > Otherwise what would prevent the situation when one process does
> > > > > > rte_eth_dev_allocate()->snprintf(rte_eth_dev_data[x].name, ...)
> > > > > > while second one does
> > > > rte_eth_dev_allocated(rte_eth_dev_data[x].name, ...) ?
> > > > > >
> > > > > The second will get True or False and that is it.
> > > >
> > > > Under race condition - in the worst case it might crash, though for
> > > > that you'll have to be really unlucky.
> > > > Though in most cases as you said it would just not operate correctly.
> > > > I think if we start to protect dev->name by lock we need to do it
> > > > for all instances (both read and write).
> > > >
> > > Since under the ownership rules, the user must take ownership of a port
> > before using it, I still don't see a problem here.
> >
> > I am not talking about owner id or name here.
> > I am talking about dev->name.
> >
> So? The user still should take ownership of a device before using it (by name or by port id).
> It can just read it without owning it, but no managing it.
> 
> > > Please, Can you describe specific crash scenario and explain how could the
> > locking fix it?
> >
> > Let say thread 0 doing rte_eth_dev_allocate()-
> > >snprintf(rte_eth_dev_data[x].name, ...), thread 1 doing
> > rte_pmd_ring_remove()->rte_eth_dev_allocated()->strcmp().
> > And because of race condition - rte_eth_dev_allocated() will return
> > rte_eth_dev * for the wrong device.
> Which wrong device do you mean? I guess it is the device which currently is being created by thread 0.
> > Then rte_pmd_ring_remove() will call rte_free() for related resources, while
> > It can still be in use by someone else.
> The rte_pmd_ring_remove caller(some DPDK entity) must take ownership (or validate that he is the owner) of a port before doing it(free,
> release), so no issue here.

Forget about ownership for a second.
Suppose we have a process it created ring port for itself (without setting any ownership)  and used it for some time.
Then it decided to remove it, so it calls rte_pmd_ring_remove() for it.
At the same time second process decides to call rte_eth_dev_allocate() (let say for anither ring port).
They could collide trying to read (process 0) and modify (process 1) same string rte_eth_dev_data[].name.

Konstantin 

> 
> 
> Also I'm not sure I fully understand your scenario looks like moving the device state setting in allocation to be after the name setting will be
> good.
> What do you think?
> 
> > Konstantin
> >
> > >
> > > > > Maybe if it had been called just a moment after, It might get
> > > > > different
> > > > answer.
> > > > > Because these APIs don't change ethdev structure(just read), it can be
> > OK.
> > > > > But again, I can understand to use ownership lock also here.
> > > > >
> > > >
> > > > Konstantin

Matan Azrad Jan. 16, 2018, 8:32 p.m. UTC | #12

Hi Konstantin

From: Ananyev, Konstantin, Tuesday, January 16, 2018 9:11 PM
> Hi Matan,
> 
> >
> > Hi Konstantin
> > From: Ananyev, Konstantin, Monday, January 15, 2018 8:44 PM
> > > Hi Matan,
> > > > Hi Konstantin
> > > > From: Ananyev, Konstantin, Monday, January 15, 2018 1:45 PM
> > > > > Hi Matan,
> > > > > > Hi Konstantin
> > > > > > From: Ananyev, Konstantin, Friday, January 12, 2018 2:02 AM
> > > > > > > Hi Matan,
> > > > > > > > Hi Konstantin
> > > > > > > > From: Ananyev, Konstantin, Thursday, January 11, 2018 2:40
> > > > > > > > PM
> > > > > > > > > Hi Matan,
> > > > > > > > > > Hi Konstantin
> > > > > > > > > > From: Ananyev, Konstantin, Wednesday, January 10, 2018
> > > > > > > > > > 3:36 PM
> > > > > > > > > > > Hi Matan,
> >  <snip>
> > > > > > > > > > > It is good to see that now scanning/updating
> > > > > > > > > > > rte_eth_dev_data[] is lock protected, but it might
> > > > > > > > > > > be not very plausible to protect both data[] and
> > > > > > > > > > > next_owner_id using the
> > > > > same lock.
> > > > > > > > > >
> > > > > > > > > > I guess you mean to the owner structure in
> > > > > rte_eth_dev_data[port_id].
> > > > > > > > > > The next_owner_id is read by ownership APIs(for owner
> > > > > > > > > > validation), so it
> > > > > > > > > makes sense to use the same lock.
> > > > > > > > > > Actually, why not?
> > > > > > > > >
> > > > > > > > > Well to me next_owner_id and rte_eth_dev_data[] are not
> > > > > > > > > directly
> > > > > > > related.
> > > > > > > > > You may create new owner_id but it doesn't mean you
> > > > > > > > > would update rte_eth_dev_data[] immediately.
> > > > > > > > > And visa-versa - you might just want to update
> > > > > > > > > rte_eth_dev_data[].name or .owner_id.
> > > > > > > > > It is not very good coding practice to use same lock for
> > > > > > > > > non-related data structures.
> > > > > > > > >
> > > > > > > > I see the relation like next:
> > > > > > > > Since the ownership mechanism synchronization is in ethdev
> > > > > > > > responsibility, we must protect against user mistakes as
> > > > > > > > much as we can by
> > > > > > > using the same lock.
> > > > > > > > So, if user try to set by invalid owner (exactly the ID
> > > > > > > > which currently is
> > > > > > > allocated) we can protect on it.
> > > > > > >
> > > > > > > Hmm, not sure why you can't do same checking with different
> > > > > > > lock or atomic variable?
> > > > > > >
> > > > > > The set ownership API is protected by ownership lock and
> > > > > > checks the owner ID validity By reading the next owner ID.
> > > > > > So, the owner ID allocation and set API should use the same
> > > > > > atomic
> > > > > mechanism.
> > > > >
> > > > > Sure but all you are doing for checking validity, is  check that
> > > > > owner_id > 0 &&& owner_id < next_ownwe_id, right?
> > > > > As you don't allow owner_id overlap (16/3248 bits) you can
> > > > > safely do same check with just atomic_get(&next_owner_id).
> > > > >
> > > > It will not protect it, scenario:
> > > > - current next_id is X.
> > > > - call set ownership of port A with owner id X by thread 0(by user
> mistake).
> > > > - context switch
> > > > - allocate new id by thread 1 and get X and change next_id to X+1
> > > atomically.
> > > > -  context switch
> > > > - Thread 0 validate X by atomic_read and succeed to take ownership.
> > > > - The system loosed the port(or will be managed by two entities) -
> crash.
> > >
> > >
> > > Ok, and how using lock will protect you with such scenario?
> >
> > The owner set API validation by thread 0 should fail because the owner
> validation is included in the protected section.
> 
> Then your validation function would fail even if you'll use atomic ops instead
> of lock.
No.
With atomic this specific scenario will cause the validation to pass.
With lock no next_id changes can be done while the thread is in the set API. 

> But in fact your code is not protected for that scenario - doesn't matter will
> you'll use lock or atomic ops.
> Let's considerer your current code with the following scenario:
> 
> next_owner_id  == 1
> 1) Process 0:
>      rte_eth_dev_owner_new(&owner_id);
>      now owner_id == 1 and next_owner_id == 2
> 2) Process 1 (by mistake):
>     rte_eth_dev_owner_set(port_id=1, owner->id=1); It will complete
> successfully, as owner_id ==1 is considered as valid.
> 3) Process 0:
>       rte_eth_dev_owner_set(port_id=1, owner->id=1); It will also complete
> with success, as owner->id is valid is equal to current port owner_id.
> So you finished with 2 processes assuming that they do own exclusively then
> same port.
> 
> Honestly in that situation  locking around nest_owner_id wouldn't give you
> any advantages over atomic ops.
> 

This is a different scenario that we can't protect on it with atomic or locks.
But for the first scenario I described I think we can.
Please read it again, I described it step by step.

> >
> > > I don't think you can protect yourself against such scenario with or
> > > without locking.
> > > Unless you'll make it harder for the mis-behaving thread to guess
> > > valid owner_id, or add some extra logic here.
> > >
> > > >
> > > >
> > > > > > The set(and others) ownership APIs already uses the ownership
> > > > > > lock so I
> > > > > think it makes sense to use the same lock also in ID allocation.
> > > > > >
> > > > > > > > > > > In fact, for next_owner_id, you don't need a lock -
> > > > > > > > > > > just rte_atomic_t should be enough.
> > > > > > > > > >
> > > > > > > > > > I don't think so, it is problematic in next_owner_id
> > > > > > > > > > wraparound and may
> > > > > > > > > complicate the code in other places which read it.
> > > > > > > > >
> > > > > > > > > IMO it is not that complicated, something like that
> > > > > > > > > should work I
> > > think.
> > > > > > > > >
> > > > > > > > > /* init to 0 at startup*/ rte_atomic32_t *owner_id;
> > > > > > > > >
> > > > > > > > > int new_owner_id(void)
> > > > > > > > > {
> > > > > > > > >     int32_t x;
> > > > > > > > >     x = rte_atomic32_add_return(&owner_id, 1);
> > > > > > > > >     if (x > UINT16_MAX) {
> > > > > > > > >        rte_atomic32_dec(&owner_id);
> > > > > > > > >        return -EOVERWLOW;
> > > > > > > > >     } else
> > > > > > > > >         return x;
> > > > > > > > > }
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > > Why not just to keep it simple and using the same lock?
> > > > > > > > >
> > > > > > > > > Lock is also fine, I just think it better be a separate
> > > > > > > > > one
> > > > > > > > > - that would protext just next_owner_id.
> > > > > > > > > Though if you are going to use uuid here - all that
> > > > > > > > > probably not relevant any more.
> > > > > > > > >
> > > > > > > >
> > > > > > > > I agree about the uuid but still think the same lock
> > > > > > > > should be used for
> > > > > both.
> > > > > > >
> > > > > > > But with uuid you don't need next_owner_id at all, right?
> > > > > > > So lock will only be used for rte_eth_dev_data[] fields anyway.
> > > > > > >
> > > > > > Sorry, I meant uint64_t, not uuid.
> > > > >
> > > > > Ah ok, my thought uuid_t is better as with it you don't need to
> > > > > support your own code to allocate new owner_id, but rely on
> > > > > system libs
> > > instead.
> > > > > But wouldn't insist here.
> > > > >
> > > > > >
> > > > > > > > > > > Another alternative would be to use 2 locks - one
> > > > > > > > > > > for next_owner_id second for actual data[] protection.
> > > > > > > > > > >
> > > > > > > > > > > Another thing - you'll probably need to grab/release
> > > > > > > > > > > a lock inside
> > > > > > > > > > > rte_eth_dev_allocated() too.
> > > > > > > > > > > It is a public function used by drivers, so need to
> > > > > > > > > > > be protected
> > > too.
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > Yes, I thought about it, but decided not to use lock in next:
> > > > > > > > > > rte_eth_dev_allocated
> > > > > > > > > > rte_eth_dev_count
> > > > > > > > > > rte_eth_dev_get_name_by_port
> > > rte_eth_dev_get_port_by_name
> > > > > > > > > > maybe more...
> > > > > > > > >
> > > > > > > > > As I can see in patch #3 you protect by lock access to
> > > > > > > > > rte_eth_dev_data[].name (which seems like a good  thing).
> > > > > > > > > So I think any other public function that access
> > > > > > > > > rte_eth_dev_data[].name should be protected by the same
> lock.
> > > > > > > > >
> > > > > > > >
> > > > > > > > I don't think so, I can understand to use the ownership
> > > > > > > > lock here(as in port
> > > > > > > creation) but I don't think it is necessary too.
> > > > > > > > What are we exactly protecting here?
> > > > > > > > Don't you think it is just timing?(ask in the next moment
> > > > > > > > and you may get another answer) I don't see optional crash.
> > > > > > >
> > > > > > > Not sure what you mean here by timing...
> > > > > > > As I understand rte_eth_dev_data[].name unique identifies
> > > > > > > device and is used by  port allocation/release/find functions.
> > > > > > > As you stated above:
> > > > > > > "1. The port allocation and port release synchronization
> > > > > > > will be managed by ethdev."
> > > > > > > To me it means that ethdev layer has to make sure that all
> > > > > > > accesses to rte_eth_dev_data[].name are atomic.
> > > > > > > Otherwise what would prevent the situation when one process
> > > > > > > does
> > > > > > > rte_eth_dev_allocate()->snprintf(rte_eth_dev_data[x].name,
> > > > > > > ...) while second one does
> > > > > rte_eth_dev_allocated(rte_eth_dev_data[x].name, ...) ?
> > > > > > >
> > > > > > The second will get True or False and that is it.
> > > > >
> > > > > Under race condition - in the worst case it might crash, though
> > > > > for that you'll have to be really unlucky.
> > > > > Though in most cases as you said it would just not operate correctly.
> > > > > I think if we start to protect dev->name by lock we need to do
> > > > > it for all instances (both read and write).
> > > > >
> > > > Since under the ownership rules, the user must take ownership of a
> > > > port
> > > before using it, I still don't see a problem here.
> > >
> > > I am not talking about owner id or name here.
> > > I am talking about dev->name.
> > >
> > So? The user still should take ownership of a device before using it (by
> name or by port id).
> > It can just read it without owning it, but no managing it.
> >
> > > > Please, Can you describe specific crash scenario and explain how
> > > > could the
> > > locking fix it?
> > >
> > > Let say thread 0 doing rte_eth_dev_allocate()-
> > > >snprintf(rte_eth_dev_data[x].name, ...), thread 1 doing
> > > rte_pmd_ring_remove()->rte_eth_dev_allocated()->strcmp().
> > > And because of race condition - rte_eth_dev_allocated() will return
> > > rte_eth_dev * for the wrong device.
> > Which wrong device do you mean? I guess it is the device which currently is
> being created by thread 0.
> > > Then rte_pmd_ring_remove() will call rte_free() for related
> > > resources, while It can still be in use by someone else.
> > The rte_pmd_ring_remove caller(some DPDK entity) must take ownership
> > (or validate that he is the owner) of a port before doing it(free, release), so
> no issue here.
> 
> Forget about ownership for a second.
> Suppose we have a process it created ring port for itself (without setting any
> ownership)  and used it for some time.
> Then it decided to remove it, so it calls rte_pmd_ring_remove() for it.
> At the same time second process decides to call rte_eth_dev_allocate() (let
> say for anither ring port).
> They could collide trying to read (process 0) and modify (process 1) same
> string rte_eth_dev_data[].name.
>
Do you mean that process 0 will compare successfully the process 1 new port name?
The state are in local process memory - so process 0 will not compare the process 1 port, from its point of view this port is in UNUSED state. 

> Konstantin
> 
> >
> >
> > Also I'm not sure I fully understand your scenario looks like moving
> > the device state setting in allocation to be after the name setting will be
> good.
> > What do you think?
> >
> > > Konstantin
> > >
> > > >
> > > > > > Maybe if it had been called just a moment after, It might get
> > > > > > different
> > > > > answer.
> > > > > > Because these APIs don't change ethdev structure(just read),
> > > > > > it can be
> > > OK.
> > > > > > But again, I can understand to use ownership lock also here.
> > > > > >
> > > > >
> > > > > Konstantin

Ananyev, Konstantin Jan. 17, 2018, 11:24 a.m. UTC | #13

Hi Matan,

> Hi Konstantin
> 
> From: Ananyev, Konstantin, Tuesday, January 16, 2018 9:11 PM
> > Hi Matan,
> >
> > >
> > > Hi Konstantin
> > > From: Ananyev, Konstantin, Monday, January 15, 2018 8:44 PM
> > > > Hi Matan,
> > > > > Hi Konstantin
> > > > > From: Ananyev, Konstantin, Monday, January 15, 2018 1:45 PM
> > > > > > Hi Matan,
> > > > > > > Hi Konstantin
> > > > > > > From: Ananyev, Konstantin, Friday, January 12, 2018 2:02 AM
> > > > > > > > Hi Matan,
> > > > > > > > > Hi Konstantin
> > > > > > > > > From: Ananyev, Konstantin, Thursday, January 11, 2018 2:40
> > > > > > > > > PM
> > > > > > > > > > Hi Matan,
> > > > > > > > > > > Hi Konstantin
> > > > > > > > > > > From: Ananyev, Konstantin, Wednesday, January 10, 2018
> > > > > > > > > > > 3:36 PM
> > > > > > > > > > > > Hi Matan,
> > >  <snip>
> > > > > > > > > > > > It is good to see that now scanning/updating
> > > > > > > > > > > > rte_eth_dev_data[] is lock protected, but it might
> > > > > > > > > > > > be not very plausible to protect both data[] and
> > > > > > > > > > > > next_owner_id using the
> > > > > > same lock.
> > > > > > > > > > >
> > > > > > > > > > > I guess you mean to the owner structure in
> > > > > > rte_eth_dev_data[port_id].
> > > > > > > > > > > The next_owner_id is read by ownership APIs(for owner
> > > > > > > > > > > validation), so it
> > > > > > > > > > makes sense to use the same lock.
> > > > > > > > > > > Actually, why not?
> > > > > > > > > >
> > > > > > > > > > Well to me next_owner_id and rte_eth_dev_data[] are not
> > > > > > > > > > directly
> > > > > > > > related.
> > > > > > > > > > You may create new owner_id but it doesn't mean you
> > > > > > > > > > would update rte_eth_dev_data[] immediately.
> > > > > > > > > > And visa-versa - you might just want to update
> > > > > > > > > > rte_eth_dev_data[].name or .owner_id.
> > > > > > > > > > It is not very good coding practice to use same lock for
> > > > > > > > > > non-related data structures.
> > > > > > > > > >
> > > > > > > > > I see the relation like next:
> > > > > > > > > Since the ownership mechanism synchronization is in ethdev
> > > > > > > > > responsibility, we must protect against user mistakes as
> > > > > > > > > much as we can by
> > > > > > > > using the same lock.
> > > > > > > > > So, if user try to set by invalid owner (exactly the ID
> > > > > > > > > which currently is
> > > > > > > > allocated) we can protect on it.
> > > > > > > >
> > > > > > > > Hmm, not sure why you can't do same checking with different
> > > > > > > > lock or atomic variable?
> > > > > > > >
> > > > > > > The set ownership API is protected by ownership lock and
> > > > > > > checks the owner ID validity By reading the next owner ID.
> > > > > > > So, the owner ID allocation and set API should use the same
> > > > > > > atomic
> > > > > > mechanism.
> > > > > >
> > > > > > Sure but all you are doing for checking validity, is  check that
> > > > > > owner_id > 0 &&& owner_id < next_ownwe_id, right?
> > > > > > As you don't allow owner_id overlap (16/3248 bits) you can
> > > > > > safely do same check with just atomic_get(&next_owner_id).
> > > > > >
> > > > > It will not protect it, scenario:
> > > > > - current next_id is X.
> > > > > - call set ownership of port A with owner id X by thread 0(by user
> > mistake).
> > > > > - context switch
> > > > > - allocate new id by thread 1 and get X and change next_id to X+1
> > > > atomically.
> > > > > -  context switch
> > > > > - Thread 0 validate X by atomic_read and succeed to take ownership.
> > > > > - The system loosed the port(or will be managed by two entities) -
> > crash.
> > > >
> > > >
> > > > Ok, and how using lock will protect you with such scenario?
> > >
> > > The owner set API validation by thread 0 should fail because the owner
> > validation is included in the protected section.
> >
> > Then your validation function would fail even if you'll use atomic ops instead
> > of lock.
> No.
> With atomic this specific scenario will cause the validation to pass.

Can you explain to me how?

rte_eth_is_valid_owner_id(uint16_t owner_id)
{
              int32_t cur_owner_id = RTE_MIN(rte_atomic32_get(next_owner_id), UINT16_MAX);

	if (owner_id == RTE_ETH_DEV_NO_OWNER || owner > cur_owner_id) {
		RTE_LOG(ERR, EAL, "Invalid owner_id=%d.\n", owner_id);
		return 0;
	}
	return 1;
}

Let say your next_owne_id==X, and you invoke rte_eth_is_valid_owner_id(owner_id=X+1)  -
it would fail.

> With lock no next_id changes can be done while the thread is in the set API.
> 
> > But in fact your code is not protected for that scenario - doesn't matter will
> > you'll use lock or atomic ops.
> > Let's considerer your current code with the following scenario:
> >
> > next_owner_id  == 1
> > 1) Process 0:
> >      rte_eth_dev_owner_new(&owner_id);
> >      now owner_id == 1 and next_owner_id == 2
> > 2) Process 1 (by mistake):
> >     rte_eth_dev_owner_set(port_id=1, owner->id=1); It will complete
> > successfully, as owner_id ==1 is considered as valid.
> > 3) Process 0:
> >       rte_eth_dev_owner_set(port_id=1, owner->id=1); It will also complete
> > with success, as owner->id is valid is equal to current port owner_id.
> > So you finished with 2 processes assuming that they do own exclusively then
> > same port.
> >
> > Honestly in that situation  locking around nest_owner_id wouldn't give you
> > any advantages over atomic ops.
> >
> 
> This is a different scenario that we can't protect on it with atomic or locks.
> But for the first scenario I described I think we can.
> Please read it again, I described it step by step.
> 
> > >
> > > > I don't think you can protect yourself against such scenario with or
> > > > without locking.
> > > > Unless you'll make it harder for the mis-behaving thread to guess
> > > > valid owner_id, or add some extra logic here.
> > > >
> > > > >
> > > > >
> > > > > > > The set(and others) ownership APIs already uses the ownership
> > > > > > > lock so I
> > > > > > think it makes sense to use the same lock also in ID allocation.
> > > > > > >
> > > > > > > > > > > > In fact, for next_owner_id, you don't need a lock -
> > > > > > > > > > > > just rte_atomic_t should be enough.
> > > > > > > > > > >
> > > > > > > > > > > I don't think so, it is problematic in next_owner_id
> > > > > > > > > > > wraparound and may
> > > > > > > > > > complicate the code in other places which read it.
> > > > > > > > > >
> > > > > > > > > > IMO it is not that complicated, something like that
> > > > > > > > > > should work I
> > > > think.
> > > > > > > > > >
> > > > > > > > > > /* init to 0 at startup*/ rte_atomic32_t *owner_id;
> > > > > > > > > >
> > > > > > > > > > int new_owner_id(void)
> > > > > > > > > > {
> > > > > > > > > >     int32_t x;
> > > > > > > > > >     x = rte_atomic32_add_return(&owner_id, 1);
> > > > > > > > > >     if (x > UINT16_MAX) {
> > > > > > > > > >        rte_atomic32_dec(&owner_id);
> > > > > > > > > >        return -EOVERWLOW;
> > > > > > > > > >     } else
> > > > > > > > > >         return x;
> > > > > > > > > > }
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > > Why not just to keep it simple and using the same lock?
> > > > > > > > > >
> > > > > > > > > > Lock is also fine, I just think it better be a separate
> > > > > > > > > > one
> > > > > > > > > > - that would protext just next_owner_id.
> > > > > > > > > > Though if you are going to use uuid here - all that
> > > > > > > > > > probably not relevant any more.
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > > I agree about the uuid but still think the same lock
> > > > > > > > > should be used for
> > > > > > both.
> > > > > > > >
> > > > > > > > But with uuid you don't need next_owner_id at all, right?
> > > > > > > > So lock will only be used for rte_eth_dev_data[] fields anyway.
> > > > > > > >
> > > > > > > Sorry, I meant uint64_t, not uuid.
> > > > > >
> > > > > > Ah ok, my thought uuid_t is better as with it you don't need to
> > > > > > support your own code to allocate new owner_id, but rely on
> > > > > > system libs
> > > > instead.
> > > > > > But wouldn't insist here.
> > > > > >
> > > > > > >
> > > > > > > > > > > > Another alternative would be to use 2 locks - one
> > > > > > > > > > > > for next_owner_id second for actual data[] protection.
> > > > > > > > > > > >
> > > > > > > > > > > > Another thing - you'll probably need to grab/release
> > > > > > > > > > > > a lock inside
> > > > > > > > > > > > rte_eth_dev_allocated() too.
> > > > > > > > > > > > It is a public function used by drivers, so need to
> > > > > > > > > > > > be protected
> > > > too.
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > Yes, I thought about it, but decided not to use lock in next:
> > > > > > > > > > > rte_eth_dev_allocated
> > > > > > > > > > > rte_eth_dev_count
> > > > > > > > > > > rte_eth_dev_get_name_by_port
> > > > rte_eth_dev_get_port_by_name
> > > > > > > > > > > maybe more...
> > > > > > > > > >
> > > > > > > > > > As I can see in patch #3 you protect by lock access to
> > > > > > > > > > rte_eth_dev_data[].name (which seems like a good  thing).
> > > > > > > > > > So I think any other public function that access
> > > > > > > > > > rte_eth_dev_data[].name should be protected by the same
> > lock.
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > > I don't think so, I can understand to use the ownership
> > > > > > > > > lock here(as in port
> > > > > > > > creation) but I don't think it is necessary too.
> > > > > > > > > What are we exactly protecting here?
> > > > > > > > > Don't you think it is just timing?(ask in the next moment
> > > > > > > > > and you may get another answer) I don't see optional crash.
> > > > > > > >
> > > > > > > > Not sure what you mean here by timing...
> > > > > > > > As I understand rte_eth_dev_data[].name unique identifies
> > > > > > > > device and is used by  port allocation/release/find functions.
> > > > > > > > As you stated above:
> > > > > > > > "1. The port allocation and port release synchronization
> > > > > > > > will be managed by ethdev."
> > > > > > > > To me it means that ethdev layer has to make sure that all
> > > > > > > > accesses to rte_eth_dev_data[].name are atomic.
> > > > > > > > Otherwise what would prevent the situation when one process
> > > > > > > > does
> > > > > > > > rte_eth_dev_allocate()->snprintf(rte_eth_dev_data[x].name,
> > > > > > > > ...) while second one does
> > > > > > rte_eth_dev_allocated(rte_eth_dev_data[x].name, ...) ?
> > > > > > > >
> > > > > > > The second will get True or False and that is it.
> > > > > >
> > > > > > Under race condition - in the worst case it might crash, though
> > > > > > for that you'll have to be really unlucky.
> > > > > > Though in most cases as you said it would just not operate correctly.
> > > > > > I think if we start to protect dev->name by lock we need to do
> > > > > > it for all instances (both read and write).
> > > > > >
> > > > > Since under the ownership rules, the user must take ownership of a
> > > > > port
> > > > before using it, I still don't see a problem here.
> > > >
> > > > I am not talking about owner id or name here.
> > > > I am talking about dev->name.
> > > >
> > > So? The user still should take ownership of a device before using it (by
> > name or by port id).
> > > It can just read it without owning it, but no managing it.
> > >
> > > > > Please, Can you describe specific crash scenario and explain how
> > > > > could the
> > > > locking fix it?
> > > >
> > > > Let say thread 0 doing rte_eth_dev_allocate()-
> > > > >snprintf(rte_eth_dev_data[x].name, ...), thread 1 doing
> > > > rte_pmd_ring_remove()->rte_eth_dev_allocated()->strcmp().
> > > > And because of race condition - rte_eth_dev_allocated() will return
> > > > rte_eth_dev * for the wrong device.
> > > Which wrong device do you mean? I guess it is the device which currently is
> > being created by thread 0.
> > > > Then rte_pmd_ring_remove() will call rte_free() for related
> > > > resources, while It can still be in use by someone else.
> > > The rte_pmd_ring_remove caller(some DPDK entity) must take ownership
> > > (or validate that he is the owner) of a port before doing it(free, release), so
> > no issue here.
> >
> > Forget about ownership for a second.
> > Suppose we have a process it created ring port for itself (without setting any
> > ownership)  and used it for some time.
> > Then it decided to remove it, so it calls rte_pmd_ring_remove() for it.
> > At the same time second process decides to call rte_eth_dev_allocate() (let
> > say for anither ring port).
> > They could collide trying to read (process 0) and modify (process 1) same
> > string rte_eth_dev_data[].name.
> >
> Do you mean that process 0 will compare successfully the process 1 new port name?

Yes.

> The state are in local process memory - so process 0 will not compare the process 1 port, from its point of view this port is in UNUSED
> state.
>

Ok, and why it can't be in attached state in process 0 too?
Konstantin
 
> > Konstantin
> >
> > >
> > >
> > > Also I'm not sure I fully understand your scenario looks like moving
> > > the device state setting in allocation to be after the name setting will be
> > good.
> > > What do you think?
> > >
> > > > Konstantin
> > > >
> > > > >
> > > > > > > Maybe if it had been called just a moment after, It might get
> > > > > > > different
> > > > > > answer.
> > > > > > > Because these APIs don't change ethdev structure(just read),
> > > > > > > it can be
> > > > OK.
> > > > > > > But again, I can understand to use ownership lock also here.
> > > > > > >
> > > > > >
> > > > > > Konstantin

Matan Azrad Jan. 17, 2018, 12:05 p.m. UTC | #14

Hi Konstantin
From: Ananyev, Konstantin, Sent: Wednesday, January 17, 2018 1:24 PM
> Hi Matan,
> 
> > Hi Konstantin
> >
> > From: Ananyev, Konstantin, Tuesday, January 16, 2018 9:11 PM
> > > Hi Matan,
> > >
> > > >
> > > > Hi Konstantin
> > > > From: Ananyev, Konstantin, Monday, January 15, 2018 8:44 PM
> > > > > Hi Matan,
> > > > > > Hi Konstantin
> > > > > > From: Ananyev, Konstantin, Monday, January 15, 2018 1:45 PM
> > > > > > > Hi Matan,
> > > > > > > > Hi Konstantin
> > > > > > > > From: Ananyev, Konstantin, Friday, January 12, 2018 2:02
> > > > > > > > AM
> > > > > > > > > Hi Matan,
> > > > > > > > > > Hi Konstantin
> > > > > > > > > > From: Ananyev, Konstantin, Thursday, January 11, 2018
> > > > > > > > > > 2:40 PM
> > > > > > > > > > > Hi Matan,
> > > > > > > > > > > > Hi Konstantin
> > > > > > > > > > > > From: Ananyev, Konstantin, Wednesday, January 10,
> > > > > > > > > > > > 2018
> > > > > > > > > > > > 3:36 PM
> > > > > > > > > > > > > Hi Matan,
> > > >  <snip>
> > > > > > > > > > > > > It is good to see that now scanning/updating
> > > > > > > > > > > > > rte_eth_dev_data[] is lock protected, but it
> > > > > > > > > > > > > might be not very plausible to protect both
> > > > > > > > > > > > > data[] and next_owner_id using the
> > > > > > > same lock.
> > > > > > > > > > > >
> > > > > > > > > > > > I guess you mean to the owner structure in
> > > > > > > rte_eth_dev_data[port_id].
> > > > > > > > > > > > The next_owner_id is read by ownership APIs(for
> > > > > > > > > > > > owner validation), so it
> > > > > > > > > > > makes sense to use the same lock.
> > > > > > > > > > > > Actually, why not?
> > > > > > > > > > >
> > > > > > > > > > > Well to me next_owner_id and rte_eth_dev_data[] are
> > > > > > > > > > > not directly
> > > > > > > > > related.
> > > > > > > > > > > You may create new owner_id but it doesn't mean you
> > > > > > > > > > > would update rte_eth_dev_data[] immediately.
> > > > > > > > > > > And visa-versa - you might just want to update
> > > > > > > > > > > rte_eth_dev_data[].name or .owner_id.
> > > > > > > > > > > It is not very good coding practice to use same lock
> > > > > > > > > > > for non-related data structures.
> > > > > > > > > > >
> > > > > > > > > > I see the relation like next:
> > > > > > > > > > Since the ownership mechanism synchronization is in
> > > > > > > > > > ethdev responsibility, we must protect against user
> > > > > > > > > > mistakes as much as we can by
> > > > > > > > > using the same lock.
> > > > > > > > > > So, if user try to set by invalid owner (exactly the
> > > > > > > > > > ID which currently is
> > > > > > > > > allocated) we can protect on it.
> > > > > > > > >
> > > > > > > > > Hmm, not sure why you can't do same checking with
> > > > > > > > > different lock or atomic variable?
> > > > > > > > >
> > > > > > > > The set ownership API is protected by ownership lock and
> > > > > > > > checks the owner ID validity By reading the next owner ID.
> > > > > > > > So, the owner ID allocation and set API should use the
> > > > > > > > same atomic
> > > > > > > mechanism.
> > > > > > >
> > > > > > > Sure but all you are doing for checking validity, is  check
> > > > > > > that owner_id > 0 &&& owner_id < next_ownwe_id, right?
> > > > > > > As you don't allow owner_id overlap (16/3248 bits) you can
> > > > > > > safely do same check with just atomic_get(&next_owner_id).
> > > > > > >
> > > > > > It will not protect it, scenario:
> > > > > > - current next_id is X.
> > > > > > - call set ownership of port A with owner id X by thread 0(by
> > > > > > user
> > > mistake).
> > > > > > - context switch
> > > > > > - allocate new id by thread 1 and get X and change next_id to
> > > > > > X+1
> > > > > atomically.
> > > > > > -  context switch
> > > > > > - Thread 0 validate X by atomic_read and succeed to take
> ownership.
> > > > > > - The system loosed the port(or will be managed by two
> > > > > > entities) -
> > > crash.
> > > > >
> > > > >
> > > > > Ok, and how using lock will protect you with such scenario?
> > > >
> > > > The owner set API validation by thread 0 should fail because the
> > > > owner
> > > validation is included in the protected section.
> > >
> > > Then your validation function would fail even if you'll use atomic
> > > ops instead of lock.
> > No.
> > With atomic this specific scenario will cause the validation to pass.
> 
> Can you explain to me how?
> 
> rte_eth_is_valid_owner_id(uint16_t owner_id) {
>               int32_t cur_owner_id = RTE_MIN(rte_atomic32_get(next_owner_id),
> UINT16_MAX);
> 
> 	if (owner_id == RTE_ETH_DEV_NO_OWNER || owner >
> cur_owner_id) {
> 		RTE_LOG(ERR, EAL, "Invalid owner_id=%d.\n", owner_id);
> 		return 0;
> 	}
> 	return 1;
> }
> 
> Let say your next_owne_id==X, and you invoke
> rte_eth_is_valid_owner_id(owner_id=X+1)  - it would fail.

Explanation:
The scenario with locks:
next_owner_id = X.
Thread 0 call to set API(with invalid owner Y=X) and take lock.
Context switch.
Thread 1 call to owner_new and stuck in the lock.
Context switch.
Thread 0 does owner id validation and failed(Y>=X) - unlock the lock and return failure to the user.
Context switch.
Thread 1 take the lock and update X to X+1, then, unlock the lock.
Everything is OK!

The same scenario with atomics:
next_owner_id = X.
Thread 0 call to set API(with invalid owner Y=X) and take lock.
Context switch.
Thread 1 call to owner_new and change X to X+1(atomically).
Context switch.
Thread 0 does owner id validation and success(Y<(atomic)X+1) - unlock the lock and return success to the  user.
Problem!
 
> > With lock no next_id changes can be done while the thread is in the set
> API.
> >
> > > But in fact your code is not protected for that scenario - doesn't
> > > matter will you'll use lock or atomic ops.
> > > Let's considerer your current code with the following scenario:
> > >
> > > next_owner_id  == 1
> > > 1) Process 0:
> > >      rte_eth_dev_owner_new(&owner_id);
> > >      now owner_id == 1 and next_owner_id == 2
> > > 2) Process 1 (by mistake):
> > >     rte_eth_dev_owner_set(port_id=1, owner->id=1); It will complete
> > > successfully, as owner_id ==1 is considered as valid.
> > > 3) Process 0:
> > >       rte_eth_dev_owner_set(port_id=1, owner->id=1); It will also
> > > complete with success, as owner->id is valid is equal to current port
> owner_id.
> > > So you finished with 2 processes assuming that they do own
> > > exclusively then same port.
> > >
> > > Honestly in that situation  locking around nest_owner_id wouldn't
> > > give you any advantages over atomic ops.
> > >
> >
> > This is a different scenario that we can't protect on it with atomic or locks.
> > But for the first scenario I described I think we can.
> > Please read it again, I described it step by step.
> >
> > > >
> > > > > I don't think you can protect yourself against such scenario
> > > > > with or without locking.
> > > > > Unless you'll make it harder for the mis-behaving thread to
> > > > > guess valid owner_id, or add some extra logic here.
> > > > >
> > > > > >
> > > > > >
> > > > > > > > The set(and others) ownership APIs already uses the
> > > > > > > > ownership lock so I
> > > > > > > think it makes sense to use the same lock also in ID allocation.
> > > > > > > >
> > > > > > > > > > > > > In fact, for next_owner_id, you don't need a
> > > > > > > > > > > > > lock - just rte_atomic_t should be enough.
> > > > > > > > > > > >
> > > > > > > > > > > > I don't think so, it is problematic in
> > > > > > > > > > > > next_owner_id wraparound and may
> > > > > > > > > > > complicate the code in other places which read it.
> > > > > > > > > > >
> > > > > > > > > > > IMO it is not that complicated, something like that
> > > > > > > > > > > should work I
> > > > > think.
> > > > > > > > > > >
> > > > > > > > > > > /* init to 0 at startup*/ rte_atomic32_t *owner_id;
> > > > > > > > > > >
> > > > > > > > > > > int new_owner_id(void) {
> > > > > > > > > > >     int32_t x;
> > > > > > > > > > >     x = rte_atomic32_add_return(&owner_id, 1);
> > > > > > > > > > >     if (x > UINT16_MAX) {
> > > > > > > > > > >        rte_atomic32_dec(&owner_id);
> > > > > > > > > > >        return -EOVERWLOW;
> > > > > > > > > > >     } else
> > > > > > > > > > >         return x;
> > > > > > > > > > > }
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > > Why not just to keep it simple and using the same lock?
> > > > > > > > > > >
> > > > > > > > > > > Lock is also fine, I just think it better be a separate
> > > > > > > > > > > one
> > > > > > > > > > > - that would protext just next_owner_id.
> > > > > > > > > > > Though if you are going to use uuid here - all that
> > > > > > > > > > > probably not relevant any more.
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > I agree about the uuid but still think the same lock
> > > > > > > > > > should be used for
> > > > > > > both.
> > > > > > > > >
> > > > > > > > > But with uuid you don't need next_owner_id at all, right?
> > > > > > > > > So lock will only be used for rte_eth_dev_data[] fields
> anyway.
> > > > > > > > >
> > > > > > > > Sorry, I meant uint64_t, not uuid.
> > > > > > >
> > > > > > > Ah ok, my thought uuid_t is better as with it you don't need to
> > > > > > > support your own code to allocate new owner_id, but rely on
> > > > > > > system libs
> > > > > instead.
> > > > > > > But wouldn't insist here.
> > > > > > >
> > > > > > > >
> > > > > > > > > > > > > Another alternative would be to use 2 locks - one
> > > > > > > > > > > > > for next_owner_id second for actual data[] protection.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Another thing - you'll probably need to grab/release
> > > > > > > > > > > > > a lock inside
> > > > > > > > > > > > > rte_eth_dev_allocated() too.
> > > > > > > > > > > > > It is a public function used by drivers, so need to
> > > > > > > > > > > > > be protected
> > > > > too.
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > Yes, I thought about it, but decided not to use lock in
> next:
> > > > > > > > > > > > rte_eth_dev_allocated
> > > > > > > > > > > > rte_eth_dev_count
> > > > > > > > > > > > rte_eth_dev_get_name_by_port
> > > > > rte_eth_dev_get_port_by_name
> > > > > > > > > > > > maybe more...
> > > > > > > > > > >
> > > > > > > > > > > As I can see in patch #3 you protect by lock access to
> > > > > > > > > > > rte_eth_dev_data[].name (which seems like a good
> thing).
> > > > > > > > > > > So I think any other public function that access
> > > > > > > > > > > rte_eth_dev_data[].name should be protected by the
> same
> > > lock.
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > I don't think so, I can understand to use the ownership
> > > > > > > > > > lock here(as in port
> > > > > > > > > creation) but I don't think it is necessary too.
> > > > > > > > > > What are we exactly protecting here?
> > > > > > > > > > Don't you think it is just timing?(ask in the next moment
> > > > > > > > > > and you may get another answer) I don't see optional crash.
> > > > > > > > >
> > > > > > > > > Not sure what you mean here by timing...
> > > > > > > > > As I understand rte_eth_dev_data[].name unique identifies
> > > > > > > > > device and is used by  port allocation/release/find functions.
> > > > > > > > > As you stated above:
> > > > > > > > > "1. The port allocation and port release synchronization
> > > > > > > > > will be managed by ethdev."
> > > > > > > > > To me it means that ethdev layer has to make sure that all
> > > > > > > > > accesses to rte_eth_dev_data[].name are atomic.
> > > > > > > > > Otherwise what would prevent the situation when one
> process
> > > > > > > > > does
> > > > > > > > > rte_eth_dev_allocate()->snprintf(rte_eth_dev_data[x].name,
> > > > > > > > > ...) while second one does
> > > > > > > rte_eth_dev_allocated(rte_eth_dev_data[x].name, ...) ?
> > > > > > > > >
> > > > > > > > The second will get True or False and that is it.
> > > > > > >
> > > > > > > Under race condition - in the worst case it might crash, though
> > > > > > > for that you'll have to be really unlucky.
> > > > > > > Though in most cases as you said it would just not operate
> correctly.
> > > > > > > I think if we start to protect dev->name by lock we need to do
> > > > > > > it for all instances (both read and write).
> > > > > > >
> > > > > > Since under the ownership rules, the user must take ownership of a
> > > > > > port
> > > > > before using it, I still don't see a problem here.
> > > > >
> > > > > I am not talking about owner id or name here.
> > > > > I am talking about dev->name.
> > > > >
> > > > So? The user still should take ownership of a device before using it (by
> > > name or by port id).
> > > > It can just read it without owning it, but no managing it.
> > > >
> > > > > > Please, Can you describe specific crash scenario and explain how
> > > > > > could the
> > > > > locking fix it?
> > > > >
> > > > > Let say thread 0 doing rte_eth_dev_allocate()-
> > > > > >snprintf(rte_eth_dev_data[x].name, ...), thread 1 doing
> > > > > rte_pmd_ring_remove()->rte_eth_dev_allocated()->strcmp().
> > > > > And because of race condition - rte_eth_dev_allocated() will return
> > > > > rte_eth_dev * for the wrong device.
> > > > Which wrong device do you mean? I guess it is the device which
> currently is
> > > being created by thread 0.
> > > > > Then rte_pmd_ring_remove() will call rte_free() for related
> > > > > resources, while It can still be in use by someone else.
> > > > The rte_pmd_ring_remove caller(some DPDK entity) must take
> ownership
> > > > (or validate that he is the owner) of a port before doing it(free,
> release), so
> > > no issue here.
> > >
> > > Forget about ownership for a second.
> > > Suppose we have a process it created ring port for itself (without setting
> any
> > > ownership)  and used it for some time.
> > > Then it decided to remove it, so it calls rte_pmd_ring_remove() for it.
> > > At the same time second process decides to call rte_eth_dev_allocate()
> (let
> > > say for anither ring port).
> > > They could collide trying to read (process 0) and modify (process 1) same
> > > string rte_eth_dev_data[].name.
> > >
> > Do you mean that process 0 will compare successfully the process 1 new
> port name?
> 
> Yes.
> 
> > The state are in local process memory - so process 0 will not compare the
> process 1 port, from its point of view this port is in UNUSED
> > state.
> >
> 
> Ok, and why it can't be in attached state in process 0 too?

Someone in process 0 should attach it using protected attach_secondary somewhere in your scenario.


> Konstantin
> 
> > > Konstantin
> > >
> > > >
> > > >
> > > > Also I'm not sure I fully understand your scenario looks like moving
> > > > the device state setting in allocation to be after the name setting will be
> > > good.
> > > > What do you think?
> > > >
> > > > > Konstantin
> > > > >
> > > > > >
> > > > > > > > Maybe if it had been called just a moment after, It might get
> > > > > > > > different
> > > > > > > answer.
> > > > > > > > Because these APIs don't change ethdev structure(just read),
> > > > > > > > it can be
> > > > > OK.
> > > > > > > > But again, I can understand to use ownership lock also here.
> > > > > > > >
> > > > > > >
> > > > > > > Konstantin

Ananyev, Konstantin Jan. 17, 2018, 12:54 p.m. UTC | #15

> 
> 
> Hi Konstantin
> From: Ananyev, Konstantin, Sent: Wednesday, January 17, 2018 1:24 PM
> > Hi Matan,
> >
> > > Hi Konstantin
> > >
> > > From: Ananyev, Konstantin, Tuesday, January 16, 2018 9:11 PM
> > > > Hi Matan,
> > > >
> > > > >
> > > > > Hi Konstantin
> > > > > From: Ananyev, Konstantin, Monday, January 15, 2018 8:44 PM
> > > > > > Hi Matan,
> > > > > > > Hi Konstantin
> > > > > > > From: Ananyev, Konstantin, Monday, January 15, 2018 1:45 PM
> > > > > > > > Hi Matan,
> > > > > > > > > Hi Konstantin
> > > > > > > > > From: Ananyev, Konstantin, Friday, January 12, 2018 2:02
> > > > > > > > > AM
> > > > > > > > > > Hi Matan,
> > > > > > > > > > > Hi Konstantin
> > > > > > > > > > > From: Ananyev, Konstantin, Thursday, January 11, 2018
> > > > > > > > > > > 2:40 PM
> > > > > > > > > > > > Hi Matan,
> > > > > > > > > > > > > Hi Konstantin
> > > > > > > > > > > > > From: Ananyev, Konstantin, Wednesday, January 10,
> > > > > > > > > > > > > 2018
> > > > > > > > > > > > > 3:36 PM
> > > > > > > > > > > > > > Hi Matan,
> > > > >  <snip>
> > > > > > > > > > > > > > It is good to see that now scanning/updating
> > > > > > > > > > > > > > rte_eth_dev_data[] is lock protected, but it
> > > > > > > > > > > > > > might be not very plausible to protect both
> > > > > > > > > > > > > > data[] and next_owner_id using the
> > > > > > > > same lock.
> > > > > > > > > > > > >
> > > > > > > > > > > > > I guess you mean to the owner structure in
> > > > > > > > rte_eth_dev_data[port_id].
> > > > > > > > > > > > > The next_owner_id is read by ownership APIs(for
> > > > > > > > > > > > > owner validation), so it
> > > > > > > > > > > > makes sense to use the same lock.
> > > > > > > > > > > > > Actually, why not?
> > > > > > > > > > > >
> > > > > > > > > > > > Well to me next_owner_id and rte_eth_dev_data[] are
> > > > > > > > > > > > not directly
> > > > > > > > > > related.
> > > > > > > > > > > > You may create new owner_id but it doesn't mean you
> > > > > > > > > > > > would update rte_eth_dev_data[] immediately.
> > > > > > > > > > > > And visa-versa - you might just want to update
> > > > > > > > > > > > rte_eth_dev_data[].name or .owner_id.
> > > > > > > > > > > > It is not very good coding practice to use same lock
> > > > > > > > > > > > for non-related data structures.
> > > > > > > > > > > >
> > > > > > > > > > > I see the relation like next:
> > > > > > > > > > > Since the ownership mechanism synchronization is in
> > > > > > > > > > > ethdev responsibility, we must protect against user
> > > > > > > > > > > mistakes as much as we can by
> > > > > > > > > > using the same lock.
> > > > > > > > > > > So, if user try to set by invalid owner (exactly the
> > > > > > > > > > > ID which currently is
> > > > > > > > > > allocated) we can protect on it.
> > > > > > > > > >
> > > > > > > > > > Hmm, not sure why you can't do same checking with
> > > > > > > > > > different lock or atomic variable?
> > > > > > > > > >
> > > > > > > > > The set ownership API is protected by ownership lock and
> > > > > > > > > checks the owner ID validity By reading the next owner ID.
> > > > > > > > > So, the owner ID allocation and set API should use the
> > > > > > > > > same atomic
> > > > > > > > mechanism.
> > > > > > > >
> > > > > > > > Sure but all you are doing for checking validity, is  check
> > > > > > > > that owner_id > 0 &&& owner_id < next_ownwe_id, right?
> > > > > > > > As you don't allow owner_id overlap (16/3248 bits) you can
> > > > > > > > safely do same check with just atomic_get(&next_owner_id).
> > > > > > > >
> > > > > > > It will not protect it, scenario:
> > > > > > > - current next_id is X.
> > > > > > > - call set ownership of port A with owner id X by thread 0(by
> > > > > > > user
> > > > mistake).
> > > > > > > - context switch
> > > > > > > - allocate new id by thread 1 and get X and change next_id to
> > > > > > > X+1
> > > > > > atomically.
> > > > > > > -  context switch
> > > > > > > - Thread 0 validate X by atomic_read and succeed to take
> > ownership.
> > > > > > > - The system loosed the port(or will be managed by two
> > > > > > > entities) -
> > > > crash.
> > > > > >
> > > > > >
> > > > > > Ok, and how using lock will protect you with such scenario?
> > > > >
> > > > > The owner set API validation by thread 0 should fail because the
> > > > > owner
> > > > validation is included in the protected section.
> > > >
> > > > Then your validation function would fail even if you'll use atomic
> > > > ops instead of lock.
> > > No.
> > > With atomic this specific scenario will cause the validation to pass.
> >
> > Can you explain to me how?
> >
> > rte_eth_is_valid_owner_id(uint16_t owner_id) {
> >               int32_t cur_owner_id = RTE_MIN(rte_atomic32_get(next_owner_id),
> > UINT16_MAX);
> >
> > 	if (owner_id == RTE_ETH_DEV_NO_OWNER || owner >
> > cur_owner_id) {
> > 		RTE_LOG(ERR, EAL, "Invalid owner_id=%d.\n", owner_id);
> > 		return 0;
> > 	}
> > 	return 1;
> > }
> >
> > Let say your next_owne_id==X, and you invoke
> > rte_eth_is_valid_owner_id(owner_id=X+1)  - it would fail.
> 
> Explanation:
> The scenario with locks:
> next_owner_id = X.
> Thread 0 call to set API(with invalid owner Y=X) and take lock.

Ok I see what you mean.
But, as I said before, if thread 0 will grab the lock first - you'll experience the same failure.
I understand now that by some reason you treat these two scenarios as something different,
but for me it is pretty much the same case.
And to me it means that neither lock, neither atomic can fully protect you here.

> Context switch.
> Thread 1 call to owner_new and stuck in the lock.
> Context switch.
> Thread 0 does owner id validation and failed(Y>=X) - unlock the lock and return failure to the user.
> Context switch.
> Thread 1 take the lock and update X to X+1, then, unlock the lock.
> Everything is OK!
> 
> The same scenario with atomics:
> next_owner_id = X.
> Thread 0 call to set API(with invalid owner Y=X) and take lock.
> Context switch.
> Thread 1 call to owner_new and change X to X+1(atomically).
> Context switch.
> Thread 0 does owner id validation and success(Y<(atomic)X+1) - unlock the lock and return success to the  user.
> Problem!
> 
> > > With lock no next_id changes can be done while the thread is in the set
> > API.
> > >
> > > > But in fact your code is not protected for that scenario - doesn't
> > > > matter will you'll use lock or atomic ops.
> > > > Let's considerer your current code with the following scenario:
> > > >
> > > > next_owner_id  == 1
> > > > 1) Process 0:
> > > >      rte_eth_dev_owner_new(&owner_id);
> > > >      now owner_id == 1 and next_owner_id == 2
> > > > 2) Process 1 (by mistake):
> > > >     rte_eth_dev_owner_set(port_id=1, owner->id=1); It will complete
> > > > successfully, as owner_id ==1 is considered as valid.
> > > > 3) Process 0:
> > > >       rte_eth_dev_owner_set(port_id=1, owner->id=1); It will also
> > > > complete with success, as owner->id is valid is equal to current port
> > owner_id.
> > > > So you finished with 2 processes assuming that they do own
> > > > exclusively then same port.
> > > >
> > > > Honestly in that situation  locking around nest_owner_id wouldn't
> > > > give you any advantages over atomic ops.
> > > >
> > >
> > > This is a different scenario that we can't protect on it with atomic or locks.
> > > But for the first scenario I described I think we can.
> > > Please read it again, I described it step by step.
> > >
> > > > >
> > > > > > I don't think you can protect yourself against such scenario
> > > > > > with or without locking.
> > > > > > Unless you'll make it harder for the mis-behaving thread to
> > > > > > guess valid owner_id, or add some extra logic here.
> > > > > >
> > > > > > >
> > > > > > >
> > > > > > > > > The set(and others) ownership APIs already uses the
> > > > > > > > > ownership lock so I
> > > > > > > > think it makes sense to use the same lock also in ID allocation.
> > > > > > > > >
> > > > > > > > > > > > > > In fact, for next_owner_id, you don't need a
> > > > > > > > > > > > > > lock - just rte_atomic_t should be enough.
> > > > > > > > > > > > >
> > > > > > > > > > > > > I don't think so, it is problematic in
> > > > > > > > > > > > > next_owner_id wraparound and may
> > > > > > > > > > > > complicate the code in other places which read it.
> > > > > > > > > > > >
> > > > > > > > > > > > IMO it is not that complicated, something like that
> > > > > > > > > > > > should work I
> > > > > > think.
> > > > > > > > > > > >
> > > > > > > > > > > > /* init to 0 at startup*/ rte_atomic32_t *owner_id;
> > > > > > > > > > > >
> > > > > > > > > > > > int new_owner_id(void) {
> > > > > > > > > > > >     int32_t x;
> > > > > > > > > > > >     x = rte_atomic32_add_return(&owner_id, 1);
> > > > > > > > > > > >     if (x > UINT16_MAX) {
> > > > > > > > > > > >        rte_atomic32_dec(&owner_id);
> > > > > > > > > > > >        return -EOVERWLOW;
> > > > > > > > > > > >     } else
> > > > > > > > > > > >         return x;
> > > > > > > > > > > > }
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > > Why not just to keep it simple and using the same lock?
> > > > > > > > > > > >
> > > > > > > > > > > > Lock is also fine, I just think it better be a separate
> > > > > > > > > > > > one
> > > > > > > > > > > > - that would protext just next_owner_id.
> > > > > > > > > > > > Though if you are going to use uuid here - all that
> > > > > > > > > > > > probably not relevant any more.
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > I agree about the uuid but still think the same lock
> > > > > > > > > > > should be used for
> > > > > > > > both.
> > > > > > > > > >
> > > > > > > > > > But with uuid you don't need next_owner_id at all, right?
> > > > > > > > > > So lock will only be used for rte_eth_dev_data[] fields
> > anyway.
> > > > > > > > > >
> > > > > > > > > Sorry, I meant uint64_t, not uuid.
> > > > > > > >
> > > > > > > > Ah ok, my thought uuid_t is better as with it you don't need to
> > > > > > > > support your own code to allocate new owner_id, but rely on
> > > > > > > > system libs
> > > > > > instead.
> > > > > > > > But wouldn't insist here.
> > > > > > > >
> > > > > > > > >
> > > > > > > > > > > > > > Another alternative would be to use 2 locks - one
> > > > > > > > > > > > > > for next_owner_id second for actual data[] protection.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Another thing - you'll probably need to grab/release
> > > > > > > > > > > > > > a lock inside
> > > > > > > > > > > > > > rte_eth_dev_allocated() too.
> > > > > > > > > > > > > > It is a public function used by drivers, so need to
> > > > > > > > > > > > > > be protected
> > > > > > too.
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > Yes, I thought about it, but decided not to use lock in
> > next:
> > > > > > > > > > > > > rte_eth_dev_allocated
> > > > > > > > > > > > > rte_eth_dev_count
> > > > > > > > > > > > > rte_eth_dev_get_name_by_port
> > > > > > rte_eth_dev_get_port_by_name
> > > > > > > > > > > > > maybe more...
> > > > > > > > > > > >
> > > > > > > > > > > > As I can see in patch #3 you protect by lock access to
> > > > > > > > > > > > rte_eth_dev_data[].name (which seems like a good
> > thing).
> > > > > > > > > > > > So I think any other public function that access
> > > > > > > > > > > > rte_eth_dev_data[].name should be protected by the
> > same
> > > > lock.
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > I don't think so, I can understand to use the ownership
> > > > > > > > > > > lock here(as in port
> > > > > > > > > > creation) but I don't think it is necessary too.
> > > > > > > > > > > What are we exactly protecting here?
> > > > > > > > > > > Don't you think it is just timing?(ask in the next moment
> > > > > > > > > > > and you may get another answer) I don't see optional crash.
> > > > > > > > > >
> > > > > > > > > > Not sure what you mean here by timing...
> > > > > > > > > > As I understand rte_eth_dev_data[].name unique identifies
> > > > > > > > > > device and is used by  port allocation/release/find functions.
> > > > > > > > > > As you stated above:
> > > > > > > > > > "1. The port allocation and port release synchronization
> > > > > > > > > > will be managed by ethdev."
> > > > > > > > > > To me it means that ethdev layer has to make sure that all
> > > > > > > > > > accesses to rte_eth_dev_data[].name are atomic.
> > > > > > > > > > Otherwise what would prevent the situation when one
> > process
> > > > > > > > > > does
> > > > > > > > > > rte_eth_dev_allocate()->snprintf(rte_eth_dev_data[x].name,
> > > > > > > > > > ...) while second one does
> > > > > > > > rte_eth_dev_allocated(rte_eth_dev_data[x].name, ...) ?
> > > > > > > > > >
> > > > > > > > > The second will get True or False and that is it.
> > > > > > > >
> > > > > > > > Under race condition - in the worst case it might crash, though
> > > > > > > > for that you'll have to be really unlucky.
> > > > > > > > Though in most cases as you said it would just not operate
> > correctly.
> > > > > > > > I think if we start to protect dev->name by lock we need to do
> > > > > > > > it for all instances (both read and write).
> > > > > > > >
> > > > > > > Since under the ownership rules, the user must take ownership of a
> > > > > > > port
> > > > > > before using it, I still don't see a problem here.
> > > > > >
> > > > > > I am not talking about owner id or name here.
> > > > > > I am talking about dev->name.
> > > > > >
> > > > > So? The user still should take ownership of a device before using it (by
> > > > name or by port id).
> > > > > It can just read it without owning it, but no managing it.
> > > > >
> > > > > > > Please, Can you describe specific crash scenario and explain how
> > > > > > > could the
> > > > > > locking fix it?
> > > > > >
> > > > > > Let say thread 0 doing rte_eth_dev_allocate()-
> > > > > > >snprintf(rte_eth_dev_data[x].name, ...), thread 1 doing
> > > > > > rte_pmd_ring_remove()->rte_eth_dev_allocated()->strcmp().
> > > > > > And because of race condition - rte_eth_dev_allocated() will return
> > > > > > rte_eth_dev * for the wrong device.
> > > > > Which wrong device do you mean? I guess it is the device which
> > currently is
> > > > being created by thread 0.
> > > > > > Then rte_pmd_ring_remove() will call rte_free() for related
> > > > > > resources, while It can still be in use by someone else.
> > > > > The rte_pmd_ring_remove caller(some DPDK entity) must take
> > ownership
> > > > > (or validate that he is the owner) of a port before doing it(free,
> > release), so
> > > > no issue here.
> > > >
> > > > Forget about ownership for a second.
> > > > Suppose we have a process it created ring port for itself (without setting
> > any
> > > > ownership)  and used it for some time.
> > > > Then it decided to remove it, so it calls rte_pmd_ring_remove() for it.
> > > > At the same time second process decides to call rte_eth_dev_allocate()
> > (let
> > > > say for anither ring port).
> > > > They could collide trying to read (process 0) and modify (process 1) same
> > > > string rte_eth_dev_data[].name.
> > > >
> > > Do you mean that process 0 will compare successfully the process 1 new
> > port name?
> >
> > Yes.
> >
> > > The state are in local process memory - so process 0 will not compare the
> > process 1 port, from its point of view this port is in UNUSED
> > > state.
> > >
> >
> > Ok, and why it can't be in attached state in process 0 too?
> 
> Someone in process 0 should attach it using protected attach_secondary somewhere in your scenario.

Yes, process 0 can have this port attached too, why not?
Konstantin

> 
> 
> > Konstantin
> >
> > > > Konstantin
> > > >
> > > > >
> > > > >
> > > > > Also I'm not sure I fully understand your scenario looks like moving
> > > > > the device state setting in allocation to be after the name setting will be
> > > > good.
> > > > > What do you think?
> > > > >
> > > > > > Konstantin
> > > > > >
> > > > > > >
> > > > > > > > > Maybe if it had been called just a moment after, It might get
> > > > > > > > > different
> > > > > > > > answer.
> > > > > > > > > Because these APIs don't change ethdev structure(just read),
> > > > > > > > > it can be
> > > > > > OK.
> > > > > > > > > But again, I can understand to use ownership lock also here.
> > > > > > > > >
> > > > > > > >
> > > > > > > > Konstantin

Matan Azrad Jan. 17, 2018, 1:10 p.m. UTC | #16

Hi Konstantin
From: Ananyev, Konstantin, Wednesday, January 17, 2018 2:55 PM
> >
> >
> > Hi Konstantin
> > From: Ananyev, Konstantin, Sent: Wednesday, January 17, 2018 1:24 PM
> > > Hi Matan,
> > >
> > > > Hi Konstantin
> > > >
> > > > From: Ananyev, Konstantin, Tuesday, January 16, 2018 9:11 PM
> > > > > Hi Matan,
> > > > >
> > > > > >
> > > > > > Hi Konstantin
> > > > > > From: Ananyev, Konstantin, Monday, January 15, 2018 8:44 PM
> > > > > > > Hi Matan,
> > > > > > > > Hi Konstantin
> > > > > > > > From: Ananyev, Konstantin, Monday, January 15, 2018 1:45
> > > > > > > > PM
> > > > > > > > > Hi Matan,
> > > > > > > > > > Hi Konstantin
> > > > > > > > > > From: Ananyev, Konstantin, Friday, January 12, 2018
> > > > > > > > > > 2:02 AM
> > > > > > > > > > > Hi Matan,
> > > > > > > > > > > > Hi Konstantin
> > > > > > > > > > > > From: Ananyev, Konstantin, Thursday, January 11,
> > > > > > > > > > > > 2018
> > > > > > > > > > > > 2:40 PM
> > > > > > > > > > > > > Hi Matan,
> > > > > > > > > > > > > > Hi Konstantin
> > > > > > > > > > > > > > From: Ananyev, Konstantin, Wednesday, January
> > > > > > > > > > > > > > 10,
> > > > > > > > > > > > > > 2018
> > > > > > > > > > > > > > 3:36 PM
> > > > > > > > > > > > > > > Hi Matan,
> > > > > >  <snip>
> > > > > > > > > > > > > > > It is good to see that now scanning/updating
> > > > > > > > > > > > > > > rte_eth_dev_data[] is lock protected, but it
> > > > > > > > > > > > > > > might be not very plausible to protect both
> > > > > > > > > > > > > > > data[] and next_owner_id using the
> > > > > > > > > same lock.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > I guess you mean to the owner structure in
> > > > > > > > > rte_eth_dev_data[port_id].
> > > > > > > > > > > > > > The next_owner_id is read by ownership
> > > > > > > > > > > > > > APIs(for owner validation), so it
> > > > > > > > > > > > > makes sense to use the same lock.
> > > > > > > > > > > > > > Actually, why not?
> > > > > > > > > > > > >
> > > > > > > > > > > > > Well to me next_owner_id and rte_eth_dev_data[]
> > > > > > > > > > > > > are not directly
> > > > > > > > > > > related.
> > > > > > > > > > > > > You may create new owner_id but it doesn't mean
> > > > > > > > > > > > > you would update rte_eth_dev_data[] immediately.
> > > > > > > > > > > > > And visa-versa - you might just want to update
> > > > > > > > > > > > > rte_eth_dev_data[].name or .owner_id.
> > > > > > > > > > > > > It is not very good coding practice to use same
> > > > > > > > > > > > > lock for non-related data structures.
> > > > > > > > > > > > >
> > > > > > > > > > > > I see the relation like next:
> > > > > > > > > > > > Since the ownership mechanism synchronization is
> > > > > > > > > > > > in ethdev responsibility, we must protect against
> > > > > > > > > > > > user mistakes as much as we can by
> > > > > > > > > > > using the same lock.
> > > > > > > > > > > > So, if user try to set by invalid owner (exactly
> > > > > > > > > > > > the ID which currently is
> > > > > > > > > > > allocated) we can protect on it.
> > > > > > > > > > >
> > > > > > > > > > > Hmm, not sure why you can't do same checking with
> > > > > > > > > > > different lock or atomic variable?
> > > > > > > > > > >
> > > > > > > > > > The set ownership API is protected by ownership lock
> > > > > > > > > > and checks the owner ID validity By reading the next owner
> ID.
> > > > > > > > > > So, the owner ID allocation and set API should use the
> > > > > > > > > > same atomic
> > > > > > > > > mechanism.
> > > > > > > > >
> > > > > > > > > Sure but all you are doing for checking validity, is
> > > > > > > > > check that owner_id > 0 &&& owner_id < next_ownwe_id,
> right?
> > > > > > > > > As you don't allow owner_id overlap (16/3248 bits) you
> > > > > > > > > can safely do same check with just
> atomic_get(&next_owner_id).
> > > > > > > > >
> > > > > > > > It will not protect it, scenario:
> > > > > > > > - current next_id is X.
> > > > > > > > - call set ownership of port A with owner id X by thread
> > > > > > > > 0(by user
> > > > > mistake).
> > > > > > > > - context switch
> > > > > > > > - allocate new id by thread 1 and get X and change next_id
> > > > > > > > to
> > > > > > > > X+1
> > > > > > > atomically.
> > > > > > > > -  context switch
> > > > > > > > - Thread 0 validate X by atomic_read and succeed to take
> > > ownership.
> > > > > > > > - The system loosed the port(or will be managed by two
> > > > > > > > entities) -
> > > > > crash.
> > > > > > >
> > > > > > >
> > > > > > > Ok, and how using lock will protect you with such scenario?
> > > > > >
> > > > > > The owner set API validation by thread 0 should fail because
> > > > > > the owner
> > > > > validation is included in the protected section.
> > > > >
> > > > > Then your validation function would fail even if you'll use
> > > > > atomic ops instead of lock.
> > > > No.
> > > > With atomic this specific scenario will cause the validation to pass.
> > >
> > > Can you explain to me how?
> > >
> > > rte_eth_is_valid_owner_id(uint16_t owner_id) {
> > >               int32_t cur_owner_id =
> > > RTE_MIN(rte_atomic32_get(next_owner_id),
> > > UINT16_MAX);
> > >
> > > 	if (owner_id == RTE_ETH_DEV_NO_OWNER || owner >
> > > cur_owner_id) {
> > > 		RTE_LOG(ERR, EAL, "Invalid owner_id=%d.\n", owner_id);
> > > 		return 0;
> > > 	}
> > > 	return 1;
> > > }
> > >
> > > Let say your next_owne_id==X, and you invoke
> > > rte_eth_is_valid_owner_id(owner_id=X+1)  - it would fail.
> >
> > Explanation:
> > The scenario with locks:
> > next_owner_id = X.
> > Thread 0 call to set API(with invalid owner Y=X) and take lock.
> 
> Ok I see what you mean.
> But, as I said before, if thread 0 will grab the lock first - you'll experience the
> same failure.
> I understand now that by some reason you treat these two scenarios as
> something different, but for me it is pretty much the same case.
> And to me it means that neither lock, neither atomic can fully protect you
> here.
> 

I agree that we are not fully protected even when using locks but one lock are more protected than ether atomics or 2 different locks.
So, I think keeping it as is (with one lock) makes sense.

> > Context switch.
> > Thread 1 call to owner_new and stuck in the lock.
> > Context switch.
> > Thread 0 does owner id validation and failed(Y>=X) - unlock the lock and
> return failure to the user.
> > Context switch.
> > Thread 1 take the lock and update X to X+1, then, unlock the lock.
> > Everything is OK!
> >
> > The same scenario with atomics:
> > next_owner_id = X.
> > Thread 0 call to set API(with invalid owner Y=X) and take lock.
> > Context switch.
> > Thread 1 call to owner_new and change X to X+1(atomically).
> > Context switch.
> > Thread 0 does owner id validation and success(Y<(atomic)X+1) - unlock the
> lock and return success to the  user.
> > Problem!
> >
> > > > With lock no next_id changes can be done while the thread is in
> > > > the set
> > > API.
> > > >
> > > > > But in fact your code is not protected for that scenario -
> > > > > doesn't matter will you'll use lock or atomic ops.
> > > > > Let's considerer your current code with the following scenario:
> > > > >
> > > > > next_owner_id  == 1
> > > > > 1) Process 0:
> > > > >      rte_eth_dev_owner_new(&owner_id);
> > > > >      now owner_id == 1 and next_owner_id == 2
> > > > > 2) Process 1 (by mistake):
> > > > >     rte_eth_dev_owner_set(port_id=1, owner->id=1); It will
> > > > > complete successfully, as owner_id ==1 is considered as valid.
> > > > > 3) Process 0:
> > > > >       rte_eth_dev_owner_set(port_id=1, owner->id=1); It will
> > > > > also complete with success, as owner->id is valid is equal to
> > > > > current port
> > > owner_id.
> > > > > So you finished with 2 processes assuming that they do own
> > > > > exclusively then same port.
> > > > >
> > > > > Honestly in that situation  locking around nest_owner_id
> > > > > wouldn't give you any advantages over atomic ops.
> > > > >
> > > >
> > > > This is a different scenario that we can't protect on it with atomic or
> locks.
> > > > But for the first scenario I described I think we can.
> > > > Please read it again, I described it step by step.
> > > >
> > > > > >
> > > > > > > I don't think you can protect yourself against such scenario
> > > > > > > with or without locking.
> > > > > > > Unless you'll make it harder for the mis-behaving thread to
> > > > > > > guess valid owner_id, or add some extra logic here.
> > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > > > The set(and others) ownership APIs already uses the
> > > > > > > > > > ownership lock so I
> > > > > > > > > think it makes sense to use the same lock also in ID allocation.
> > > > > > > > > >
> > > > > > > > > > > > > > > In fact, for next_owner_id, you don't need a
> > > > > > > > > > > > > > > lock - just rte_atomic_t should be enough.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > I don't think so, it is problematic in
> > > > > > > > > > > > > > next_owner_id wraparound and may
> > > > > > > > > > > > > complicate the code in other places which read it.
> > > > > > > > > > > > >
> > > > > > > > > > > > > IMO it is not that complicated, something like
> > > > > > > > > > > > > that should work I
> > > > > > > think.
> > > > > > > > > > > > >
> > > > > > > > > > > > > /* init to 0 at startup*/ rte_atomic32_t
> > > > > > > > > > > > > *owner_id;
> > > > > > > > > > > > >
> > > > > > > > > > > > > int new_owner_id(void) {
> > > > > > > > > > > > >     int32_t x;
> > > > > > > > > > > > >     x = rte_atomic32_add_return(&owner_id, 1);
> > > > > > > > > > > > >     if (x > UINT16_MAX) {
> > > > > > > > > > > > >        rte_atomic32_dec(&owner_id);
> > > > > > > > > > > > >        return -EOVERWLOW;
> > > > > > > > > > > > >     } else
> > > > > > > > > > > > >         return x; }
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > > Why not just to keep it simple and using the same
> lock?
> > > > > > > > > > > > >
> > > > > > > > > > > > > Lock is also fine, I just think it better be a separate
> > > > > > > > > > > > > one
> > > > > > > > > > > > > - that would protext just next_owner_id.
> > > > > > > > > > > > > Though if you are going to use uuid here - all that
> > > > > > > > > > > > > probably not relevant any more.
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > I agree about the uuid but still think the same lock
> > > > > > > > > > > > should be used for
> > > > > > > > > both.
> > > > > > > > > > >
> > > > > > > > > > > But with uuid you don't need next_owner_id at all, right?
> > > > > > > > > > > So lock will only be used for rte_eth_dev_data[] fields
> > > anyway.
> > > > > > > > > > >
> > > > > > > > > > Sorry, I meant uint64_t, not uuid.
> > > > > > > > >
> > > > > > > > > Ah ok, my thought uuid_t is better as with it you don't need to
> > > > > > > > > support your own code to allocate new owner_id, but rely on
> > > > > > > > > system libs
> > > > > > > instead.
> > > > > > > > > But wouldn't insist here.
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > > > > > > Another alternative would be to use 2 locks - one
> > > > > > > > > > > > > > > for next_owner_id second for actual data[]
> protection.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Another thing - you'll probably need to
> grab/release
> > > > > > > > > > > > > > > a lock inside
> > > > > > > > > > > > > > > rte_eth_dev_allocated() too.
> > > > > > > > > > > > > > > It is a public function used by drivers, so need to
> > > > > > > > > > > > > > > be protected
> > > > > > > too.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Yes, I thought about it, but decided not to use lock in
> > > next:
> > > > > > > > > > > > > > rte_eth_dev_allocated
> > > > > > > > > > > > > > rte_eth_dev_count
> > > > > > > > > > > > > > rte_eth_dev_get_name_by_port
> > > > > > > rte_eth_dev_get_port_by_name
> > > > > > > > > > > > > > maybe more...
> > > > > > > > > > > > >
> > > > > > > > > > > > > As I can see in patch #3 you protect by lock access to
> > > > > > > > > > > > > rte_eth_dev_data[].name (which seems like a good
> > > thing).
> > > > > > > > > > > > > So I think any other public function that access
> > > > > > > > > > > > > rte_eth_dev_data[].name should be protected by the
> > > same
> > > > > lock.
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > I don't think so, I can understand to use the ownership
> > > > > > > > > > > > lock here(as in port
> > > > > > > > > > > creation) but I don't think it is necessary too.
> > > > > > > > > > > > What are we exactly protecting here?
> > > > > > > > > > > > Don't you think it is just timing?(ask in the next moment
> > > > > > > > > > > > and you may get another answer) I don't see optional
> crash.
> > > > > > > > > > >
> > > > > > > > > > > Not sure what you mean here by timing...
> > > > > > > > > > > As I understand rte_eth_dev_data[].name unique
> identifies
> > > > > > > > > > > device and is used by  port allocation/release/find
> functions.
> > > > > > > > > > > As you stated above:
> > > > > > > > > > > "1. The port allocation and port release synchronization
> > > > > > > > > > > will be managed by ethdev."
> > > > > > > > > > > To me it means that ethdev layer has to make sure that all
> > > > > > > > > > > accesses to rte_eth_dev_data[].name are atomic.
> > > > > > > > > > > Otherwise what would prevent the situation when one
> > > process
> > > > > > > > > > > does
> > > > > > > > > > > rte_eth_dev_allocate()-
> >snprintf(rte_eth_dev_data[x].name,
> > > > > > > > > > > ...) while second one does
> > > > > > > > > rte_eth_dev_allocated(rte_eth_dev_data[x].name, ...) ?
> > > > > > > > > > >
> > > > > > > > > > The second will get True or False and that is it.
> > > > > > > > >
> > > > > > > > > Under race condition - in the worst case it might crash, though
> > > > > > > > > for that you'll have to be really unlucky.
> > > > > > > > > Though in most cases as you said it would just not operate
> > > correctly.
> > > > > > > > > I think if we start to protect dev->name by lock we need to do
> > > > > > > > > it for all instances (both read and write).
> > > > > > > > >
> > > > > > > > Since under the ownership rules, the user must take ownership
> of a
> > > > > > > > port
> > > > > > > before using it, I still don't see a problem here.
> > > > > > >
> > > > > > > I am not talking about owner id or name here.
> > > > > > > I am talking about dev->name.
> > > > > > >
> > > > > > So? The user still should take ownership of a device before using it
> (by
> > > > > name or by port id).
> > > > > > It can just read it without owning it, but no managing it.
> > > > > >
> > > > > > > > Please, Can you describe specific crash scenario and explain how
> > > > > > > > could the
> > > > > > > locking fix it?
> > > > > > >
> > > > > > > Let say thread 0 doing rte_eth_dev_allocate()-
> > > > > > > >snprintf(rte_eth_dev_data[x].name, ...), thread 1 doing
> > > > > > > rte_pmd_ring_remove()->rte_eth_dev_allocated()->strcmp().
> > > > > > > And because of race condition - rte_eth_dev_allocated() will
> return
> > > > > > > rte_eth_dev * for the wrong device.
> > > > > > Which wrong device do you mean? I guess it is the device which
> > > currently is
> > > > > being created by thread 0.
> > > > > > > Then rte_pmd_ring_remove() will call rte_free() for related
> > > > > > > resources, while It can still be in use by someone else.
> > > > > > The rte_pmd_ring_remove caller(some DPDK entity) must take
> > > ownership
> > > > > > (or validate that he is the owner) of a port before doing it(free,
> > > release), so
> > > > > no issue here.
> > > > >
> > > > > Forget about ownership for a second.
> > > > > Suppose we have a process it created ring port for itself (without
> setting
> > > any
> > > > > ownership)  and used it for some time.
> > > > > Then it decided to remove it, so it calls rte_pmd_ring_remove() for it.
> > > > > At the same time second process decides to call
> rte_eth_dev_allocate()
> > > (let
> > > > > say for anither ring port).
> > > > > They could collide trying to read (process 0) and modify (process 1)
> same
> > > > > string rte_eth_dev_data[].name.
> > > > >
> > > > Do you mean that process 0 will compare successfully the process 1
> new
> > > port name?
> > >
> > > Yes.
> > >
> > > > The state are in local process memory - so process 0 will not compare
> the
> > > process 1 port, from its point of view this port is in UNUSED
> > > > state.
> > > >
> > >
> > > Ok, and why it can't be in attached state in process 0 too?
> >
> > Someone in process 0 should attach it using protected attach_secondary
> somewhere in your scenario.
> 
> Yes, process 0 can have this port attached too, why not?
See the function with inline comments:

struct rte_eth_dev *
rte_eth_dev_allocated(const char *name)
{
	unsigned i;

	for (i = 0; i < RTE_MAX_ETHPORTS; i++) {

	    	The below state are in local process memory,
		So, if here process 1 will allocate a new port (the current i), update its local state to ATTACHED and write the name,
		the state is not visible by process 0 until someone in process 0 will attach it by rte_eth_dev_attach_secondary.
		So, to use rte_eth_dev_attach_secondary process 0 must take the lock and it can't, because it is currently locked by process 1.

		if ((rte_eth_devices[i].state == RTE_ETH_DEV_ATTACHED) &&
		strcmp(rte_eth_devices[i].data->name, name) == 0)
			return &rte_eth_devices[i];
	}
	return NULL;


> Konstantin
> 
> >
> >
> > > Konstantin
> > >
> > > > > Konstantin
> > > > >
> > > > > >
> > > > > >
> > > > > > Also I'm not sure I fully understand your scenario looks like moving
> > > > > > the device state setting in allocation to be after the name setting
> will be
> > > > > good.
> > > > > > What do you think?
> > > > > >
> > > > > > > Konstantin
> > > > > > >
> > > > > > > >
> > > > > > > > > > Maybe if it had been called just a moment after, It might get
> > > > > > > > > > different
> > > > > > > > > answer.
> > > > > > > > > > Because these APIs don't change ethdev structure(just
> read),
> > > > > > > > > > it can be
> > > > > > > OK.
> > > > > > > > > > But again, I can understand to use ownership lock also here.
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > > Konstantin

Neil Horman Jan. 17, 2018, 2 p.m. UTC | #17

On Wed, Jan 17, 2018 at 12:05:42PM +0000, Matan Azrad wrote:
> 
> Hi Konstantin
> From: Ananyev, Konstantin, Sent: Wednesday, January 17, 2018 1:24 PM
> > Hi Matan,
> > 
> > > Hi Konstantin
> > >
> > > From: Ananyev, Konstantin, Tuesday, January 16, 2018 9:11 PM
> > > > Hi Matan,
> > > >
> > > > >
> > > > > Hi Konstantin
> > > > > From: Ananyev, Konstantin, Monday, January 15, 2018 8:44 PM
> > > > > > Hi Matan,
> > > > > > > Hi Konstantin
> > > > > > > From: Ananyev, Konstantin, Monday, January 15, 2018 1:45 PM
> > > > > > > > Hi Matan,
> > > > > > > > > Hi Konstantin
> > > > > > > > > From: Ananyev, Konstantin, Friday, January 12, 2018 2:02
> > > > > > > > > AM
> > > > > > > > > > Hi Matan,
> > > > > > > > > > > Hi Konstantin
> > > > > > > > > > > From: Ananyev, Konstantin, Thursday, January 11, 2018
> > > > > > > > > > > 2:40 PM
> > > > > > > > > > > > Hi Matan,
> > > > > > > > > > > > > Hi Konstantin
> > > > > > > > > > > > > From: Ananyev, Konstantin, Wednesday, January 10,
> > > > > > > > > > > > > 2018
> > > > > > > > > > > > > 3:36 PM
> > > > > > > > > > > > > > Hi Matan,
> > > > >  <snip>
> > > > > > > > > > > > > > It is good to see that now scanning/updating
> > > > > > > > > > > > > > rte_eth_dev_data[] is lock protected, but it
> > > > > > > > > > > > > > might be not very plausible to protect both
> > > > > > > > > > > > > > data[] and next_owner_id using the
> > > > > > > > same lock.
> > > > > > > > > > > > >
> > > > > > > > > > > > > I guess you mean to the owner structure in
> > > > > > > > rte_eth_dev_data[port_id].
> > > > > > > > > > > > > The next_owner_id is read by ownership APIs(for
> > > > > > > > > > > > > owner validation), so it
> > > > > > > > > > > > makes sense to use the same lock.
> > > > > > > > > > > > > Actually, why not?
> > > > > > > > > > > >
> > > > > > > > > > > > Well to me next_owner_id and rte_eth_dev_data[] are
> > > > > > > > > > > > not directly
> > > > > > > > > > related.
> > > > > > > > > > > > You may create new owner_id but it doesn't mean you
> > > > > > > > > > > > would update rte_eth_dev_data[] immediately.
> > > > > > > > > > > > And visa-versa - you might just want to update
> > > > > > > > > > > > rte_eth_dev_data[].name or .owner_id.
> > > > > > > > > > > > It is not very good coding practice to use same lock
> > > > > > > > > > > > for non-related data structures.
> > > > > > > > > > > >
> > > > > > > > > > > I see the relation like next:
> > > > > > > > > > > Since the ownership mechanism synchronization is in
> > > > > > > > > > > ethdev responsibility, we must protect against user
> > > > > > > > > > > mistakes as much as we can by
> > > > > > > > > > using the same lock.
> > > > > > > > > > > So, if user try to set by invalid owner (exactly the
> > > > > > > > > > > ID which currently is
> > > > > > > > > > allocated) we can protect on it.
> > > > > > > > > >
> > > > > > > > > > Hmm, not sure why you can't do same checking with
> > > > > > > > > > different lock or atomic variable?
> > > > > > > > > >
> > > > > > > > > The set ownership API is protected by ownership lock and
> > > > > > > > > checks the owner ID validity By reading the next owner ID.
> > > > > > > > > So, the owner ID allocation and set API should use the
> > > > > > > > > same atomic
> > > > > > > > mechanism.
> > > > > > > >
> > > > > > > > Sure but all you are doing for checking validity, is  check
> > > > > > > > that owner_id > 0 &&& owner_id < next_ownwe_id, right?
> > > > > > > > As you don't allow owner_id overlap (16/3248 bits) you can
> > > > > > > > safely do same check with just atomic_get(&next_owner_id).
> > > > > > > >
> > > > > > > It will not protect it, scenario:
> > > > > > > - current next_id is X.
> > > > > > > - call set ownership of port A with owner id X by thread 0(by
> > > > > > > user
> > > > mistake).
> > > > > > > - context switch
> > > > > > > - allocate new id by thread 1 and get X and change next_id to
> > > > > > > X+1
> > > > > > atomically.
> > > > > > > -  context switch
> > > > > > > - Thread 0 validate X by atomic_read and succeed to take
> > ownership.
> > > > > > > - The system loosed the port(or will be managed by two
> > > > > > > entities) -
> > > > crash.
> > > > > >
> > > > > >
> > > > > > Ok, and how using lock will protect you with such scenario?
> > > > >
> > > > > The owner set API validation by thread 0 should fail because the
> > > > > owner
> > > > validation is included in the protected section.
> > > >
> > > > Then your validation function would fail even if you'll use atomic
> > > > ops instead of lock.
> > > No.
> > > With atomic this specific scenario will cause the validation to pass.
> > 
> > Can you explain to me how?
> > 
> > rte_eth_is_valid_owner_id(uint16_t owner_id) {
> >               int32_t cur_owner_id = RTE_MIN(rte_atomic32_get(next_owner_id),
> > UINT16_MAX);
> > 
> > 	if (owner_id == RTE_ETH_DEV_NO_OWNER || owner >
> > cur_owner_id) {
> > 		RTE_LOG(ERR, EAL, "Invalid owner_id=%d.\n", owner_id);
> > 		return 0;
> > 	}
> > 	return 1;
> > }
> > 
> > Let say your next_owne_id==X, and you invoke
> > rte_eth_is_valid_owner_id(owner_id=X+1)  - it would fail.
> 
> Explanation:
> The scenario with locks:
> next_owner_id = X.
> Thread 0 call to set API(with invalid owner Y=X) and take lock.
> Context switch.
> Thread 1 call to owner_new and stuck in the lock.
> Context switch.
> Thread 0 does owner id validation and failed(Y>=X) - unlock the lock and return failure to the user.
> Context switch.
> Thread 1 take the lock and update X to X+1, then, unlock the lock.
> Everything is OK!
> 
> The same scenario with atomics:
> next_owner_id = X.
> Thread 0 call to set API(with invalid owner Y=X) and take lock.
> Context switch.
> Thread 1 call to owner_new and change X to X+1(atomically).
> Context switch.
> Thread 0 does owner id validation and success(Y<(atomic)X+1) - unlock the lock and return success to the  user.
> Problem!
> 


Matan is correct here, there is no way to preform parallel set operations using
just and atomic variable here, because multiple reads of next_owner_id need to
be preformed while it is stable.  That is to say rte_eth_next_owner_id must be
compared to RTE_ETH_DEV_NO_OWNER and owner_id in rte_eth_is_valid_owner_id.  If
you were to only use an atomic_read on such a variable, it could be incremented
by the owner_new function between the checks and an invalid owner value could
become valid because  a third thread incremented the next value.  The state of
next_owner_id must be kept stable during any validity checks

That said, I really have to wonder why ownership ids are really needed here at
all.  It seems this design could be much simpler with the addition of a per-port
lock (and optional ownership record).  The API could consist of three
operations:

ownership_set
ownership_tryset
ownership_release
ownership_get


The first call simply tries to take the per-port lock (blocking if its already
locked)

The second call is a non-blocking version of the first

The third unlocks the port, allowing others to take ownership

The fourth returns whatever ownership record you want to encode with the lock.

The addition of all this id checking seems a bit overcomplicated

Neil

Ananyev, Konstantin Jan. 17, 2018, 4:52 p.m. UTC | #18

Hi Matan,

> 
> Hi Konstantin
> From: Ananyev, Konstantin, Wednesday, January 17, 2018 2:55 PM
> > >
> > >
> > > Hi Konstantin
> > > From: Ananyev, Konstantin, Sent: Wednesday, January 17, 2018 1:24 PM
> > > > Hi Matan,
> > > >
> > > > > Hi Konstantin
> > > > >
> > > > > From: Ananyev, Konstantin, Tuesday, January 16, 2018 9:11 PM
> > > > > > Hi Matan,
> > > > > >
> > > > > > >
> > > > > > > Hi Konstantin
> > > > > > > From: Ananyev, Konstantin, Monday, January 15, 2018 8:44 PM
> > > > > > > > Hi Matan,
> > > > > > > > > Hi Konstantin
> > > > > > > > > From: Ananyev, Konstantin, Monday, January 15, 2018 1:45
> > > > > > > > > PM
> > > > > > > > > > Hi Matan,
> > > > > > > > > > > Hi Konstantin
> > > > > > > > > > > From: Ananyev, Konstantin, Friday, January 12, 2018
> > > > > > > > > > > 2:02 AM
> > > > > > > > > > > > Hi Matan,
> > > > > > > > > > > > > Hi Konstantin
> > > > > > > > > > > > > From: Ananyev, Konstantin, Thursday, January 11,
> > > > > > > > > > > > > 2018
> > > > > > > > > > > > > 2:40 PM
> > > > > > > > > > > > > > Hi Matan,
> > > > > > > > > > > > > > > Hi Konstantin
> > > > > > > > > > > > > > > From: Ananyev, Konstantin, Wednesday, January
> > > > > > > > > > > > > > > 10,
> > > > > > > > > > > > > > > 2018
> > > > > > > > > > > > > > > 3:36 PM
> > > > > > > > > > > > > > > > Hi Matan,
> > > > > > >  <snip>
> > > > > > > > > > > > > > > > It is good to see that now scanning/updating
> > > > > > > > > > > > > > > > rte_eth_dev_data[] is lock protected, but it
> > > > > > > > > > > > > > > > might be not very plausible to protect both
> > > > > > > > > > > > > > > > data[] and next_owner_id using the
> > > > > > > > > > same lock.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > I guess you mean to the owner structure in
> > > > > > > > > > rte_eth_dev_data[port_id].
> > > > > > > > > > > > > > > The next_owner_id is read by ownership
> > > > > > > > > > > > > > > APIs(for owner validation), so it
> > > > > > > > > > > > > > makes sense to use the same lock.
> > > > > > > > > > > > > > > Actually, why not?
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Well to me next_owner_id and rte_eth_dev_data[]
> > > > > > > > > > > > > > are not directly
> > > > > > > > > > > > related.
> > > > > > > > > > > > > > You may create new owner_id but it doesn't mean
> > > > > > > > > > > > > > you would update rte_eth_dev_data[] immediately.
> > > > > > > > > > > > > > And visa-versa - you might just want to update
> > > > > > > > > > > > > > rte_eth_dev_data[].name or .owner_id.
> > > > > > > > > > > > > > It is not very good coding practice to use same
> > > > > > > > > > > > > > lock for non-related data structures.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > I see the relation like next:
> > > > > > > > > > > > > Since the ownership mechanism synchronization is
> > > > > > > > > > > > > in ethdev responsibility, we must protect against
> > > > > > > > > > > > > user mistakes as much as we can by
> > > > > > > > > > > > using the same lock.
> > > > > > > > > > > > > So, if user try to set by invalid owner (exactly
> > > > > > > > > > > > > the ID which currently is
> > > > > > > > > > > > allocated) we can protect on it.
> > > > > > > > > > > >
> > > > > > > > > > > > Hmm, not sure why you can't do same checking with
> > > > > > > > > > > > different lock or atomic variable?
> > > > > > > > > > > >
> > > > > > > > > > > The set ownership API is protected by ownership lock
> > > > > > > > > > > and checks the owner ID validity By reading the next owner
> > ID.
> > > > > > > > > > > So, the owner ID allocation and set API should use the
> > > > > > > > > > > same atomic
> > > > > > > > > > mechanism.
> > > > > > > > > >
> > > > > > > > > > Sure but all you are doing for checking validity, is
> > > > > > > > > > check that owner_id > 0 &&& owner_id < next_ownwe_id,
> > right?
> > > > > > > > > > As you don't allow owner_id overlap (16/3248 bits) you
> > > > > > > > > > can safely do same check with just
> > atomic_get(&next_owner_id).
> > > > > > > > > >
> > > > > > > > > It will not protect it, scenario:
> > > > > > > > > - current next_id is X.
> > > > > > > > > - call set ownership of port A with owner id X by thread
> > > > > > > > > 0(by user
> > > > > > mistake).
> > > > > > > > > - context switch
> > > > > > > > > - allocate new id by thread 1 and get X and change next_id
> > > > > > > > > to
> > > > > > > > > X+1
> > > > > > > > atomically.
> > > > > > > > > -  context switch
> > > > > > > > > - Thread 0 validate X by atomic_read and succeed to take
> > > > ownership.
> > > > > > > > > - The system loosed the port(or will be managed by two
> > > > > > > > > entities) -
> > > > > > crash.
> > > > > > > >
> > > > > > > >
> > > > > > > > Ok, and how using lock will protect you with such scenario?
> > > > > > >
> > > > > > > The owner set API validation by thread 0 should fail because
> > > > > > > the owner
> > > > > > validation is included in the protected section.
> > > > > >
> > > > > > Then your validation function would fail even if you'll use
> > > > > > atomic ops instead of lock.
> > > > > No.
> > > > > With atomic this specific scenario will cause the validation to pass.
> > > >
> > > > Can you explain to me how?
> > > >
> > > > rte_eth_is_valid_owner_id(uint16_t owner_id) {
> > > >               int32_t cur_owner_id =
> > > > RTE_MIN(rte_atomic32_get(next_owner_id),
> > > > UINT16_MAX);
> > > >
> > > > 	if (owner_id == RTE_ETH_DEV_NO_OWNER || owner >
> > > > cur_owner_id) {
> > > > 		RTE_LOG(ERR, EAL, "Invalid owner_id=%d.\n", owner_id);
> > > > 		return 0;
> > > > 	}
> > > > 	return 1;
> > > > }
> > > >
> > > > Let say your next_owne_id==X, and you invoke
> > > > rte_eth_is_valid_owner_id(owner_id=X+1)  - it would fail.
> > >
> > > Explanation:
> > > The scenario with locks:
> > > next_owner_id = X.
> > > Thread 0 call to set API(with invalid owner Y=X) and take lock.
> >
> > Ok I see what you mean.
> > But, as I said before, if thread 0 will grab the lock first - you'll experience the
> > same failure.
> > I understand now that by some reason you treat these two scenarios as
> > something different, but for me it is pretty much the same case.
> > And to me it means that neither lock, neither atomic can fully protect you
> > here.
> >
> 
> I agree that we are not fully protected even when using locks but one lock are more protected than ether atomics or 2 different locks.
> So, I think keeping it as is (with one lock) makes sense.

Ok if that your preference - let's keep your current approach here.

> 
> > > Context switch.
> > > Thread 1 call to owner_new and stuck in the lock.
> > > Context switch.
> > > Thread 0 does owner id validation and failed(Y>=X) - unlock the lock and
> > return failure to the user.
> > > Context switch.
> > > Thread 1 take the lock and update X to X+1, then, unlock the lock.
> > > Everything is OK!
> > >
> > > The same scenario with atomics:
> > > next_owner_id = X.
> > > Thread 0 call to set API(with invalid owner Y=X) and take lock.
> > > Context switch.
> > > Thread 1 call to owner_new and change X to X+1(atomically).
> > > Context switch.
> > > Thread 0 does owner id validation and success(Y<(atomic)X+1) - unlock the
> > lock and return success to the  user.
> > > Problem!
> > >
> > > > > With lock no next_id changes can be done while the thread is in
> > > > > the set
> > > > API.
> > > > >
> > > > > > But in fact your code is not protected for that scenario -
> > > > > > doesn't matter will you'll use lock or atomic ops.
> > > > > > Let's considerer your current code with the following scenario:
> > > > > >
> > > > > > next_owner_id  == 1
> > > > > > 1) Process 0:
> > > > > >      rte_eth_dev_owner_new(&owner_id);
> > > > > >      now owner_id == 1 and next_owner_id == 2
> > > > > > 2) Process 1 (by mistake):
> > > > > >     rte_eth_dev_owner_set(port_id=1, owner->id=1); It will
> > > > > > complete successfully, as owner_id ==1 is considered as valid.
> > > > > > 3) Process 0:
> > > > > >       rte_eth_dev_owner_set(port_id=1, owner->id=1); It will
> > > > > > also complete with success, as owner->id is valid is equal to
> > > > > > current port
> > > > owner_id.
> > > > > > So you finished with 2 processes assuming that they do own
> > > > > > exclusively then same port.
> > > > > >
> > > > > > Honestly in that situation  locking around nest_owner_id
> > > > > > wouldn't give you any advantages over atomic ops.
> > > > > >
> > > > >
> > > > > This is a different scenario that we can't protect on it with atomic or
> > locks.
> > > > > But for the first scenario I described I think we can.
> > > > > Please read it again, I described it step by step.
> > > > >
> > > > > > >
> > > > > > > > I don't think you can protect yourself against such scenario
> > > > > > > > with or without locking.
> > > > > > > > Unless you'll make it harder for the mis-behaving thread to
> > > > > > > > guess valid owner_id, or add some extra logic here.
> > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > > > The set(and others) ownership APIs already uses the
> > > > > > > > > > > ownership lock so I
> > > > > > > > > > think it makes sense to use the same lock also in ID allocation.
> > > > > > > > > > >
> > > > > > > > > > > > > > > > In fact, for next_owner_id, you don't need a
> > > > > > > > > > > > > > > > lock - just rte_atomic_t should be enough.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > I don't think so, it is problematic in
> > > > > > > > > > > > > > > next_owner_id wraparound and may
> > > > > > > > > > > > > > complicate the code in other places which read it.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > IMO it is not that complicated, something like
> > > > > > > > > > > > > > that should work I
> > > > > > > > think.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > /* init to 0 at startup*/ rte_atomic32_t
> > > > > > > > > > > > > > *owner_id;
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > int new_owner_id(void) {
> > > > > > > > > > > > > >     int32_t x;
> > > > > > > > > > > > > >     x = rte_atomic32_add_return(&owner_id, 1);
> > > > > > > > > > > > > >     if (x > UINT16_MAX) {
> > > > > > > > > > > > > >        rte_atomic32_dec(&owner_id);
> > > > > > > > > > > > > >        return -EOVERWLOW;
> > > > > > > > > > > > > >     } else
> > > > > > > > > > > > > >         return x; }
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Why not just to keep it simple and using the same
> > lock?
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Lock is also fine, I just think it better be a separate
> > > > > > > > > > > > > > one
> > > > > > > > > > > > > > - that would protext just next_owner_id.
> > > > > > > > > > > > > > Though if you are going to use uuid here - all that
> > > > > > > > > > > > > > probably not relevant any more.
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > I agree about the uuid but still think the same lock
> > > > > > > > > > > > > should be used for
> > > > > > > > > > both.
> > > > > > > > > > > >
> > > > > > > > > > > > But with uuid you don't need next_owner_id at all, right?
> > > > > > > > > > > > So lock will only be used for rte_eth_dev_data[] fields
> > > > anyway.
> > > > > > > > > > > >
> > > > > > > > > > > Sorry, I meant uint64_t, not uuid.
> > > > > > > > > >
> > > > > > > > > > Ah ok, my thought uuid_t is better as with it you don't need to
> > > > > > > > > > support your own code to allocate new owner_id, but rely on
> > > > > > > > > > system libs
> > > > > > > > instead.
> > > > > > > > > > But wouldn't insist here.
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > > > > > > Another alternative would be to use 2 locks - one
> > > > > > > > > > > > > > > > for next_owner_id second for actual data[]
> > protection.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Another thing - you'll probably need to
> > grab/release
> > > > > > > > > > > > > > > > a lock inside
> > > > > > > > > > > > > > > > rte_eth_dev_allocated() too.
> > > > > > > > > > > > > > > > It is a public function used by drivers, so need to
> > > > > > > > > > > > > > > > be protected
> > > > > > > > too.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Yes, I thought about it, but decided not to use lock in
> > > > next:
> > > > > > > > > > > > > > > rte_eth_dev_allocated
> > > > > > > > > > > > > > > rte_eth_dev_count
> > > > > > > > > > > > > > > rte_eth_dev_get_name_by_port
> > > > > > > > rte_eth_dev_get_port_by_name
> > > > > > > > > > > > > > > maybe more...
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > As I can see in patch #3 you protect by lock access to
> > > > > > > > > > > > > > rte_eth_dev_data[].name (which seems like a good
> > > > thing).
> > > > > > > > > > > > > > So I think any other public function that access
> > > > > > > > > > > > > > rte_eth_dev_data[].name should be protected by the
> > > > same
> > > > > > lock.
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > I don't think so, I can understand to use the ownership
> > > > > > > > > > > > > lock here(as in port
> > > > > > > > > > > > creation) but I don't think it is necessary too.
> > > > > > > > > > > > > What are we exactly protecting here?
> > > > > > > > > > > > > Don't you think it is just timing?(ask in the next moment
> > > > > > > > > > > > > and you may get another answer) I don't see optional
> > crash.
> > > > > > > > > > > >
> > > > > > > > > > > > Not sure what you mean here by timing...
> > > > > > > > > > > > As I understand rte_eth_dev_data[].name unique
> > identifies
> > > > > > > > > > > > device and is used by  port allocation/release/find
> > functions.
> > > > > > > > > > > > As you stated above:
> > > > > > > > > > > > "1. The port allocation and port release synchronization
> > > > > > > > > > > > will be managed by ethdev."
> > > > > > > > > > > > To me it means that ethdev layer has to make sure that all
> > > > > > > > > > > > accesses to rte_eth_dev_data[].name are atomic.
> > > > > > > > > > > > Otherwise what would prevent the situation when one
> > > > process
> > > > > > > > > > > > does
> > > > > > > > > > > > rte_eth_dev_allocate()-
> > >snprintf(rte_eth_dev_data[x].name,
> > > > > > > > > > > > ...) while second one does
> > > > > > > > > > rte_eth_dev_allocated(rte_eth_dev_data[x].name, ...) ?
> > > > > > > > > > > >
> > > > > > > > > > > The second will get True or False and that is it.
> > > > > > > > > >
> > > > > > > > > > Under race condition - in the worst case it might crash, though
> > > > > > > > > > for that you'll have to be really unlucky.
> > > > > > > > > > Though in most cases as you said it would just not operate
> > > > correctly.
> > > > > > > > > > I think if we start to protect dev->name by lock we need to do
> > > > > > > > > > it for all instances (both read and write).
> > > > > > > > > >
> > > > > > > > > Since under the ownership rules, the user must take ownership
> > of a
> > > > > > > > > port
> > > > > > > > before using it, I still don't see a problem here.
> > > > > > > >
> > > > > > > > I am not talking about owner id or name here.
> > > > > > > > I am talking about dev->name.
> > > > > > > >
> > > > > > > So? The user still should take ownership of a device before using it
> > (by
> > > > > > name or by port id).
> > > > > > > It can just read it without owning it, but no managing it.
> > > > > > >
> > > > > > > > > Please, Can you describe specific crash scenario and explain how
> > > > > > > > > could the
> > > > > > > > locking fix it?
> > > > > > > >
> > > > > > > > Let say thread 0 doing rte_eth_dev_allocate()-
> > > > > > > > >snprintf(rte_eth_dev_data[x].name, ...), thread 1 doing
> > > > > > > > rte_pmd_ring_remove()->rte_eth_dev_allocated()->strcmp().
> > > > > > > > And because of race condition - rte_eth_dev_allocated() will
> > return
> > > > > > > > rte_eth_dev * for the wrong device.
> > > > > > > Which wrong device do you mean? I guess it is the device which
> > > > currently is
> > > > > > being created by thread 0.
> > > > > > > > Then rte_pmd_ring_remove() will call rte_free() for related
> > > > > > > > resources, while It can still be in use by someone else.
> > > > > > > The rte_pmd_ring_remove caller(some DPDK entity) must take
> > > > ownership
> > > > > > > (or validate that he is the owner) of a port before doing it(free,
> > > > release), so
> > > > > > no issue here.
> > > > > >
> > > > > > Forget about ownership for a second.
> > > > > > Suppose we have a process it created ring port for itself (without
> > setting
> > > > any
> > > > > > ownership)  and used it for some time.
> > > > > > Then it decided to remove it, so it calls rte_pmd_ring_remove() for it.
> > > > > > At the same time second process decides to call
> > rte_eth_dev_allocate()
> > > > (let
> > > > > > say for anither ring port).
> > > > > > They could collide trying to read (process 0) and modify (process 1)
> > same
> > > > > > string rte_eth_dev_data[].name.
> > > > > >
> > > > > Do you mean that process 0 will compare successfully the process 1
> > new
> > > > port name?
> > > >
> > > > Yes.
> > > >
> > > > > The state are in local process memory - so process 0 will not compare
> > the
> > > > process 1 port, from its point of view this port is in UNUSED
> > > > > state.
> > > > >
> > > >
> > > > Ok, and why it can't be in attached state in process 0 too?
> > >
> > > Someone in process 0 should attach it using protected attach_secondary
> > somewhere in your scenario.
> >
> > Yes, process 0 can have this port attached too, why not?
> See the function with inline comments:
> 
> struct rte_eth_dev *
> rte_eth_dev_allocated(const char *name)
> {
> 	unsigned i;
> 
> 	for (i = 0; i < RTE_MAX_ETHPORTS; i++) {
> 
> 	    	The below state are in local process memory,
> 		So, if here process 1 will allocate a new port (the current i), update its local state to ATTACHED and write the name,
> 		the state is not visible by process 0 until someone in process 0 will attach it by rte_eth_dev_attach_secondary.
> 		So, to use rte_eth_dev_attach_secondary process 0 must take the lock and it can't, because it is currently locked by
> process 1.

Ok I see.
Thanks for your patience.
BTW, that means that if let say process 0 will call rte_eth_dev_allocate("xxx")
and process 1 will call rte_eth_dev_allocate("yyy") we can endup with
same port_id be used for different devices and 2 processes will overwrite the
same rte_eth_dev_data[port_id]?
Konstantin

> 
> 		if ((rte_eth_devices[i].state == RTE_ETH_DEV_ATTACHED) &&
> 		strcmp(rte_eth_devices[i].data->name, name) == 0)
> 			return &rte_eth_devices[i];
> 	}
> 	return NULL;
> 
>

Ananyev, Konstantin Jan. 17, 2018, 5:01 p.m. UTC | #19

> -----Original Message-----
> From: Neil Horman [mailto:nhorman@tuxdriver.com]
> Sent: Wednesday, January 17, 2018 2:00 PM
> To: Matan Azrad <matan@mellanox.com>
> Cc: Ananyev, Konstantin <konstantin.ananyev@intel.com>; Thomas Monjalon <thomas@monjalon.net>; Gaetan Rivet
> <gaetan.rivet@6wind.com>; Wu, Jingjing <jingjing.wu@intel.com>; dev@dpdk.org; Richardson, Bruce <bruce.richardson@intel.com>
> Subject: Re: [PATCH v2 2/6] ethdev: add port ownership
> 
> On Wed, Jan 17, 2018 at 12:05:42PM +0000, Matan Azrad wrote:
> >
> > Hi Konstantin
> > From: Ananyev, Konstantin, Sent: Wednesday, January 17, 2018 1:24 PM
> > > Hi Matan,
> > >
> > > > Hi Konstantin
> > > >
> > > > From: Ananyev, Konstantin, Tuesday, January 16, 2018 9:11 PM
> > > > > Hi Matan,
> > > > >
> > > > > >
> > > > > > Hi Konstantin
> > > > > > From: Ananyev, Konstantin, Monday, January 15, 2018 8:44 PM
> > > > > > > Hi Matan,
> > > > > > > > Hi Konstantin
> > > > > > > > From: Ananyev, Konstantin, Monday, January 15, 2018 1:45 PM
> > > > > > > > > Hi Matan,
> > > > > > > > > > Hi Konstantin
> > > > > > > > > > From: Ananyev, Konstantin, Friday, January 12, 2018 2:02
> > > > > > > > > > AM
> > > > > > > > > > > Hi Matan,
> > > > > > > > > > > > Hi Konstantin
> > > > > > > > > > > > From: Ananyev, Konstantin, Thursday, January 11, 2018
> > > > > > > > > > > > 2:40 PM
> > > > > > > > > > > > > Hi Matan,
> > > > > > > > > > > > > > Hi Konstantin
> > > > > > > > > > > > > > From: Ananyev, Konstantin, Wednesday, January 10,
> > > > > > > > > > > > > > 2018
> > > > > > > > > > > > > > 3:36 PM
> > > > > > > > > > > > > > > Hi Matan,
> > > > > >  <snip>
> > > > > > > > > > > > > > > It is good to see that now scanning/updating
> > > > > > > > > > > > > > > rte_eth_dev_data[] is lock protected, but it
> > > > > > > > > > > > > > > might be not very plausible to protect both
> > > > > > > > > > > > > > > data[] and next_owner_id using the
> > > > > > > > > same lock.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > I guess you mean to the owner structure in
> > > > > > > > > rte_eth_dev_data[port_id].
> > > > > > > > > > > > > > The next_owner_id is read by ownership APIs(for
> > > > > > > > > > > > > > owner validation), so it
> > > > > > > > > > > > > makes sense to use the same lock.
> > > > > > > > > > > > > > Actually, why not?
> > > > > > > > > > > > >
> > > > > > > > > > > > > Well to me next_owner_id and rte_eth_dev_data[] are
> > > > > > > > > > > > > not directly
> > > > > > > > > > > related.
> > > > > > > > > > > > > You may create new owner_id but it doesn't mean you
> > > > > > > > > > > > > would update rte_eth_dev_data[] immediately.
> > > > > > > > > > > > > And visa-versa - you might just want to update
> > > > > > > > > > > > > rte_eth_dev_data[].name or .owner_id.
> > > > > > > > > > > > > It is not very good coding practice to use same lock
> > > > > > > > > > > > > for non-related data structures.
> > > > > > > > > > > > >
> > > > > > > > > > > > I see the relation like next:
> > > > > > > > > > > > Since the ownership mechanism synchronization is in
> > > > > > > > > > > > ethdev responsibility, we must protect against user
> > > > > > > > > > > > mistakes as much as we can by
> > > > > > > > > > > using the same lock.
> > > > > > > > > > > > So, if user try to set by invalid owner (exactly the
> > > > > > > > > > > > ID which currently is
> > > > > > > > > > > allocated) we can protect on it.
> > > > > > > > > > >
> > > > > > > > > > > Hmm, not sure why you can't do same checking with
> > > > > > > > > > > different lock or atomic variable?
> > > > > > > > > > >
> > > > > > > > > > The set ownership API is protected by ownership lock and
> > > > > > > > > > checks the owner ID validity By reading the next owner ID.
> > > > > > > > > > So, the owner ID allocation and set API should use the
> > > > > > > > > > same atomic
> > > > > > > > > mechanism.
> > > > > > > > >
> > > > > > > > > Sure but all you are doing for checking validity, is  check
> > > > > > > > > that owner_id > 0 &&& owner_id < next_ownwe_id, right?
> > > > > > > > > As you don't allow owner_id overlap (16/3248 bits) you can
> > > > > > > > > safely do same check with just atomic_get(&next_owner_id).
> > > > > > > > >
> > > > > > > > It will not protect it, scenario:
> > > > > > > > - current next_id is X.
> > > > > > > > - call set ownership of port A with owner id X by thread 0(by
> > > > > > > > user
> > > > > mistake).
> > > > > > > > - context switch
> > > > > > > > - allocate new id by thread 1 and get X and change next_id to
> > > > > > > > X+1
> > > > > > > atomically.
> > > > > > > > -  context switch
> > > > > > > > - Thread 0 validate X by atomic_read and succeed to take
> > > ownership.
> > > > > > > > - The system loosed the port(or will be managed by two
> > > > > > > > entities) -
> > > > > crash.
> > > > > > >
> > > > > > >
> > > > > > > Ok, and how using lock will protect you with such scenario?
> > > > > >
> > > > > > The owner set API validation by thread 0 should fail because the
> > > > > > owner
> > > > > validation is included in the protected section.
> > > > >
> > > > > Then your validation function would fail even if you'll use atomic
> > > > > ops instead of lock.
> > > > No.
> > > > With atomic this specific scenario will cause the validation to pass.
> > >
> > > Can you explain to me how?
> > >
> > > rte_eth_is_valid_owner_id(uint16_t owner_id) {
> > >               int32_t cur_owner_id = RTE_MIN(rte_atomic32_get(next_owner_id),
> > > UINT16_MAX);
> > >
> > > 	if (owner_id == RTE_ETH_DEV_NO_OWNER || owner >
> > > cur_owner_id) {
> > > 		RTE_LOG(ERR, EAL, "Invalid owner_id=%d.\n", owner_id);
> > > 		return 0;
> > > 	}
> > > 	return 1;
> > > }
> > >
> > > Let say your next_owne_id==X, and you invoke
> > > rte_eth_is_valid_owner_id(owner_id=X+1)  - it would fail.
> >
> > Explanation:
> > The scenario with locks:
> > next_owner_id = X.
> > Thread 0 call to set API(with invalid owner Y=X) and take lock.
> > Context switch.
> > Thread 1 call to owner_new and stuck in the lock.
> > Context switch.
> > Thread 0 does owner id validation and failed(Y>=X) - unlock the lock and return failure to the user.
> > Context switch.
> > Thread 1 take the lock and update X to X+1, then, unlock the lock.
> > Everything is OK!
> >
> > The same scenario with atomics:
> > next_owner_id = X.
> > Thread 0 call to set API(with invalid owner Y=X) and take lock.
> > Context switch.
> > Thread 1 call to owner_new and change X to X+1(atomically).
> > Context switch.
> > Thread 0 does owner id validation and success(Y<(atomic)X+1) - unlock the lock and return success to the  user.
> > Problem!
> >
> 
> 
> Matan is correct here, there is no way to preform parallel set operations using
> just and atomic variable here, because multiple reads of next_owner_id need to
> be preformed while it is stable.  That is to say rte_eth_next_owner_id must be
> compared to RTE_ETH_DEV_NO_OWNER and owner_id in rte_eth_is_valid_owner_id.  If
> you were to only use an atomic_read on such a variable, it could be incremented
> by the owner_new function between the checks and an invalid owner value could
> become valid because  a third thread incremented the next value.  The state of
> next_owner_id must be kept stable during any validity checks

It could still be incremented between the checks - if let say different thread will
invoke new_onwer_id, grab the lock update counter, release the lock - all that
before the check.
But ok, there is probably no point to argue on that one any longer -
let's keep the lock here, nothing will be broken with it for sure.

> 
> That said, I really have to wonder why ownership ids are really needed here at
> all.  It seems this design could be much simpler with the addition of a per-port
> lock (and optional ownership record).  The API could consist of three
> operations:
> 
> ownership_set
> ownership_tryset
> ownership_release
> ownership_get
> 

Ok, but how to distinguish who is the current owner of the port?
To make sure that only owner is allowed to perform control ops?
Konstantin

> 
> The first call simply tries to take the per-port lock (blocking if its already
> locked)
> 
> The second call is a non-blocking version of the first
> 
> The third unlocks the port, allowing others to take ownership
> 
> The fourth returns whatever ownership record you want to encode with the lock.
> 
> The addition of all this id checking seems a bit overcomplicated
> 
> Neil

Matan Azrad Jan. 17, 2018, 5:58 p.m. UTC | #20

Hi Neil

 From: Neil Horman, Wednesday, January 17, 2018 4:00 PM
> On Wed, Jan 17, 2018 at 12:05:42PM +0000, Matan Azrad wrote:
> >
> > Hi Konstantin
> > From: Ananyev, Konstantin, Sent: Wednesday, January 17, 2018 1:24 PM
> > > Hi Matan,
> > >
> > > > Hi Konstantin
> > > >
> > > > From: Ananyev, Konstantin, Tuesday, January 16, 2018 9:11 PM
> > > > > Hi Matan,
> > > > >
> > > > > >
> > > > > > Hi Konstantin
> > > > > > From: Ananyev, Konstantin, Monday, January 15, 2018 8:44 PM
> > > > > > > Hi Matan,
> > > > > > > > Hi Konstantin
> > > > > > > > From: Ananyev, Konstantin, Monday, January 15, 2018 1:45
> > > > > > > > PM
> > > > > > > > > Hi Matan,
> > > > > > > > > > Hi Konstantin
> > > > > > > > > > From: Ananyev, Konstantin, Friday, January 12, 2018
> > > > > > > > > > 2:02 AM
> > > > > > > > > > > Hi Matan,
> > > > > > > > > > > > Hi Konstantin
> > > > > > > > > > > > From: Ananyev, Konstantin, Thursday, January 11,
> > > > > > > > > > > > 2018
> > > > > > > > > > > > 2:40 PM
> > > > > > > > > > > > > Hi Matan,
> > > > > > > > > > > > > > Hi Konstantin
> > > > > > > > > > > > > > From: Ananyev, Konstantin, Wednesday, January
> > > > > > > > > > > > > > 10,
> > > > > > > > > > > > > > 2018
> > > > > > > > > > > > > > 3:36 PM
> > > > > > > > > > > > > > > Hi Matan,
> > > > > >  <snip>
> > > > > > > > > > > > > > > It is good to see that now scanning/updating
> > > > > > > > > > > > > > > rte_eth_dev_data[] is lock protected, but it
> > > > > > > > > > > > > > > might be not very plausible to protect both
> > > > > > > > > > > > > > > data[] and next_owner_id using the
> > > > > > > > > same lock.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > I guess you mean to the owner structure in
> > > > > > > > > rte_eth_dev_data[port_id].
> > > > > > > > > > > > > > The next_owner_id is read by ownership
> > > > > > > > > > > > > > APIs(for owner validation), so it
> > > > > > > > > > > > > makes sense to use the same lock.
> > > > > > > > > > > > > > Actually, why not?
> > > > > > > > > > > > >
> > > > > > > > > > > > > Well to me next_owner_id and rte_eth_dev_data[]
> > > > > > > > > > > > > are not directly
> > > > > > > > > > > related.
> > > > > > > > > > > > > You may create new owner_id but it doesn't mean
> > > > > > > > > > > > > you would update rte_eth_dev_data[] immediately.
> > > > > > > > > > > > > And visa-versa - you might just want to update
> > > > > > > > > > > > > rte_eth_dev_data[].name or .owner_id.
> > > > > > > > > > > > > It is not very good coding practice to use same
> > > > > > > > > > > > > lock for non-related data structures.
> > > > > > > > > > > > >
> > > > > > > > > > > > I see the relation like next:
> > > > > > > > > > > > Since the ownership mechanism synchronization is
> > > > > > > > > > > > in ethdev responsibility, we must protect against
> > > > > > > > > > > > user mistakes as much as we can by
> > > > > > > > > > > using the same lock.
> > > > > > > > > > > > So, if user try to set by invalid owner (exactly
> > > > > > > > > > > > the ID which currently is
> > > > > > > > > > > allocated) we can protect on it.
> > > > > > > > > > >
> > > > > > > > > > > Hmm, not sure why you can't do same checking with
> > > > > > > > > > > different lock or atomic variable?
> > > > > > > > > > >
> > > > > > > > > > The set ownership API is protected by ownership lock
> > > > > > > > > > and checks the owner ID validity By reading the next owner
> ID.
> > > > > > > > > > So, the owner ID allocation and set API should use the
> > > > > > > > > > same atomic
> > > > > > > > > mechanism.
> > > > > > > > >
> > > > > > > > > Sure but all you are doing for checking validity, is
> > > > > > > > > check that owner_id > 0 &&& owner_id < next_ownwe_id,
> right?
> > > > > > > > > As you don't allow owner_id overlap (16/3248 bits) you
> > > > > > > > > can safely do same check with just
> atomic_get(&next_owner_id).
> > > > > > > > >
> > > > > > > > It will not protect it, scenario:
> > > > > > > > - current next_id is X.
> > > > > > > > - call set ownership of port A with owner id X by thread
> > > > > > > > 0(by user
> > > > > mistake).
> > > > > > > > - context switch
> > > > > > > > - allocate new id by thread 1 and get X and change next_id
> > > > > > > > to
> > > > > > > > X+1
> > > > > > > atomically.
> > > > > > > > -  context switch
> > > > > > > > - Thread 0 validate X by atomic_read and succeed to take
> > > ownership.
> > > > > > > > - The system loosed the port(or will be managed by two
> > > > > > > > entities) -
> > > > > crash.
> > > > > > >
> > > > > > >
> > > > > > > Ok, and how using lock will protect you with such scenario?
> > > > > >
> > > > > > The owner set API validation by thread 0 should fail because
> > > > > > the owner
> > > > > validation is included in the protected section.
> > > > >
> > > > > Then your validation function would fail even if you'll use
> > > > > atomic ops instead of lock.
> > > > No.
> > > > With atomic this specific scenario will cause the validation to pass.
> > >
> > > Can you explain to me how?
> > >
> > > rte_eth_is_valid_owner_id(uint16_t owner_id) {
> > >               int32_t cur_owner_id =
> > > RTE_MIN(rte_atomic32_get(next_owner_id),
> > > UINT16_MAX);
> > >
> > > 	if (owner_id == RTE_ETH_DEV_NO_OWNER || owner >
> > > cur_owner_id) {
> > > 		RTE_LOG(ERR, EAL, "Invalid owner_id=%d.\n", owner_id);
> > > 		return 0;
> > > 	}
> > > 	return 1;
> > > }
> > >
> > > Let say your next_owne_id==X, and you invoke
> > > rte_eth_is_valid_owner_id(owner_id=X+1)  - it would fail.
> >
> > Explanation:
> > The scenario with locks:
> > next_owner_id = X.
> > Thread 0 call to set API(with invalid owner Y=X) and take lock.
> > Context switch.
> > Thread 1 call to owner_new and stuck in the lock.
> > Context switch.
> > Thread 0 does owner id validation and failed(Y>=X) - unlock the lock and
> return failure to the user.
> > Context switch.
> > Thread 1 take the lock and update X to X+1, then, unlock the lock.
> > Everything is OK!
> >
> > The same scenario with atomics:
> > next_owner_id = X.
> > Thread 0 call to set API(with invalid owner Y=X) and take lock.
> > Context switch.
> > Thread 1 call to owner_new and change X to X+1(atomically).
> > Context switch.
> > Thread 0 does owner id validation and success(Y<(atomic)X+1) - unlock the
> lock and return success to the  user.
> > Problem!
> >
> 
> 
> Matan is correct here, there is no way to preform parallel set operations
> using just and atomic variable here, because multiple reads of
> next_owner_id need to be preformed while it is stable.  That is to say
> rte_eth_next_owner_id must be compared to RTE_ETH_DEV_NO_OWNER
> and owner_id in rte_eth_is_valid_owner_id.  If you were to only use an
> atomic_read on such a variable, it could be incremented by the owner_new
> function between the checks and an invalid owner value could become valid
> because  a third thread incremented the next value.  The state of
> next_owner_id must be kept stable during any validity checks
> 
> That said, I really have to wonder why ownership ids are really needed here
> at all.  It seems this design could be much simpler with the addition of a per-
> port lock (and optional ownership record).  The API could consist of three
> operations:
> 
> ownership_set
> ownership_tryset
> ownership_release
> ownership_get
> 
> 
> The first call simply tries to take the per-port lock (blocking if its already
> locked)
> 

Per port lock is not good because the ownership mechanism must to be synchronized with the port creation\release.
So the port creation and port ownership should use the same lock.

I didn't find precedence for blocking function in ethdev.

> The second call is a non-blocking version of the first
> 
> The third unlocks the port, allowing others to take ownership
> 
> The fourth returns whatever ownership record you want to encode with the
> lock.
> 
> The addition of all this id checking seems a bit overcomplicated

You miss the identification of the owner - we want to allow info of the owner for printing and easy debug.
And it is makes sense to manage the owner uniqueness by unique ID.

The API already discussed a lot in the previous version, Do you really want, now, to open it again?  
 
> Neil

Matan Azrad Jan. 17, 2018, 6:02 p.m. UTC | #21

Hi Konstantin

From: Ananyev, Konstantin, Wednesday, January 17, 2018 6:53 PM
> Hi Matan,
> 
> >
> > Hi Konstantin
> > From: Ananyev, Konstantin, Wednesday, January 17, 2018 2:55 PM
> > > >
> > > >
> > > > Hi Konstantin
> > > > From: Ananyev, Konstantin, Sent: Wednesday, January 17, 2018 1:24
> > > > PM
> > > > > Hi Matan,
> > > > >
> > > > > > Hi Konstantin
> > > > > >
> > > > > > From: Ananyev, Konstantin, Tuesday, January 16, 2018 9:11 PM
> > > > > > > Hi Matan,
> > > > > > >
> > > > > > > >
> > > > > > > > Hi Konstantin
> > > > > > > > From: Ananyev, Konstantin, Monday, January 15, 2018 8:44
> > > > > > > > PM
> > > > > > > > > Hi Matan,
> > > > > > > > > > Hi Konstantin
> > > > > > > > > > From: Ananyev, Konstantin, Monday, January 15, 2018
> > > > > > > > > > 1:45 PM
> > > > > > > > > > > Hi Matan,
> > > > > > > > > > > > Hi Konstantin
> > > > > > > > > > > > From: Ananyev, Konstantin, Friday, January 12,
> > > > > > > > > > > > 2018
> > > > > > > > > > > > 2:02 AM
> > > > > > > > > > > > > Hi Matan,
> > > > > > > > > > > > > > Hi Konstantin
> > > > > > > > > > > > > > From: Ananyev, Konstantin, Thursday, January
> > > > > > > > > > > > > > 11,
> > > > > > > > > > > > > > 2018
> > > > > > > > > > > > > > 2:40 PM
> > > > > > > > > > > > > > > Hi Matan,
> > > > > > > > > > > > > > > > Hi Konstantin
> > > > > > > > > > > > > > > > From: Ananyev, Konstantin, Wednesday,
> > > > > > > > > > > > > > > > January 10,
> > > > > > > > > > > > > > > > 2018
> > > > > > > > > > > > > > > > 3:36 PM
> > > > > > > > > > > > > > > > > Hi Matan,
> > > > > > > >  <snip>
> > > > > > > > > > > > > > > > > It is good to see that now
> > > > > > > > > > > > > > > > > scanning/updating rte_eth_dev_data[] is
> > > > > > > > > > > > > > > > > lock protected, but it might be not very
> > > > > > > > > > > > > > > > > plausible to protect both data[] and
> > > > > > > > > > > > > > > > > next_owner_id using the
> > > > > > > > > > > same lock.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > I guess you mean to the owner structure in
> > > > > > > > > > > rte_eth_dev_data[port_id].
> > > > > > > > > > > > > > > > The next_owner_id is read by ownership
> > > > > > > > > > > > > > > > APIs(for owner validation), so it
> > > > > > > > > > > > > > > makes sense to use the same lock.
> > > > > > > > > > > > > > > > Actually, why not?
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Well to me next_owner_id and
> > > > > > > > > > > > > > > rte_eth_dev_data[] are not directly
> > > > > > > > > > > > > related.
> > > > > > > > > > > > > > > You may create new owner_id but it doesn't
> > > > > > > > > > > > > > > mean you would update rte_eth_dev_data[]
> immediately.
> > > > > > > > > > > > > > > And visa-versa - you might just want to
> > > > > > > > > > > > > > > update rte_eth_dev_data[].name or .owner_id.
> > > > > > > > > > > > > > > It is not very good coding practice to use
> > > > > > > > > > > > > > > same lock for non-related data structures.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > I see the relation like next:
> > > > > > > > > > > > > > Since the ownership mechanism synchronization
> > > > > > > > > > > > > > is in ethdev responsibility, we must protect
> > > > > > > > > > > > > > against user mistakes as much as we can by
> > > > > > > > > > > > > using the same lock.
> > > > > > > > > > > > > > So, if user try to set by invalid owner
> > > > > > > > > > > > > > (exactly the ID which currently is
> > > > > > > > > > > > > allocated) we can protect on it.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Hmm, not sure why you can't do same checking
> > > > > > > > > > > > > with different lock or atomic variable?
> > > > > > > > > > > > >
> > > > > > > > > > > > The set ownership API is protected by ownership
> > > > > > > > > > > > lock and checks the owner ID validity By reading
> > > > > > > > > > > > the next owner
> > > ID.
> > > > > > > > > > > > So, the owner ID allocation and set API should use
> > > > > > > > > > > > the same atomic
> > > > > > > > > > > mechanism.
> > > > > > > > > > >
> > > > > > > > > > > Sure but all you are doing for checking validity, is
> > > > > > > > > > > check that owner_id > 0 &&& owner_id <
> > > > > > > > > > > next_ownwe_id,
> > > right?
> > > > > > > > > > > As you don't allow owner_id overlap (16/3248 bits)
> > > > > > > > > > > you can safely do same check with just
> > > atomic_get(&next_owner_id).
> > > > > > > > > > >
> > > > > > > > > > It will not protect it, scenario:
> > > > > > > > > > - current next_id is X.
> > > > > > > > > > - call set ownership of port A with owner id X by
> > > > > > > > > > thread 0(by user
> > > > > > > mistake).
> > > > > > > > > > - context switch
> > > > > > > > > > - allocate new id by thread 1 and get X and change
> > > > > > > > > > next_id to
> > > > > > > > > > X+1
> > > > > > > > > atomically.
> > > > > > > > > > -  context switch
> > > > > > > > > > - Thread 0 validate X by atomic_read and succeed to
> > > > > > > > > > take
> > > > > ownership.
> > > > > > > > > > - The system loosed the port(or will be managed by two
> > > > > > > > > > entities) -
> > > > > > > crash.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > Ok, and how using lock will protect you with such scenario?
> > > > > > > >
> > > > > > > > The owner set API validation by thread 0 should fail
> > > > > > > > because the owner
> > > > > > > validation is included in the protected section.
> > > > > > >
> > > > > > > Then your validation function would fail even if you'll use
> > > > > > > atomic ops instead of lock.
> > > > > > No.
> > > > > > With atomic this specific scenario will cause the validation to pass.
> > > > >
> > > > > Can you explain to me how?
> > > > >
> > > > > rte_eth_is_valid_owner_id(uint16_t owner_id) {
> > > > >               int32_t cur_owner_id =
> > > > > RTE_MIN(rte_atomic32_get(next_owner_id),
> > > > > UINT16_MAX);
> > > > >
> > > > > 	if (owner_id == RTE_ETH_DEV_NO_OWNER || owner >
> > > > > cur_owner_id) {
> > > > > 		RTE_LOG(ERR, EAL, "Invalid owner_id=%d.\n", owner_id);
> > > > > 		return 0;
> > > > > 	}
> > > > > 	return 1;
> > > > > }
> > > > >
> > > > > Let say your next_owne_id==X, and you invoke
> > > > > rte_eth_is_valid_owner_id(owner_id=X+1)  - it would fail.
> > > >
> > > > Explanation:
> > > > The scenario with locks:
> > > > next_owner_id = X.
> > > > Thread 0 call to set API(with invalid owner Y=X) and take lock.
> > >
> > > Ok I see what you mean.
> > > But, as I said before, if thread 0 will grab the lock first - you'll
> > > experience the same failure.
> > > I understand now that by some reason you treat these two scenarios
> > > as something different, but for me it is pretty much the same case.
> > > And to me it means that neither lock, neither atomic can fully
> > > protect you here.
> > >
> >
> > I agree that we are not fully protected even when using locks but one lock
> are more protected than ether atomics or 2 different locks.
> > So, I think keeping it as is (with one lock) makes sense.
> 
> Ok if that your preference - let's keep your current approach here.
> 
> >
> > > > Context switch.
> > > > Thread 1 call to owner_new and stuck in the lock.
> > > > Context switch.
> > > > Thread 0 does owner id validation and failed(Y>=X) - unlock the
> > > > lock and
> > > return failure to the user.
> > > > Context switch.
> > > > Thread 1 take the lock and update X to X+1, then, unlock the lock.
> > > > Everything is OK!
> > > >
> > > > The same scenario with atomics:
> > > > next_owner_id = X.
> > > > Thread 0 call to set API(with invalid owner Y=X) and take lock.
> > > > Context switch.
> > > > Thread 1 call to owner_new and change X to X+1(atomically).
> > > > Context switch.
> > > > Thread 0 does owner id validation and success(Y<(atomic)X+1) -
> > > > unlock the
> > > lock and return success to the  user.
> > > > Problem!
> > > >
> > > > > > With lock no next_id changes can be done while the thread is
> > > > > > in the set
> > > > > API.
> > > > > >
> > > > > > > But in fact your code is not protected for that scenario -
> > > > > > > doesn't matter will you'll use lock or atomic ops.
> > > > > > > Let's considerer your current code with the following scenario:
> > > > > > >
> > > > > > > next_owner_id  == 1
> > > > > > > 1) Process 0:
> > > > > > >      rte_eth_dev_owner_new(&owner_id);
> > > > > > >      now owner_id == 1 and next_owner_id == 2
> > > > > > > 2) Process 1 (by mistake):
> > > > > > >     rte_eth_dev_owner_set(port_id=1, owner->id=1); It will
> > > > > > > complete successfully, as owner_id ==1 is considered as valid.
> > > > > > > 3) Process 0:
> > > > > > >       rte_eth_dev_owner_set(port_id=1, owner->id=1); It will
> > > > > > > also complete with success, as owner->id is valid is equal
> > > > > > > to current port
> > > > > owner_id.
> > > > > > > So you finished with 2 processes assuming that they do own
> > > > > > > exclusively then same port.
> > > > > > >
> > > > > > > Honestly in that situation  locking around nest_owner_id
> > > > > > > wouldn't give you any advantages over atomic ops.
> > > > > > >
> > > > > >
> > > > > > This is a different scenario that we can't protect on it with
> > > > > > atomic or
> > > locks.
> > > > > > But for the first scenario I described I think we can.
> > > > > > Please read it again, I described it step by step.
> > > > > >
> > > > > > > >
> > > > > > > > > I don't think you can protect yourself against such
> > > > > > > > > scenario with or without locking.
> > > > > > > > > Unless you'll make it harder for the mis-behaving thread
> > > > > > > > > to guess valid owner_id, or add some extra logic here.
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > > > The set(and others) ownership APIs already uses
> > > > > > > > > > > > the ownership lock so I
> > > > > > > > > > > think it makes sense to use the same lock also in ID
> allocation.
> > > > > > > > > > > >
> > > > > > > > > > > > > > > > > In fact, for next_owner_id, you don't
> > > > > > > > > > > > > > > > > need a lock - just rte_atomic_t should be
> enough.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > I don't think so, it is problematic in
> > > > > > > > > > > > > > > > next_owner_id wraparound and may
> > > > > > > > > > > > > > > complicate the code in other places which read it.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > IMO it is not that complicated, something
> > > > > > > > > > > > > > > like that should work I
> > > > > > > > > think.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > /* init to 0 at startup*/ rte_atomic32_t
> > > > > > > > > > > > > > > *owner_id;
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > int new_owner_id(void) {
> > > > > > > > > > > > > > >     int32_t x;
> > > > > > > > > > > > > > >     x = rte_atomic32_add_return(&owner_id, 1);
> > > > > > > > > > > > > > >     if (x > UINT16_MAX) {
> > > > > > > > > > > > > > >        rte_atomic32_dec(&owner_id);
> > > > > > > > > > > > > > >        return -EOVERWLOW;
> > > > > > > > > > > > > > >     } else
> > > > > > > > > > > > > > >         return x; }
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Why not just to keep it simple and using
> > > > > > > > > > > > > > > > the same
> > > lock?
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Lock is also fine, I just think it better be
> > > > > > > > > > > > > > > a separate one
> > > > > > > > > > > > > > > - that would protext just next_owner_id.
> > > > > > > > > > > > > > > Though if you are going to use uuid here -
> > > > > > > > > > > > > > > all that probably not relevant any more.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > I agree about the uuid but still think the
> > > > > > > > > > > > > > same lock should be used for
> > > > > > > > > > > both.
> > > > > > > > > > > > >
> > > > > > > > > > > > > But with uuid you don't need next_owner_id at all,
> right?
> > > > > > > > > > > > > So lock will only be used for rte_eth_dev_data[]
> > > > > > > > > > > > > fields
> > > > > anyway.
> > > > > > > > > > > > >
> > > > > > > > > > > > Sorry, I meant uint64_t, not uuid.
> > > > > > > > > > >
> > > > > > > > > > > Ah ok, my thought uuid_t is better as with it you
> > > > > > > > > > > don't need to support your own code to allocate new
> > > > > > > > > > > owner_id, but rely on system libs
> > > > > > > > > instead.
> > > > > > > > > > > But wouldn't insist here.
> > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Another alternative would be to use 2
> > > > > > > > > > > > > > > > > locks - one for next_owner_id second for
> > > > > > > > > > > > > > > > > actual data[]
> > > protection.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Another thing - you'll probably need to
> > > grab/release
> > > > > > > > > > > > > > > > > a lock inside
> > > > > > > > > > > > > > > > > rte_eth_dev_allocated() too.
> > > > > > > > > > > > > > > > > It is a public function used by drivers,
> > > > > > > > > > > > > > > > > so need to be protected
> > > > > > > > > too.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Yes, I thought about it, but decided not
> > > > > > > > > > > > > > > > to use lock in
> > > > > next:
> > > > > > > > > > > > > > > > rte_eth_dev_allocated rte_eth_dev_count
> > > > > > > > > > > > > > > > rte_eth_dev_get_name_by_port
> > > > > > > > > rte_eth_dev_get_port_by_name
> > > > > > > > > > > > > > > > maybe more...
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > As I can see in patch #3 you protect by lock
> > > > > > > > > > > > > > > access to rte_eth_dev_data[].name (which
> > > > > > > > > > > > > > > seems like a good
> > > > > thing).
> > > > > > > > > > > > > > > So I think any other public function that
> > > > > > > > > > > > > > > access rte_eth_dev_data[].name should be
> > > > > > > > > > > > > > > protected by the
> > > > > same
> > > > > > > lock.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > I don't think so, I can understand to use the
> > > > > > > > > > > > > > ownership lock here(as in port
> > > > > > > > > > > > > creation) but I don't think it is necessary too.
> > > > > > > > > > > > > > What are we exactly protecting here?
> > > > > > > > > > > > > > Don't you think it is just timing?(ask in the
> > > > > > > > > > > > > > next moment and you may get another answer) I
> > > > > > > > > > > > > > don't see optional
> > > crash.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Not sure what you mean here by timing...
> > > > > > > > > > > > > As I understand rte_eth_dev_data[].name unique
> > > identifies
> > > > > > > > > > > > > device and is used by  port
> > > > > > > > > > > > > allocation/release/find
> > > functions.
> > > > > > > > > > > > > As you stated above:
> > > > > > > > > > > > > "1. The port allocation and port release
> > > > > > > > > > > > > synchronization will be managed by ethdev."
> > > > > > > > > > > > > To me it means that ethdev layer has to make
> > > > > > > > > > > > > sure that all accesses to rte_eth_dev_data[].name are
> atomic.
> > > > > > > > > > > > > Otherwise what would prevent the situation when
> > > > > > > > > > > > > one
> > > > > process
> > > > > > > > > > > > > does
> > > > > > > > > > > > > rte_eth_dev_allocate()-
> > > >snprintf(rte_eth_dev_data[x].name,
> > > > > > > > > > > > > ...) while second one does
> > > > > > > > > > > rte_eth_dev_allocated(rte_eth_dev_data[x].name, ...) ?
> > > > > > > > > > > > >
> > > > > > > > > > > > The second will get True or False and that is it.
> > > > > > > > > > >
> > > > > > > > > > > Under race condition - in the worst case it might
> > > > > > > > > > > crash, though for that you'll have to be really unlucky.
> > > > > > > > > > > Though in most cases as you said it would just not
> > > > > > > > > > > operate
> > > > > correctly.
> > > > > > > > > > > I think if we start to protect dev->name by lock we
> > > > > > > > > > > need to do it for all instances (both read and write).
> > > > > > > > > > >
> > > > > > > > > > Since under the ownership rules, the user must take
> > > > > > > > > > ownership
> > > of a
> > > > > > > > > > port
> > > > > > > > > before using it, I still don't see a problem here.
> > > > > > > > >
> > > > > > > > > I am not talking about owner id or name here.
> > > > > > > > > I am talking about dev->name.
> > > > > > > > >
> > > > > > > > So? The user still should take ownership of a device
> > > > > > > > before using it
> > > (by
> > > > > > > name or by port id).
> > > > > > > > It can just read it without owning it, but no managing it.
> > > > > > > >
> > > > > > > > > > Please, Can you describe specific crash scenario and
> > > > > > > > > > explain how could the
> > > > > > > > > locking fix it?
> > > > > > > > >
> > > > > > > > > Let say thread 0 doing rte_eth_dev_allocate()-
> > > > > > > > > >snprintf(rte_eth_dev_data[x].name, ...), thread 1 doing
> > > > > > > > > rte_pmd_ring_remove()->rte_eth_dev_allocated()-
> >strcmp().
> > > > > > > > > And because of race condition - rte_eth_dev_allocated()
> > > > > > > > > will
> > > return
> > > > > > > > > rte_eth_dev * for the wrong device.
> > > > > > > > Which wrong device do you mean? I guess it is the device
> > > > > > > > which
> > > > > currently is
> > > > > > > being created by thread 0.
> > > > > > > > > Then rte_pmd_ring_remove() will call rte_free() for
> > > > > > > > > related resources, while It can still be in use by someone else.
> > > > > > > > The rte_pmd_ring_remove caller(some DPDK entity) must take
> > > > > ownership
> > > > > > > > (or validate that he is the owner) of a port before doing
> > > > > > > > it(free,
> > > > > release), so
> > > > > > > no issue here.
> > > > > > >
> > > > > > > Forget about ownership for a second.
> > > > > > > Suppose we have a process it created ring port for itself
> > > > > > > (without
> > > setting
> > > > > any
> > > > > > > ownership)  and used it for some time.
> > > > > > > Then it decided to remove it, so it calls rte_pmd_ring_remove()
> for it.
> > > > > > > At the same time second process decides to call
> > > rte_eth_dev_allocate()
> > > > > (let
> > > > > > > say for anither ring port).
> > > > > > > They could collide trying to read (process 0) and modify
> > > > > > > (process 1)
> > > same
> > > > > > > string rte_eth_dev_data[].name.
> > > > > > >
> > > > > > Do you mean that process 0 will compare successfully the
> > > > > > process 1
> > > new
> > > > > port name?
> > > > >
> > > > > Yes.
> > > > >
> > > > > > The state are in local process memory - so process 0 will not
> > > > > > compare
> > > the
> > > > > process 1 port, from its point of view this port is in UNUSED
> > > > > > state.
> > > > > >
> > > > >
> > > > > Ok, and why it can't be in attached state in process 0 too?
> > > >
> > > > Someone in process 0 should attach it using protected
> > > > attach_secondary
> > > somewhere in your scenario.
> > >
> > > Yes, process 0 can have this port attached too, why not?
> > See the function with inline comments:
> >
> > struct rte_eth_dev *
> > rte_eth_dev_allocated(const char *name) {
> > 	unsigned i;
> >
> > 	for (i = 0; i < RTE_MAX_ETHPORTS; i++) {
> >
> > 	    	The below state are in local process memory,
> > 		So, if here process 1 will allocate a new port (the current i),
> update its local state to ATTACHED and write the name,
> > 		the state is not visible by process 0 until someone in process
> 0 will attach it by rte_eth_dev_attach_secondary.
> > 		So, to use rte_eth_dev_attach_secondary process 0 must
> take the lock
> > and it can't, because it is currently locked by process 1.
> 
> Ok I see.
> Thanks for your patience.
> BTW, that means that if let say process 0 will call rte_eth_dev_allocate("xxx")
> and process 1 will call rte_eth_dev_allocate("yyy") we can endup with same
> port_id be used for different devices and 2 processes will overwrite the
> same rte_eth_dev_data[port_id]?

No, contrary to the state, the lock itself is in shared memory, so 2 processes cannot allocate port in the same time.(you can see it in the next patch of this series).

> Konstantin
> 
> >
> > 		if ((rte_eth_devices[i].state == RTE_ETH_DEV_ATTACHED)
> &&
> > 		strcmp(rte_eth_devices[i].data->name, name) == 0)
> > 			return &rte_eth_devices[i];
> > 	}
> > 	return NULL;
> >
> >

Matan Azrad Jan. 17, 2018, 8:34 p.m. UTC | #22

From: Matan Azrad, Wednesday, January 17, 2018 8:02 PM
> Hi Konstantin
> 
> From: Ananyev, Konstantin, Wednesday, January 17, 2018 6:53 PM
> > Hi Matan,
> >
> > >
> > > Hi Konstantin
> > > From: Ananyev, Konstantin, Wednesday, January 17, 2018 2:55 PM
> > > > >
> > > > >
> > > > > Hi Konstantin
> > > > > From: Ananyev, Konstantin, Sent: Wednesday, January 17, 2018
> > > > > 1:24 PM
> > > > > > Hi Matan,
> > > > > >
> > > > > > > Hi Konstantin
> > > > > > >
> > > > > > > From: Ananyev, Konstantin, Tuesday, January 16, 2018 9:11 PM
> > > > > > > > Hi Matan,
> > > > > > > >
> > > > > > > > >
> > > > > > > > > Hi Konstantin
> > > > > > > > > From: Ananyev, Konstantin, Monday, January 15, 2018 8:44
> > > > > > > > > PM
> > > > > > > > > > Hi Matan,
> > > > > > > > > > > Hi Konstantin
> > > > > > > > > > > From: Ananyev, Konstantin, Monday, January 15, 2018
> > > > > > > > > > > 1:45 PM
> > > > > > > > > > > > Hi Matan,
> > > > > > > > > > > > > Hi Konstantin
> > > > > > > > > > > > > From: Ananyev, Konstantin, Friday, January 12,
> > > > > > > > > > > > > 2018
> > > > > > > > > > > > > 2:02 AM
> > > > > > > > > > > > > > Hi Matan,
> > > > > > > > > > > > > > > Hi Konstantin
> > > > > > > > > > > > > > > From: Ananyev, Konstantin, Thursday, January
> > > > > > > > > > > > > > > 11,
> > > > > > > > > > > > > > > 2018
> > > > > > > > > > > > > > > 2:40 PM
> > > > > > > > > > > > > > > > Hi Matan,
> > > > > > > > > > > > > > > > > Hi Konstantin
> > > > > > > > > > > > > > > > > From: Ananyev, Konstantin, Wednesday,
> > > > > > > > > > > > > > > > > January 10,
> > > > > > > > > > > > > > > > > 2018
> > > > > > > > > > > > > > > > > 3:36 PM
> > > > > > > > > > > > > > > > > > Hi Matan,
> > > > > > > > >  <snip>
> > > > > > > > > > > > > > > > > > It is good to see that now
> > > > > > > > > > > > > > > > > > scanning/updating rte_eth_dev_data[]
> > > > > > > > > > > > > > > > > > is lock protected, but it might be not
> > > > > > > > > > > > > > > > > > very plausible to protect both data[]
> > > > > > > > > > > > > > > > > > and next_owner_id using the
> > > > > > > > > > > > same lock.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > I guess you mean to the owner structure
> > > > > > > > > > > > > > > > > in
> > > > > > > > > > > > rte_eth_dev_data[port_id].
> > > > > > > > > > > > > > > > > The next_owner_id is read by ownership
> > > > > > > > > > > > > > > > > APIs(for owner validation), so it
> > > > > > > > > > > > > > > > makes sense to use the same lock.
> > > > > > > > > > > > > > > > > Actually, why not?
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Well to me next_owner_id and
> > > > > > > > > > > > > > > > rte_eth_dev_data[] are not directly
> > > > > > > > > > > > > > related.
> > > > > > > > > > > > > > > > You may create new owner_id but it doesn't
> > > > > > > > > > > > > > > > mean you would update rte_eth_dev_data[]
> > immediately.
> > > > > > > > > > > > > > > > And visa-versa - you might just want to
> > > > > > > > > > > > > > > > update rte_eth_dev_data[].name or .owner_id.
> > > > > > > > > > > > > > > > It is not very good coding practice to use
> > > > > > > > > > > > > > > > same lock for non-related data structures.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > I see the relation like next:
> > > > > > > > > > > > > > > Since the ownership mechanism
> > > > > > > > > > > > > > > synchronization is in ethdev responsibility,
> > > > > > > > > > > > > > > we must protect against user mistakes as
> > > > > > > > > > > > > > > much as we can by
> > > > > > > > > > > > > > using the same lock.
> > > > > > > > > > > > > > > So, if user try to set by invalid owner
> > > > > > > > > > > > > > > (exactly the ID which currently is
> > > > > > > > > > > > > > allocated) we can protect on it.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Hmm, not sure why you can't do same checking
> > > > > > > > > > > > > > with different lock or atomic variable?
> > > > > > > > > > > > > >
> > > > > > > > > > > > > The set ownership API is protected by ownership
> > > > > > > > > > > > > lock and checks the owner ID validity By reading
> > > > > > > > > > > > > the next owner
> > > > ID.
> > > > > > > > > > > > > So, the owner ID allocation and set API should
> > > > > > > > > > > > > use the same atomic
> > > > > > > > > > > > mechanism.
> > > > > > > > > > > >
> > > > > > > > > > > > Sure but all you are doing for checking validity,
> > > > > > > > > > > > is check that owner_id > 0 &&& owner_id <
> > > > > > > > > > > > next_ownwe_id,
> > > > right?
> > > > > > > > > > > > As you don't allow owner_id overlap (16/3248 bits)
> > > > > > > > > > > > you can safely do same check with just
> > > > atomic_get(&next_owner_id).
> > > > > > > > > > > >
> > > > > > > > > > > It will not protect it, scenario:
> > > > > > > > > > > - current next_id is X.
> > > > > > > > > > > - call set ownership of port A with owner id X by
> > > > > > > > > > > thread 0(by user
> > > > > > > > mistake).
> > > > > > > > > > > - context switch
> > > > > > > > > > > - allocate new id by thread 1 and get X and change
> > > > > > > > > > > next_id to
> > > > > > > > > > > X+1
> > > > > > > > > > atomically.
> > > > > > > > > > > -  context switch
> > > > > > > > > > > - Thread 0 validate X by atomic_read and succeed to
> > > > > > > > > > > take
> > > > > > ownership.
> > > > > > > > > > > - The system loosed the port(or will be managed by
> > > > > > > > > > > two
> > > > > > > > > > > entities) -
> > > > > > > > crash.
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > Ok, and how using lock will protect you with such scenario?
> > > > > > > > >
> > > > > > > > > The owner set API validation by thread 0 should fail
> > > > > > > > > because the owner
> > > > > > > > validation is included in the protected section.
> > > > > > > >
> > > > > > > > Then your validation function would fail even if you'll
> > > > > > > > use atomic ops instead of lock.
> > > > > > > No.
> > > > > > > With atomic this specific scenario will cause the validation to pass.
> > > > > >
> > > > > > Can you explain to me how?
> > > > > >
> > > > > > rte_eth_is_valid_owner_id(uint16_t owner_id) {
> > > > > >               int32_t cur_owner_id =
> > > > > > RTE_MIN(rte_atomic32_get(next_owner_id),
> > > > > > UINT16_MAX);
> > > > > >
> > > > > > 	if (owner_id == RTE_ETH_DEV_NO_OWNER || owner >
> > > > > > cur_owner_id) {
> > > > > > 		RTE_LOG(ERR, EAL, "Invalid owner_id=%d.\n",
> owner_id);
> > > > > > 		return 0;
> > > > > > 	}
> > > > > > 	return 1;
> > > > > > }
> > > > > >
> > > > > > Let say your next_owne_id==X, and you invoke
> > > > > > rte_eth_is_valid_owner_id(owner_id=X+1)  - it would fail.
> > > > >
> > > > > Explanation:
> > > > > The scenario with locks:
> > > > > next_owner_id = X.
> > > > > Thread 0 call to set API(with invalid owner Y=X) and take lock.
> > > >
> > > > Ok I see what you mean.
> > > > But, as I said before, if thread 0 will grab the lock first -
> > > > you'll experience the same failure.
> > > > I understand now that by some reason you treat these two scenarios
> > > > as something different, but for me it is pretty much the same case.
> > > > And to me it means that neither lock, neither atomic can fully
> > > > protect you here.
> > > >
> > >
> > > I agree that we are not fully protected even when using locks but
> > > one lock
> > are more protected than ether atomics or 2 different locks.
> > > So, I think keeping it as is (with one lock) makes sense.
> >
> > Ok if that your preference - let's keep your current approach here.
> >
> > >
> > > > > Context switch.
> > > > > Thread 1 call to owner_new and stuck in the lock.
> > > > > Context switch.
> > > > > Thread 0 does owner id validation and failed(Y>=X) - unlock the
> > > > > lock and
> > > > return failure to the user.
> > > > > Context switch.
> > > > > Thread 1 take the lock and update X to X+1, then, unlock the lock.
> > > > > Everything is OK!
> > > > >
> > > > > The same scenario with atomics:
> > > > > next_owner_id = X.
> > > > > Thread 0 call to set API(with invalid owner Y=X) and take lock.
> > > > > Context switch.
> > > > > Thread 1 call to owner_new and change X to X+1(atomically).
> > > > > Context switch.
> > > > > Thread 0 does owner id validation and success(Y<(atomic)X+1) -
> > > > > unlock the
> > > > lock and return success to the  user.
> > > > > Problem!
> > > > >
> > > > > > > With lock no next_id changes can be done while the thread is
> > > > > > > in the set
> > > > > > API.
> > > > > > >
> > > > > > > > But in fact your code is not protected for that scenario -
> > > > > > > > doesn't matter will you'll use lock or atomic ops.
> > > > > > > > Let's considerer your current code with the following scenario:
> > > > > > > >
> > > > > > > > next_owner_id  == 1
> > > > > > > > 1) Process 0:
> > > > > > > >      rte_eth_dev_owner_new(&owner_id);
> > > > > > > >      now owner_id == 1 and next_owner_id == 2
> > > > > > > > 2) Process 1 (by mistake):
> > > > > > > >     rte_eth_dev_owner_set(port_id=1, owner->id=1); It will
> > > > > > > > complete successfully, as owner_id ==1 is considered as valid.
> > > > > > > > 3) Process 0:
> > > > > > > >       rte_eth_dev_owner_set(port_id=1, owner->id=1); It
> > > > > > > > will also complete with success, as owner->id is valid is
> > > > > > > > equal to current port
> > > > > > owner_id.
> > > > > > > > So you finished with 2 processes assuming that they do own
> > > > > > > > exclusively then same port.
> > > > > > > >
> > > > > > > > Honestly in that situation  locking around nest_owner_id
> > > > > > > > wouldn't give you any advantages over atomic ops.
> > > > > > > >
> > > > > > >
> > > > > > > This is a different scenario that we can't protect on it
> > > > > > > with atomic or
> > > > locks.
> > > > > > > But for the first scenario I described I think we can.
> > > > > > > Please read it again, I described it step by step.
> > > > > > >
> > > > > > > > >
> > > > > > > > > > I don't think you can protect yourself against such
> > > > > > > > > > scenario with or without locking.
> > > > > > > > > > Unless you'll make it harder for the mis-behaving
> > > > > > > > > > thread to guess valid owner_id, or add some extra logic
> here.
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > > > The set(and others) ownership APIs already uses
> > > > > > > > > > > > > the ownership lock so I
> > > > > > > > > > > > think it makes sense to use the same lock also in
> > > > > > > > > > > > ID
> > allocation.
> > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > In fact, for next_owner_id, you don't
> > > > > > > > > > > > > > > > > > need a lock - just rte_atomic_t should
> > > > > > > > > > > > > > > > > > be
> > enough.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > I don't think so, it is problematic in
> > > > > > > > > > > > > > > > > next_owner_id wraparound and may
> > > > > > > > > > > > > > > > complicate the code in other places which read it.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > IMO it is not that complicated, something
> > > > > > > > > > > > > > > > like that should work I
> > > > > > > > > > think.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > /* init to 0 at startup*/ rte_atomic32_t
> > > > > > > > > > > > > > > > *owner_id;
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > int new_owner_id(void) {
> > > > > > > > > > > > > > > >     int32_t x;
> > > > > > > > > > > > > > > >     x = rte_atomic32_add_return(&owner_id, 1);
> > > > > > > > > > > > > > > >     if (x > UINT16_MAX) {
> > > > > > > > > > > > > > > >        rte_atomic32_dec(&owner_id);
> > > > > > > > > > > > > > > >        return -EOVERWLOW;
> > > > > > > > > > > > > > > >     } else
> > > > > > > > > > > > > > > >         return x; }
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Why not just to keep it simple and using
> > > > > > > > > > > > > > > > > the same
> > > > lock?
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Lock is also fine, I just think it better
> > > > > > > > > > > > > > > > be a separate one
> > > > > > > > > > > > > > > > - that would protext just next_owner_id.
> > > > > > > > > > > > > > > > Though if you are going to use uuid here -
> > > > > > > > > > > > > > > > all that probably not relevant any more.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > I agree about the uuid but still think the
> > > > > > > > > > > > > > > same lock should be used for
> > > > > > > > > > > > both.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > But with uuid you don't need next_owner_id at
> > > > > > > > > > > > > > all,
> > right?
> > > > > > > > > > > > > > So lock will only be used for
> > > > > > > > > > > > > > rte_eth_dev_data[] fields
> > > > > > anyway.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > Sorry, I meant uint64_t, not uuid.
> > > > > > > > > > > >
> > > > > > > > > > > > Ah ok, my thought uuid_t is better as with it you
> > > > > > > > > > > > don't need to support your own code to allocate
> > > > > > > > > > > > new owner_id, but rely on system libs
> > > > > > > > > > instead.
> > > > > > > > > > > > But wouldn't insist here.
> > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > Another alternative would be to use 2
> > > > > > > > > > > > > > > > > > locks - one for next_owner_id second
> > > > > > > > > > > > > > > > > > for actual data[]
> > > > protection.
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > Another thing - you'll probably need
> > > > > > > > > > > > > > > > > > to
> > > > grab/release
> > > > > > > > > > > > > > > > > > a lock inside
> > > > > > > > > > > > > > > > > > rte_eth_dev_allocated() too.
> > > > > > > > > > > > > > > > > > It is a public function used by
> > > > > > > > > > > > > > > > > > drivers, so need to be protected
> > > > > > > > > > too.
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Yes, I thought about it, but decided not
> > > > > > > > > > > > > > > > > to use lock in
> > > > > > next:
> > > > > > > > > > > > > > > > > rte_eth_dev_allocated rte_eth_dev_count
> > > > > > > > > > > > > > > > > rte_eth_dev_get_name_by_port
> > > > > > > > > > rte_eth_dev_get_port_by_name
> > > > > > > > > > > > > > > > > maybe more...
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > As I can see in patch #3 you protect by
> > > > > > > > > > > > > > > > lock access to rte_eth_dev_data[].name
> > > > > > > > > > > > > > > > (which seems like a good
> > > > > > thing).
> > > > > > > > > > > > > > > > So I think any other public function that
> > > > > > > > > > > > > > > > access rte_eth_dev_data[].name should be
> > > > > > > > > > > > > > > > protected by the
> > > > > > same
> > > > > > > > lock.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > I don't think so, I can understand to use
> > > > > > > > > > > > > > > the ownership lock here(as in port
> > > > > > > > > > > > > > creation) but I don't think it is necessary too.
> > > > > > > > > > > > > > > What are we exactly protecting here?
> > > > > > > > > > > > > > > Don't you think it is just timing?(ask in
> > > > > > > > > > > > > > > the next moment and you may get another
> > > > > > > > > > > > > > > answer) I don't see optional
> > > > crash.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Not sure what you mean here by timing...
> > > > > > > > > > > > > > As I understand rte_eth_dev_data[].name unique
> > > > identifies
> > > > > > > > > > > > > > device and is used by  port
> > > > > > > > > > > > > > allocation/release/find
> > > > functions.
> > > > > > > > > > > > > > As you stated above:
> > > > > > > > > > > > > > "1. The port allocation and port release
> > > > > > > > > > > > > > synchronization will be managed by ethdev."
> > > > > > > > > > > > > > To me it means that ethdev layer has to make
> > > > > > > > > > > > > > sure that all accesses to
> > > > > > > > > > > > > > rte_eth_dev_data[].name are
> > atomic.
> > > > > > > > > > > > > > Otherwise what would prevent the situation
> > > > > > > > > > > > > > when one
> > > > > > process
> > > > > > > > > > > > > > does
> > > > > > > > > > > > > > rte_eth_dev_allocate()-
> > > > >snprintf(rte_eth_dev_data[x].name,
> > > > > > > > > > > > > > ...) while second one does
> > > > > > > > > > > > rte_eth_dev_allocated(rte_eth_dev_data[x].name, ...) ?
> > > > > > > > > > > > > >
> > > > > > > > > > > > > The second will get True or False and that is it.
> > > > > > > > > > > >
> > > > > > > > > > > > Under race condition - in the worst case it might
> > > > > > > > > > > > crash, though for that you'll have to be really unlucky.
> > > > > > > > > > > > Though in most cases as you said it would just not
> > > > > > > > > > > > operate
> > > > > > correctly.
> > > > > > > > > > > > I think if we start to protect dev->name by lock
> > > > > > > > > > > > we need to do it for all instances (both read and write).
> > > > > > > > > > > >
> > > > > > > > > > > Since under the ownership rules, the user must take
> > > > > > > > > > > ownership
> > > > of a
> > > > > > > > > > > port
> > > > > > > > > > before using it, I still don't see a problem here.
> > > > > > > > > >
> > > > > > > > > > I am not talking about owner id or name here.
> > > > > > > > > > I am talking about dev->name.
> > > > > > > > > >
> > > > > > > > > So? The user still should take ownership of a device
> > > > > > > > > before using it
> > > > (by
> > > > > > > > name or by port id).
> > > > > > > > > It can just read it without owning it, but no managing it.
> > > > > > > > >
> > > > > > > > > > > Please, Can you describe specific crash scenario and
> > > > > > > > > > > explain how could the
> > > > > > > > > > locking fix it?
> > > > > > > > > >
> > > > > > > > > > Let say thread 0 doing rte_eth_dev_allocate()-
> > > > > > > > > > >snprintf(rte_eth_dev_data[x].name, ...), thread 1
> > > > > > > > > > >doing
> > > > > > > > > > rte_pmd_ring_remove()->rte_eth_dev_allocated()-
> > >strcmp().
> > > > > > > > > > And because of race condition -
> > > > > > > > > > rte_eth_dev_allocated() will
> > > > return
> > > > > > > > > > rte_eth_dev * for the wrong device.
> > > > > > > > > Which wrong device do you mean? I guess it is the device
> > > > > > > > > which
> > > > > > currently is
> > > > > > > > being created by thread 0.
> > > > > > > > > > Then rte_pmd_ring_remove() will call rte_free() for
> > > > > > > > > > related resources, while It can still be in use by someone
> else.
> > > > > > > > > The rte_pmd_ring_remove caller(some DPDK entity) must
> > > > > > > > > take
> > > > > > ownership
> > > > > > > > > (or validate that he is the owner) of a port before
> > > > > > > > > doing it(free,
> > > > > > release), so
> > > > > > > > no issue here.
> > > > > > > >
> > > > > > > > Forget about ownership for a second.
> > > > > > > > Suppose we have a process it created ring port for itself
> > > > > > > > (without
> > > > setting
> > > > > > any
> > > > > > > > ownership)  and used it for some time.
> > > > > > > > Then it decided to remove it, so it calls
> > > > > > > > rte_pmd_ring_remove()
> > for it.
> > > > > > > > At the same time second process decides to call
> > > > rte_eth_dev_allocate()
> > > > > > (let
> > > > > > > > say for anither ring port).
> > > > > > > > They could collide trying to read (process 0) and modify
> > > > > > > > (process 1)
> > > > same
> > > > > > > > string rte_eth_dev_data[].name.
> > > > > > > >
> > > > > > > Do you mean that process 0 will compare successfully the
> > > > > > > process 1
> > > > new
> > > > > > port name?
> > > > > >
> > > > > > Yes.
> > > > > >
> > > > > > > The state are in local process memory - so process 0 will
> > > > > > > not compare
> > > > the
> > > > > > process 1 port, from its point of view this port is in UNUSED
> > > > > > > state.
> > > > > > >
> > > > > >
> > > > > > Ok, and why it can't be in attached state in process 0 too?
> > > > >
> > > > > Someone in process 0 should attach it using protected
> > > > > attach_secondary
> > > > somewhere in your scenario.
> > > >
> > > > Yes, process 0 can have this port attached too, why not?
> > > See the function with inline comments:
> > >
> > > struct rte_eth_dev *
> > > rte_eth_dev_allocated(const char *name) {
> > > 	unsigned i;
> > >
> > > 	for (i = 0; i < RTE_MAX_ETHPORTS; i++) {
> > >
> > > 	    	The below state are in local process memory,
> > > 		So, if here process 1 will allocate a new port (the current i),
> > update its local state to ATTACHED and write the name,
> > > 		the state is not visible by process 0 until someone in process
> > 0 will attach it by rte_eth_dev_attach_secondary.
> > > 		So, to use rte_eth_dev_attach_secondary process 0 must
> > take the lock
> > > and it can't, because it is currently locked by process 1.
> >
> > Ok I see.
> > Thanks for your patience.
> > BTW, that means that if let say process 0 will call
> > rte_eth_dev_allocate("xxx") and process 1 will call
> > rte_eth_dev_allocate("yyy") we can endup with same port_id be used for
> > different devices and 2 processes will overwrite the same
> rte_eth_dev_data[port_id]?
> 
> No, contrary to the state, the lock itself is in shared memory, so 2 processes
> cannot allocate port in the same time.(you can see it in the next patch of this
> series).
>

Actually I think only one process(primary) should allocate ports, the others should attach them.
The race of port allocation is only between the threads of the primary process.

 
> > Konstantin
> >
> > >
> > > 		if ((rte_eth_devices[i].state == RTE_ETH_DEV_ATTACHED)
> > &&
> > > 		strcmp(rte_eth_devices[i].data->name, name) == 0)
> > > 			return &rte_eth_devices[i];
> > > 	}
> > > 	return NULL;
> > >
> > >

Neil Horman Jan. 18, 2018, 1:10 p.m. UTC | #23

On Wed, Jan 17, 2018 at 05:01:10PM +0000, Ananyev, Konstantin wrote:
> 
> 
> > -----Original Message-----
> > From: Neil Horman [mailto:nhorman@tuxdriver.com]
> > Sent: Wednesday, January 17, 2018 2:00 PM
> > To: Matan Azrad <matan@mellanox.com>
> > Cc: Ananyev, Konstantin <konstantin.ananyev@intel.com>; Thomas Monjalon <thomas@monjalon.net>; Gaetan Rivet
> > <gaetan.rivet@6wind.com>; Wu, Jingjing <jingjing.wu@intel.com>; dev@dpdk.org; Richardson, Bruce <bruce.richardson@intel.com>
> > Subject: Re: [PATCH v2 2/6] ethdev: add port ownership
> > 
> > On Wed, Jan 17, 2018 at 12:05:42PM +0000, Matan Azrad wrote:
> > >
> > > Hi Konstantin
> > > From: Ananyev, Konstantin, Sent: Wednesday, January 17, 2018 1:24 PM
> > > > Hi Matan,
> > > >
> > > > > Hi Konstantin
> > > > >
> > > > > From: Ananyev, Konstantin, Tuesday, January 16, 2018 9:11 PM
> > > > > > Hi Matan,
> > > > > >
> > > > > > >
> > > > > > > Hi Konstantin
> > > > > > > From: Ananyev, Konstantin, Monday, January 15, 2018 8:44 PM
> > > > > > > > Hi Matan,
> > > > > > > > > Hi Konstantin
> > > > > > > > > From: Ananyev, Konstantin, Monday, January 15, 2018 1:45 PM
> > > > > > > > > > Hi Matan,
> > > > > > > > > > > Hi Konstantin
> > > > > > > > > > > From: Ananyev, Konstantin, Friday, January 12, 2018 2:02
> > > > > > > > > > > AM
> > > > > > > > > > > > Hi Matan,
> > > > > > > > > > > > > Hi Konstantin
> > > > > > > > > > > > > From: Ananyev, Konstantin, Thursday, January 11, 2018
> > > > > > > > > > > > > 2:40 PM
> > > > > > > > > > > > > > Hi Matan,
> > > > > > > > > > > > > > > Hi Konstantin
> > > > > > > > > > > > > > > From: Ananyev, Konstantin, Wednesday, January 10,
> > > > > > > > > > > > > > > 2018
> > > > > > > > > > > > > > > 3:36 PM
> > > > > > > > > > > > > > > > Hi Matan,
> > > > > > >  <snip>
> > > > > > > > > > > > > > > > It is good to see that now scanning/updating
> > > > > > > > > > > > > > > > rte_eth_dev_data[] is lock protected, but it
> > > > > > > > > > > > > > > > might be not very plausible to protect both
> > > > > > > > > > > > > > > > data[] and next_owner_id using the
> > > > > > > > > > same lock.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > I guess you mean to the owner structure in
> > > > > > > > > > rte_eth_dev_data[port_id].
> > > > > > > > > > > > > > > The next_owner_id is read by ownership APIs(for
> > > > > > > > > > > > > > > owner validation), so it
> > > > > > > > > > > > > > makes sense to use the same lock.
> > > > > > > > > > > > > > > Actually, why not?
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Well to me next_owner_id and rte_eth_dev_data[] are
> > > > > > > > > > > > > > not directly
> > > > > > > > > > > > related.
> > > > > > > > > > > > > > You may create new owner_id but it doesn't mean you
> > > > > > > > > > > > > > would update rte_eth_dev_data[] immediately.
> > > > > > > > > > > > > > And visa-versa - you might just want to update
> > > > > > > > > > > > > > rte_eth_dev_data[].name or .owner_id.
> > > > > > > > > > > > > > It is not very good coding practice to use same lock
> > > > > > > > > > > > > > for non-related data structures.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > I see the relation like next:
> > > > > > > > > > > > > Since the ownership mechanism synchronization is in
> > > > > > > > > > > > > ethdev responsibility, we must protect against user
> > > > > > > > > > > > > mistakes as much as we can by
> > > > > > > > > > > > using the same lock.
> > > > > > > > > > > > > So, if user try to set by invalid owner (exactly the
> > > > > > > > > > > > > ID which currently is
> > > > > > > > > > > > allocated) we can protect on it.
> > > > > > > > > > > >
> > > > > > > > > > > > Hmm, not sure why you can't do same checking with
> > > > > > > > > > > > different lock or atomic variable?
> > > > > > > > > > > >
> > > > > > > > > > > The set ownership API is protected by ownership lock and
> > > > > > > > > > > checks the owner ID validity By reading the next owner ID.
> > > > > > > > > > > So, the owner ID allocation and set API should use the
> > > > > > > > > > > same atomic
> > > > > > > > > > mechanism.
> > > > > > > > > >
> > > > > > > > > > Sure but all you are doing for checking validity, is  check
> > > > > > > > > > that owner_id > 0 &&& owner_id < next_ownwe_id, right?
> > > > > > > > > > As you don't allow owner_id overlap (16/3248 bits) you can
> > > > > > > > > > safely do same check with just atomic_get(&next_owner_id).
> > > > > > > > > >
> > > > > > > > > It will not protect it, scenario:
> > > > > > > > > - current next_id is X.
> > > > > > > > > - call set ownership of port A with owner id X by thread 0(by
> > > > > > > > > user
> > > > > > mistake).
> > > > > > > > > - context switch
> > > > > > > > > - allocate new id by thread 1 and get X and change next_id to
> > > > > > > > > X+1
> > > > > > > > atomically.
> > > > > > > > > -  context switch
> > > > > > > > > - Thread 0 validate X by atomic_read and succeed to take
> > > > ownership.
> > > > > > > > > - The system loosed the port(or will be managed by two
> > > > > > > > > entities) -
> > > > > > crash.
> > > > > > > >
> > > > > > > >
> > > > > > > > Ok, and how using lock will protect you with such scenario?
> > > > > > >
> > > > > > > The owner set API validation by thread 0 should fail because the
> > > > > > > owner
> > > > > > validation is included in the protected section.
> > > > > >
> > > > > > Then your validation function would fail even if you'll use atomic
> > > > > > ops instead of lock.
> > > > > No.
> > > > > With atomic this specific scenario will cause the validation to pass.
> > > >
> > > > Can you explain to me how?
> > > >
> > > > rte_eth_is_valid_owner_id(uint16_t owner_id) {
> > > >               int32_t cur_owner_id = RTE_MIN(rte_atomic32_get(next_owner_id),
> > > > UINT16_MAX);
> > > >
> > > > 	if (owner_id == RTE_ETH_DEV_NO_OWNER || owner >
> > > > cur_owner_id) {
> > > > 		RTE_LOG(ERR, EAL, "Invalid owner_id=%d.\n", owner_id);
> > > > 		return 0;
> > > > 	}
> > > > 	return 1;
> > > > }
> > > >
> > > > Let say your next_owne_id==X, and you invoke
> > > > rte_eth_is_valid_owner_id(owner_id=X+1)  - it would fail.
> > >
> > > Explanation:
> > > The scenario with locks:
> > > next_owner_id = X.
> > > Thread 0 call to set API(with invalid owner Y=X) and take lock.
> > > Context switch.
> > > Thread 1 call to owner_new and stuck in the lock.
> > > Context switch.
> > > Thread 0 does owner id validation and failed(Y>=X) - unlock the lock and return failure to the user.
> > > Context switch.
> > > Thread 1 take the lock and update X to X+1, then, unlock the lock.
> > > Everything is OK!
> > >
> > > The same scenario with atomics:
> > > next_owner_id = X.
> > > Thread 0 call to set API(with invalid owner Y=X) and take lock.
> > > Context switch.
> > > Thread 1 call to owner_new and change X to X+1(atomically).
> > > Context switch.
> > > Thread 0 does owner id validation and success(Y<(atomic)X+1) - unlock the lock and return success to the  user.
> > > Problem!
> > >
> > 
> > 
> > Matan is correct here, there is no way to preform parallel set operations using
> > just and atomic variable here, because multiple reads of next_owner_id need to
> > be preformed while it is stable.  That is to say rte_eth_next_owner_id must be
> > compared to RTE_ETH_DEV_NO_OWNER and owner_id in rte_eth_is_valid_owner_id.  If
> > you were to only use an atomic_read on such a variable, it could be incremented
> > by the owner_new function between the checks and an invalid owner value could
> > become valid because  a third thread incremented the next value.  The state of
> > next_owner_id must be kept stable during any validity checks
> 
> It could still be incremented between the checks - if let say different thread will
> invoke new_onwer_id, grab the lock update counter, release the lock - all that
> before the check.
I don't see how all of the contents of rte_eth_dev_owner_set is protected under
rte_eth_dev_ownership_lock, as is rte_eth_dev_owner_new.  Next_owner might
increment between another threads calls to owner_new and owner_set, but that
will just cause a transition from an ownership id being valid to invalid, and
thats ok, as long as there is consistency in the model that enforces a single
valid owner at a time (in that case the subsequent caller to owner_new).

Though this confusion does underscore my assertion I think that this API is
overly complicated

Neil

Neil Horman Jan. 18, 2018, 1:20 p.m. UTC | #24

On Wed, Jan 17, 2018 at 05:58:07PM +0000, Matan Azrad wrote:
> 
> Hi Neil
> 
>  From: Neil Horman, Wednesday, January 17, 2018 4:00 PM
> > On Wed, Jan 17, 2018 at 12:05:42PM +0000, Matan Azrad wrote:
> > >
> > > Hi Konstantin
> > > From: Ananyev, Konstantin, Sent: Wednesday, January 17, 2018 1:24 PM
> > > > Hi Matan,
> > > >
> > > > > Hi Konstantin
> > > > >
> > > > > From: Ananyev, Konstantin, Tuesday, January 16, 2018 9:11 PM
> > > > > > Hi Matan,
> > > > > >
> > > > > > >
> > > > > > > Hi Konstantin
> > > > > > > From: Ananyev, Konstantin, Monday, January 15, 2018 8:44 PM
> > > > > > > > Hi Matan,
> > > > > > > > > Hi Konstantin
> > > > > > > > > From: Ananyev, Konstantin, Monday, January 15, 2018 1:45
> > > > > > > > > PM
> > > > > > > > > > Hi Matan,
> > > > > > > > > > > Hi Konstantin
> > > > > > > > > > > From: Ananyev, Konstantin, Friday, January 12, 2018
> > > > > > > > > > > 2:02 AM
> > > > > > > > > > > > Hi Matan,
> > > > > > > > > > > > > Hi Konstantin
> > > > > > > > > > > > > From: Ananyev, Konstantin, Thursday, January 11,
> > > > > > > > > > > > > 2018
> > > > > > > > > > > > > 2:40 PM
> > > > > > > > > > > > > > Hi Matan,
> > > > > > > > > > > > > > > Hi Konstantin
> > > > > > > > > > > > > > > From: Ananyev, Konstantin, Wednesday, January
> > > > > > > > > > > > > > > 10,
> > > > > > > > > > > > > > > 2018
> > > > > > > > > > > > > > > 3:36 PM
> > > > > > > > > > > > > > > > Hi Matan,
> > > > > > >  <snip>
> > > > > > > > > > > > > > > > It is good to see that now scanning/updating
> > > > > > > > > > > > > > > > rte_eth_dev_data[] is lock protected, but it
> > > > > > > > > > > > > > > > might be not very plausible to protect both
> > > > > > > > > > > > > > > > data[] and next_owner_id using the
> > > > > > > > > > same lock.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > I guess you mean to the owner structure in
> > > > > > > > > > rte_eth_dev_data[port_id].
> > > > > > > > > > > > > > > The next_owner_id is read by ownership
> > > > > > > > > > > > > > > APIs(for owner validation), so it
> > > > > > > > > > > > > > makes sense to use the same lock.
> > > > > > > > > > > > > > > Actually, why not?
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Well to me next_owner_id and rte_eth_dev_data[]
> > > > > > > > > > > > > > are not directly
> > > > > > > > > > > > related.
> > > > > > > > > > > > > > You may create new owner_id but it doesn't mean
> > > > > > > > > > > > > > you would update rte_eth_dev_data[] immediately.
> > > > > > > > > > > > > > And visa-versa - you might just want to update
> > > > > > > > > > > > > > rte_eth_dev_data[].name or .owner_id.
> > > > > > > > > > > > > > It is not very good coding practice to use same
> > > > > > > > > > > > > > lock for non-related data structures.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > I see the relation like next:
> > > > > > > > > > > > > Since the ownership mechanism synchronization is
> > > > > > > > > > > > > in ethdev responsibility, we must protect against
> > > > > > > > > > > > > user mistakes as much as we can by
> > > > > > > > > > > > using the same lock.
> > > > > > > > > > > > > So, if user try to set by invalid owner (exactly
> > > > > > > > > > > > > the ID which currently is
> > > > > > > > > > > > allocated) we can protect on it.
> > > > > > > > > > > >
> > > > > > > > > > > > Hmm, not sure why you can't do same checking with
> > > > > > > > > > > > different lock or atomic variable?
> > > > > > > > > > > >
> > > > > > > > > > > The set ownership API is protected by ownership lock
> > > > > > > > > > > and checks the owner ID validity By reading the next owner
> > ID.
> > > > > > > > > > > So, the owner ID allocation and set API should use the
> > > > > > > > > > > same atomic
> > > > > > > > > > mechanism.
> > > > > > > > > >
> > > > > > > > > > Sure but all you are doing for checking validity, is
> > > > > > > > > > check that owner_id > 0 &&& owner_id < next_ownwe_id,
> > right?
> > > > > > > > > > As you don't allow owner_id overlap (16/3248 bits) you
> > > > > > > > > > can safely do same check with just
> > atomic_get(&next_owner_id).
> > > > > > > > > >
> > > > > > > > > It will not protect it, scenario:
> > > > > > > > > - current next_id is X.
> > > > > > > > > - call set ownership of port A with owner id X by thread
> > > > > > > > > 0(by user
> > > > > > mistake).
> > > > > > > > > - context switch
> > > > > > > > > - allocate new id by thread 1 and get X and change next_id
> > > > > > > > > to
> > > > > > > > > X+1
> > > > > > > > atomically.
> > > > > > > > > -  context switch
> > > > > > > > > - Thread 0 validate X by atomic_read and succeed to take
> > > > ownership.
> > > > > > > > > - The system loosed the port(or will be managed by two
> > > > > > > > > entities) -
> > > > > > crash.
> > > > > > > >
> > > > > > > >
> > > > > > > > Ok, and how using lock will protect you with such scenario?
> > > > > > >
> > > > > > > The owner set API validation by thread 0 should fail because
> > > > > > > the owner
> > > > > > validation is included in the protected section.
> > > > > >
> > > > > > Then your validation function would fail even if you'll use
> > > > > > atomic ops instead of lock.
> > > > > No.
> > > > > With atomic this specific scenario will cause the validation to pass.
> > > >
> > > > Can you explain to me how?
> > > >
> > > > rte_eth_is_valid_owner_id(uint16_t owner_id) {
> > > >               int32_t cur_owner_id =
> > > > RTE_MIN(rte_atomic32_get(next_owner_id),
> > > > UINT16_MAX);
> > > >
> > > > 	if (owner_id == RTE_ETH_DEV_NO_OWNER || owner >
> > > > cur_owner_id) {
> > > > 		RTE_LOG(ERR, EAL, "Invalid owner_id=%d.\n", owner_id);
> > > > 		return 0;
> > > > 	}
> > > > 	return 1;
> > > > }
> > > >
> > > > Let say your next_owne_id==X, and you invoke
> > > > rte_eth_is_valid_owner_id(owner_id=X+1)  - it would fail.
> > >
> > > Explanation:
> > > The scenario with locks:
> > > next_owner_id = X.
> > > Thread 0 call to set API(with invalid owner Y=X) and take lock.
> > > Context switch.
> > > Thread 1 call to owner_new and stuck in the lock.
> > > Context switch.
> > > Thread 0 does owner id validation and failed(Y>=X) - unlock the lock and
> > return failure to the user.
> > > Context switch.
> > > Thread 1 take the lock and update X to X+1, then, unlock the lock.
> > > Everything is OK!
> > >
> > > The same scenario with atomics:
> > > next_owner_id = X.
> > > Thread 0 call to set API(with invalid owner Y=X) and take lock.
> > > Context switch.
> > > Thread 1 call to owner_new and change X to X+1(atomically).
> > > Context switch.
> > > Thread 0 does owner id validation and success(Y<(atomic)X+1) - unlock the
> > lock and return success to the  user.
> > > Problem!
> > >
> > 
> > 
> > Matan is correct here, there is no way to preform parallel set operations
> > using just and atomic variable here, because multiple reads of
> > next_owner_id need to be preformed while it is stable.  That is to say
> > rte_eth_next_owner_id must be compared to RTE_ETH_DEV_NO_OWNER
> > and owner_id in rte_eth_is_valid_owner_id.  If you were to only use an
> > atomic_read on such a variable, it could be incremented by the owner_new
> > function between the checks and an invalid owner value could become valid
> > because  a third thread incremented the next value.  The state of
> > next_owner_id must be kept stable during any validity checks
> > 
> > That said, I really have to wonder why ownership ids are really needed here
> > at all.  It seems this design could be much simpler with the addition of a per-
> > port lock (and optional ownership record).  The API could consist of three
> > operations:
> > 
> > ownership_set
> > ownership_tryset
> > ownership_release
> > ownership_get
> > 
> > 
> > The first call simply tries to take the per-port lock (blocking if its already
> > locked)
> > 
> 
> Per port lock is not good because the ownership mechanism must to be synchronized with the port creation\release.
> So the port creation and port ownership should use the same lock.
> 
In what way do you need to synchronize with port creation?  If a port has not
yet been created, then by definition the owner must be the thread calling the
create function.  If you are concerned about the mechanics of the port data
structure (i.e. the fact that rte_eth_devices is statically allocated, you can
add a lock structure to the rte_eth_dev struct and initialize it statically with
RTE_SPINLOCK_INITAIZER()


> I didn't find precedence for blocking function in ethdev.
> 
Then perhaps we don't need that api call.  Perhaps ownership_tryset is enough.

> > The second call is a non-blocking version of the first
> > 
> > The third unlocks the port, allowing others to take ownership
> > 
> > The fourth returns whatever ownership record you want to encode with the
> > lock.
> > 
> > The addition of all this id checking seems a bit overcomplicated
> 
> You miss the identification of the owner - we want to allow info of the owner for printing and easy debug.
> And it is makes sense to manage the owner uniqueness by unique ID.
> 
I specifically pointed that out above.  There is no reason an owernship record
couldn't be added to the rte_eth_dev structure.

> The API already discussed a lot in the previous version, Do you really want, now, to open it again?  
>  
What I want is the most useful and elegant ownership API available.  If you
think what you have is that, so be it.  I only bring this up because the amount
of debate you and Konstantin have had over lock safety causes me to wonder if
this isn't an overly complex design.

Neil


> > Neil
> 
>

Matan Azrad Jan. 18, 2018, 2 p.m. UTC | #25

Hi Neil

From: Neil Horman, Thursday, January 18, 2018 3:10 PM
> On Wed, Jan 17, 2018 at 05:01:10PM +0000, Ananyev, Konstantin wrote:
> >
> >
> > > -----Original Message-----
> > > From: Neil Horman [mailto:nhorman@tuxdriver.com]
> > > Sent: Wednesday, January 17, 2018 2:00 PM
> > > To: Matan Azrad <matan@mellanox.com>
> > > Cc: Ananyev, Konstantin <konstantin.ananyev@intel.com>; Thomas
> > > Monjalon <thomas@monjalon.net>; Gaetan Rivet
> > > <gaetan.rivet@6wind.com>; Wu, Jingjing <jingjing.wu@intel.com>;
> > > dev@dpdk.org; Richardson, Bruce <bruce.richardson@intel.com>
> > > Subject: Re: [PATCH v2 2/6] ethdev: add port ownership
> > >
> > > On Wed, Jan 17, 2018 at 12:05:42PM +0000, Matan Azrad wrote:
> > > >
> > > > Hi Konstantin
> > > > From: Ananyev, Konstantin, Sent: Wednesday, January 17, 2018 1:24
> > > > PM
> > > > > Hi Matan,
> > > > >
> > > > > > Hi Konstantin
> > > > > >
> > > > > > From: Ananyev, Konstantin, Tuesday, January 16, 2018 9:11 PM
> > > > > > > Hi Matan,
> > > > > > >
> > > > > > > >
> > > > > > > > Hi Konstantin
> > > > > > > > From: Ananyev, Konstantin, Monday, January 15, 2018 8:44
> > > > > > > > PM
> > > > > > > > > Hi Matan,
> > > > > > > > > > Hi Konstantin
> > > > > > > > > > From: Ananyev, Konstantin, Monday, January 15, 2018
> > > > > > > > > > 1:45 PM
> > > > > > > > > > > Hi Matan,
> > > > > > > > > > > > Hi Konstantin
> > > > > > > > > > > > From: Ananyev, Konstantin, Friday, January 12,
> > > > > > > > > > > > 2018 2:02 AM
> > > > > > > > > > > > > Hi Matan,
> > > > > > > > > > > > > > Hi Konstantin
> > > > > > > > > > > > > > From: Ananyev, Konstantin, Thursday, January
> > > > > > > > > > > > > > 11, 2018
> > > > > > > > > > > > > > 2:40 PM
> > > > > > > > > > > > > > > Hi Matan,
> > > > > > > > > > > > > > > > Hi Konstantin
> > > > > > > > > > > > > > > > From: Ananyev, Konstantin, Wednesday,
> > > > > > > > > > > > > > > > January 10,
> > > > > > > > > > > > > > > > 2018
> > > > > > > > > > > > > > > > 3:36 PM
> > > > > > > > > > > > > > > > > Hi Matan,
> > > > > > > >  <snip>
> > > > > > > > > > > > > > > > > It is good to see that now
> > > > > > > > > > > > > > > > > scanning/updating rte_eth_dev_data[] is
> > > > > > > > > > > > > > > > > lock protected, but it might be not very
> > > > > > > > > > > > > > > > > plausible to protect both data[] and
> > > > > > > > > > > > > > > > > next_owner_id using the
> > > > > > > > > > > same lock.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > I guess you mean to the owner structure in
> > > > > > > > > > > rte_eth_dev_data[port_id].
> > > > > > > > > > > > > > > > The next_owner_id is read by ownership
> > > > > > > > > > > > > > > > APIs(for owner validation), so it
> > > > > > > > > > > > > > > makes sense to use the same lock.
> > > > > > > > > > > > > > > > Actually, why not?
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Well to me next_owner_id and
> > > > > > > > > > > > > > > rte_eth_dev_data[] are not directly
> > > > > > > > > > > > > related.
> > > > > > > > > > > > > > > You may create new owner_id but it doesn't
> > > > > > > > > > > > > > > mean you would update rte_eth_dev_data[]
> immediately.
> > > > > > > > > > > > > > > And visa-versa - you might just want to
> > > > > > > > > > > > > > > update rte_eth_dev_data[].name or .owner_id.
> > > > > > > > > > > > > > > It is not very good coding practice to use
> > > > > > > > > > > > > > > same lock for non-related data structures.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > I see the relation like next:
> > > > > > > > > > > > > > Since the ownership mechanism synchronization
> > > > > > > > > > > > > > is in ethdev responsibility, we must protect
> > > > > > > > > > > > > > against user mistakes as much as we can by
> > > > > > > > > > > > > using the same lock.
> > > > > > > > > > > > > > So, if user try to set by invalid owner
> > > > > > > > > > > > > > (exactly the ID which currently is
> > > > > > > > > > > > > allocated) we can protect on it.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Hmm, not sure why you can't do same checking
> > > > > > > > > > > > > with different lock or atomic variable?
> > > > > > > > > > > > >
> > > > > > > > > > > > The set ownership API is protected by ownership
> > > > > > > > > > > > lock and checks the owner ID validity By reading the next
> owner ID.
> > > > > > > > > > > > So, the owner ID allocation and set API should use
> > > > > > > > > > > > the same atomic
> > > > > > > > > > > mechanism.
> > > > > > > > > > >
> > > > > > > > > > > Sure but all you are doing for checking validity, is
> > > > > > > > > > > check that owner_id > 0 &&& owner_id < next_ownwe_id,
> right?
> > > > > > > > > > > As you don't allow owner_id overlap (16/3248 bits)
> > > > > > > > > > > you can safely do same check with just
> atomic_get(&next_owner_id).
> > > > > > > > > > >
> > > > > > > > > > It will not protect it, scenario:
> > > > > > > > > > - current next_id is X.
> > > > > > > > > > - call set ownership of port A with owner id X by
> > > > > > > > > > thread 0(by user
> > > > > > > mistake).
> > > > > > > > > > - context switch
> > > > > > > > > > - allocate new id by thread 1 and get X and change
> > > > > > > > > > next_id to
> > > > > > > > > > X+1
> > > > > > > > > atomically.
> > > > > > > > > > -  context switch
> > > > > > > > > > - Thread 0 validate X by atomic_read and succeed to
> > > > > > > > > > take
> > > > > ownership.
> > > > > > > > > > - The system loosed the port(or will be managed by two
> > > > > > > > > > entities) -
> > > > > > > crash.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > Ok, and how using lock will protect you with such scenario?
> > > > > > > >
> > > > > > > > The owner set API validation by thread 0 should fail
> > > > > > > > because the owner
> > > > > > > validation is included in the protected section.
> > > > > > >
> > > > > > > Then your validation function would fail even if you'll use
> > > > > > > atomic ops instead of lock.
> > > > > > No.
> > > > > > With atomic this specific scenario will cause the validation to pass.
> > > > >
> > > > > Can you explain to me how?
> > > > >
> > > > > rte_eth_is_valid_owner_id(uint16_t owner_id) {
> > > > >               int32_t cur_owner_id =
> > > > > RTE_MIN(rte_atomic32_get(next_owner_id),
> > > > > UINT16_MAX);
> > > > >
> > > > > 	if (owner_id == RTE_ETH_DEV_NO_OWNER || owner >
> > > > > cur_owner_id) {
> > > > > 		RTE_LOG(ERR, EAL, "Invalid owner_id=%d.\n", owner_id);
> > > > > 		return 0;
> > > > > 	}
> > > > > 	return 1;
> > > > > }
> > > > >
> > > > > Let say your next_owne_id==X, and you invoke
> > > > > rte_eth_is_valid_owner_id(owner_id=X+1)  - it would fail.
> > > >
> > > > Explanation:
> > > > The scenario with locks:
> > > > next_owner_id = X.
> > > > Thread 0 call to set API(with invalid owner Y=X) and take lock.
> > > > Context switch.
> > > > Thread 1 call to owner_new and stuck in the lock.
> > > > Context switch.
> > > > Thread 0 does owner id validation and failed(Y>=X) - unlock the lock and
> return failure to the user.
> > > > Context switch.
> > > > Thread 1 take the lock and update X to X+1, then, unlock the lock.
> > > > Everything is OK!
> > > >
> > > > The same scenario with atomics:
> > > > next_owner_id = X.
> > > > Thread 0 call to set API(with invalid owner Y=X) and take lock.
> > > > Context switch.
> > > > Thread 1 call to owner_new and change X to X+1(atomically).
> > > > Context switch.
> > > > Thread 0 does owner id validation and success(Y<(atomic)X+1) - unlock
> the lock and return success to the  user.
> > > > Problem!
> > > >
> > >
> > >
> > > Matan is correct here, there is no way to preform parallel set
> > > operations using just and atomic variable here, because multiple
> > > reads of next_owner_id need to be preformed while it is stable.
> > > That is to say rte_eth_next_owner_id must be compared to
> > > RTE_ETH_DEV_NO_OWNER and owner_id in rte_eth_is_valid_owner_id.
> If
> > > you were to only use an atomic_read on such a variable, it could be
> > > incremented by the owner_new function between the checks and an
> > > invalid owner value could become valid because  a third thread
> > > incremented the next value.  The state of next_owner_id must be kept
> > > stable during any validity checks
> >
> > It could still be incremented between the checks - if let say
> > different thread will invoke new_onwer_id, grab the lock update
> > counter, release the lock - all that before the check.
> I don't see how all of the contents of rte_eth_dev_owner_set is protected
> under rte_eth_dev_ownership_lock, as is rte_eth_dev_owner_new.
> Next_owner might increment between another threads calls to owner_new
> and owner_set, but that will just cause a transition from an ownership id
> being valid to invalid, and thats ok, as long as there is consistency in the
> model that enforces a single valid owner at a time (in that case the
> subsequent caller to owner_new).
> 

I'm not sure I fully understand you, but see:
we can't protect all of the user mistakes(using the wrong owner id).
But we are doing the maximum for it.


> Though this confusion does underscore my assertion I think that this API is
> overly complicated
> 

I really don't think it is complicated. - just take ownership of a port(by owner id allocation and set APIs) and manage the port as you want. 

> Neil

Ananyev, Konstantin Jan. 18, 2018, 2:17 p.m. UTC | #26

Hi Matan,

> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > Another thing - you'll probably need
> > > > > > > > > > > > > > > > > > > to
> > > > > grab/release
> > > > > > > > > > > > > > > > > > > a lock inside
> > > > > > > > > > > > > > > > > > > rte_eth_dev_allocated() too.
> > > > > > > > > > > > > > > > > > > It is a public function used by
> > > > > > > > > > > > > > > > > > > drivers, so need to be protected
> > > > > > > > > > > too.
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > Yes, I thought about it, but decided not
> > > > > > > > > > > > > > > > > > to use lock in
> > > > > > > next:
> > > > > > > > > > > > > > > > > > rte_eth_dev_allocated rte_eth_dev_count
> > > > > > > > > > > > > > > > > > rte_eth_dev_get_name_by_port
> > > > > > > > > > > rte_eth_dev_get_port_by_name
> > > > > > > > > > > > > > > > > > maybe more...
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > As I can see in patch #3 you protect by
> > > > > > > > > > > > > > > > > lock access to rte_eth_dev_data[].name
> > > > > > > > > > > > > > > > > (which seems like a good
> > > > > > > thing).
> > > > > > > > > > > > > > > > > So I think any other public function that
> > > > > > > > > > > > > > > > > access rte_eth_dev_data[].name should be
> > > > > > > > > > > > > > > > > protected by the
> > > > > > > same
> > > > > > > > > lock.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > I don't think so, I can understand to use
> > > > > > > > > > > > > > > > the ownership lock here(as in port
> > > > > > > > > > > > > > > creation) but I don't think it is necessary too.
> > > > > > > > > > > > > > > > What are we exactly protecting here?
> > > > > > > > > > > > > > > > Don't you think it is just timing?(ask in
> > > > > > > > > > > > > > > > the next moment and you may get another
> > > > > > > > > > > > > > > > answer) I don't see optional
> > > > > crash.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Not sure what you mean here by timing...
> > > > > > > > > > > > > > > As I understand rte_eth_dev_data[].name unique
> > > > > identifies
> > > > > > > > > > > > > > > device and is used by  port
> > > > > > > > > > > > > > > allocation/release/find
> > > > > functions.
> > > > > > > > > > > > > > > As you stated above:
> > > > > > > > > > > > > > > "1. The port allocation and port release
> > > > > > > > > > > > > > > synchronization will be managed by ethdev."
> > > > > > > > > > > > > > > To me it means that ethdev layer has to make
> > > > > > > > > > > > > > > sure that all accesses to
> > > > > > > > > > > > > > > rte_eth_dev_data[].name are
> > > atomic.
> > > > > > > > > > > > > > > Otherwise what would prevent the situation
> > > > > > > > > > > > > > > when one
> > > > > > > process
> > > > > > > > > > > > > > > does
> > > > > > > > > > > > > > > rte_eth_dev_allocate()-
> > > > > >snprintf(rte_eth_dev_data[x].name,
> > > > > > > > > > > > > > > ...) while second one does
> > > > > > > > > > > > > rte_eth_dev_allocated(rte_eth_dev_data[x].name, ...) ?
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > The second will get True or False and that is it.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Under race condition - in the worst case it might
> > > > > > > > > > > > > crash, though for that you'll have to be really unlucky.
> > > > > > > > > > > > > Though in most cases as you said it would just not
> > > > > > > > > > > > > operate
> > > > > > > correctly.
> > > > > > > > > > > > > I think if we start to protect dev->name by lock
> > > > > > > > > > > > > we need to do it for all instances (both read and write).
> > > > > > > > > > > > >
> > > > > > > > > > > > Since under the ownership rules, the user must take
> > > > > > > > > > > > ownership
> > > > > of a
> > > > > > > > > > > > port
> > > > > > > > > > > before using it, I still don't see a problem here.
> > > > > > > > > > >
> > > > > > > > > > > I am not talking about owner id or name here.
> > > > > > > > > > > I am talking about dev->name.
> > > > > > > > > > >
> > > > > > > > > > So? The user still should take ownership of a device
> > > > > > > > > > before using it
> > > > > (by
> > > > > > > > > name or by port id).
> > > > > > > > > > It can just read it without owning it, but no managing it.
> > > > > > > > > >
> > > > > > > > > > > > Please, Can you describe specific crash scenario and
> > > > > > > > > > > > explain how could the
> > > > > > > > > > > locking fix it?
> > > > > > > > > > >
> > > > > > > > > > > Let say thread 0 doing rte_eth_dev_allocate()-
> > > > > > > > > > > >snprintf(rte_eth_dev_data[x].name, ...), thread 1
> > > > > > > > > > > >doing
> > > > > > > > > > > rte_pmd_ring_remove()->rte_eth_dev_allocated()-
> > > >strcmp().
> > > > > > > > > > > And because of race condition -
> > > > > > > > > > > rte_eth_dev_allocated() will
> > > > > return
> > > > > > > > > > > rte_eth_dev * for the wrong device.
> > > > > > > > > > Which wrong device do you mean? I guess it is the device
> > > > > > > > > > which
> > > > > > > currently is
> > > > > > > > > being created by thread 0.
> > > > > > > > > > > Then rte_pmd_ring_remove() will call rte_free() for
> > > > > > > > > > > related resources, while It can still be in use by someone
> > else.
> > > > > > > > > > The rte_pmd_ring_remove caller(some DPDK entity) must
> > > > > > > > > > take
> > > > > > > ownership
> > > > > > > > > > (or validate that he is the owner) of a port before
> > > > > > > > > > doing it(free,
> > > > > > > release), so
> > > > > > > > > no issue here.
> > > > > > > > >
> > > > > > > > > Forget about ownership for a second.
> > > > > > > > > Suppose we have a process it created ring port for itself
> > > > > > > > > (without
> > > > > setting
> > > > > > > any
> > > > > > > > > ownership)  and used it for some time.
> > > > > > > > > Then it decided to remove it, so it calls
> > > > > > > > > rte_pmd_ring_remove()
> > > for it.
> > > > > > > > > At the same time second process decides to call
> > > > > rte_eth_dev_allocate()
> > > > > > > (let
> > > > > > > > > say for anither ring port).
> > > > > > > > > They could collide trying to read (process 0) and modify
> > > > > > > > > (process 1)
> > > > > same
> > > > > > > > > string rte_eth_dev_data[].name.
> > > > > > > > >
> > > > > > > > Do you mean that process 0 will compare successfully the
> > > > > > > > process 1
> > > > > new
> > > > > > > port name?
> > > > > > >
> > > > > > > Yes.
> > > > > > >
> > > > > > > > The state are in local process memory - so process 0 will
> > > > > > > > not compare
> > > > > the
> > > > > > > process 1 port, from its point of view this port is in UNUSED
> > > > > > > > state.
> > > > > > > >
> > > > > > >
> > > > > > > Ok, and why it can't be in attached state in process 0 too?
> > > > > >
> > > > > > Someone in process 0 should attach it using protected
> > > > > > attach_secondary
> > > > > somewhere in your scenario.
> > > > >
> > > > > Yes, process 0 can have this port attached too, why not?
> > > > See the function with inline comments:
> > > >
> > > > struct rte_eth_dev *
> > > > rte_eth_dev_allocated(const char *name) {
> > > > 	unsigned i;
> > > >
> > > > 	for (i = 0; i < RTE_MAX_ETHPORTS; i++) {
> > > >
> > > > 	    	The below state are in local process memory,
> > > > 		So, if here process 1 will allocate a new port (the current i),
> > > update its local state to ATTACHED and write the name,
> > > > 		the state is not visible by process 0 until someone in process
> > > 0 will attach it by rte_eth_dev_attach_secondary.
> > > > 		So, to use rte_eth_dev_attach_secondary process 0 must
> > > take the lock
> > > > and it can't, because it is currently locked by process 1.
> > >
> > > Ok I see.
> > > Thanks for your patience.
> > > BTW, that means that if let say process 0 will call
> > > rte_eth_dev_allocate("xxx") and process 1 will call
> > > rte_eth_dev_allocate("yyy") we can endup with same port_id be used for
> > > different devices and 2 processes will overwrite the same
> > rte_eth_dev_data[port_id]?
> >
> > No, contrary to the state, the lock itself is in shared memory, so 2 processes
> > cannot allocate port in the same time.(you can see it in the next patch of this
> > series).

I am not talking about racing here.
Let say process 0 calls rte_pmd_ring_probe()->....->rte_eth_dev_allocate("xxx")
rte_eth_dev_allocate() finds that port N is 'free', i.e.
local rte_eth_devices[N].state == RTE_ETH_DEV_UNUSED
so it assigns new dev ("xxx") to port N.
Then after some time process 1 calls rte_pmd_ring_probe()->....->rte_eth_dev_allocate("yyy").
From its perspective port N is still free:  rte_eth_devices[N].state == RTE_ETH_DEV_UNUSED,
so it will assign new dev ("yyy") to the same port.
Konstantin
  

> >
> 
> Actually I think only one process(primary) should allocate ports, the others should attach them.
> The race of port allocation is only between the threads of the primary process.
> 
> 
> > > Konstantin
> > >
> > > >
> > > > 		if ((rte_eth_devices[i].state == RTE_ETH_DEV_ATTACHED)
> > > &&
> > > > 		strcmp(rte_eth_devices[i].data->name, name) == 0)
> > > > 			return &rte_eth_devices[i];
> > > > 	}
> > > > 	return NULL;
> > > >
> > > >

Matan Azrad Jan. 18, 2018, 2:26 p.m. UTC | #27

Hi Konstantine

> Hi Matan,
> 
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > Another thing - you'll probably
> > > > > > > > > > > > > > > > > > > > need to
> > > > > > grab/release
> > > > > > > > > > > > > > > > > > > > a lock inside
> > > > > > > > > > > > > > > > > > > > rte_eth_dev_allocated() too.
> > > > > > > > > > > > > > > > > > > > It is a public function used by
> > > > > > > > > > > > > > > > > > > > drivers, so need to be protected
> > > > > > > > > > > > too.
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > Yes, I thought about it, but decided
> > > > > > > > > > > > > > > > > > > not to use lock in
> > > > > > > > next:
> > > > > > > > > > > > > > > > > > > rte_eth_dev_allocated
> > > > > > > > > > > > > > > > > > > rte_eth_dev_count
> > > > > > > > > > > > > > > > > > > rte_eth_dev_get_name_by_port
> > > > > > > > > > > > rte_eth_dev_get_port_by_name
> > > > > > > > > > > > > > > > > > > maybe more...
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > As I can see in patch #3 you protect
> > > > > > > > > > > > > > > > > > by lock access to
> > > > > > > > > > > > > > > > > > rte_eth_dev_data[].name (which seems
> > > > > > > > > > > > > > > > > > like a good
> > > > > > > > thing).
> > > > > > > > > > > > > > > > > > So I think any other public function
> > > > > > > > > > > > > > > > > > that access rte_eth_dev_data[].name
> > > > > > > > > > > > > > > > > > should be protected by the
> > > > > > > > same
> > > > > > > > > > lock.
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > I don't think so, I can understand to
> > > > > > > > > > > > > > > > > use the ownership lock here(as in port
> > > > > > > > > > > > > > > > creation) but I don't think it is necessary too.
> > > > > > > > > > > > > > > > > What are we exactly protecting here?
> > > > > > > > > > > > > > > > > Don't you think it is just timing?(ask
> > > > > > > > > > > > > > > > > in the next moment and you may get
> > > > > > > > > > > > > > > > > another
> > > > > > > > > > > > > > > > > answer) I don't see optional
> > > > > > crash.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Not sure what you mean here by timing...
> > > > > > > > > > > > > > > > As I understand rte_eth_dev_data[].name
> > > > > > > > > > > > > > > > unique
> > > > > > identifies
> > > > > > > > > > > > > > > > device and is used by  port
> > > > > > > > > > > > > > > > allocation/release/find
> > > > > > functions.
> > > > > > > > > > > > > > > > As you stated above:
> > > > > > > > > > > > > > > > "1. The port allocation and port release
> > > > > > > > > > > > > > > > synchronization will be managed by ethdev."
> > > > > > > > > > > > > > > > To me it means that ethdev layer has to
> > > > > > > > > > > > > > > > make sure that all accesses to
> > > > > > > > > > > > > > > > rte_eth_dev_data[].name are
> > > > atomic.
> > > > > > > > > > > > > > > > Otherwise what would prevent the situation
> > > > > > > > > > > > > > > > when one
> > > > > > > > process
> > > > > > > > > > > > > > > > does
> > > > > > > > > > > > > > > > rte_eth_dev_allocate()-
> > > > > > >snprintf(rte_eth_dev_data[x].name,
> > > > > > > > > > > > > > > > ...) while second one does
> > > > > > > > > > > > > > rte_eth_dev_allocated(rte_eth_dev_data[x].name,
> ...) ?
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > The second will get True or False and that is it.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Under race condition - in the worst case it
> > > > > > > > > > > > > > might crash, though for that you'll have to be really
> unlucky.
> > > > > > > > > > > > > > Though in most cases as you said it would just
> > > > > > > > > > > > > > not operate
> > > > > > > > correctly.
> > > > > > > > > > > > > > I think if we start to protect dev->name by
> > > > > > > > > > > > > > lock we need to do it for all instances (both read and
> write).
> > > > > > > > > > > > > >
> > > > > > > > > > > > > Since under the ownership rules, the user must
> > > > > > > > > > > > > take ownership
> > > > > > of a
> > > > > > > > > > > > > port
> > > > > > > > > > > > before using it, I still don't see a problem here.
> > > > > > > > > > > >
> > > > > > > > > > > > I am not talking about owner id or name here.
> > > > > > > > > > > > I am talking about dev->name.
> > > > > > > > > > > >
> > > > > > > > > > > So? The user still should take ownership of a device
> > > > > > > > > > > before using it
> > > > > > (by
> > > > > > > > > > name or by port id).
> > > > > > > > > > > It can just read it without owning it, but no managing it.
> > > > > > > > > > >
> > > > > > > > > > > > > Please, Can you describe specific crash scenario
> > > > > > > > > > > > > and explain how could the
> > > > > > > > > > > > locking fix it?
> > > > > > > > > > > >
> > > > > > > > > > > > Let say thread 0 doing rte_eth_dev_allocate()-
> > > > > > > > > > > > >snprintf(rte_eth_dev_data[x].name, ...), thread 1
> > > > > > > > > > > > >doing
> > > > > > > > > > > > rte_pmd_ring_remove()->rte_eth_dev_allocated()-
> > > > >strcmp().
> > > > > > > > > > > > And because of race condition -
> > > > > > > > > > > > rte_eth_dev_allocated() will
> > > > > > return
> > > > > > > > > > > > rte_eth_dev * for the wrong device.
> > > > > > > > > > > Which wrong device do you mean? I guess it is the
> > > > > > > > > > > device which
> > > > > > > > currently is
> > > > > > > > > > being created by thread 0.
> > > > > > > > > > > > Then rte_pmd_ring_remove() will call rte_free()
> > > > > > > > > > > > for related resources, while It can still be in
> > > > > > > > > > > > use by someone
> > > else.
> > > > > > > > > > > The rte_pmd_ring_remove caller(some DPDK entity)
> > > > > > > > > > > must take
> > > > > > > > ownership
> > > > > > > > > > > (or validate that he is the owner) of a port before
> > > > > > > > > > > doing it(free,
> > > > > > > > release), so
> > > > > > > > > > no issue here.
> > > > > > > > > >
> > > > > > > > > > Forget about ownership for a second.
> > > > > > > > > > Suppose we have a process it created ring port for
> > > > > > > > > > itself (without
> > > > > > setting
> > > > > > > > any
> > > > > > > > > > ownership)  and used it for some time.
> > > > > > > > > > Then it decided to remove it, so it calls
> > > > > > > > > > rte_pmd_ring_remove()
> > > > for it.
> > > > > > > > > > At the same time second process decides to call
> > > > > > rte_eth_dev_allocate()
> > > > > > > > (let
> > > > > > > > > > say for anither ring port).
> > > > > > > > > > They could collide trying to read (process 0) and
> > > > > > > > > > modify (process 1)
> > > > > > same
> > > > > > > > > > string rte_eth_dev_data[].name.
> > > > > > > > > >
> > > > > > > > > Do you mean that process 0 will compare successfully the
> > > > > > > > > process 1
> > > > > > new
> > > > > > > > port name?
> > > > > > > >
> > > > > > > > Yes.
> > > > > > > >
> > > > > > > > > The state are in local process memory - so process 0
> > > > > > > > > will not compare
> > > > > > the
> > > > > > > > process 1 port, from its point of view this port is in
> > > > > > > > UNUSED
> > > > > > > > > state.
> > > > > > > > >
> > > > > > > >
> > > > > > > > Ok, and why it can't be in attached state in process 0 too?
> > > > > > >
> > > > > > > Someone in process 0 should attach it using protected
> > > > > > > attach_secondary
> > > > > > somewhere in your scenario.
> > > > > >
> > > > > > Yes, process 0 can have this port attached too, why not?
> > > > > See the function with inline comments:
> > > > >
> > > > > struct rte_eth_dev *
> > > > > rte_eth_dev_allocated(const char *name) {
> > > > > 	unsigned i;
> > > > >
> > > > > 	for (i = 0; i < RTE_MAX_ETHPORTS; i++) {
> > > > >
> > > > > 	    	The below state are in local process memory,
> > > > > 		So, if here process 1 will allocate a new port (the current
> > > > > i),
> > > > update its local state to ATTACHED and write the name,
> > > > > 		the state is not visible by process 0 until someone in process
> > > > 0 will attach it by rte_eth_dev_attach_secondary.
> > > > > 		So, to use rte_eth_dev_attach_secondary process 0 must
> > > > take the lock
> > > > > and it can't, because it is currently locked by process 1.
> > > >
> > > > Ok I see.
> > > > Thanks for your patience.
> > > > BTW, that means that if let say process 0 will call
> > > > rte_eth_dev_allocate("xxx") and process 1 will call
> > > > rte_eth_dev_allocate("yyy") we can endup with same port_id be used
> > > > for different devices and 2 processes will overwrite the same
> > > rte_eth_dev_data[port_id]?
> > >
> > > No, contrary to the state, the lock itself is in shared memory, so 2
> > > processes cannot allocate port in the same time.(you can see it in
> > > the next patch of this series).
> 
> I am not talking about racing here.
> Let say process 0 calls rte_pmd_ring_probe()->....-
> >rte_eth_dev_allocate("xxx")
> rte_eth_dev_allocate() finds that port N is 'free', i.e.
> local rte_eth_devices[N].state == RTE_ETH_DEV_UNUSED so it assigns new
> dev ("xxx") to port N.
> Then after some time process 1 calls rte_pmd_ring_probe()->....-
> >rte_eth_dev_allocate("yyy").
> From its perspective port N is still free:  rte_eth_devices[N].state ==
> RTE_ETH_DEV_UNUSED, so it will assign new dev ("yyy") to the same port.
> 

Yes you right, this is a problem(not related actually to port ownership) but look:
As I understand the secondary processes are not allowed to create a ports and they must to use attach_secondary API, but there is not hardcoded check which prevent them to do it.


Konstantin
> 
> 
> > >
> >
> > Actually I think only one process(primary) should allocate ports, the others
> should attach them.
> > The race of port allocation is only between the threads of the primary
> process.
> >
> >
> > > > Konstantin
> > > >
> > > > >
> > > > > 		if ((rte_eth_devices[i].state == RTE_ETH_DEV_ATTACHED)
> > > > &&
> > > > > 		strcmp(rte_eth_devices[i].data->name, name) == 0)
> > > > > 			return &rte_eth_devices[i];
> > > > > 	}
> > > > > 	return NULL;
> > > > >
> > > > >

Ananyev, Konstantin Jan. 18, 2018, 2:41 p.m. UTC | #28

> 
> Hi Konstantine
> 
> > Hi Matan,
> >
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > Another thing - you'll probably
> > > > > > > > > > > > > > > > > > > > > need to
> > > > > > > grab/release
> > > > > > > > > > > > > > > > > > > > > a lock inside
> > > > > > > > > > > > > > > > > > > > > rte_eth_dev_allocated() too.
> > > > > > > > > > > > > > > > > > > > > It is a public function used by
> > > > > > > > > > > > > > > > > > > > > drivers, so need to be protected
> > > > > > > > > > > > > too.
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > Yes, I thought about it, but decided
> > > > > > > > > > > > > > > > > > > > not to use lock in
> > > > > > > > > next:
> > > > > > > > > > > > > > > > > > > > rte_eth_dev_allocated
> > > > > > > > > > > > > > > > > > > > rte_eth_dev_count
> > > > > > > > > > > > > > > > > > > > rte_eth_dev_get_name_by_port
> > > > > > > > > > > > > rte_eth_dev_get_port_by_name
> > > > > > > > > > > > > > > > > > > > maybe more...
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > As I can see in patch #3 you protect
> > > > > > > > > > > > > > > > > > > by lock access to
> > > > > > > > > > > > > > > > > > > rte_eth_dev_data[].name (which seems
> > > > > > > > > > > > > > > > > > > like a good
> > > > > > > > > thing).
> > > > > > > > > > > > > > > > > > > So I think any other public function
> > > > > > > > > > > > > > > > > > > that access rte_eth_dev_data[].name
> > > > > > > > > > > > > > > > > > > should be protected by the
> > > > > > > > > same
> > > > > > > > > > > lock.
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > I don't think so, I can understand to
> > > > > > > > > > > > > > > > > > use the ownership lock here(as in port
> > > > > > > > > > > > > > > > > creation) but I don't think it is necessary too.
> > > > > > > > > > > > > > > > > > What are we exactly protecting here?
> > > > > > > > > > > > > > > > > > Don't you think it is just timing?(ask
> > > > > > > > > > > > > > > > > > in the next moment and you may get
> > > > > > > > > > > > > > > > > > another
> > > > > > > > > > > > > > > > > > answer) I don't see optional
> > > > > > > crash.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Not sure what you mean here by timing...
> > > > > > > > > > > > > > > > > As I understand rte_eth_dev_data[].name
> > > > > > > > > > > > > > > > > unique
> > > > > > > identifies
> > > > > > > > > > > > > > > > > device and is used by  port
> > > > > > > > > > > > > > > > > allocation/release/find
> > > > > > > functions.
> > > > > > > > > > > > > > > > > As you stated above:
> > > > > > > > > > > > > > > > > "1. The port allocation and port release
> > > > > > > > > > > > > > > > > synchronization will be managed by ethdev."
> > > > > > > > > > > > > > > > > To me it means that ethdev layer has to
> > > > > > > > > > > > > > > > > make sure that all accesses to
> > > > > > > > > > > > > > > > > rte_eth_dev_data[].name are
> > > > > atomic.
> > > > > > > > > > > > > > > > > Otherwise what would prevent the situation
> > > > > > > > > > > > > > > > > when one
> > > > > > > > > process
> > > > > > > > > > > > > > > > > does
> > > > > > > > > > > > > > > > > rte_eth_dev_allocate()-
> > > > > > > >snprintf(rte_eth_dev_data[x].name,
> > > > > > > > > > > > > > > > > ...) while second one does
> > > > > > > > > > > > > > > rte_eth_dev_allocated(rte_eth_dev_data[x].name,
> > ...) ?
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > The second will get True or False and that is it.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Under race condition - in the worst case it
> > > > > > > > > > > > > > > might crash, though for that you'll have to be really
> > unlucky.
> > > > > > > > > > > > > > > Though in most cases as you said it would just
> > > > > > > > > > > > > > > not operate
> > > > > > > > > correctly.
> > > > > > > > > > > > > > > I think if we start to protect dev->name by
> > > > > > > > > > > > > > > lock we need to do it for all instances (both read and
> > write).
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > Since under the ownership rules, the user must
> > > > > > > > > > > > > > take ownership
> > > > > > > of a
> > > > > > > > > > > > > > port
> > > > > > > > > > > > > before using it, I still don't see a problem here.
> > > > > > > > > > > > >
> > > > > > > > > > > > > I am not talking about owner id or name here.
> > > > > > > > > > > > > I am talking about dev->name.
> > > > > > > > > > > > >
> > > > > > > > > > > > So? The user still should take ownership of a device
> > > > > > > > > > > > before using it
> > > > > > > (by
> > > > > > > > > > > name or by port id).
> > > > > > > > > > > > It can just read it without owning it, but no managing it.
> > > > > > > > > > > >
> > > > > > > > > > > > > > Please, Can you describe specific crash scenario
> > > > > > > > > > > > > > and explain how could the
> > > > > > > > > > > > > locking fix it?
> > > > > > > > > > > > >
> > > > > > > > > > > > > Let say thread 0 doing rte_eth_dev_allocate()-
> > > > > > > > > > > > > >snprintf(rte_eth_dev_data[x].name, ...), thread 1
> > > > > > > > > > > > > >doing
> > > > > > > > > > > > > rte_pmd_ring_remove()->rte_eth_dev_allocated()-
> > > > > >strcmp().
> > > > > > > > > > > > > And because of race condition -
> > > > > > > > > > > > > rte_eth_dev_allocated() will
> > > > > > > return
> > > > > > > > > > > > > rte_eth_dev * for the wrong device.
> > > > > > > > > > > > Which wrong device do you mean? I guess it is the
> > > > > > > > > > > > device which
> > > > > > > > > currently is
> > > > > > > > > > > being created by thread 0.
> > > > > > > > > > > > > Then rte_pmd_ring_remove() will call rte_free()
> > > > > > > > > > > > > for related resources, while It can still be in
> > > > > > > > > > > > > use by someone
> > > > else.
> > > > > > > > > > > > The rte_pmd_ring_remove caller(some DPDK entity)
> > > > > > > > > > > > must take
> > > > > > > > > ownership
> > > > > > > > > > > > (or validate that he is the owner) of a port before
> > > > > > > > > > > > doing it(free,
> > > > > > > > > release), so
> > > > > > > > > > > no issue here.
> > > > > > > > > > >
> > > > > > > > > > > Forget about ownership for a second.
> > > > > > > > > > > Suppose we have a process it created ring port for
> > > > > > > > > > > itself (without
> > > > > > > setting
> > > > > > > > > any
> > > > > > > > > > > ownership)  and used it for some time.
> > > > > > > > > > > Then it decided to remove it, so it calls
> > > > > > > > > > > rte_pmd_ring_remove()
> > > > > for it.
> > > > > > > > > > > At the same time second process decides to call
> > > > > > > rte_eth_dev_allocate()
> > > > > > > > > (let
> > > > > > > > > > > say for anither ring port).
> > > > > > > > > > > They could collide trying to read (process 0) and
> > > > > > > > > > > modify (process 1)
> > > > > > > same
> > > > > > > > > > > string rte_eth_dev_data[].name.
> > > > > > > > > > >
> > > > > > > > > > Do you mean that process 0 will compare successfully the
> > > > > > > > > > process 1
> > > > > > > new
> > > > > > > > > port name?
> > > > > > > > >
> > > > > > > > > Yes.
> > > > > > > > >
> > > > > > > > > > The state are in local process memory - so process 0
> > > > > > > > > > will not compare
> > > > > > > the
> > > > > > > > > process 1 port, from its point of view this port is in
> > > > > > > > > UNUSED
> > > > > > > > > > state.
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > > Ok, and why it can't be in attached state in process 0 too?
> > > > > > > >
> > > > > > > > Someone in process 0 should attach it using protected
> > > > > > > > attach_secondary
> > > > > > > somewhere in your scenario.
> > > > > > >
> > > > > > > Yes, process 0 can have this port attached too, why not?
> > > > > > See the function with inline comments:
> > > > > >
> > > > > > struct rte_eth_dev *
> > > > > > rte_eth_dev_allocated(const char *name) {
> > > > > > 	unsigned i;
> > > > > >
> > > > > > 	for (i = 0; i < RTE_MAX_ETHPORTS; i++) {
> > > > > >
> > > > > > 	    	The below state are in local process memory,
> > > > > > 		So, if here process 1 will allocate a new port (the current
> > > > > > i),
> > > > > update its local state to ATTACHED and write the name,
> > > > > > 		the state is not visible by process 0 until someone in process
> > > > > 0 will attach it by rte_eth_dev_attach_secondary.
> > > > > > 		So, to use rte_eth_dev_attach_secondary process 0 must
> > > > > take the lock
> > > > > > and it can't, because it is currently locked by process 1.
> > > > >
> > > > > Ok I see.
> > > > > Thanks for your patience.
> > > > > BTW, that means that if let say process 0 will call
> > > > > rte_eth_dev_allocate("xxx") and process 1 will call
> > > > > rte_eth_dev_allocate("yyy") we can endup with same port_id be used
> > > > > for different devices and 2 processes will overwrite the same
> > > > rte_eth_dev_data[port_id]?
> > > >
> > > > No, contrary to the state, the lock itself is in shared memory, so 2
> > > > processes cannot allocate port in the same time.(you can see it in
> > > > the next patch of this series).
> >
> > I am not talking about racing here.
> > Let say process 0 calls rte_pmd_ring_probe()->....-
> > >rte_eth_dev_allocate("xxx")
> > rte_eth_dev_allocate() finds that port N is 'free', i.e.
> > local rte_eth_devices[N].state == RTE_ETH_DEV_UNUSED so it assigns new
> > dev ("xxx") to port N.
> > Then after some time process 1 calls rte_pmd_ring_probe()->....-
> > >rte_eth_dev_allocate("yyy").
> > From its perspective port N is still free:  rte_eth_devices[N].state ==
> > RTE_ETH_DEV_UNUSED, so it will assign new dev ("yyy") to the same port.
> >
> 
> Yes you right, this is a problem(not related actually to port ownership)

Yep that's true - it was there before your patches.

> but look:
> As I understand the secondary processes are not allowed to create a ports and they must to use attach_secondary API, but there is not
> hardcoded check which prevent them to do it.

Secondary processes ae the ability to allocate their own vdevs and probably it should stay like that.
I just thought it is a good opportunity to fix it while you are on these changes anyway,
but ok we can leave it for now.
 
Konstantin

Matan Azrad Jan. 18, 2018, 2:45 p.m. UTC | #29

HI

From: Ananyev, Konstantin, Thursday, January 18, 2018 4:42 PM
> > Hi Konstantine
> >
> > > Hi Matan,
> > >
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > Another thing - you'll
> > > > > > > > > > > > > > > > > > > > > > probably need to
> > > > > > > > grab/release
> > > > > > > > > > > > > > > > > > > > > > a lock inside
> > > > > > > > > > > > > > > > > > > > > > rte_eth_dev_allocated() too.
> > > > > > > > > > > > > > > > > > > > > > It is a public function used
> > > > > > > > > > > > > > > > > > > > > > by drivers, so need to be
> > > > > > > > > > > > > > > > > > > > > > protected
> > > > > > > > > > > > > > too.
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > Yes, I thought about it, but
> > > > > > > > > > > > > > > > > > > > > decided not to use lock in
> > > > > > > > > > next:
> > > > > > > > > > > > > > > > > > > > > rte_eth_dev_allocated
> > > > > > > > > > > > > > > > > > > > > rte_eth_dev_count
> > > > > > > > > > > > > > > > > > > > > rte_eth_dev_get_name_by_port
> > > > > > > > > > > > > > rte_eth_dev_get_port_by_name
> > > > > > > > > > > > > > > > > > > > > maybe more...
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > As I can see in patch #3 you
> > > > > > > > > > > > > > > > > > > > protect by lock access to
> > > > > > > > > > > > > > > > > > > > rte_eth_dev_data[].name (which
> > > > > > > > > > > > > > > > > > > > seems like a good
> > > > > > > > > > thing).
> > > > > > > > > > > > > > > > > > > > So I think any other public
> > > > > > > > > > > > > > > > > > > > function that access
> > > > > > > > > > > > > > > > > > > > rte_eth_dev_data[].name should be
> > > > > > > > > > > > > > > > > > > > protected by the
> > > > > > > > > > same
> > > > > > > > > > > > lock.
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > I don't think so, I can understand
> > > > > > > > > > > > > > > > > > > to use the ownership lock here(as in
> > > > > > > > > > > > > > > > > > > port
> > > > > > > > > > > > > > > > > > creation) but I don't think it is necessary too.
> > > > > > > > > > > > > > > > > > > What are we exactly protecting here?
> > > > > > > > > > > > > > > > > > > Don't you think it is just
> > > > > > > > > > > > > > > > > > > timing?(ask in the next moment and
> > > > > > > > > > > > > > > > > > > you may get another
> > > > > > > > > > > > > > > > > > > answer) I don't see optional
> > > > > > > > crash.
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > Not sure what you mean here by timing...
> > > > > > > > > > > > > > > > > > As I understand
> > > > > > > > > > > > > > > > > > rte_eth_dev_data[].name unique
> > > > > > > > identifies
> > > > > > > > > > > > > > > > > > device and is used by  port
> > > > > > > > > > > > > > > > > > allocation/release/find
> > > > > > > > functions.
> > > > > > > > > > > > > > > > > > As you stated above:
> > > > > > > > > > > > > > > > > > "1. The port allocation and port
> > > > > > > > > > > > > > > > > > release synchronization will be managed by
> ethdev."
> > > > > > > > > > > > > > > > > > To me it means that ethdev layer has
> > > > > > > > > > > > > > > > > > to make sure that all accesses to
> > > > > > > > > > > > > > > > > > rte_eth_dev_data[].name are
> > > > > > atomic.
> > > > > > > > > > > > > > > > > > Otherwise what would prevent the
> > > > > > > > > > > > > > > > > > situation when one
> > > > > > > > > > process
> > > > > > > > > > > > > > > > > > does
> > > > > > > > > > > > > > > > > > rte_eth_dev_allocate()-
> > > > > > > > >snprintf(rte_eth_dev_data[x].name,
> > > > > > > > > > > > > > > > > > ...) while second one does
> > > > > > > > > > > > > > > > rte_eth_dev_allocated(rte_eth_dev_data[x].
> > > > > > > > > > > > > > > > name,
> > > ...) ?
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > The second will get True or False and that is it.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Under race condition - in the worst case
> > > > > > > > > > > > > > > > it might crash, though for that you'll
> > > > > > > > > > > > > > > > have to be really
> > > unlucky.
> > > > > > > > > > > > > > > > Though in most cases as you said it would
> > > > > > > > > > > > > > > > just not operate
> > > > > > > > > > correctly.
> > > > > > > > > > > > > > > > I think if we start to protect dev->name
> > > > > > > > > > > > > > > > by lock we need to do it for all instances
> > > > > > > > > > > > > > > > (both read and
> > > write).
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Since under the ownership rules, the user
> > > > > > > > > > > > > > > must take ownership
> > > > > > > > of a
> > > > > > > > > > > > > > > port
> > > > > > > > > > > > > > before using it, I still don't see a problem here.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > I am not talking about owner id or name here.
> > > > > > > > > > > > > > I am talking about dev->name.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > So? The user still should take ownership of a
> > > > > > > > > > > > > device before using it
> > > > > > > > (by
> > > > > > > > > > > > name or by port id).
> > > > > > > > > > > > > It can just read it without owning it, but no managing it.
> > > > > > > > > > > > >
> > > > > > > > > > > > > > > Please, Can you describe specific crash
> > > > > > > > > > > > > > > scenario and explain how could the
> > > > > > > > > > > > > > locking fix it?
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Let say thread 0 doing rte_eth_dev_allocate()-
> > > > > > > > > > > > > > >snprintf(rte_eth_dev_data[x].name, ...),
> > > > > > > > > > > > > > >thread 1 doing
> > > > > > > > > > > > > > rte_pmd_ring_remove()->rte_eth_dev_allocated()
> > > > > > > > > > > > > > -
> > > > > > >strcmp().
> > > > > > > > > > > > > > And because of race condition -
> > > > > > > > > > > > > > rte_eth_dev_allocated() will
> > > > > > > > return
> > > > > > > > > > > > > > rte_eth_dev * for the wrong device.
> > > > > > > > > > > > > Which wrong device do you mean? I guess it is
> > > > > > > > > > > > > the device which
> > > > > > > > > > currently is
> > > > > > > > > > > > being created by thread 0.
> > > > > > > > > > > > > > Then rte_pmd_ring_remove() will call
> > > > > > > > > > > > > > rte_free() for related resources, while It can
> > > > > > > > > > > > > > still be in use by someone
> > > > > else.
> > > > > > > > > > > > > The rte_pmd_ring_remove caller(some DPDK entity)
> > > > > > > > > > > > > must take
> > > > > > > > > > ownership
> > > > > > > > > > > > > (or validate that he is the owner) of a port
> > > > > > > > > > > > > before doing it(free,
> > > > > > > > > > release), so
> > > > > > > > > > > > no issue here.
> > > > > > > > > > > >
> > > > > > > > > > > > Forget about ownership for a second.
> > > > > > > > > > > > Suppose we have a process it created ring port for
> > > > > > > > > > > > itself (without
> > > > > > > > setting
> > > > > > > > > > any
> > > > > > > > > > > > ownership)  and used it for some time.
> > > > > > > > > > > > Then it decided to remove it, so it calls
> > > > > > > > > > > > rte_pmd_ring_remove()
> > > > > > for it.
> > > > > > > > > > > > At the same time second process decides to call
> > > > > > > > rte_eth_dev_allocate()
> > > > > > > > > > (let
> > > > > > > > > > > > say for anither ring port).
> > > > > > > > > > > > They could collide trying to read (process 0) and
> > > > > > > > > > > > modify (process 1)
> > > > > > > > same
> > > > > > > > > > > > string rte_eth_dev_data[].name.
> > > > > > > > > > > >
> > > > > > > > > > > Do you mean that process 0 will compare successfully
> > > > > > > > > > > the process 1
> > > > > > > > new
> > > > > > > > > > port name?
> > > > > > > > > >
> > > > > > > > > > Yes.
> > > > > > > > > >
> > > > > > > > > > > The state are in local process memory - so process 0
> > > > > > > > > > > will not compare
> > > > > > > > the
> > > > > > > > > > process 1 port, from its point of view this port is in
> > > > > > > > > > UNUSED
> > > > > > > > > > > state.
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > Ok, and why it can't be in attached state in process 0 too?
> > > > > > > > >
> > > > > > > > > Someone in process 0 should attach it using protected
> > > > > > > > > attach_secondary
> > > > > > > > somewhere in your scenario.
> > > > > > > >
> > > > > > > > Yes, process 0 can have this port attached too, why not?
> > > > > > > See the function with inline comments:
> > > > > > >
> > > > > > > struct rte_eth_dev *
> > > > > > > rte_eth_dev_allocated(const char *name) {
> > > > > > > 	unsigned i;
> > > > > > >
> > > > > > > 	for (i = 0; i < RTE_MAX_ETHPORTS; i++) {
> > > > > > >
> > > > > > > 	    	The below state are in local process memory,
> > > > > > > 		So, if here process 1 will allocate a new port (the
> > > > > > > current i),
> > > > > > update its local state to ATTACHED and write the name,
> > > > > > > 		the state is not visible by process 0 until someone in
> > > > > > > process
> > > > > > 0 will attach it by rte_eth_dev_attach_secondary.
> > > > > > > 		So, to use rte_eth_dev_attach_secondary process 0
> must
> > > > > > take the lock
> > > > > > > and it can't, because it is currently locked by process 1.
> > > > > >
> > > > > > Ok I see.
> > > > > > Thanks for your patience.
> > > > > > BTW, that means that if let say process 0 will call
> > > > > > rte_eth_dev_allocate("xxx") and process 1 will call
> > > > > > rte_eth_dev_allocate("yyy") we can endup with same port_id be
> > > > > > used for different devices and 2 processes will overwrite the
> > > > > > same
> > > > > rte_eth_dev_data[port_id]?
> > > > >
> > > > > No, contrary to the state, the lock itself is in shared memory,
> > > > > so 2 processes cannot allocate port in the same time.(you can
> > > > > see it in the next patch of this series).
> > >
> > > I am not talking about racing here.
> > > Let say process 0 calls rte_pmd_ring_probe()->....-
> > > >rte_eth_dev_allocate("xxx")
> > > rte_eth_dev_allocate() finds that port N is 'free', i.e.
> > > local rte_eth_devices[N].state == RTE_ETH_DEV_UNUSED so it assigns
> > > new dev ("xxx") to port N.
> > > Then after some time process 1 calls rte_pmd_ring_probe()->....-
> > > >rte_eth_dev_allocate("yyy").
> > > From its perspective port N is still free:  rte_eth_devices[N].state
> > > == RTE_ETH_DEV_UNUSED, so it will assign new dev ("yyy") to the same
> port.
> > >
> >
> > Yes you right, this is a problem(not related actually to port
> > ownership)
> 
> Yep that's true - it was there before your patches.
> 
> > but look:
> > As I understand the secondary processes are not allowed to create a
> > ports and they must to use attach_secondary API, but there is not
> hardcoded check which prevent them to do it.
> 
> Secondary processes ae the ability to allocate their own vdevs and probably it
> should stay like that.
> I just thought it is a good opportunity to fix it while you are on these changes
> anyway, but ok we can leave it for now.
> 
Looks like the fix should break ABI(moving the state to the shared memory), let's try to fix it in the next version :)

> Konstantin

Ananyev, Konstantin Jan. 18, 2018, 2:51 p.m. UTC | #30

> -----Original Message-----
> From: Matan Azrad [mailto:matan@mellanox.com]
> Sent: Thursday, January 18, 2018 2:45 PM
> To: Ananyev, Konstantin <konstantin.ananyev@intel.com>; Thomas Monjalon <thomas@monjalon.net>; Gaetan Rivet
> <gaetan.rivet@6wind.com>; Wu, Jingjing <jingjing.wu@intel.com>
> Cc: dev@dpdk.org; Neil Horman <nhorman@tuxdriver.com>; Richardson, Bruce <bruce.richardson@intel.com>
> Subject: RE: [PATCH v2 2/6] ethdev: add port ownership
> 
> HI
> 
> From: Ananyev, Konstantin, Thursday, January 18, 2018 4:42 PM
> > > Hi Konstantine
> > >
> > > > Hi Matan,
> > > >
> > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > Another thing - you'll
> > > > > > > > > > > > > > > > > > > > > > > probably need to
> > > > > > > > > grab/release
> > > > > > > > > > > > > > > > > > > > > > > a lock inside
> > > > > > > > > > > > > > > > > > > > > > > rte_eth_dev_allocated() too.
> > > > > > > > > > > > > > > > > > > > > > > It is a public function used
> > > > > > > > > > > > > > > > > > > > > > > by drivers, so need to be
> > > > > > > > > > > > > > > > > > > > > > > protected
> > > > > > > > > > > > > > > too.
> > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > Yes, I thought about it, but
> > > > > > > > > > > > > > > > > > > > > > decided not to use lock in
> > > > > > > > > > > next:
> > > > > > > > > > > > > > > > > > > > > > rte_eth_dev_allocated
> > > > > > > > > > > > > > > > > > > > > > rte_eth_dev_count
> > > > > > > > > > > > > > > > > > > > > > rte_eth_dev_get_name_by_port
> > > > > > > > > > > > > > > rte_eth_dev_get_port_by_name
> > > > > > > > > > > > > > > > > > > > > > maybe more...
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > As I can see in patch #3 you
> > > > > > > > > > > > > > > > > > > > > protect by lock access to
> > > > > > > > > > > > > > > > > > > > > rte_eth_dev_data[].name (which
> > > > > > > > > > > > > > > > > > > > > seems like a good
> > > > > > > > > > > thing).
> > > > > > > > > > > > > > > > > > > > > So I think any other public
> > > > > > > > > > > > > > > > > > > > > function that access
> > > > > > > > > > > > > > > > > > > > > rte_eth_dev_data[].name should be
> > > > > > > > > > > > > > > > > > > > > protected by the
> > > > > > > > > > > same
> > > > > > > > > > > > > lock.
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > I don't think so, I can understand
> > > > > > > > > > > > > > > > > > > > to use the ownership lock here(as in
> > > > > > > > > > > > > > > > > > > > port
> > > > > > > > > > > > > > > > > > > creation) but I don't think it is necessary too.
> > > > > > > > > > > > > > > > > > > > What are we exactly protecting here?
> > > > > > > > > > > > > > > > > > > > Don't you think it is just
> > > > > > > > > > > > > > > > > > > > timing?(ask in the next moment and
> > > > > > > > > > > > > > > > > > > > you may get another
> > > > > > > > > > > > > > > > > > > > answer) I don't see optional
> > > > > > > > > crash.
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > Not sure what you mean here by timing...
> > > > > > > > > > > > > > > > > > > As I understand
> > > > > > > > > > > > > > > > > > > rte_eth_dev_data[].name unique
> > > > > > > > > identifies
> > > > > > > > > > > > > > > > > > > device and is used by  port
> > > > > > > > > > > > > > > > > > > allocation/release/find
> > > > > > > > > functions.
> > > > > > > > > > > > > > > > > > > As you stated above:
> > > > > > > > > > > > > > > > > > > "1. The port allocation and port
> > > > > > > > > > > > > > > > > > > release synchronization will be managed by
> > ethdev."
> > > > > > > > > > > > > > > > > > > To me it means that ethdev layer has
> > > > > > > > > > > > > > > > > > > to make sure that all accesses to
> > > > > > > > > > > > > > > > > > > rte_eth_dev_data[].name are
> > > > > > > atomic.
> > > > > > > > > > > > > > > > > > > Otherwise what would prevent the
> > > > > > > > > > > > > > > > > > > situation when one
> > > > > > > > > > > process
> > > > > > > > > > > > > > > > > > > does
> > > > > > > > > > > > > > > > > > > rte_eth_dev_allocate()-
> > > > > > > > > >snprintf(rte_eth_dev_data[x].name,
> > > > > > > > > > > > > > > > > > > ...) while second one does
> > > > > > > > > > > > > > > > > rte_eth_dev_allocated(rte_eth_dev_data[x].
> > > > > > > > > > > > > > > > > name,
> > > > ...) ?
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > The second will get True or False and that is it.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Under race condition - in the worst case
> > > > > > > > > > > > > > > > > it might crash, though for that you'll
> > > > > > > > > > > > > > > > > have to be really
> > > > unlucky.
> > > > > > > > > > > > > > > > > Though in most cases as you said it would
> > > > > > > > > > > > > > > > > just not operate
> > > > > > > > > > > correctly.
> > > > > > > > > > > > > > > > > I think if we start to protect dev->name
> > > > > > > > > > > > > > > > > by lock we need to do it for all instances
> > > > > > > > > > > > > > > > > (both read and
> > > > write).
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Since under the ownership rules, the user
> > > > > > > > > > > > > > > > must take ownership
> > > > > > > > > of a
> > > > > > > > > > > > > > > > port
> > > > > > > > > > > > > > > before using it, I still don't see a problem here.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > I am not talking about owner id or name here.
> > > > > > > > > > > > > > > I am talking about dev->name.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > So? The user still should take ownership of a
> > > > > > > > > > > > > > device before using it
> > > > > > > > > (by
> > > > > > > > > > > > > name or by port id).
> > > > > > > > > > > > > > It can just read it without owning it, but no managing it.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Please, Can you describe specific crash
> > > > > > > > > > > > > > > > scenario and explain how could the
> > > > > > > > > > > > > > > locking fix it?
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Let say thread 0 doing rte_eth_dev_allocate()-
> > > > > > > > > > > > > > > >snprintf(rte_eth_dev_data[x].name, ...),
> > > > > > > > > > > > > > > >thread 1 doing
> > > > > > > > > > > > > > > rte_pmd_ring_remove()->rte_eth_dev_allocated()
> > > > > > > > > > > > > > > -
> > > > > > > >strcmp().
> > > > > > > > > > > > > > > And because of race condition -
> > > > > > > > > > > > > > > rte_eth_dev_allocated() will
> > > > > > > > > return
> > > > > > > > > > > > > > > rte_eth_dev * for the wrong device.
> > > > > > > > > > > > > > Which wrong device do you mean? I guess it is
> > > > > > > > > > > > > > the device which
> > > > > > > > > > > currently is
> > > > > > > > > > > > > being created by thread 0.
> > > > > > > > > > > > > > > Then rte_pmd_ring_remove() will call
> > > > > > > > > > > > > > > rte_free() for related resources, while It can
> > > > > > > > > > > > > > > still be in use by someone
> > > > > > else.
> > > > > > > > > > > > > > The rte_pmd_ring_remove caller(some DPDK entity)
> > > > > > > > > > > > > > must take
> > > > > > > > > > > ownership
> > > > > > > > > > > > > > (or validate that he is the owner) of a port
> > > > > > > > > > > > > > before doing it(free,
> > > > > > > > > > > release), so
> > > > > > > > > > > > > no issue here.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Forget about ownership for a second.
> > > > > > > > > > > > > Suppose we have a process it created ring port for
> > > > > > > > > > > > > itself (without
> > > > > > > > > setting
> > > > > > > > > > > any
> > > > > > > > > > > > > ownership)  and used it for some time.
> > > > > > > > > > > > > Then it decided to remove it, so it calls
> > > > > > > > > > > > > rte_pmd_ring_remove()
> > > > > > > for it.
> > > > > > > > > > > > > At the same time second process decides to call
> > > > > > > > > rte_eth_dev_allocate()
> > > > > > > > > > > (let
> > > > > > > > > > > > > say for anither ring port).
> > > > > > > > > > > > > They could collide trying to read (process 0) and
> > > > > > > > > > > > > modify (process 1)
> > > > > > > > > same
> > > > > > > > > > > > > string rte_eth_dev_data[].name.
> > > > > > > > > > > > >
> > > > > > > > > > > > Do you mean that process 0 will compare successfully
> > > > > > > > > > > > the process 1
> > > > > > > > > new
> > > > > > > > > > > port name?
> > > > > > > > > > >
> > > > > > > > > > > Yes.
> > > > > > > > > > >
> > > > > > > > > > > > The state are in local process memory - so process 0
> > > > > > > > > > > > will not compare
> > > > > > > > > the
> > > > > > > > > > > process 1 port, from its point of view this port is in
> > > > > > > > > > > UNUSED
> > > > > > > > > > > > state.
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > Ok, and why it can't be in attached state in process 0 too?
> > > > > > > > > >
> > > > > > > > > > Someone in process 0 should attach it using protected
> > > > > > > > > > attach_secondary
> > > > > > > > > somewhere in your scenario.
> > > > > > > > >
> > > > > > > > > Yes, process 0 can have this port attached too, why not?
> > > > > > > > See the function with inline comments:
> > > > > > > >
> > > > > > > > struct rte_eth_dev *
> > > > > > > > rte_eth_dev_allocated(const char *name) {
> > > > > > > > 	unsigned i;
> > > > > > > >
> > > > > > > > 	for (i = 0; i < RTE_MAX_ETHPORTS; i++) {
> > > > > > > >
> > > > > > > > 	    	The below state are in local process memory,
> > > > > > > > 		So, if here process 1 will allocate a new port (the
> > > > > > > > current i),
> > > > > > > update its local state to ATTACHED and write the name,
> > > > > > > > 		the state is not visible by process 0 until someone in
> > > > > > > > process
> > > > > > > 0 will attach it by rte_eth_dev_attach_secondary.
> > > > > > > > 		So, to use rte_eth_dev_attach_secondary process 0
> > must
> > > > > > > take the lock
> > > > > > > > and it can't, because it is currently locked by process 1.
> > > > > > >
> > > > > > > Ok I see.
> > > > > > > Thanks for your patience.
> > > > > > > BTW, that means that if let say process 0 will call
> > > > > > > rte_eth_dev_allocate("xxx") and process 1 will call
> > > > > > > rte_eth_dev_allocate("yyy") we can endup with same port_id be
> > > > > > > used for different devices and 2 processes will overwrite the
> > > > > > > same
> > > > > > rte_eth_dev_data[port_id]?
> > > > > >
> > > > > > No, contrary to the state, the lock itself is in shared memory,
> > > > > > so 2 processes cannot allocate port in the same time.(you can
> > > > > > see it in the next patch of this series).
> > > >
> > > > I am not talking about racing here.
> > > > Let say process 0 calls rte_pmd_ring_probe()->....-
> > > > >rte_eth_dev_allocate("xxx")
> > > > rte_eth_dev_allocate() finds that port N is 'free', i.e.
> > > > local rte_eth_devices[N].state == RTE_ETH_DEV_UNUSED so it assigns
> > > > new dev ("xxx") to port N.
> > > > Then after some time process 1 calls rte_pmd_ring_probe()->....-
> > > > >rte_eth_dev_allocate("yyy").
> > > > From its perspective port N is still free:  rte_eth_devices[N].state
> > > > == RTE_ETH_DEV_UNUSED, so it will assign new dev ("yyy") to the same
> > port.
> > > >
> > >
> > > Yes you right, this is a problem(not related actually to port
> > > ownership)
> >
> > Yep that's true - it was there before your patches.
> >
> > > but look:
> > > As I understand the secondary processes are not allowed to create a
> > > ports and they must to use attach_secondary API, but there is not
> > hardcoded check which prevent them to do it.
> >
> > Secondary processes ae the ability to allocate their own vdevs and probably it
> > should stay like that.
> > I just thought it is a good opportunity to fix it while you are on these changes
> > anyway, but ok we can leave it for now.
> >
> Looks like the fix should break ABI(moving the state to the shared memory), let's try to fix it in the next version :)

Not necessarily - I think we can just  add a check inside te_eth_dev_find_free_port() that 
rte_eth_dev_data[port_id].name is an empty string.
Konstantin


> 
> > Konstantin

Matan Azrad Jan. 18, 2018, 2:52 p.m. UTC | #31

Hi Neil

From: Neil Horman, Thursday, January 18, 2018 3:21 PM
> On Wed, Jan 17, 2018 at 05:58:07PM +0000, Matan Azrad wrote:
> >
> > Hi Neil
> >
> >  From: Neil Horman, Wednesday, January 17, 2018 4:00 PM
> > > On Wed, Jan 17, 2018 at 12:05:42PM +0000, Matan Azrad wrote:
<snip>
> > > Matan is correct here, there is no way to preform parallel set
> > > operations using just and atomic variable here, because multiple
> > > reads of next_owner_id need to be preformed while it is stable.
> > > That is to say rte_eth_next_owner_id must be compared to
> > > RTE_ETH_DEV_NO_OWNER and owner_id in rte_eth_is_valid_owner_id.
> If
> > > you were to only use an atomic_read on such a variable, it could be
> > > incremented by the owner_new function between the checks and an
> > > invalid owner value could become valid because  a third thread
> > > incremented the next value.  The state of next_owner_id must be kept
> > > stable during any validity checks
> > >
> > > That said, I really have to wonder why ownership ids are really
> > > needed here at all.  It seems this design could be much simpler with
> > > the addition of a per- port lock (and optional ownership record).
> > > The API could consist of three
> > > operations:
> > >
> > > ownership_set
> > > ownership_tryset
> > > ownership_release
> > > ownership_get
> > >
> > >
> > > The first call simply tries to take the per-port lock (blocking if
> > > its already
> > > locked)
> > >
> >
> > Per port lock is not good because the ownership mechanism must to be
> synchronized with the port creation\release.
> > So the port creation and port ownership should use the same lock.
> >
> In what way do you need to synchronize with port creation?

The port release zeroes the data field of the port owner, so it should be synchronized with the ownership APIs.
The port creation should be synchronized with the port release.


>  If a port has not
> yet been created, then by definition the owner must be the thread calling
> the create function.

No, the owner can be any dpdk entity. (an application - multi\single threads\proccesses, a PMD, a library).
So the port allocation(usually done from the port PMD by one thread from one process) just should to allocate a port.


>  If you are concerned about the mechanics of the port
> data structure (i.e. the fact that rte_eth_devices is statically allocated, you
> can add a lock structure to the rte_eth_dev struct and initialize it statically
> with
> RTE_SPINLOCK_INITAIZER()
> 

The lock should be in shared memory to allow secondary processes entities to take owner safely.
 
> > I didn't find precedence for blocking function in ethdev.
> >
> Then perhaps we don't need that api call.  Perhaps ownership_tryset is
> enough.
>

As I already did :)
 
> > > The second call is a non-blocking version of the first
> > >
> > > The third unlocks the port, allowing others to take ownership
> > >
> > > The fourth returns whatever ownership record you want to encode with
> > > the lock.
> > >
> > > The addition of all this id checking seems a bit overcomplicated
> >
> > You miss the identification of the owner - we want to allow info of the
> owner for printing and easy debug.
> > And it is makes sense to manage the owner uniqueness by unique ID.
> >
> I specifically pointed that out above.  There is no reason an owernship record
> couldn't be added to the rte_eth_dev structure.
> 

Sorry, don't understand why.

> > The API already discussed a lot in the previous version, Do you really want,
> now, to open it again?
> >
> What I want is the most useful and elegant ownership API available.  If you
> think what you have is that, so be it.  I only bring this up because the amount
> of debate you and Konstantin have had over lock safety causes me to
> wonder if this isn't an overly complex design.

I think the complex design is in secondary\primary processes, not in the current port ownership.
I think there is some work to do there regardless port ownership.
I think also there is some work in progress for it.

Thanks, a lot.

> 
> Neil
> 
> 
> > > Neil
> >
> >

Matan Azrad Jan. 18, 2018, 3 p.m. UTC | #32

From: Ananyev, Konstantin, Thursday, January 18, 2018 4:52 PM
> 
> > -----Original Message-----
> > From: Matan Azrad [mailto:matan@mellanox.com]
> > Sent: Thursday, January 18, 2018 2:45 PM
> > To: Ananyev, Konstantin <konstantin.ananyev@intel.com>; Thomas
> > Monjalon <thomas@monjalon.net>; Gaetan Rivet
> <gaetan.rivet@6wind.com>;
> > Wu, Jingjing <jingjing.wu@intel.com>
> > Cc: dev@dpdk.org; Neil Horman <nhorman@tuxdriver.com>; Richardson,
> > Bruce <bruce.richardson@intel.com>
> > Subject: RE: [PATCH v2 2/6] ethdev: add port ownership
> >
> > HI
> >
> > From: Ananyev, Konstantin, Thursday, January 18, 2018 4:42 PM
> > > > Hi Konstantine
> > > >
> > > > > Hi Matan,
> > > > >
> > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > > Another thing - you'll
> > > > > > > > > > > > > > > > > > > > > > > > probably need to
> > > > > > > > > > grab/release
> > > > > > > > > > > > > > > > > > > > > > > > a lock inside
> > > > > > > > > > > > > > > > > > > > > > > > rte_eth_dev_allocated() too.
> > > > > > > > > > > > > > > > > > > > > > > > It is a public function
> > > > > > > > > > > > > > > > > > > > > > > > used by drivers, so need
> > > > > > > > > > > > > > > > > > > > > > > > to be protected
> > > > > > > > > > > > > > > > too.
> > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > > Yes, I thought about it, but
> > > > > > > > > > > > > > > > > > > > > > > decided not to use lock in
> > > > > > > > > > > > next:
> > > > > > > > > > > > > > > > > > > > > > > rte_eth_dev_allocated
> > > > > > > > > > > > > > > > > > > > > > > rte_eth_dev_count
> > > > > > > > > > > > > > > > > > > > > > > rte_eth_dev_get_name_by_port
> > > > > > > > > > > > > > > > rte_eth_dev_get_port_by_name
> > > > > > > > > > > > > > > > > > > > > > > maybe more...
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > > As I can see in patch #3 you
> > > > > > > > > > > > > > > > > > > > > > protect by lock access to
> > > > > > > > > > > > > > > > > > > > > > rte_eth_dev_data[].name (which
> > > > > > > > > > > > > > > > > > > > > > seems like a good
> > > > > > > > > > > > thing).
> > > > > > > > > > > > > > > > > > > > > > So I think any other public
> > > > > > > > > > > > > > > > > > > > > > function that access
> > > > > > > > > > > > > > > > > > > > > > rte_eth_dev_data[].name should
> > > > > > > > > > > > > > > > > > > > > > be protected by the
> > > > > > > > > > > > same
> > > > > > > > > > > > > > lock.
> > > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > > I don't think so, I can
> > > > > > > > > > > > > > > > > > > > > understand to use the ownership
> > > > > > > > > > > > > > > > > > > > > lock here(as in port
> > > > > > > > > > > > > > > > > > > > creation) but I don't think it is necessary
> too.
> > > > > > > > > > > > > > > > > > > > > What are we exactly protecting here?
> > > > > > > > > > > > > > > > > > > > > Don't you think it is just
> > > > > > > > > > > > > > > > > > > > > timing?(ask in the next moment
> > > > > > > > > > > > > > > > > > > > > and you may get another
> > > > > > > > > > > > > > > > > > > > > answer) I don't see optional
> > > > > > > > > > crash.
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > > Not sure what you mean here by timing...
> > > > > > > > > > > > > > > > > > > > As I understand
> > > > > > > > > > > > > > > > > > > > rte_eth_dev_data[].name unique
> > > > > > > > > > identifies
> > > > > > > > > > > > > > > > > > > > device and is used by  port
> > > > > > > > > > > > > > > > > > > > allocation/release/find
> > > > > > > > > > functions.
> > > > > > > > > > > > > > > > > > > > As you stated above:
> > > > > > > > > > > > > > > > > > > > "1. The port allocation and port
> > > > > > > > > > > > > > > > > > > > release synchronization will be
> > > > > > > > > > > > > > > > > > > > managed by
> > > ethdev."
> > > > > > > > > > > > > > > > > > > > To me it means that ethdev layer
> > > > > > > > > > > > > > > > > > > > has to make sure that all accesses
> > > > > > > > > > > > > > > > > > > > to rte_eth_dev_data[].name are
> > > > > > > > atomic.
> > > > > > > > > > > > > > > > > > > > Otherwise what would prevent the
> > > > > > > > > > > > > > > > > > > > situation when one
> > > > > > > > > > > > process
> > > > > > > > > > > > > > > > > > > > does
> > > > > > > > > > > > > > > > > > > > rte_eth_dev_allocate()-
> > > > > > > > > > >snprintf(rte_eth_dev_data[x].name,
> > > > > > > > > > > > > > > > > > > > ...) while second one does
> > > > > > > > > > > > > > > > > > rte_eth_dev_allocated(rte_eth_dev_data[x].
> > > > > > > > > > > > > > > > > > name,
> > > > > ...) ?
> > > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > The second will get True or False and that is
> it.
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > Under race condition - in the worst
> > > > > > > > > > > > > > > > > > case it might crash, though for that
> > > > > > > > > > > > > > > > > > you'll have to be really
> > > > > unlucky.
> > > > > > > > > > > > > > > > > > Though in most cases as you said it
> > > > > > > > > > > > > > > > > > would just not operate
> > > > > > > > > > > > correctly.
> > > > > > > > > > > > > > > > > > I think if we start to protect
> > > > > > > > > > > > > > > > > > dev->name by lock we need to do it for
> > > > > > > > > > > > > > > > > > all instances (both read and
> > > > > write).
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Since under the ownership rules, the
> > > > > > > > > > > > > > > > > user must take ownership
> > > > > > > > > > of a
> > > > > > > > > > > > > > > > > port
> > > > > > > > > > > > > > > > before using it, I still don't see a problem here.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > I am not talking about owner id or name here.
> > > > > > > > > > > > > > > > I am talking about dev->name.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > So? The user still should take ownership of
> > > > > > > > > > > > > > > a device before using it
> > > > > > > > > > (by
> > > > > > > > > > > > > > name or by port id).
> > > > > > > > > > > > > > > It can just read it without owning it, but no
> managing it.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Please, Can you describe specific crash
> > > > > > > > > > > > > > > > > scenario and explain how could the
> > > > > > > > > > > > > > > > locking fix it?
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Let say thread 0 doing
> > > > > > > > > > > > > > > > rte_eth_dev_allocate()-
> > > > > > > > > > > > > > > > >snprintf(rte_eth_dev_data[x].name, ...),
> > > > > > > > > > > > > > > > >thread 1 doing
> > > > > > > > > > > > > > > > rte_pmd_ring_remove()->rte_eth_dev_allocat
> > > > > > > > > > > > > > > > ed()
> > > > > > > > > > > > > > > > -
> > > > > > > > >strcmp().
> > > > > > > > > > > > > > > > And because of race condition -
> > > > > > > > > > > > > > > > rte_eth_dev_allocated() will
> > > > > > > > > > return
> > > > > > > > > > > > > > > > rte_eth_dev * for the wrong device.
> > > > > > > > > > > > > > > Which wrong device do you mean? I guess it
> > > > > > > > > > > > > > > is the device which
> > > > > > > > > > > > currently is
> > > > > > > > > > > > > > being created by thread 0.
> > > > > > > > > > > > > > > > Then rte_pmd_ring_remove() will call
> > > > > > > > > > > > > > > > rte_free() for related resources, while It
> > > > > > > > > > > > > > > > can still be in use by someone
> > > > > > > else.
> > > > > > > > > > > > > > > The rte_pmd_ring_remove caller(some DPDK
> > > > > > > > > > > > > > > entity) must take
> > > > > > > > > > > > ownership
> > > > > > > > > > > > > > > (or validate that he is the owner) of a port
> > > > > > > > > > > > > > > before doing it(free,
> > > > > > > > > > > > release), so
> > > > > > > > > > > > > > no issue here.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Forget about ownership for a second.
> > > > > > > > > > > > > > Suppose we have a process it created ring port
> > > > > > > > > > > > > > for itself (without
> > > > > > > > > > setting
> > > > > > > > > > > > any
> > > > > > > > > > > > > > ownership)  and used it for some time.
> > > > > > > > > > > > > > Then it decided to remove it, so it calls
> > > > > > > > > > > > > > rte_pmd_ring_remove()
> > > > > > > > for it.
> > > > > > > > > > > > > > At the same time second process decides to
> > > > > > > > > > > > > > call
> > > > > > > > > > rte_eth_dev_allocate()
> > > > > > > > > > > > (let
> > > > > > > > > > > > > > say for anither ring port).
> > > > > > > > > > > > > > They could collide trying to read (process 0)
> > > > > > > > > > > > > > and modify (process 1)
> > > > > > > > > > same
> > > > > > > > > > > > > > string rte_eth_dev_data[].name.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > Do you mean that process 0 will compare
> > > > > > > > > > > > > successfully the process 1
> > > > > > > > > > new
> > > > > > > > > > > > port name?
> > > > > > > > > > > >
> > > > > > > > > > > > Yes.
> > > > > > > > > > > >
> > > > > > > > > > > > > The state are in local process memory - so
> > > > > > > > > > > > > process 0 will not compare
> > > > > > > > > > the
> > > > > > > > > > > > process 1 port, from its point of view this port
> > > > > > > > > > > > is in UNUSED
> > > > > > > > > > > > > state.
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > Ok, and why it can't be in attached state in process 0 too?
> > > > > > > > > > >
> > > > > > > > > > > Someone in process 0 should attach it using
> > > > > > > > > > > protected attach_secondary
> > > > > > > > > > somewhere in your scenario.
> > > > > > > > > >
> > > > > > > > > > Yes, process 0 can have this port attached too, why not?
> > > > > > > > > See the function with inline comments:
> > > > > > > > >
> > > > > > > > > struct rte_eth_dev *
> > > > > > > > > rte_eth_dev_allocated(const char *name) {
> > > > > > > > > 	unsigned i;
> > > > > > > > >
> > > > > > > > > 	for (i = 0; i < RTE_MAX_ETHPORTS; i++) {
> > > > > > > > >
> > > > > > > > > 	    	The below state are in local process memory,
> > > > > > > > > 		So, if here process 1 will allocate a new port (the
> > > > > > > > > current i),
> > > > > > > > update its local state to ATTACHED and write the name,
> > > > > > > > > 		the state is not visible by process 0 until someone in
> > > > > > > > > process
> > > > > > > > 0 will attach it by rte_eth_dev_attach_secondary.
> > > > > > > > > 		So, to use rte_eth_dev_attach_secondary process 0
> > > must
> > > > > > > > take the lock
> > > > > > > > > and it can't, because it is currently locked by process 1.
> > > > > > > >
> > > > > > > > Ok I see.
> > > > > > > > Thanks for your patience.
> > > > > > > > BTW, that means that if let say process 0 will call
> > > > > > > > rte_eth_dev_allocate("xxx") and process 1 will call
> > > > > > > > rte_eth_dev_allocate("yyy") we can endup with same port_id
> > > > > > > > be used for different devices and 2 processes will
> > > > > > > > overwrite the same
> > > > > > > rte_eth_dev_data[port_id]?
> > > > > > >
> > > > > > > No, contrary to the state, the lock itself is in shared
> > > > > > > memory, so 2 processes cannot allocate port in the same
> > > > > > > time.(you can see it in the next patch of this series).
> > > > >
> > > > > I am not talking about racing here.
> > > > > Let say process 0 calls rte_pmd_ring_probe()->....-
> > > > > >rte_eth_dev_allocate("xxx")
> > > > > rte_eth_dev_allocate() finds that port N is 'free', i.e.
> > > > > local rte_eth_devices[N].state == RTE_ETH_DEV_UNUSED so it
> > > > > assigns new dev ("xxx") to port N.
> > > > > Then after some time process 1 calls rte_pmd_ring_probe()->....-
> > > > > >rte_eth_dev_allocate("yyy").
> > > > > From its perspective port N is still free:
> > > > > rte_eth_devices[N].state == RTE_ETH_DEV_UNUSED, so it will
> > > > > assign new dev ("yyy") to the same
> > > port.
> > > > >
> > > >
> > > > Yes you right, this is a problem(not related actually to port
> > > > ownership)
> > >
> > > Yep that's true - it was there before your patches.
> > >
> > > > but look:
> > > > As I understand the secondary processes are not allowed to create
> > > > a ports and they must to use attach_secondary API, but there is
> > > > not
> > > hardcoded check which prevent them to do it.
> > >
> > > Secondary processes ae the ability to allocate their own vdevs and
> > > probably it should stay like that.
> > > I just thought it is a good opportunity to fix it while you are on
> > > these changes anyway, but ok we can leave it for now.
> > >
> > Looks like the fix should break ABI(moving the state to the shared
> > memory), let's try to fix it in the next version :)
> 
> Not necessarily - I think we can just  add a check inside
> te_eth_dev_find_free_port() that rte_eth_dev_data[port_id].name is an
> empty string.

Good idea, I will add it (actually the first patch in this series allows it).

Thanks.


> Konstantin
> 
> 
> >
> > > Konstantin

Neil Horman Jan. 18, 2018, 4:27 p.m. UTC | #33

On Wed, Jan 17, 2018 at 05:01:10PM +0000, Ananyev, Konstantin wrote:
> 
> 
> > -----Original Message-----
> > From: Neil Horman [mailto:nhorman@tuxdriver.com]
> > Sent: Wednesday, January 17, 2018 2:00 PM
> > To: Matan Azrad <matan@mellanox.com>
> > Cc: Ananyev, Konstantin <konstantin.ananyev@intel.com>; Thomas Monjalon <thomas@monjalon.net>; Gaetan Rivet
> > <gaetan.rivet@6wind.com>; Wu, Jingjing <jingjing.wu@intel.com>; dev@dpdk.org; Richardson, Bruce <bruce.richardson@intel.com>
> > Subject: Re: [PATCH v2 2/6] ethdev: add port ownership
> > 
> > On Wed, Jan 17, 2018 at 12:05:42PM +0000, Matan Azrad wrote:
> > >
> > > Hi Konstantin
> > > From: Ananyev, Konstantin, Sent: Wednesday, January 17, 2018 1:24 PM
> > > > Hi Matan,
> > > >
> > > > > Hi Konstantin
> > > > >
> > > > > From: Ananyev, Konstantin, Tuesday, January 16, 2018 9:11 PM
> > > > > > Hi Matan,
> > > > > >
> > > > > > >
> > > > > > > Hi Konstantin
> > > > > > > From: Ananyev, Konstantin, Monday, January 15, 2018 8:44 PM
> > > > > > > > Hi Matan,
> > > > > > > > > Hi Konstantin
> > > > > > > > > From: Ananyev, Konstantin, Monday, January 15, 2018 1:45 PM
> > > > > > > > > > Hi Matan,
> > > > > > > > > > > Hi Konstantin
> > > > > > > > > > > From: Ananyev, Konstantin, Friday, January 12, 2018 2:02
> > > > > > > > > > > AM
> > > > > > > > > > > > Hi Matan,
> > > > > > > > > > > > > Hi Konstantin
> > > > > > > > > > > > > From: Ananyev, Konstantin, Thursday, January 11, 2018
> > > > > > > > > > > > > 2:40 PM
> > > > > > > > > > > > > > Hi Matan,
> > > > > > > > > > > > > > > Hi Konstantin
> > > > > > > > > > > > > > > From: Ananyev, Konstantin, Wednesday, January 10,
> > > > > > > > > > > > > > > 2018
> > > > > > > > > > > > > > > 3:36 PM
> > > > > > > > > > > > > > > > Hi Matan,
> > > > > > >  <snip>
> > > > > > > > > > > > > > > > It is good to see that now scanning/updating
> > > > > > > > > > > > > > > > rte_eth_dev_data[] is lock protected, but it
> > > > > > > > > > > > > > > > might be not very plausible to protect both
> > > > > > > > > > > > > > > > data[] and next_owner_id using the
> > > > > > > > > > same lock.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > I guess you mean to the owner structure in
> > > > > > > > > > rte_eth_dev_data[port_id].
> > > > > > > > > > > > > > > The next_owner_id is read by ownership APIs(for
> > > > > > > > > > > > > > > owner validation), so it
> > > > > > > > > > > > > > makes sense to use the same lock.
> > > > > > > > > > > > > > > Actually, why not?
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Well to me next_owner_id and rte_eth_dev_data[] are
> > > > > > > > > > > > > > not directly
> > > > > > > > > > > > related.
> > > > > > > > > > > > > > You may create new owner_id but it doesn't mean you
> > > > > > > > > > > > > > would update rte_eth_dev_data[] immediately.
> > > > > > > > > > > > > > And visa-versa - you might just want to update
> > > > > > > > > > > > > > rte_eth_dev_data[].name or .owner_id.
> > > > > > > > > > > > > > It is not very good coding practice to use same lock
> > > > > > > > > > > > > > for non-related data structures.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > I see the relation like next:
> > > > > > > > > > > > > Since the ownership mechanism synchronization is in
> > > > > > > > > > > > > ethdev responsibility, we must protect against user
> > > > > > > > > > > > > mistakes as much as we can by
> > > > > > > > > > > > using the same lock.
> > > > > > > > > > > > > So, if user try to set by invalid owner (exactly the
> > > > > > > > > > > > > ID which currently is
> > > > > > > > > > > > allocated) we can protect on it.
> > > > > > > > > > > >
> > > > > > > > > > > > Hmm, not sure why you can't do same checking with
> > > > > > > > > > > > different lock or atomic variable?
> > > > > > > > > > > >
> > > > > > > > > > > The set ownership API is protected by ownership lock and
> > > > > > > > > > > checks the owner ID validity By reading the next owner ID.
> > > > > > > > > > > So, the owner ID allocation and set API should use the
> > > > > > > > > > > same atomic
> > > > > > > > > > mechanism.
> > > > > > > > > >
> > > > > > > > > > Sure but all you are doing for checking validity, is  check
> > > > > > > > > > that owner_id > 0 &&& owner_id < next_ownwe_id, right?
> > > > > > > > > > As you don't allow owner_id overlap (16/3248 bits) you can
> > > > > > > > > > safely do same check with just atomic_get(&next_owner_id).
> > > > > > > > > >
> > > > > > > > > It will not protect it, scenario:
> > > > > > > > > - current next_id is X.
> > > > > > > > > - call set ownership of port A with owner id X by thread 0(by
> > > > > > > > > user
> > > > > > mistake).
> > > > > > > > > - context switch
> > > > > > > > > - allocate new id by thread 1 and get X and change next_id to
> > > > > > > > > X+1
> > > > > > > > atomically.
> > > > > > > > > -  context switch
> > > > > > > > > - Thread 0 validate X by atomic_read and succeed to take
> > > > ownership.
> > > > > > > > > - The system loosed the port(or will be managed by two
> > > > > > > > > entities) -
> > > > > > crash.
> > > > > > > >
> > > > > > > >
> > > > > > > > Ok, and how using lock will protect you with such scenario?
> > > > > > >
> > > > > > > The owner set API validation by thread 0 should fail because the
> > > > > > > owner
> > > > > > validation is included in the protected section.
> > > > > >
> > > > > > Then your validation function would fail even if you'll use atomic
> > > > > > ops instead of lock.
> > > > > No.
> > > > > With atomic this specific scenario will cause the validation to pass.
> > > >
> > > > Can you explain to me how?
> > > >
> > > > rte_eth_is_valid_owner_id(uint16_t owner_id) {
> > > >               int32_t cur_owner_id = RTE_MIN(rte_atomic32_get(next_owner_id),
> > > > UINT16_MAX);
> > > >
> > > > 	if (owner_id == RTE_ETH_DEV_NO_OWNER || owner >
> > > > cur_owner_id) {
> > > > 		RTE_LOG(ERR, EAL, "Invalid owner_id=%d.\n", owner_id);
> > > > 		return 0;
> > > > 	}
> > > > 	return 1;
> > > > }
> > > >
> > > > Let say your next_owne_id==X, and you invoke
> > > > rte_eth_is_valid_owner_id(owner_id=X+1)  - it would fail.
> > >
> > > Explanation:
> > > The scenario with locks:
> > > next_owner_id = X.
> > > Thread 0 call to set API(with invalid owner Y=X) and take lock.
> > > Context switch.
> > > Thread 1 call to owner_new and stuck in the lock.
> > > Context switch.
> > > Thread 0 does owner id validation and failed(Y>=X) - unlock the lock and return failure to the user.
> > > Context switch.
> > > Thread 1 take the lock and update X to X+1, then, unlock the lock.
> > > Everything is OK!
> > >
> > > The same scenario with atomics:
> > > next_owner_id = X.
> > > Thread 0 call to set API(with invalid owner Y=X) and take lock.
> > > Context switch.
> > > Thread 1 call to owner_new and change X to X+1(atomically).
> > > Context switch.
> > > Thread 0 does owner id validation and success(Y<(atomic)X+1) - unlock the lock and return success to the  user.
> > > Problem!
> > >
> > 
> > 
> > Matan is correct here, there is no way to preform parallel set operations using
> > just and atomic variable here, because multiple reads of next_owner_id need to
> > be preformed while it is stable.  That is to say rte_eth_next_owner_id must be
> > compared to RTE_ETH_DEV_NO_OWNER and owner_id in rte_eth_is_valid_owner_id.  If
> > you were to only use an atomic_read on such a variable, it could be incremented
> > by the owner_new function between the checks and an invalid owner value could
> > become valid because  a third thread incremented the next value.  The state of
> > next_owner_id must be kept stable during any validity checks
> 
> It could still be incremented between the checks - if let say different thread will
> invoke new_onwer_id, grab the lock update counter, release the lock - all that
> before the check.
Yes, as I mentioned previously, thats an artifact of this implementation, and
arguably ok, because the state of next is still kept steady during the check
process.  Theres no guarantee that, once you call new, you will be able to take
ownership. The result of the set operation determines that.  If you want to
ensure that you claim ownership on set, then you need to make the allocation of
an owner object atomic with its aquisition of the port, the way my proposed api
below does.

> But ok, there is probably no point to argue on that one any longer -
> let's keep the lock here, nothing will be broken with it for sure.
> 
Agree.

> > 
> > That said, I really have to wonder why ownership ids are really needed here at
> > all.  It seems this design could be much simpler with the addition of a per-port
> > lock (and optional ownership record).  The API could consist of three
> > operations:
> > 
> > ownership_set
> > ownership_tryset
> > ownership_release
> > ownership_get
> > 
> 
> Ok, but how to distinguish who is the current owner of the port?
> To make sure that only owner is allowed to perform control ops?
> Konstantin
> 
As I said above, if you want to have an ownership record, theres no reason you
can't (thats what ownership_get is intended to return to you).  Perhaps a better
api would be an is_owner(owner_record) call, which can atomically compare a
passed in owner record with the current ownership and return true/false if they
match

Neil
> > 
> > The first call simply tries to take the per-port lock (blocking if its already
> > locked)
> > 
> > The second call is a non-blocking version of the first
> > 
> > The third unlocks the port, allowing others to take ownership
> > 
> > The fourth returns whatever ownership record you want to encode with the lock.
> > 
> > The addition of all this id checking seems a bit overcomplicated
> > 
> > Neil
> 
>

Neil Horman Jan. 18, 2018, 4:54 p.m. UTC | #34

On Thu, Jan 18, 2018 at 02:00:23PM +0000, Matan Azrad wrote:
> Hi Neil
> 
> From: Neil Horman, Thursday, January 18, 2018 3:10 PM
> > On Wed, Jan 17, 2018 at 05:01:10PM +0000, Ananyev, Konstantin wrote:
> > >
> > >
> > > > -----Original Message-----
> > > > From: Neil Horman [mailto:nhorman@tuxdriver.com]
> > > > Sent: Wednesday, January 17, 2018 2:00 PM
> > > > To: Matan Azrad <matan@mellanox.com>
> > > > Cc: Ananyev, Konstantin <konstantin.ananyev@intel.com>; Thomas
> > > > Monjalon <thomas@monjalon.net>; Gaetan Rivet
> > > > <gaetan.rivet@6wind.com>; Wu, Jingjing <jingjing.wu@intel.com>;
> > > > dev@dpdk.org; Richardson, Bruce <bruce.richardson@intel.com>
> > > > Subject: Re: [PATCH v2 2/6] ethdev: add port ownership
> > > >
> > > > On Wed, Jan 17, 2018 at 12:05:42PM +0000, Matan Azrad wrote:
> > > > >
> > > > > Hi Konstantin
> > > > > From: Ananyev, Konstantin, Sent: Wednesday, January 17, 2018 1:24
> > > > > PM
> > > > > > Hi Matan,
> > > > > >
> > > > > > > Hi Konstantin
> > > > > > >
> > > > > > > From: Ananyev, Konstantin, Tuesday, January 16, 2018 9:11 PM
> > > > > > > > Hi Matan,
> > > > > > > >
> > > > > > > > >
> > > > > > > > > Hi Konstantin
> > > > > > > > > From: Ananyev, Konstantin, Monday, January 15, 2018 8:44
> > > > > > > > > PM
> > > > > > > > > > Hi Matan,
> > > > > > > > > > > Hi Konstantin
> > > > > > > > > > > From: Ananyev, Konstantin, Monday, January 15, 2018
> > > > > > > > > > > 1:45 PM
> > > > > > > > > > > > Hi Matan,
> > > > > > > > > > > > > Hi Konstantin
> > > > > > > > > > > > > From: Ananyev, Konstantin, Friday, January 12,
> > > > > > > > > > > > > 2018 2:02 AM
> > > > > > > > > > > > > > Hi Matan,
> > > > > > > > > > > > > > > Hi Konstantin
> > > > > > > > > > > > > > > From: Ananyev, Konstantin, Thursday, January
> > > > > > > > > > > > > > > 11, 2018
> > > > > > > > > > > > > > > 2:40 PM
> > > > > > > > > > > > > > > > Hi Matan,
> > > > > > > > > > > > > > > > > Hi Konstantin
> > > > > > > > > > > > > > > > > From: Ananyev, Konstantin, Wednesday,
> > > > > > > > > > > > > > > > > January 10,
> > > > > > > > > > > > > > > > > 2018
> > > > > > > > > > > > > > > > > 3:36 PM
> > > > > > > > > > > > > > > > > > Hi Matan,
> > > > > > > > >  <snip>
> > > > > > > > > > > > > > > > > > It is good to see that now
> > > > > > > > > > > > > > > > > > scanning/updating rte_eth_dev_data[] is
> > > > > > > > > > > > > > > > > > lock protected, but it might be not very
> > > > > > > > > > > > > > > > > > plausible to protect both data[] and
> > > > > > > > > > > > > > > > > > next_owner_id using the
> > > > > > > > > > > > same lock.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > I guess you mean to the owner structure in
> > > > > > > > > > > > rte_eth_dev_data[port_id].
> > > > > > > > > > > > > > > > > The next_owner_id is read by ownership
> > > > > > > > > > > > > > > > > APIs(for owner validation), so it
> > > > > > > > > > > > > > > > makes sense to use the same lock.
> > > > > > > > > > > > > > > > > Actually, why not?
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Well to me next_owner_id and
> > > > > > > > > > > > > > > > rte_eth_dev_data[] are not directly
> > > > > > > > > > > > > > related.
> > > > > > > > > > > > > > > > You may create new owner_id but it doesn't
> > > > > > > > > > > > > > > > mean you would update rte_eth_dev_data[]
> > immediately.
> > > > > > > > > > > > > > > > And visa-versa - you might just want to
> > > > > > > > > > > > > > > > update rte_eth_dev_data[].name or .owner_id.
> > > > > > > > > > > > > > > > It is not very good coding practice to use
> > > > > > > > > > > > > > > > same lock for non-related data structures.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > I see the relation like next:
> > > > > > > > > > > > > > > Since the ownership mechanism synchronization
> > > > > > > > > > > > > > > is in ethdev responsibility, we must protect
> > > > > > > > > > > > > > > against user mistakes as much as we can by
> > > > > > > > > > > > > > using the same lock.
> > > > > > > > > > > > > > > So, if user try to set by invalid owner
> > > > > > > > > > > > > > > (exactly the ID which currently is
> > > > > > > > > > > > > > allocated) we can protect on it.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Hmm, not sure why you can't do same checking
> > > > > > > > > > > > > > with different lock or atomic variable?
> > > > > > > > > > > > > >
> > > > > > > > > > > > > The set ownership API is protected by ownership
> > > > > > > > > > > > > lock and checks the owner ID validity By reading the next
> > owner ID.
> > > > > > > > > > > > > So, the owner ID allocation and set API should use
> > > > > > > > > > > > > the same atomic
> > > > > > > > > > > > mechanism.
> > > > > > > > > > > >
> > > > > > > > > > > > Sure but all you are doing for checking validity, is
> > > > > > > > > > > > check that owner_id > 0 &&& owner_id < next_ownwe_id,
> > right?
> > > > > > > > > > > > As you don't allow owner_id overlap (16/3248 bits)
> > > > > > > > > > > > you can safely do same check with just
> > atomic_get(&next_owner_id).
> > > > > > > > > > > >
> > > > > > > > > > > It will not protect it, scenario:
> > > > > > > > > > > - current next_id is X.
> > > > > > > > > > > - call set ownership of port A with owner id X by
> > > > > > > > > > > thread 0(by user
> > > > > > > > mistake).
> > > > > > > > > > > - context switch
> > > > > > > > > > > - allocate new id by thread 1 and get X and change
> > > > > > > > > > > next_id to
> > > > > > > > > > > X+1
> > > > > > > > > > atomically.
> > > > > > > > > > > -  context switch
> > > > > > > > > > > - Thread 0 validate X by atomic_read and succeed to
> > > > > > > > > > > take
> > > > > > ownership.
> > > > > > > > > > > - The system loosed the port(or will be managed by two
> > > > > > > > > > > entities) -
> > > > > > > > crash.
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > Ok, and how using lock will protect you with such scenario?
> > > > > > > > >
> > > > > > > > > The owner set API validation by thread 0 should fail
> > > > > > > > > because the owner
> > > > > > > > validation is included in the protected section.
> > > > > > > >
> > > > > > > > Then your validation function would fail even if you'll use
> > > > > > > > atomic ops instead of lock.
> > > > > > > No.
> > > > > > > With atomic this specific scenario will cause the validation to pass.
> > > > > >
> > > > > > Can you explain to me how?
> > > > > >
> > > > > > rte_eth_is_valid_owner_id(uint16_t owner_id) {
> > > > > >               int32_t cur_owner_id =
> > > > > > RTE_MIN(rte_atomic32_get(next_owner_id),
> > > > > > UINT16_MAX);
> > > > > >
> > > > > > 	if (owner_id == RTE_ETH_DEV_NO_OWNER || owner >
> > > > > > cur_owner_id) {
> > > > > > 		RTE_LOG(ERR, EAL, "Invalid owner_id=%d.\n", owner_id);
> > > > > > 		return 0;
> > > > > > 	}
> > > > > > 	return 1;
> > > > > > }
> > > > > >
> > > > > > Let say your next_owne_id==X, and you invoke
> > > > > > rte_eth_is_valid_owner_id(owner_id=X+1)  - it would fail.
> > > > >
> > > > > Explanation:
> > > > > The scenario with locks:
> > > > > next_owner_id = X.
> > > > > Thread 0 call to set API(with invalid owner Y=X) and take lock.
> > > > > Context switch.
> > > > > Thread 1 call to owner_new and stuck in the lock.
> > > > > Context switch.
> > > > > Thread 0 does owner id validation and failed(Y>=X) - unlock the lock and
> > return failure to the user.
> > > > > Context switch.
> > > > > Thread 1 take the lock and update X to X+1, then, unlock the lock.
> > > > > Everything is OK!
> > > > >
> > > > > The same scenario with atomics:
> > > > > next_owner_id = X.
> > > > > Thread 0 call to set API(with invalid owner Y=X) and take lock.
> > > > > Context switch.
> > > > > Thread 1 call to owner_new and change X to X+1(atomically).
> > > > > Context switch.
> > > > > Thread 0 does owner id validation and success(Y<(atomic)X+1) - unlock
> > the lock and return success to the  user.
> > > > > Problem!
> > > > >
> > > >
> > > >
> > > > Matan is correct here, there is no way to preform parallel set
> > > > operations using just and atomic variable here, because multiple
> > > > reads of next_owner_id need to be preformed while it is stable.
> > > > That is to say rte_eth_next_owner_id must be compared to
> > > > RTE_ETH_DEV_NO_OWNER and owner_id in rte_eth_is_valid_owner_id.
> > If
> > > > you were to only use an atomic_read on such a variable, it could be
> > > > incremented by the owner_new function between the checks and an
> > > > invalid owner value could become valid because  a third thread
> > > > incremented the next value.  The state of next_owner_id must be kept
> > > > stable during any validity checks
> > >
> > > It could still be incremented between the checks - if let say
> > > different thread will invoke new_onwer_id, grab the lock update
> > > counter, release the lock - all that before the check.
> > I don't see how all of the contents of rte_eth_dev_owner_set is protected
> > under rte_eth_dev_ownership_lock, as is rte_eth_dev_owner_new.
> > Next_owner might increment between another threads calls to owner_new
> > and owner_set, but that will just cause a transition from an ownership id
> > being valid to invalid, and thats ok, as long as there is consistency in the
> > model that enforces a single valid owner at a time (in that case the
> > subsequent caller to owner_new).
> > 
> 
> I'm not sure I fully understand you, but see:
> we can't protect all of the user mistakes(using the wrong owner id).
> But we are doing the maximum for it.
> 
Yeah, my writing was atrocious, apologies.  All I meant to say was that the
locking you have is ok, in that it maintains a steady state for the data being
read during the period its being read.  The fact that a given set operation may
fail because someone else created an ownership record is an artifact of the api,
not a bug in its implementation.  I think we're basically in agreement on the
semantics here, but this goes to my argument about complexity (more below).

> 
> > Though this confusion does underscore my assertion I think that this API is
> > overly complicated
> > 
> 
> I really don't think it is complicated. - just take ownership of a port(by owner id allocation and set APIs) and manage the port as you want. 
> 
But thats not all.  The determination of success or failure in claiming
ownership is largely dependent on the behavior of other threads actions, not a
function of the state of the system at the moment ownership is requested.  That
is to say, if you have N threads, and they all create ownership objects
identified as X, x+1, X+2...X+N, only the thread with id X+N will be able to
claim ownership of any port, because they all will have incremented the shared
nex_id variable.  Determination of ownership by the programmer will have to be
done via debugging, and errors will likely be transient dependent on the order
in which threads execute (subject to scheduling jitter).  

Rather than making ownership success dependent on any data contained within the
ownership record, ownership should be entirely dependent on the state of port
ownership at the time that it was requested.  That is to say, port ownership
should succede if and only if the port is unowned at the time that a given
thread requets ownership.  Any ancilliary data regarding which context owns the
port should be exactly that, ancilliary, and have no impact on weather or not
the port ownership request succedes.

Regards
Neil

> > Neil
> 
>

Matan Azrad Jan. 18, 2018, 5:20 p.m. UTC | #35

Hi Neil

From: Neil Horman, Thursday, January 18, 2018 6:55 PM
> On Thu, Jan 18, 2018 at 02:00:23PM +0000, Matan Azrad wrote:
> > Hi Neil
> >
> > From: Neil Horman, Thursday, January 18, 2018 3:10 PM
> > > On Wed, Jan 17, 2018 at 05:01:10PM +0000, Ananyev, Konstantin wrote:
> > > >
> > > >
> > > > > -----Original Message-----
> > > > > From: Neil Horman [mailto:nhorman@tuxdriver.com]
> > > > > Sent: Wednesday, January 17, 2018 2:00 PM
> > > > > To: Matan Azrad <matan@mellanox.com>
> > > > > Cc: Ananyev, Konstantin <konstantin.ananyev@intel.com>; Thomas
> > > > > Monjalon <thomas@monjalon.net>; Gaetan Rivet
> > > > > <gaetan.rivet@6wind.com>; Wu, Jingjing <jingjing.wu@intel.com>;
> > > > > dev@dpdk.org; Richardson, Bruce <bruce.richardson@intel.com>
> > > > > Subject: Re: [PATCH v2 2/6] ethdev: add port ownership
> > > > >
> > > > > On Wed, Jan 17, 2018 at 12:05:42PM +0000, Matan Azrad wrote:
> > > > > >
> > > > > > Hi Konstantin
> > > > > > From: Ananyev, Konstantin, Sent: Wednesday, January 17, 2018
> > > > > > 1:24 PM
> > > > > > > Hi Matan,
> > > > > > >
> > > > > > > > Hi Konstantin
> > > > > > > >
> > > > > > > > From: Ananyev, Konstantin, Tuesday, January 16, 2018 9:11
> > > > > > > > PM
> > > > > > > > > Hi Matan,
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > Hi Konstantin
> > > > > > > > > > From: Ananyev, Konstantin, Monday, January 15, 2018
> > > > > > > > > > 8:44 PM
> > > > > > > > > > > Hi Matan,
> > > > > > > > > > > > Hi Konstantin
> > > > > > > > > > > > From: Ananyev, Konstantin, Monday, January 15,
> > > > > > > > > > > > 2018
> > > > > > > > > > > > 1:45 PM
> > > > > > > > > > > > > Hi Matan,
> > > > > > > > > > > > > > Hi Konstantin
> > > > > > > > > > > > > > From: Ananyev, Konstantin, Friday, January 12,
> > > > > > > > > > > > > > 2018 2:02 AM
> > > > > > > > > > > > > > > Hi Matan,
> > > > > > > > > > > > > > > > Hi Konstantin
> > > > > > > > > > > > > > > > From: Ananyev, Konstantin, Thursday,
> > > > > > > > > > > > > > > > January 11, 2018
> > > > > > > > > > > > > > > > 2:40 PM
> > > > > > > > > > > > > > > > > Hi Matan,
> > > > > > > > > > > > > > > > > > Hi Konstantin
> > > > > > > > > > > > > > > > > > From: Ananyev, Konstantin, Wednesday,
> > > > > > > > > > > > > > > > > > January 10,
> > > > > > > > > > > > > > > > > > 2018
> > > > > > > > > > > > > > > > > > 3:36 PM
> > > > > > > > > > > > > > > > > > > Hi Matan,
> > > > > > > > > >  <snip>
> > > > > > > > > > > > > > > > > > > It is good to see that now
> > > > > > > > > > > > > > > > > > > scanning/updating rte_eth_dev_data[]
> > > > > > > > > > > > > > > > > > > is lock protected, but it might be
> > > > > > > > > > > > > > > > > > > not very plausible to protect both
> > > > > > > > > > > > > > > > > > > data[] and next_owner_id using the
> > > > > > > > > > > > > same lock.
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > I guess you mean to the owner
> > > > > > > > > > > > > > > > > > structure in
> > > > > > > > > > > > > rte_eth_dev_data[port_id].
> > > > > > > > > > > > > > > > > > The next_owner_id is read by ownership
> > > > > > > > > > > > > > > > > > APIs(for owner validation), so it
> > > > > > > > > > > > > > > > > makes sense to use the same lock.
> > > > > > > > > > > > > > > > > > Actually, why not?
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Well to me next_owner_id and
> > > > > > > > > > > > > > > > > rte_eth_dev_data[] are not directly
> > > > > > > > > > > > > > > related.
> > > > > > > > > > > > > > > > > You may create new owner_id but it
> > > > > > > > > > > > > > > > > doesn't mean you would update
> > > > > > > > > > > > > > > > > rte_eth_dev_data[]
> > > immediately.
> > > > > > > > > > > > > > > > > And visa-versa - you might just want to
> > > > > > > > > > > > > > > > > update rte_eth_dev_data[].name or
> .owner_id.
> > > > > > > > > > > > > > > > > It is not very good coding practice to
> > > > > > > > > > > > > > > > > use same lock for non-related data structures.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > I see the relation like next:
> > > > > > > > > > > > > > > > Since the ownership mechanism
> > > > > > > > > > > > > > > > synchronization is in ethdev
> > > > > > > > > > > > > > > > responsibility, we must protect against
> > > > > > > > > > > > > > > > user mistakes as much as we can by
> > > > > > > > > > > > > > > using the same lock.
> > > > > > > > > > > > > > > > So, if user try to set by invalid owner
> > > > > > > > > > > > > > > > (exactly the ID which currently is
> > > > > > > > > > > > > > > allocated) we can protect on it.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Hmm, not sure why you can't do same checking
> > > > > > > > > > > > > > > with different lock or atomic variable?
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > The set ownership API is protected by
> > > > > > > > > > > > > > ownership lock and checks the owner ID
> > > > > > > > > > > > > > validity By reading the next
> > > owner ID.
> > > > > > > > > > > > > > So, the owner ID allocation and set API should
> > > > > > > > > > > > > > use the same atomic
> > > > > > > > > > > > > mechanism.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Sure but all you are doing for checking
> > > > > > > > > > > > > validity, is check that owner_id > 0 &&&
> > > > > > > > > > > > > owner_id < next_ownwe_id,
> > > right?
> > > > > > > > > > > > > As you don't allow owner_id overlap (16/3248
> > > > > > > > > > > > > bits) you can safely do same check with just
> > > atomic_get(&next_owner_id).
> > > > > > > > > > > > >
> > > > > > > > > > > > It will not protect it, scenario:
> > > > > > > > > > > > - current next_id is X.
> > > > > > > > > > > > - call set ownership of port A with owner id X by
> > > > > > > > > > > > thread 0(by user
> > > > > > > > > mistake).
> > > > > > > > > > > > - context switch
> > > > > > > > > > > > - allocate new id by thread 1 and get X and change
> > > > > > > > > > > > next_id to
> > > > > > > > > > > > X+1
> > > > > > > > > > > atomically.
> > > > > > > > > > > > -  context switch
> > > > > > > > > > > > - Thread 0 validate X by atomic_read and succeed
> > > > > > > > > > > > to take
> > > > > > > ownership.
> > > > > > > > > > > > - The system loosed the port(or will be managed by
> > > > > > > > > > > > two
> > > > > > > > > > > > entities) -
> > > > > > > > > crash.
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > Ok, and how using lock will protect you with such scenario?
> > > > > > > > > >
> > > > > > > > > > The owner set API validation by thread 0 should fail
> > > > > > > > > > because the owner
> > > > > > > > > validation is included in the protected section.
> > > > > > > > >
> > > > > > > > > Then your validation function would fail even if you'll
> > > > > > > > > use atomic ops instead of lock.
> > > > > > > > No.
> > > > > > > > With atomic this specific scenario will cause the validation to
> pass.
> > > > > > >
> > > > > > > Can you explain to me how?
> > > > > > >
> > > > > > > rte_eth_is_valid_owner_id(uint16_t owner_id) {
> > > > > > >               int32_t cur_owner_id =
> > > > > > > RTE_MIN(rte_atomic32_get(next_owner_id),
> > > > > > > UINT16_MAX);
> > > > > > >
> > > > > > > 	if (owner_id == RTE_ETH_DEV_NO_OWNER || owner >
> > > > > > > cur_owner_id) {
> > > > > > > 		RTE_LOG(ERR, EAL, "Invalid owner_id=%d.\n",
> owner_id);
> > > > > > > 		return 0;
> > > > > > > 	}
> > > > > > > 	return 1;
> > > > > > > }
> > > > > > >
> > > > > > > Let say your next_owne_id==X, and you invoke
> > > > > > > rte_eth_is_valid_owner_id(owner_id=X+1)  - it would fail.
> > > > > >
> > > > > > Explanation:
> > > > > > The scenario with locks:
> > > > > > next_owner_id = X.
> > > > > > Thread 0 call to set API(with invalid owner Y=X) and take lock.
> > > > > > Context switch.
> > > > > > Thread 1 call to owner_new and stuck in the lock.
> > > > > > Context switch.
> > > > > > Thread 0 does owner id validation and failed(Y>=X) - unlock
> > > > > > the lock and
> > > return failure to the user.
> > > > > > Context switch.
> > > > > > Thread 1 take the lock and update X to X+1, then, unlock the lock.
> > > > > > Everything is OK!
> > > > > >
> > > > > > The same scenario with atomics:
> > > > > > next_owner_id = X.
> > > > > > Thread 0 call to set API(with invalid owner Y=X) and take lock.
> > > > > > Context switch.
> > > > > > Thread 1 call to owner_new and change X to X+1(atomically).
> > > > > > Context switch.
> > > > > > Thread 0 does owner id validation and success(Y<(atomic)X+1) -
> > > > > > unlock
> > > the lock and return success to the  user.
> > > > > > Problem!
> > > > > >
> > > > >
> > > > >
> > > > > Matan is correct here, there is no way to preform parallel set
> > > > > operations using just and atomic variable here, because multiple
> > > > > reads of next_owner_id need to be preformed while it is stable.
> > > > > That is to say rte_eth_next_owner_id must be compared to
> > > > > RTE_ETH_DEV_NO_OWNER and owner_id in
> rte_eth_is_valid_owner_id.
> > > If
> > > > > you were to only use an atomic_read on such a variable, it could
> > > > > be incremented by the owner_new function between the checks and
> > > > > an invalid owner value could become valid because  a third
> > > > > thread incremented the next value.  The state of next_owner_id
> > > > > must be kept stable during any validity checks
> > > >
> > > > It could still be incremented between the checks - if let say
> > > > different thread will invoke new_onwer_id, grab the lock update
> > > > counter, release the lock - all that before the check.
> > > I don't see how all of the contents of rte_eth_dev_owner_set is
> > > protected under rte_eth_dev_ownership_lock, as is
> rte_eth_dev_owner_new.
> > > Next_owner might increment between another threads calls to
> > > owner_new and owner_set, but that will just cause a transition from
> > > an ownership id being valid to invalid, and thats ok, as long as
> > > there is consistency in the model that enforces a single valid owner
> > > at a time (in that case the subsequent caller to owner_new).
> > >
> >
> > I'm not sure I fully understand you, but see:
> > we can't protect all of the user mistakes(using the wrong owner id).
> > But we are doing the maximum for it.
> >
> Yeah, my writing was atrocious, apologies.  All I meant to say was that the
> locking you have is ok, in that it maintains a steady state for the data being
> read during the period its being read.  The fact that a given set operation may
> fail because someone else created an ownership record is an artifact of the
> api, not a bug in its implementation.  I think we're basically in agreement on
> the semantics here, but this goes to my argument about complexity (more
> below).
> 
> >
> > > Though this confusion does underscore my assertion I think that this
> > > API is overly complicated
> > >
> >
> > I really don't think it is complicated. - just take ownership of a port(by
> owner id allocation and set APIs) and manage the port as you want.
> >
> But thats not all.  The determination of success or failure in claiming
> ownership is largely dependent on the behavior of other threads actions, not
> a function of the state of the system at the moment ownership is requested.
> That is to say, if you have N threads, and they all create ownership objects
> identified as X, x+1, X+2...X+N, only the thread with id X+N will be able to
> claim ownership of any port, because they all will have incremented the
> shared nex_id variable.

Why? Each one will get its owner id according to some order(The critical section is protected by spinlock).

>  Determination of ownership by the programmer will
> have to be done via debugging, and errors will likely be transient dependent
> on the order in which threads execute (subject to scheduling jitter).
> 
Yes.

> Rather than making ownership success dependent on any data contained
> within the ownership record, ownership should be entirely dependent on
> the state of port ownership at the time that it was requested.  That is to say,
> port ownership should succede if and only if the port is unowned at the time
> that a given thread requets ownership.

Yes.

>  Any ancilliary data regarding which
> context owns the port should be exactly that, ancilliary, and have no impact
> on weather or not the port ownership request succedes.
> 

Yes, I understand what you say - there is no deterministic state for ownership set success.
Actually I think it will be very hard to arrive to determination in DPDK regarding port ownership when multi-thread is in the game,
Especially it depend in a lot of DPDK entities implementation..
But the current non-deterministic approach makes good order in the game. 



> Regards
> Neil
> 
> > > Neil
> >
> >

Neil Horman Jan. 18, 2018, 6:41 p.m. UTC | #36

On Thu, Jan 18, 2018 at 05:20:31PM +0000, Matan Azrad wrote:
> Hi Neil
> 
> From: Neil Horman, Thursday, January 18, 2018 6:55 PM
> > On Thu, Jan 18, 2018 at 02:00:23PM +0000, Matan Azrad wrote:
> > > Hi Neil
> > >
> > > From: Neil Horman, Thursday, January 18, 2018 3:10 PM
> > > > On Wed, Jan 17, 2018 at 05:01:10PM +0000, Ananyev, Konstantin wrote:
> > > > >
> > > > >
> > > > > > -----Original Message-----
> > > > > > From: Neil Horman [mailto:nhorman@tuxdriver.com]
> > > > > > Sent: Wednesday, January 17, 2018 2:00 PM
> > > > > > To: Matan Azrad <matan@mellanox.com>
> > > > > > Cc: Ananyev, Konstantin <konstantin.ananyev@intel.com>; Thomas
> > > > > > Monjalon <thomas@monjalon.net>; Gaetan Rivet
> > > > > > <gaetan.rivet@6wind.com>; Wu, Jingjing <jingjing.wu@intel.com>;
> > > > > > dev@dpdk.org; Richardson, Bruce <bruce.richardson@intel.com>
> > > > > > Subject: Re: [PATCH v2 2/6] ethdev: add port ownership
> > > > > >
> > > > > > On Wed, Jan 17, 2018 at 12:05:42PM +0000, Matan Azrad wrote:
> > > > > > >
> > > > > > > Hi Konstantin
> > > > > > > From: Ananyev, Konstantin, Sent: Wednesday, January 17, 2018
> > > > > > > 1:24 PM
> > > > > > > > Hi Matan,
> > > > > > > >
> > > > > > > > > Hi Konstantin
> > > > > > > > >
> > > > > > > > > From: Ananyev, Konstantin, Tuesday, January 16, 2018 9:11
> > > > > > > > > PM
> > > > > > > > > > Hi Matan,
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > Hi Konstantin
> > > > > > > > > > > From: Ananyev, Konstantin, Monday, January 15, 2018
> > > > > > > > > > > 8:44 PM
> > > > > > > > > > > > Hi Matan,
> > > > > > > > > > > > > Hi Konstantin
> > > > > > > > > > > > > From: Ananyev, Konstantin, Monday, January 15,
> > > > > > > > > > > > > 2018
> > > > > > > > > > > > > 1:45 PM
> > > > > > > > > > > > > > Hi Matan,
> > > > > > > > > > > > > > > Hi Konstantin
> > > > > > > > > > > > > > > From: Ananyev, Konstantin, Friday, January 12,
> > > > > > > > > > > > > > > 2018 2:02 AM
> > > > > > > > > > > > > > > > Hi Matan,
> > > > > > > > > > > > > > > > > Hi Konstantin
> > > > > > > > > > > > > > > > > From: Ananyev, Konstantin, Thursday,
> > > > > > > > > > > > > > > > > January 11, 2018
> > > > > > > > > > > > > > > > > 2:40 PM
> > > > > > > > > > > > > > > > > > Hi Matan,
> > > > > > > > > > > > > > > > > > > Hi Konstantin
> > > > > > > > > > > > > > > > > > > From: Ananyev, Konstantin, Wednesday,
> > > > > > > > > > > > > > > > > > > January 10,
> > > > > > > > > > > > > > > > > > > 2018
> > > > > > > > > > > > > > > > > > > 3:36 PM
> > > > > > > > > > > > > > > > > > > > Hi Matan,
> > > > > > > > > > >  <snip>
> > > > > > > > > > > > > > > > > > > > It is good to see that now
> > > > > > > > > > > > > > > > > > > > scanning/updating rte_eth_dev_data[]
> > > > > > > > > > > > > > > > > > > > is lock protected, but it might be
> > > > > > > > > > > > > > > > > > > > not very plausible to protect both
> > > > > > > > > > > > > > > > > > > > data[] and next_owner_id using the
> > > > > > > > > > > > > > same lock.
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > I guess you mean to the owner
> > > > > > > > > > > > > > > > > > > structure in
> > > > > > > > > > > > > > rte_eth_dev_data[port_id].
> > > > > > > > > > > > > > > > > > > The next_owner_id is read by ownership
> > > > > > > > > > > > > > > > > > > APIs(for owner validation), so it
> > > > > > > > > > > > > > > > > > makes sense to use the same lock.
> > > > > > > > > > > > > > > > > > > Actually, why not?
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > Well to me next_owner_id and
> > > > > > > > > > > > > > > > > > rte_eth_dev_data[] are not directly
> > > > > > > > > > > > > > > > related.
> > > > > > > > > > > > > > > > > > You may create new owner_id but it
> > > > > > > > > > > > > > > > > > doesn't mean you would update
> > > > > > > > > > > > > > > > > > rte_eth_dev_data[]
> > > > immediately.
> > > > > > > > > > > > > > > > > > And visa-versa - you might just want to
> > > > > > > > > > > > > > > > > > update rte_eth_dev_data[].name or
> > .owner_id.
> > > > > > > > > > > > > > > > > > It is not very good coding practice to
> > > > > > > > > > > > > > > > > > use same lock for non-related data structures.
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > I see the relation like next:
> > > > > > > > > > > > > > > > > Since the ownership mechanism
> > > > > > > > > > > > > > > > > synchronization is in ethdev
> > > > > > > > > > > > > > > > > responsibility, we must protect against
> > > > > > > > > > > > > > > > > user mistakes as much as we can by
> > > > > > > > > > > > > > > > using the same lock.
> > > > > > > > > > > > > > > > > So, if user try to set by invalid owner
> > > > > > > > > > > > > > > > > (exactly the ID which currently is
> > > > > > > > > > > > > > > > allocated) we can protect on it.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Hmm, not sure why you can't do same checking
> > > > > > > > > > > > > > > > with different lock or atomic variable?
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > The set ownership API is protected by
> > > > > > > > > > > > > > > ownership lock and checks the owner ID
> > > > > > > > > > > > > > > validity By reading the next
> > > > owner ID.
> > > > > > > > > > > > > > > So, the owner ID allocation and set API should
> > > > > > > > > > > > > > > use the same atomic
> > > > > > > > > > > > > > mechanism.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Sure but all you are doing for checking
> > > > > > > > > > > > > > validity, is check that owner_id > 0 &&&
> > > > > > > > > > > > > > owner_id < next_ownwe_id,
> > > > right?
> > > > > > > > > > > > > > As you don't allow owner_id overlap (16/3248
> > > > > > > > > > > > > > bits) you can safely do same check with just
> > > > atomic_get(&next_owner_id).
> > > > > > > > > > > > > >
> > > > > > > > > > > > > It will not protect it, scenario:
> > > > > > > > > > > > > - current next_id is X.
> > > > > > > > > > > > > - call set ownership of port A with owner id X by
> > > > > > > > > > > > > thread 0(by user
> > > > > > > > > > mistake).
> > > > > > > > > > > > > - context switch
> > > > > > > > > > > > > - allocate new id by thread 1 and get X and change
> > > > > > > > > > > > > next_id to
> > > > > > > > > > > > > X+1
> > > > > > > > > > > > atomically.
> > > > > > > > > > > > > -  context switch
> > > > > > > > > > > > > - Thread 0 validate X by atomic_read and succeed
> > > > > > > > > > > > > to take
> > > > > > > > ownership.
> > > > > > > > > > > > > - The system loosed the port(or will be managed by
> > > > > > > > > > > > > two
> > > > > > > > > > > > > entities) -
> > > > > > > > > > crash.
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > Ok, and how using lock will protect you with such scenario?
> > > > > > > > > > >
> > > > > > > > > > > The owner set API validation by thread 0 should fail
> > > > > > > > > > > because the owner
> > > > > > > > > > validation is included in the protected section.
> > > > > > > > > >
> > > > > > > > > > Then your validation function would fail even if you'll
> > > > > > > > > > use atomic ops instead of lock.
> > > > > > > > > No.
> > > > > > > > > With atomic this specific scenario will cause the validation to
> > pass.
> > > > > > > >
> > > > > > > > Can you explain to me how?
> > > > > > > >
> > > > > > > > rte_eth_is_valid_owner_id(uint16_t owner_id) {
> > > > > > > >               int32_t cur_owner_id =
> > > > > > > > RTE_MIN(rte_atomic32_get(next_owner_id),
> > > > > > > > UINT16_MAX);
> > > > > > > >
> > > > > > > > 	if (owner_id == RTE_ETH_DEV_NO_OWNER || owner >
> > > > > > > > cur_owner_id) {
> > > > > > > > 		RTE_LOG(ERR, EAL, "Invalid owner_id=%d.\n",
> > owner_id);
> > > > > > > > 		return 0;
> > > > > > > > 	}
> > > > > > > > 	return 1;
> > > > > > > > }
> > > > > > > >
> > > > > > > > Let say your next_owne_id==X, and you invoke
> > > > > > > > rte_eth_is_valid_owner_id(owner_id=X+1)  - it would fail.
> > > > > > >
> > > > > > > Explanation:
> > > > > > > The scenario with locks:
> > > > > > > next_owner_id = X.
> > > > > > > Thread 0 call to set API(with invalid owner Y=X) and take lock.
> > > > > > > Context switch.
> > > > > > > Thread 1 call to owner_new and stuck in the lock.
> > > > > > > Context switch.
> > > > > > > Thread 0 does owner id validation and failed(Y>=X) - unlock
> > > > > > > the lock and
> > > > return failure to the user.
> > > > > > > Context switch.
> > > > > > > Thread 1 take the lock and update X to X+1, then, unlock the lock.
> > > > > > > Everything is OK!
> > > > > > >
> > > > > > > The same scenario with atomics:
> > > > > > > next_owner_id = X.
> > > > > > > Thread 0 call to set API(with invalid owner Y=X) and take lock.
> > > > > > > Context switch.
> > > > > > > Thread 1 call to owner_new and change X to X+1(atomically).
> > > > > > > Context switch.
> > > > > > > Thread 0 does owner id validation and success(Y<(atomic)X+1) -
> > > > > > > unlock
> > > > the lock and return success to the  user.
> > > > > > > Problem!
> > > > > > >
> > > > > >
> > > > > >
> > > > > > Matan is correct here, there is no way to preform parallel set
> > > > > > operations using just and atomic variable here, because multiple
> > > > > > reads of next_owner_id need to be preformed while it is stable.
> > > > > > That is to say rte_eth_next_owner_id must be compared to
> > > > > > RTE_ETH_DEV_NO_OWNER and owner_id in
> > rte_eth_is_valid_owner_id.
> > > > If
> > > > > > you were to only use an atomic_read on such a variable, it could
> > > > > > be incremented by the owner_new function between the checks and
> > > > > > an invalid owner value could become valid because  a third
> > > > > > thread incremented the next value.  The state of next_owner_id
> > > > > > must be kept stable during any validity checks
> > > > >
> > > > > It could still be incremented between the checks - if let say
> > > > > different thread will invoke new_onwer_id, grab the lock update
> > > > > counter, release the lock - all that before the check.
> > > > I don't see how all of the contents of rte_eth_dev_owner_set is
> > > > protected under rte_eth_dev_ownership_lock, as is
> > rte_eth_dev_owner_new.
> > > > Next_owner might increment between another threads calls to
> > > > owner_new and owner_set, but that will just cause a transition from
> > > > an ownership id being valid to invalid, and thats ok, as long as
> > > > there is consistency in the model that enforces a single valid owner
> > > > at a time (in that case the subsequent caller to owner_new).
> > > >
> > >
> > > I'm not sure I fully understand you, but see:
> > > we can't protect all of the user mistakes(using the wrong owner id).
> > > But we are doing the maximum for it.
> > >
> > Yeah, my writing was atrocious, apologies.  All I meant to say was that the
> > locking you have is ok, in that it maintains a steady state for the data being
> > read during the period its being read.  The fact that a given set operation may
> > fail because someone else created an ownership record is an artifact of the
> > api, not a bug in its implementation.  I think we're basically in agreement on
> > the semantics here, but this goes to my argument about complexity (more
> > below).
> > 
> > >
> > > > Though this confusion does underscore my assertion I think that this
> > > > API is overly complicated
> > > >
> > >
> > > I really don't think it is complicated. - just take ownership of a port(by
> > owner id allocation and set APIs) and manage the port as you want.
> > >
> > But thats not all.  The determination of success or failure in claiming
> > ownership is largely dependent on the behavior of other threads actions, not
> > a function of the state of the system at the moment ownership is requested.
> > That is to say, if you have N threads, and they all create ownership objects
> > identified as X, x+1, X+2...X+N, only the thread with id X+N will be able to
> > claim ownership of any port, because they all will have incremented the
> > shared nex_id variable.
> 
> Why? Each one will get its owner id according to some order(The critical section is protected by spinlock).
> 
Yes, and thats my issue here, the ordering.  Perhaps my issue is one of
perception.  When I consider an ownership library, what I really think about is
mutual exclusion (i.e. guaranteing that only one entity is capable of access to
a resource at any one time).  This semantics of this library don't really
conform to any semantics that you usually see with other mutual exclusion
mechanisms.  That is to say a spinlock or a mutex succedes locking if its prior
state is unlocked.  This library succeeds aqusition of the resource it protects
if and only if allocation of ownership records occurs in a particular order
relative to one another.  That just seems odd to me.  What advantage do these
new semantics have over more traditional established semantics?

 
> >  Determination of ownership by the programmer will
> > have to be done via debugging, and errors will likely be transient dependent
> > on the order in which threads execute (subject to scheduling jitter).
> > 
> Yes.
> 
But why put yourself through that pain?  Traditional semantics are far simpler
to comprehend, with and without a debugger.

> > Rather than making ownership success dependent on any data contained
> > within the ownership record, ownership should be entirely dependent on
> > the state of port ownership at the time that it was requested.  That is to say,
> > port ownership should succede if and only if the port is unowned at the time
> > that a given thread requets ownership.
> 
> Yes.
> 
Soo. We agree?  Then why do your ownership semantics require a check for the
highest allocated owner id?

> >  Any ancilliary data regarding which
> > context owns the port should be exactly that, ancilliary, and have no impact
> > on weather or not the port ownership request succedes.
> > 
> 
> Yes, I understand what you say - there is no deterministic state for ownership set success.
I think we agree here.  To be clear, I'm not saying that aquisition success or
failure should be deterministic in the sense that you should know which thread
can claim ownership, but only that you should be able to determine the success
of failure of ownership aqusition based on data in the locking mechanism, rather
than both data in the lock mechanism and data held by the requesting context.

> Actually I think it will be very hard to arrive to determination in DPDK regarding port ownership when multi-thread is in the game,
> Especially it depend in a lot of DPDK entities implementation..
Why?  A simple spinlock is sufficient for what I'm talking about.  If its locked
you don't get ownership, if it isn't you do.


> But the current non-deterministic approach makes good order in the game. 
Can you explain why the ordering is valuable to me?  Perhaps that would help me
out here, because currently, I don't see how the order is valuable, especially
given that the allocating contexts have no real control over the order in which
those objects are allocated

Neil

> 
> 
> 
> > Regards
> > Neil
> > 
> > > > Neil
> > >
> > >
>

Matan Azrad Jan. 18, 2018, 8:21 p.m. UTC | #37

Hi Neil.

From: Neil Horman, Thursday, January 18, 2018 8:42 PM

<snip>
> > > But thats not all.  The determination of success or failure in
> > > claiming ownership is largely dependent on the behavior of other
> > > threads actions, not a function of the state of the system at the moment
> ownership is requested.
> > > That is to say, if you have N threads, and they all create ownership
> > > objects identified as X, x+1, X+2...X+N, only the thread with id X+N
> > > will be able to claim ownership of any port, because they all will
> > > have incremented the shared nex_id variable.
> >
> > Why? Each one will get its owner id according to some order(The critical
> section is protected by spinlock).
> >
> Yes, and thats my issue here, the ordering.  Perhaps my issue is one of
> perception.  When I consider an ownership library, what I really think about is
> mutual exclusion (i.e. guaranteing that only one entity is capable of access to
> a resource at any one time).  This semantics of this library don't really
> conform to any semantics that you usually see with other mutual exclusion
> mechanisms.  That is to say a spinlock or a mutex succedes locking if its prior
> state is unlocked.  This library succeeds aqusition of the resource it protects if
> and only if allocation of ownership records occurs in a particular order relative
> to one another.  That just seems odd to me.  What advantage do these new
> semantics have over more traditional established semantics?
> 
> 
> > >  Determination of ownership by the programmer will have to be done
> > > via debugging, and errors will likely be transient dependent on the
> > > order in which threads execute (subject to scheduling jitter).
> > >
> > Yes.
> >
> But why put yourself through that pain?  Traditional semantics are far simpler
> to comprehend, with and without a debugger.
> 

Looks like I missed you, sorry:
Please describe next:

1. What exactly do you want to improve?(in details)
2. Which API specifically do you want to change(\ part of code)?
3. What is the missing in current code(you can answer it in V3 I sent if you want) which should be fixed?


<snip> sorry for that, I think it is not relevant continue discussion if we are not fully understand each other. So let's start from the beginning "with good order :)" by answering the above questions.

Neil Horman Jan. 19, 2018, 1:41 a.m. UTC | #38

On Thu, Jan 18, 2018 at 08:21:34PM +0000, Matan Azrad wrote:
> Hi Neil.
> 
> From: Neil Horman, Thursday, January 18, 2018 8:42 PM
> 
> <snip>
> > > > But thats not all.  The determination of success or failure in
> > > > claiming ownership is largely dependent on the behavior of other
> > > > threads actions, not a function of the state of the system at the moment
> > ownership is requested.
> > > > That is to say, if you have N threads, and they all create ownership
> > > > objects identified as X, x+1, X+2...X+N, only the thread with id X+N
> > > > will be able to claim ownership of any port, because they all will
> > > > have incremented the shared nex_id variable.
> > >
> > > Why? Each one will get its owner id according to some order(The critical
> > section is protected by spinlock).
> > >
> > Yes, and thats my issue here, the ordering.  Perhaps my issue is one of
> > perception.  When I consider an ownership library, what I really think about is
> > mutual exclusion (i.e. guaranteing that only one entity is capable of access to
> > a resource at any one time).  This semantics of this library don't really
> > conform to any semantics that you usually see with other mutual exclusion
> > mechanisms.  That is to say a spinlock or a mutex succedes locking if its prior
> > state is unlocked.  This library succeeds aqusition of the resource it protects if
> > and only if allocation of ownership records occurs in a particular order relative
> > to one another.  That just seems odd to me.  What advantage do these new
> > semantics have over more traditional established semantics?
> > 
> > 
> > > >  Determination of ownership by the programmer will have to be done
> > > > via debugging, and errors will likely be transient dependent on the
> > > > order in which threads execute (subject to scheduling jitter).
> > > >
> > > Yes.
> > >
> > But why put yourself through that pain?  Traditional semantics are far simpler
> > to comprehend, with and without a debugger.
> > 
> 
> Looks like I missed you, sorry:
> Please describe next:
> 
> 1. What exactly do you want to improve?(in details)
> 2. Which API specifically do you want to change(\ part of code)?
> 3. What is the missing in current code(you can answer it in V3 I sent if you want) which should be fixed?
> 
> 
> <snip> sorry for that, I think it is not relevant continue discussion if we are not fully understand each other. So let's start from the beginning "with good order :)" by answering the above questions.

Sure, this seems like a reasonable way to level set.  

I mentioned in another thread that perhaps some of my issue here is perception
regarding what is meant by ownership.  When I think of an ownership api I think
primarily of mutual exclusion (that is to say, enforcement of a single execution
context having access to a resource at any given time.  In my mind the simplest
form of ownership is a spinlock or a mutex.  A single execution context either
does or does not hold the resource at any one time.  Those contexts that attempt
to gain excusive access to the resource call an api that (depending on
implementation) either block continued execution of that thread until exclusive
access to the resource can be granted, or returns immediately with a success or
error indicator to let the caller know if access is granted.

If I were to codify this port ownership api in pseudo code it would look
something like this:

struct rte_eth_dev {

	< eth dev bits >
	rte_spinlock_t owner_lock;
	bool locked;
	pid_t owner_pid;
}

bool rte_port_claim_ownership(struct rte_eth_dev *dev)
{
	bool ret = false;

	spin_lock(dev->owner_lock);
	if (dev->locked)
		goto out;
	dev->locked = true;
	dev->owner_pid = getpid();
	ret = true;
out:
	spin_unlock(dev->lock)
	return ret;		
}

bool rte_port_release_ownership(rte_eth_dev *dev)
{

	boot ret = false;
	spin_lock(dev->owner_lock);
	if (!dev->locked)
		goto out;
	if (dev->owner_pid != getpid())
		goto out;
	dev->locked = false;
	dev_owner_pid = 0;
	ret = true;
out:
	spin_unlock(dev->owner_lock)
	return ret;
}

bool rte_port_is_owned_by(struct rte_eth_dev *dev, pid_t pid)
{
	bool ret = false;

	spin_lock(dev->owner_lock);
	if (pid)
		ret = (dev->locked && (pid == dev->owner_pid));
	else
		ret = dev->locked;
	spin_unlock(dev->owner_lock);
	return ret;
}

The idea here is that lock state is isolated from ownership information.  Any
context has the opportunity to lock the resource (in this case the eth port)
despite its ownership object.  

In comparison, your api, which is in may ways simmilar, separates the creation
of ownership objects to a separate api call, and that ownership information
embodies state that is integral to the ability to get exclusive access to the
resource.  I.E. if thread A calls your owner_new call, and then thread B calls
owner_new, thread A will never be able to get access to any port unless it calls
owner_new again.

Does that help clarify my position?

Regards
Neil

}

Matan Azrad Jan. 19, 2018, 7:14 a.m. UTC | #39

Hi Neil
From: Neil Horman, Friday, January 19, 2018 3:41 AM
> On Thu, Jan 18, 2018 at 08:21:34PM +0000, Matan Azrad wrote:
> > Hi Neil.
> >
> > From: Neil Horman, Thursday, January 18, 2018 8:42 PM

<snip>
> > 1. What exactly do you want to improve?(in details) 2. Which API
> > specifically do you want to change(\ part of code)?
> > 3. What is the missing in current code(you can answer it in V3 I sent if you
> want) which should be fixed?
> >
> >
> > <snip> sorry for that, I think it is not relevant continue discussion if we are
> not fully understand each other. So let's start from the beginning "with good
> order :)" by answering the above questions.
> 
> 
> Sure, this seems like a reasonable way to level set.
> 
> I mentioned in another thread that perhaps some of my issue here is
> perception regarding what is meant by ownership.  When I think of an
> ownership api I think primarily of mutual exclusion (that is to say,
> enforcement of a single execution context having access to a resource at any
> given time.  In my mind the simplest form of ownership is a spinlock or a
> mutex.  A single execution context either does or does not hold the resource
> at any one time.  Those contexts that attempt to gain excusive access to the
> resource call an api that (depending on
> implementation) either block continued execution of that thread until
> exclusive access to the resource can be granted, or returns immediately with
> a success or error indicator to let the caller know if access is granted.
> 
> If I were to codify this port ownership api in pseudo code it would look
> something like this:
> 
> struct rte_eth_dev {
> 
> 	< eth dev bits >
> 	rte_spinlock_t owner_lock;
> 	bool locked;
> 	pid_t owner_pid;
> }
> 
> 
> bool rte_port_claim_ownership(struct rte_eth_dev *dev) {
> 	bool ret = false;
> 
> 	spin_lock(dev->owner_lock);
> 	if (dev->locked)
> 		goto out;
> 	dev->locked = true;
> 	dev->owner_pid = getpid();
> 	ret = true;
> out:
> 	spin_unlock(dev->lock)
> 	return ret;
> }
> 
> 
> bool rte_port_release_ownership(rte_eth_dev *dev) {
> 
> 	boot ret = false;
> 	spin_lock(dev->owner_lock);
> 	if (!dev->locked)
> 		goto out;
> 	if (dev->owner_pid != getpid())
> 		goto out;
> 	dev->locked = false;
> 	dev_owner_pid = 0;
> 	ret = true;
> out:
> 	spin_unlock(dev->owner_lock)
> 	return ret;
> }
> 
> bool rte_port_is_owned_by(struct rte_eth_dev *dev, pid_t pid) {
> 	bool ret = false;
> 
> 	spin_lock(dev->owner_lock);
> 	if (pid)
> 		ret = (dev->locked && (pid == dev->owner_pid));
> 	else
> 		ret = dev->locked;
> 	spin_unlock(dev->owner_lock);
> 	return ret;
> }
> 
> The idea here is that lock state is isolated from ownership information.  Any
> context has the opportunity to lock the resource (in this case the eth port)
> despite its ownership object.
> 
> In comparison, your api, which is in may ways simmilar, separates the
> creation of ownership objects to a separate api call, and that ownership
> information embodies state that is integral to the ability to get exclusive
> access to the resource.  I.E. if thread A calls your owner_new call, and then
> thread B calls owner_new, thread A will never be able to get access to any
> port unless it calls owner_new again.
> 
> Does that help clarify my position?

Now I fully understand you, thanks for your patience.

So, you are missing here one of the main ideas of my port ownership intention.
There are options for X>1 different uncoordinated owners running in the same thread.

For example:
1. Think about Testpmd control commands that call to failsafe port devop which call to its sub-devices devops, while tespmd is different owner(controlling failsafe-port) and failsafe is a different owner(controlling all its sub-devices ports), There are both run control commands in the same thread and there are uncoordinated!
 2. Interrupt callbacks that anyone can register to them and all will run by the DPDK host thread. 

So, no any optional  owner becomes an owner, it depends in the specific implementation.

So if some "part of code" wants to manage a port exclusively and wants to take ownership of it to prevent other "part of code" to use this port :
1. Take ownership.
2. It should ask itself: Am I run in different threads\processes? If yes, it should synchronize its port management. 
3. Release ownership in the end.

Remember that may be different "part of code"s running in the same thread\threads\process\processes.

Thanks, Matan.
> 
> Regards
> Neil
> 
> }

Bruce Richardson Jan. 19, 2018, 9:30 a.m. UTC | #40

On Fri, Jan 19, 2018 at 07:14:17AM +0000, Matan Azrad wrote:
> 
> Hi Neil
> From: Neil Horman, Friday, January 19, 2018 3:41 AM
> > On Thu, Jan 18, 2018 at 08:21:34PM +0000, Matan Azrad wrote:
> > > Hi Neil.
> > >
> > > From: Neil Horman, Thursday, January 18, 2018 8:42 PM
> 
> <snip>
> > > 1. What exactly do you want to improve?(in details) 2. Which API
> > > specifically do you want to change(\ part of code)?
> > > 3. What is the missing in current code(you can answer it in V3 I sent if you
> > want) which should be fixed?
> > >
> > >
> > > <snip> sorry for that, I think it is not relevant continue discussion if we are
> > not fully understand each other. So let's start from the beginning "with good
> > order :)" by answering the above questions.
> > 
> > 
> > Sure, this seems like a reasonable way to level set.
> > 
> > I mentioned in another thread that perhaps some of my issue here is
> > perception regarding what is meant by ownership.  When I think of an
> > ownership api I think primarily of mutual exclusion (that is to say,
> > enforcement of a single execution context having access to a resource at any
> > given time.  In my mind the simplest form of ownership is a spinlock or a
> > mutex.  A single execution context either does or does not hold the resource
> > at any one time.  Those contexts that attempt to gain excusive access to the
> > resource call an api that (depending on
> > implementation) either block continued execution of that thread until
> > exclusive access to the resource can be granted, or returns immediately with
> > a success or error indicator to let the caller know if access is granted.
> > 
> > If I were to codify this port ownership api in pseudo code it would look
> > something like this:
> > 
> > struct rte_eth_dev {
> > 
> > 	< eth dev bits >
> > 	rte_spinlock_t owner_lock;
> > 	bool locked;
> > 	pid_t owner_pid;
> > }
> > 
As an aside, if you ensure that both locked (or "owned", I think in this
context) and owner_pid are integer values, you can do away with the lock
and use a compare-and-set to take ownership, by setting both atomically
if unmodified from the originally read values.

> > 
> > bool rte_port_claim_ownership(struct rte_eth_dev *dev) {
> > 	bool ret = false;
> > 
> > 	spin_lock(dev->owner_lock);
> > 	if (dev->locked)
> > 		goto out;
> > 	dev->locked = true;
> > 	dev->owner_pid = getpid();
> > 	ret = true;
> > out:
> > 	spin_unlock(dev->lock)
> > 	return ret;
> > }
> > 
> > 
> > bool rte_port_release_ownership(rte_eth_dev *dev) {
> > 
> > 	boot ret = false;
> > 	spin_lock(dev->owner_lock);
> > 	if (!dev->locked)
> > 		goto out;
> > 	if (dev->owner_pid != getpid())
> > 		goto out;
> > 	dev->locked = false;
> > 	dev_owner_pid = 0;
> > 	ret = true;
> > out:
> > 	spin_unlock(dev->owner_lock)
> > 	return ret;
> > }
> > 
> > bool rte_port_is_owned_by(struct rte_eth_dev *dev, pid_t pid) {
> > 	bool ret = false;
> > 
> > 	spin_lock(dev->owner_lock);
> > 	if (pid)
> > 		ret = (dev->locked && (pid == dev->owner_pid));
> > 	else
> > 		ret = dev->locked;
> > 	spin_unlock(dev->owner_lock);
> > 	return ret;
> > }
> > 
> > The idea here is that lock state is isolated from ownership information.  Any
> > context has the opportunity to lock the resource (in this case the eth port)
> > despite its ownership object.
> > 
> > In comparison, your api, which is in may ways simmilar, separates the
> > creation of ownership objects to a separate api call, and that ownership
> > information embodies state that is integral to the ability to get exclusive
> > access to the resource.  I.E. if thread A calls your owner_new call, and then
> > thread B calls owner_new, thread A will never be able to get access to any
> > port unless it calls owner_new again.
> > 
> > Does that help clarify my position?
This would have been my understanding of what was being looked for too,
from my minimal understanding of the problem. Thanks for putting that
forward on behalf of many of us!

> 
> Now I fully understand you, thanks for your patience.
> 
> So, you are missing here one of the main ideas of my port ownership intention.
> There are options for X>1 different uncoordinated owners running in the same thread.

Thanks Matan for taking time to try and explain how your idea differs,
but I for one am still a little confused. Sorry for the late questions.

Sure, Neil's example above takes the pid or thread id as the owner id
parameter, but there is no reason we can't use the same scheme with
arbitrarily assigned owner ids, so long as they are unique. We can even
have a simple mapping table mapping ids to names of components.
> 
> For example:
> 1. Think about Testpmd control commands that call to failsafe port devop which call to its sub-devices devops, while tespmd is different owner(controlling failsafe-port) and failsafe is a different owner(controlling all its sub-devices ports), There are both run control commands in the same thread and there are uncoordinated!
>  2. Interrupt callbacks that anyone can register to them and all will run by the DPDK host thread.

Can you provide a little more details here: what is the specific issue
or conflict in each of these examples and how does your ownership
proposal fix it, when Neil's simpler approach doesn't?

> 
> So, no any optional  owner becomes an owner, it depends in the specific implementation.
> 
> So if some "part of code" wants to manage a port exclusively and wants to take ownership of it to prevent other "part of code" to use this port :
> 1. Take ownership.
> 2. It should ask itself: Am I run in different threads\processes? If yes, it should synchronize its port management. 
> 3. Release ownership in the end.
> 
> Remember that may be different "part of code"s running in the same thread\threads\process\processes.
> 
> Thanks, Matan.
> > 
> > Regards
> > Neil
> > 
> > }

Matan Azrad Jan. 19, 2018, 10:44 a.m. UTC | #41

Hi Bruce
From: Bruce Richardson, Friday, January 19, 2018 11:30 AM
> On Fri, Jan 19, 2018 at 07:14:17AM +0000, Matan Azrad wrote:
> >
> > Hi Neil
> > From: Neil Horman, Friday, January 19, 2018 3:41 AM
> > > On Thu, Jan 18, 2018 at 08:21:34PM +0000, Matan Azrad wrote:
> > > > Hi Neil.
> > > >
> > > > From: Neil Horman, Thursday, January 18, 2018 8:42 PM
> >
> > <snip>
> > > > 1. What exactly do you want to improve?(in details) 2. Which API
> > > > specifically do you want to change(\ part of code)?
> > > > 3. What is the missing in current code(you can answer it in V3 I
> > > > sent if you
> > > want) which should be fixed?
> > > >
> > > >
> > > > <snip> sorry for that, I think it is not relevant continue
> > > > discussion if we are
> > > not fully understand each other. So let's start from the beginning
> > > "with good order :)" by answering the above questions.
> > >
> > >
> > > Sure, this seems like a reasonable way to level set.
> > >
> > > I mentioned in another thread that perhaps some of my issue here is
> > > perception regarding what is meant by ownership.  When I think of an
> > > ownership api I think primarily of mutual exclusion (that is to say,
> > > enforcement of a single execution context having access to a
> > > resource at any given time.  In my mind the simplest form of
> > > ownership is a spinlock or a mutex.  A single execution context
> > > either does or does not hold the resource at any one time.  Those
> > > contexts that attempt to gain excusive access to the resource call
> > > an api that (depending on
> > > implementation) either block continued execution of that thread
> > > until exclusive access to the resource can be granted, or returns
> > > immediately with a success or error indicator to let the caller know if
> access is granted.
> > >
> > > If I were to codify this port ownership api in pseudo code it would
> > > look something like this:
> > >
> > > struct rte_eth_dev {
> > >
> > > 	< eth dev bits >
> > > 	rte_spinlock_t owner_lock;
> > > 	bool locked;
> > > 	pid_t owner_pid;
> > > }
> > >
> As an aside, if you ensure that both locked (or "owned", I think in this
> context) and owner_pid are integer values, you can do away with the lock
> and use a compare-and-set to take ownership, by setting both atomically if
> unmodified from the originally read values.
> 
> > >
> > > bool rte_port_claim_ownership(struct rte_eth_dev *dev) {
> > > 	bool ret = false;
> > >
> > > 	spin_lock(dev->owner_lock);
> > > 	if (dev->locked)
> > > 		goto out;
> > > 	dev->locked = true;
> > > 	dev->owner_pid = getpid();
> > > 	ret = true;
> > > out:
> > > 	spin_unlock(dev->lock)
> > > 	return ret;
> > > }
> > >
> > >
> > > bool rte_port_release_ownership(rte_eth_dev *dev) {
> > >
> > > 	boot ret = false;
> > > 	spin_lock(dev->owner_lock);
> > > 	if (!dev->locked)
> > > 		goto out;
> > > 	if (dev->owner_pid != getpid())
> > > 		goto out;
> > > 	dev->locked = false;
> > > 	dev_owner_pid = 0;
> > > 	ret = true;
> > > out:
> > > 	spin_unlock(dev->owner_lock)
> > > 	return ret;
> > > }
> > >
> > > bool rte_port_is_owned_by(struct rte_eth_dev *dev, pid_t pid) {
> > > 	bool ret = false;
> > >
> > > 	spin_lock(dev->owner_lock);
> > > 	if (pid)
> > > 		ret = (dev->locked && (pid == dev->owner_pid));
> > > 	else
> > > 		ret = dev->locked;
> > > 	spin_unlock(dev->owner_lock);
> > > 	return ret;
> > > }
> > >
> > > The idea here is that lock state is isolated from ownership
> > > information.  Any context has the opportunity to lock the resource
> > > (in this case the eth port) despite its ownership object.
> > >
> > > In comparison, your api, which is in may ways simmilar, separates
> > > the creation of ownership objects to a separate api call, and that
> > > ownership information embodies state that is integral to the ability
> > > to get exclusive access to the resource.  I.E. if thread A calls
> > > your owner_new call, and then thread B calls owner_new, thread A
> > > will never be able to get access to any port unless it calls owner_new
> again.
> > >
> > > Does that help clarify my position?
> This would have been my understanding of what was being looked for too,
> from my minimal understanding of the problem. Thanks for putting that
> forward on behalf of many of us!
> 
> >
> > Now I fully understand you, thanks for your patience.
> >
> > So, you are missing here one of the main ideas of my port ownership
> intention.
> > There are options for X>1 different uncoordinated owners running in the
> same thread.
> 
> Thanks Matan for taking time to try and explain how your idea differs, but I
> for one am still a little confused. Sorry for the late questions.
> 
> Sure, Neil's example above takes the pid or thread id as the owner id
> parameter, but there is no reason we can't use the same scheme with
> arbitrarily assigned owner ids, so long as they are unique. We can even have
> a simple mapping table mapping ids to names of components.
> >
Sorry, don't understand your point here.
My approach asked to allocate unique ID for "any part of code want to manage\use a port".
What is the problem here and how do you suggest to fix it?

Neil approach (with process iD\ thread id ) is wrong because 2 different owners can run in same thread (as I explained a lot below).

> > For example:
> > 1. Think about Testpmd control commands that call to failsafe port devop
> which call to its sub-devices devops, while tespmd is different
> owner(controlling failsafe-port) and failsafe is a different owner(controlling
> all its sub-devices ports), There are both run control commands in the same
> thread and there are uncoordinated!
> >  2. Interrupt callbacks that anyone can register to them and all will run by
> the DPDK host thread.
> 
> Can you provide a little more details here: what is the specific issue or conflict
> in each of these examples and how does your ownership proposal fix it,
> when Neil's simpler approach doesn't?
> 
For the first example:
My approach:
Testpmd want to manage the fail-safe port, therefore it should allocate unique ID(only one time) and use owner set(by its ID) to take ownership of this port.
If it succeed to take ownership it can manage the port.
Failsafe PMD wants to manage its sub-devices ports and does the same process as Testpmd.
Everything is ok.

Neil  approach:
Testpmd want to manage the fail-safe port, therefore it just need to claim ownership(set) and its pid will take as the owner identifier.
Failsafe PMD wants to manage its sub-devices ports and does the same process as Testpmd.
But look these 2 entities run in same threads and there both can set the same pid. -> problem!

The second one just describe more scenario about more than one DPDK entities which run from the same thread.

> >
> > So, no any optional  owner becomes an owner, it depends in the specific
> implementation.
> >
> > So if some "part of code" wants to manage a port exclusively and wants to
> take ownership of it to prevent other "part of code" to use this port :
> > 1. Take ownership.
> > 2. It should ask itself: Am I run in different threads\processes? If yes, it
> should synchronize its port management.
> > 3. Release ownership in the end.
> >
> > Remember that may be different "part of code"s running in the same
> thread\threads\process\processes.
> >
> > Thanks, Matan.
> > >
> > > Regards
> > > Neil
> > >
> > > }

Neil Horman Jan. 19, 2018, 12:55 p.m. UTC | #42

On Fri, Jan 19, 2018 at 09:30:17AM +0000, Bruce Richardson wrote:
> On Fri, Jan 19, 2018 at 07:14:17AM +0000, Matan Azrad wrote:
> > 
> > Hi Neil
> > From: Neil Horman, Friday, January 19, 2018 3:41 AM
> > > On Thu, Jan 18, 2018 at 08:21:34PM +0000, Matan Azrad wrote:
> > > > Hi Neil.
> > > >
> > > > From: Neil Horman, Thursday, January 18, 2018 8:42 PM
> > 
> > <snip>
> > > > 1. What exactly do you want to improve?(in details) 2. Which API
> > > > specifically do you want to change(\ part of code)?
> > > > 3. What is the missing in current code(you can answer it in V3 I sent if you
> > > want) which should be fixed?
> > > >
> > > >
> > > > <snip> sorry for that, I think it is not relevant continue discussion if we are
> > > not fully understand each other. So let's start from the beginning "with good
> > > order :)" by answering the above questions.
> > > 
> > > 
> > > Sure, this seems like a reasonable way to level set.
> > > 
> > > I mentioned in another thread that perhaps some of my issue here is
> > > perception regarding what is meant by ownership.  When I think of an
> > > ownership api I think primarily of mutual exclusion (that is to say,
> > > enforcement of a single execution context having access to a resource at any
> > > given time.  In my mind the simplest form of ownership is a spinlock or a
> > > mutex.  A single execution context either does or does not hold the resource
> > > at any one time.  Those contexts that attempt to gain excusive access to the
> > > resource call an api that (depending on
> > > implementation) either block continued execution of that thread until
> > > exclusive access to the resource can be granted, or returns immediately with
> > > a success or error indicator to let the caller know if access is granted.
> > > 
> > > If I were to codify this port ownership api in pseudo code it would look
> > > something like this:
> > > 
> > > struct rte_eth_dev {
> > > 
> > > 	< eth dev bits >
> > > 	rte_spinlock_t owner_lock;
> > > 	bool locked;
> > > 	pid_t owner_pid;
> > > }
> > > 
> As an aside, if you ensure that both locked (or "owned", I think in this
> context) and owner_pid are integer values, you can do away with the lock
> and use a compare-and-set to take ownership, by setting both atomically
> if unmodified from the originally read values.
> 
This is true, since the lock is release at the end of each API function
(effectively making each API function atomic).  Though, a dpdk spinlock is just
a compare_and_set operation with a built in yield()

Neil
>

Neil Horman Jan. 19, 2018, 1:30 p.m. UTC | #43

On Fri, Jan 19, 2018 at 10:44:32AM +0000, Matan Azrad wrote:
> Hi Bruce
> From: Bruce Richardson, Friday, January 19, 2018 11:30 AM
> > On Fri, Jan 19, 2018 at 07:14:17AM +0000, Matan Azrad wrote:
> > >
> > > Hi Neil
> > > From: Neil Horman, Friday, January 19, 2018 3:41 AM
> > > > On Thu, Jan 18, 2018 at 08:21:34PM +0000, Matan Azrad wrote:
> > > > > Hi Neil.
> > > > >
> > > > > From: Neil Horman, Thursday, January 18, 2018 8:42 PM
> > >
> > > <snip>
> > > > > 1. What exactly do you want to improve?(in details) 2. Which API
> > > > > specifically do you want to change(\ part of code)?
> > > > > 3. What is the missing in current code(you can answer it in V3 I
> > > > > sent if you
> > > > want) which should be fixed?
> > > > >
> > > > >
> > > > > <snip> sorry for that, I think it is not relevant continue
> > > > > discussion if we are
> > > > not fully understand each other. So let's start from the beginning
> > > > "with good order :)" by answering the above questions.
> > > >
> > > >
> > > > Sure, this seems like a reasonable way to level set.
> > > >
> > > > I mentioned in another thread that perhaps some of my issue here is
> > > > perception regarding what is meant by ownership.  When I think of an
> > > > ownership api I think primarily of mutual exclusion (that is to say,
> > > > enforcement of a single execution context having access to a
> > > > resource at any given time.  In my mind the simplest form of
> > > > ownership is a spinlock or a mutex.  A single execution context
> > > > either does or does not hold the resource at any one time.  Those
> > > > contexts that attempt to gain excusive access to the resource call
> > > > an api that (depending on
> > > > implementation) either block continued execution of that thread
> > > > until exclusive access to the resource can be granted, or returns
> > > > immediately with a success or error indicator to let the caller know if
> > access is granted.
> > > >
> > > > If I were to codify this port ownership api in pseudo code it would
> > > > look something like this:
> > > >
> > > > struct rte_eth_dev {
> > > >
> > > > 	< eth dev bits >
> > > > 	rte_spinlock_t owner_lock;
> > > > 	bool locked;
> > > > 	pid_t owner_pid;
> > > > }
> > > >
> > As an aside, if you ensure that both locked (or "owned", I think in this
> > context) and owner_pid are integer values, you can do away with the lock
> > and use a compare-and-set to take ownership, by setting both atomically if
> > unmodified from the originally read values.
> > 
> > > >
> > > > bool rte_port_claim_ownership(struct rte_eth_dev *dev) {
> > > > 	bool ret = false;
> > > >
> > > > 	spin_lock(dev->owner_lock);
> > > > 	if (dev->locked)
> > > > 		goto out;
> > > > 	dev->locked = true;
> > > > 	dev->owner_pid = getpid();
> > > > 	ret = true;
> > > > out:
> > > > 	spin_unlock(dev->lock)
> > > > 	return ret;
> > > > }
> > > >
> > > >
> > > > bool rte_port_release_ownership(rte_eth_dev *dev) {
> > > >
> > > > 	boot ret = false;
> > > > 	spin_lock(dev->owner_lock);
> > > > 	if (!dev->locked)
> > > > 		goto out;
> > > > 	if (dev->owner_pid != getpid())
> > > > 		goto out;
> > > > 	dev->locked = false;
> > > > 	dev_owner_pid = 0;
> > > > 	ret = true;
> > > > out:
> > > > 	spin_unlock(dev->owner_lock)
> > > > 	return ret;
> > > > }
> > > >
> > > > bool rte_port_is_owned_by(struct rte_eth_dev *dev, pid_t pid) {
> > > > 	bool ret = false;
> > > >
> > > > 	spin_lock(dev->owner_lock);
> > > > 	if (pid)
> > > > 		ret = (dev->locked && (pid == dev->owner_pid));
> > > > 	else
> > > > 		ret = dev->locked;
> > > > 	spin_unlock(dev->owner_lock);
> > > > 	return ret;
> > > > }
> > > >
> > > > The idea here is that lock state is isolated from ownership
> > > > information.  Any context has the opportunity to lock the resource
> > > > (in this case the eth port) despite its ownership object.
> > > >
> > > > In comparison, your api, which is in may ways simmilar, separates
> > > > the creation of ownership objects to a separate api call, and that
> > > > ownership information embodies state that is integral to the ability
> > > > to get exclusive access to the resource.  I.E. if thread A calls
> > > > your owner_new call, and then thread B calls owner_new, thread A
> > > > will never be able to get access to any port unless it calls owner_new
> > again.
> > > >
> > > > Does that help clarify my position?
> > This would have been my understanding of what was being looked for too,
> > from my minimal understanding of the problem. Thanks for putting that
> > forward on behalf of many of us!
> > 
> > >
> > > Now I fully understand you, thanks for your patience.
> > >
> > > So, you are missing here one of the main ideas of my port ownership
> > intention.
> > > There are options for X>1 different uncoordinated owners running in the
> > same thread.
> > 
> > Thanks Matan for taking time to try and explain how your idea differs, but I
> > for one am still a little confused. Sorry for the late questions.
> > 
> > Sure, Neil's example above takes the pid or thread id as the owner id
> > parameter, but there is no reason we can't use the same scheme with
> > arbitrarily assigned owner ids, so long as they are unique. We can even have
> > a simple mapping table mapping ids to names of components.
> > >
> Sorry, don't understand your point here.
> My approach asked to allocate unique ID for "any part of code want to manage\use a port".
> What is the problem here and how do you suggest to fix it?
> 
> Neil approach (with process iD\ thread id ) is wrong because 2 different owners can run in same thread (as I explained a lot below).
> 
So, I may be wrong here, but it would be my opinion that the ownership record
should codify something about the owning context.  The fact that you want two
different owners to run in the context of the same thread is not a problem per
se, but rather an artifact of your adherence to the statement "any part of code
to manage/use a port".  I would assert that was perhaps a statement made in
error early during the design phase.  Perhaps it would be better to state that
any exectution context may take ownership of a port.

> > > For example:
> > > 1. Think about Testpmd control commands that call to failsafe port devop
> > which call to its sub-devices devops, while tespmd is different
> > owner(controlling failsafe-port) and failsafe is a different owner(controlling
> > all its sub-devices ports), There are both run control commands in the same
> > thread and there are uncoordinated!
Answered below.

> > >  2. Interrupt callbacks that anyone can register to them and all will run by
> > the DPDK host thread.
> > 
I'm sorry, I'm not clear on how your solution succededs here where my alternate
model fails.  Both models require co-ordination such that ownership of a port is
released and re-aquired by another thread, if I'm understanding this correctly.

> > Can you provide a little more details here: what is the specific issue or conflict
> > in each of these examples and how does your ownership proposal fix it,
> > when Neil's simpler approach doesn't?
> > 
> For the first example:
> My approach:
> Testpmd want to manage the fail-safe port, therefore it should allocate unique ID(only one time) and use owner set(by its ID) to take ownership of this port.
> If it succeed to take ownership it can manage the port.
> Failsafe PMD wants to manage its sub-devices ports and does the same process as Testpmd.
> Everything is ok.
> 
> Neil  approach:
> Testpmd want to manage the fail-safe port, therefore it just need to claim ownership(set) and its pid will take as the owner identifier.
> Failsafe PMD wants to manage its sub-devices ports and does the same process as Testpmd.
> But look these 2 entities run in same threads and there both can set the same pid. -> problem!
> 
I would argue thats not an error at all.  As above, the only thing wrong with
using the same ID to claim ownership of both ports is that it violates the
statement you referred to above, which I think is somewhat erroneous.  I would
further argue that using the same Id in both scenarios is preferable because it
accurately indicates the ownership relation between the top level failsafe
device and its slaves (i.e. that the application thread owns the failsafe
device, and transitively, the slaves).  There is no real need to codify the fact
that the failsafe port actually owns the slaves, above and beyond that statement
above.

There is a convienience to having ownership be differentiated in the
master/slave model when it comes to iterating over top level vs subordinate
ports, but I would agrue thats a problem that should be solved independently,
adding it here is somewhat confusing.  I would suggest adding a parent
rte_eth_Dev and childrent rte_eth_dev list to the rte_eth_dev structure so that
iterations can be preformed over top level devices, children, children of
children, etc.  You can do this with your ownership model as well of course, but
there are other ways to skin that cat.


> The second one just describe more scenario about more than one DPDK entities which run from the same thread.
> 
> > >
> > > So, no any optional  owner becomes an owner, it depends in the specific
> > implementation.
> > >
> > > So if some "part of code" wants to manage a port exclusively and wants to
> > take ownership of it to prevent other "part of code" to use this port :
> > > 1. Take ownership.
> > > 2. It should ask itself: Am I run in different threads\processes? If yes, it
> > should synchronize its port management.
> > > 3. Release ownership in the end.
> > >
> > > Remember that may be different "part of code"s running in the same
> > thread\threads\process\processes.
> > >
So it seems like the real point of contention that we need to settle here is,
what codifies an 'owner'.  Must it be a specific execution context, or can we
define any arbitrary section of code as being an owner?  I would agrue against
the latter.  While in your master/slave model I can see how it seems tempting, I
would suggest alternate use cases that make that ownership model ambiguous.  If,
for example, we use your interrupt example above, and an interrupt call back is
run for a given port, how, using your example, does it now which area of
code/object/thread to co-ordinate releasing of that port with so that it can
operate exclusively?


Thanks
Neil
> > > Thanks, Matan.
> > > >
> > > > Regards
> > > > Neil
> > > >
> > > > }
>

Neil Horman Jan. 19, 2018, 1:52 p.m. UTC | #44

On Fri, Jan 19, 2018 at 07:14:17AM +0000, Matan Azrad wrote:
> 
> Hi Neil
> From: Neil Horman, Friday, January 19, 2018 3:41 AM
> > On Thu, Jan 18, 2018 at 08:21:34PM +0000, Matan Azrad wrote:
> > > Hi Neil.
> > >
> > > From: Neil Horman, Thursday, January 18, 2018 8:42 PM
> 
> <snip>
> > > 1. What exactly do you want to improve?(in details) 2. Which API
> > > specifically do you want to change(\ part of code)?
> > > 3. What is the missing in current code(you can answer it in V3 I sent if you
> > want) which should be fixed?
> > >
> > >
> > > <snip> sorry for that, I think it is not relevant continue discussion if we are
> > not fully understand each other. So let's start from the beginning "with good
> > order :)" by answering the above questions.
> > 
> > 
> > Sure, this seems like a reasonable way to level set.
> > 
> > I mentioned in another thread that perhaps some of my issue here is
> > perception regarding what is meant by ownership.  When I think of an
> > ownership api I think primarily of mutual exclusion (that is to say,
> > enforcement of a single execution context having access to a resource at any
> > given time.  In my mind the simplest form of ownership is a spinlock or a
> > mutex.  A single execution context either does or does not hold the resource
> > at any one time.  Those contexts that attempt to gain excusive access to the
> > resource call an api that (depending on
> > implementation) either block continued execution of that thread until
> > exclusive access to the resource can be granted, or returns immediately with
> > a success or error indicator to let the caller know if access is granted.
> > 
> > If I were to codify this port ownership api in pseudo code it would look
> > something like this:
> > 
> > struct rte_eth_dev {
> > 
> > 	< eth dev bits >
> > 	rte_spinlock_t owner_lock;
> > 	bool locked;
> > 	pid_t owner_pid;
> > }
> > 
> > 
> > bool rte_port_claim_ownership(struct rte_eth_dev *dev) {
> > 	bool ret = false;
> > 
> > 	spin_lock(dev->owner_lock);
> > 	if (dev->locked)
> > 		goto out;
> > 	dev->locked = true;
> > 	dev->owner_pid = getpid();
> > 	ret = true;
> > out:
> > 	spin_unlock(dev->lock)
> > 	return ret;
> > }
> > 
> > 
> > bool rte_port_release_ownership(rte_eth_dev *dev) {
> > 
> > 	boot ret = false;
> > 	spin_lock(dev->owner_lock);
> > 	if (!dev->locked)
> > 		goto out;
> > 	if (dev->owner_pid != getpid())
> > 		goto out;
> > 	dev->locked = false;
> > 	dev_owner_pid = 0;
> > 	ret = true;
> > out:
> > 	spin_unlock(dev->owner_lock)
> > 	return ret;
> > }
> > 
> > bool rte_port_is_owned_by(struct rte_eth_dev *dev, pid_t pid) {
> > 	bool ret = false;
> > 
> > 	spin_lock(dev->owner_lock);
> > 	if (pid)
> > 		ret = (dev->locked && (pid == dev->owner_pid));
> > 	else
> > 		ret = dev->locked;
> > 	spin_unlock(dev->owner_lock);
> > 	return ret;
> > }
> > 
> > The idea here is that lock state is isolated from ownership information.  Any
> > context has the opportunity to lock the resource (in this case the eth port)
> > despite its ownership object.
> > 
> > In comparison, your api, which is in may ways simmilar, separates the
> > creation of ownership objects to a separate api call, and that ownership
> > information embodies state that is integral to the ability to get exclusive
> > access to the resource.  I.E. if thread A calls your owner_new call, and then
> > thread B calls owner_new, thread A will never be able to get access to any
> > port unless it calls owner_new again.
> > 
> > Does that help clarify my position?
> 
> Now I fully understand you, thanks for your patience.
> 
> So, you are missing here one of the main ideas of my port ownership intention.
> There are options for X>1 different uncoordinated owners running in the same thread.
> 
> For example:
> 1. Think about Testpmd control commands that call to failsafe port devop which call to its sub-devices devops, while tespmd is different owner(controlling failsafe-port) and failsafe is a different owner(controlling all its sub-devices ports), There are both run control commands in the same thread and there are uncoordinated!
>  2. Interrupt callbacks that anyone can register to them and all will run by the DPDK host thread. 
> 
> So, no any optional  owner becomes an owner, it depends in the specific implementation.
> 
> So if some "part of code" wants to manage a port exclusively and wants to take ownership of it to prevent other "part of code" to use this port :
> 1. Take ownership.
> 2. It should ask itself: Am I run in different threads\processes? If yes, it should synchronize its port management. 
> 3. Release ownership in the end.
> 
> Remember that may be different "part of code"s running in the same thread\threads\process\processes.
> 
Apologies for not responding in full here, but in the interests of
de-duplication, I'm providing a larger response to the above higher up in the
the thread (the branch in which Bruce commented).

Best
Neil

> Thanks, Matan.
> > 
> > Regards
> > Neil
> > 
> > }
>

Neil Horman Jan. 19, 2018, 1:57 p.m. UTC | #45

On Thu, Jan 18, 2018 at 02:52:20PM +0000, Matan Azrad wrote:
> Hi Neil
> 
> From: Neil Horman, Thursday, January 18, 2018 3:21 PM
> > On Wed, Jan 17, 2018 at 05:58:07PM +0000, Matan Azrad wrote:
> > >
> > > Hi Neil
> > >
> > >  From: Neil Horman, Wednesday, January 17, 2018 4:00 PM
> > > > On Wed, Jan 17, 2018 at 12:05:42PM +0000, Matan Azrad wrote:
> <snip>
> > > > Matan is correct here, there is no way to preform parallel set
> > > > operations using just and atomic variable here, because multiple
> > > > reads of next_owner_id need to be preformed while it is stable.
> > > > That is to say rte_eth_next_owner_id must be compared to
> > > > RTE_ETH_DEV_NO_OWNER and owner_id in rte_eth_is_valid_owner_id.
> > If
> > > > you were to only use an atomic_read on such a variable, it could be
> > > > incremented by the owner_new function between the checks and an
> > > > invalid owner value could become valid because  a third thread
> > > > incremented the next value.  The state of next_owner_id must be kept
> > > > stable during any validity checks
> > > >
> > > > That said, I really have to wonder why ownership ids are really
> > > > needed here at all.  It seems this design could be much simpler with
> > > > the addition of a per- port lock (and optional ownership record).
> > > > The API could consist of three
> > > > operations:
> > > >
> > > > ownership_set
> > > > ownership_tryset
> > > > ownership_release
> > > > ownership_get
> > > >
> > > >
> > > > The first call simply tries to take the per-port lock (blocking if
> > > > its already
> > > > locked)
> > > >
> > >
> > > Per port lock is not good because the ownership mechanism must to be
> > synchronized with the port creation\release.
> > > So the port creation and port ownership should use the same lock.
> > >
> > In what way do you need to synchronize with port creation?
> 
> The port release zeroes the data field of the port owner, so it should be synchronized with the ownership APIs.
> The port creation should be synchronized with the port release.
> 
Ok, thats fair, but you can do that, as long as you don't get hung up on the
necessity to zero all the port data.  Keep the state of the spinlock, and
mandate that the port be in an unowned state during release.

> 
> >  If a port has not
> > yet been created, then by definition the owner must be the thread calling
> > the create function.
> 
> No, the owner can be any dpdk entity. (an application - multi\single threads\proccesses, a PMD, a library).
> So the port allocation(usually done from the port PMD by one thread from one process) just should to allocate a port.
> 
Again, in the interests of de-duplication, I made an argument on this point in
the part of the thread where Bruce commented.  I don't think we need to adhere
to the notion that any block of code can be declared the owner of a port.

> 
> >  If you are concerned about the mechanics of the port
> > data structure (i.e. the fact that rte_eth_devices is statically allocated, you
> > can add a lock structure to the rte_eth_dev struct and initialize it statically
> > with
> > RTE_SPINLOCK_INITAIZER()
> > 
> 
> The lock should be in shared memory to allow secondary processes entities to take owner safely.
>  
Ok, thats entirely doable.

> > > I didn't find precedence for blocking function in ethdev.
> > >
> > Then perhaps we don't need that api call.  Perhaps ownership_tryset is
> > enough.
> >
> 
> As I already did :)
>  
> > > > The second call is a non-blocking version of the first
> > > >
> > > > The third unlocks the port, allowing others to take ownership
> > > >
> > > > The fourth returns whatever ownership record you want to encode with
> > > > the lock.
> > > >
> > > > The addition of all this id checking seems a bit overcomplicated
> > >
> > > You miss the identification of the owner - we want to allow info of the
> > owner for printing and easy debug.
> > > And it is makes sense to manage the owner uniqueness by unique ID.
> > >
> > I specifically pointed that out above.  There is no reason an owernship record
> > couldn't be added to the rte_eth_dev structure.
> > 
> 
> Sorry, don't understand why.
> 
Because, thats the resource your trying to protect, and the object you want to
identify ownership of, no?
 

> > > The API already discussed a lot in the previous version, Do you really want,
> > now, to open it again?
> > >
> > What I want is the most useful and elegant ownership API available.  If you
> > think what you have is that, so be it.  I only bring this up because the amount
> > of debate you and Konstantin have had over lock safety causes me to
> > wonder if this isn't an overly complex design.
> 
> I think the complex design is in secondary\primary processes, not in the current port ownership.
> I think there is some work to do there regardless port ownership.
> I think also there is some work in progress for it.
> 
> Thanks, a lot.
> 
> > 
> > Neil
> > 
> > 
> > > > Neil
> > >
> > >
>

Matan Azrad Jan. 19, 2018, 1:57 p.m. UTC | #46

Hi Neil
From: Neil Horman, Friday, January 19, 2018 3:30 PM
> On Fri, Jan 19, 2018 at 10:44:32AM +0000, Matan Azrad wrote:
> > Hi Bruce
> > From: Bruce Richardson, Friday, January 19, 2018 11:30 AM
> > > On Fri, Jan 19, 2018 at 07:14:17AM +0000, Matan Azrad wrote:
> > > >
> > > > Hi Neil
> > > > From: Neil Horman, Friday, January 19, 2018 3:41 AM
> > > > > On Thu, Jan 18, 2018 at 08:21:34PM +0000, Matan Azrad wrote:
> > > > > > Hi Neil.
> > > > > >
> > > > > > From: Neil Horman, Thursday, January 18, 2018 8:42 PM
> > > >
> > > > <snip>
> > > > > > 1. What exactly do you want to improve?(in details) 2. Which
> > > > > > API specifically do you want to change(\ part of code)?
> > > > > > 3. What is the missing in current code(you can answer it in V3
> > > > > > I sent if you
> > > > > want) which should be fixed?
> > > > > >
> > > > > >
> > > > > > <snip> sorry for that, I think it is not relevant continue
> > > > > > discussion if we are
> > > > > not fully understand each other. So let's start from the
> > > > > beginning "with good order :)" by answering the above questions.
> > > > >
> > > > >
> > > > > Sure, this seems like a reasonable way to level set.
> > > > >
> > > > > I mentioned in another thread that perhaps some of my issue here
> > > > > is perception regarding what is meant by ownership.  When I
> > > > > think of an ownership api I think primarily of mutual exclusion
> > > > > (that is to say, enforcement of a single execution context
> > > > > having access to a resource at any given time.  In my mind the
> > > > > simplest form of ownership is a spinlock or a mutex.  A single
> > > > > execution context either does or does not hold the resource at
> > > > > any one time.  Those contexts that attempt to gain excusive
> > > > > access to the resource call an api that (depending on
> > > > > implementation) either block continued execution of that thread
> > > > > until exclusive access to the resource can be granted, or
> > > > > returns immediately with a success or error indicator to let the
> > > > > caller know if
> > > access is granted.
> > > > >
> > > > > If I were to codify this port ownership api in pseudo code it
> > > > > would look something like this:
> > > > >
> > > > > struct rte_eth_dev {
> > > > >
> > > > > 	< eth dev bits >
> > > > > 	rte_spinlock_t owner_lock;
> > > > > 	bool locked;
> > > > > 	pid_t owner_pid;
> > > > > }
> > > > >
> > > As an aside, if you ensure that both locked (or "owned", I think in
> > > this
> > > context) and owner_pid are integer values, you can do away with the
> > > lock and use a compare-and-set to take ownership, by setting both
> > > atomically if unmodified from the originally read values.
> > >
> > > > >
> > > > > bool rte_port_claim_ownership(struct rte_eth_dev *dev) {
> > > > > 	bool ret = false;
> > > > >
> > > > > 	spin_lock(dev->owner_lock);
> > > > > 	if (dev->locked)
> > > > > 		goto out;
> > > > > 	dev->locked = true;
> > > > > 	dev->owner_pid = getpid();
> > > > > 	ret = true;
> > > > > out:
> > > > > 	spin_unlock(dev->lock)
> > > > > 	return ret;
> > > > > }
> > > > >
> > > > >
> > > > > bool rte_port_release_ownership(rte_eth_dev *dev) {
> > > > >
> > > > > 	boot ret = false;
> > > > > 	spin_lock(dev->owner_lock);
> > > > > 	if (!dev->locked)
> > > > > 		goto out;
> > > > > 	if (dev->owner_pid != getpid())
> > > > > 		goto out;
> > > > > 	dev->locked = false;
> > > > > 	dev_owner_pid = 0;
> > > > > 	ret = true;
> > > > > out:
> > > > > 	spin_unlock(dev->owner_lock)
> > > > > 	return ret;
> > > > > }
> > > > >
> > > > > bool rte_port_is_owned_by(struct rte_eth_dev *dev, pid_t pid) {
> > > > > 	bool ret = false;
> > > > >
> > > > > 	spin_lock(dev->owner_lock);
> > > > > 	if (pid)
> > > > > 		ret = (dev->locked && (pid == dev->owner_pid));
> > > > > 	else
> > > > > 		ret = dev->locked;
> > > > > 	spin_unlock(dev->owner_lock);
> > > > > 	return ret;
> > > > > }
> > > > >
> > > > > The idea here is that lock state is isolated from ownership
> > > > > information.  Any context has the opportunity to lock the
> > > > > resource (in this case the eth port) despite its ownership object.
> > > > >
> > > > > In comparison, your api, which is in may ways simmilar,
> > > > > separates the creation of ownership objects to a separate api
> > > > > call, and that ownership information embodies state that is
> > > > > integral to the ability to get exclusive access to the resource.
> > > > > I.E. if thread A calls your owner_new call, and then thread B
> > > > > calls owner_new, thread A will never be able to get access to
> > > > > any port unless it calls owner_new
> > > again.
> > > > >
> > > > > Does that help clarify my position?
> > > This would have been my understanding of what was being looked for
> > > too, from my minimal understanding of the problem. Thanks for
> > > putting that forward on behalf of many of us!
> > >
> > > >
> > > > Now I fully understand you, thanks for your patience.
> > > >
> > > > So, you are missing here one of the main ideas of my port
> > > > ownership
> > > intention.
> > > > There are options for X>1 different uncoordinated owners running
> > > > in the
> > > same thread.
> > >
> > > Thanks Matan for taking time to try and explain how your idea
> > > differs, but I for one am still a little confused. Sorry for the late questions.
> > >
> > > Sure, Neil's example above takes the pid or thread id as the owner
> > > id parameter, but there is no reason we can't use the same scheme
> > > with arbitrarily assigned owner ids, so long as they are unique. We
> > > can even have a simple mapping table mapping ids to names of
> components.
> > > >
> > Sorry, don't understand your point here.
> > My approach asked to allocate unique ID for "any part of code want to
> manage\use a port".
> > What is the problem here and how do you suggest to fix it?
> >
> > Neil approach (with process iD\ thread id ) is wrong because 2 different
> owners can run in same thread (as I explained a lot below).
> >
> So, I may be wrong here, but it would be my opinion that the ownership
> record should codify something about the owning context.

So, the context is the allocated ID and the name.
I think pid is not necessary.

>  The fact that you
> want two different owners to run in the context of the same thread is not a
> problem per se, but rather an artifact of your adherence to the statement
> "any part of code to manage/use a port".  I would assert that was perhaps a
> statement made in error early during the design phase.  Perhaps it would be
> better to state that any exectution context may take ownership of a port.
>

It is just semantic.
 
> > > > For example:
> > > > 1. Think about Testpmd control commands that call to failsafe port
> > > > devop
> > > which call to its sub-devices devops, while tespmd is different
> > > owner(controlling failsafe-port) and failsafe is a different
> > > owner(controlling all its sub-devices ports), There are both run
> > > control commands in the same thread and there are uncoordinated!
> Answered below.
> 
> > > >  2. Interrupt callbacks that anyone can register to them and all
> > > > will run by
> > > the DPDK host thread.
> > >
> I'm sorry, I'm not clear on how your solution succededs here where my
> alternate model fails.  Both models require co-ordination such that
> ownership of a port is released and re-aquired by another thread, if I'm
> understanding this correctly.
>

I think you don't understand, please see below.
 
> > > Can you provide a little more details here: what is the specific
> > > issue or conflict in each of these examples and how does your
> > > ownership proposal fix it, when Neil's simpler approach doesn't?
> > >
> > For the first example:
> > My approach:
> > Testpmd want to manage the fail-safe port, therefore it should allocate
> unique ID(only one time) and use owner set(by its ID) to take ownership of
> this port.
> > If it succeed to take ownership it can manage the port.
> > Failsafe PMD wants to manage its sub-devices ports and does the same
> process as Testpmd.
> > Everything is ok.
> >
> > Neil  approach:
> > Testpmd want to manage the fail-safe port, therefore it just need to claim
> ownership(set) and its pid will take as the owner identifier.
> > Failsafe PMD wants to manage its sub-devices ports and does the same
> process as Testpmd.
> > But look these 2 entities run in same threads and there both can set the
> same pid. -> problem!
> >
> I would argue thats not an error at all.  As above, the only thing wrong with
> using the same ID to claim ownership of both ports is that it violates the
> statement you referred to above, which I think is somewhat erroneous.  I
> would further argue that using the same Id in both scenarios is preferable
> because it accurately indicates the ownership relation between the top level
> failsafe device and its slaves (i.e. that the application thread owns the failsafe
> device, and transitively, the slaves).  There is no real need to codify the fact
> that the failsafe port actually owns the slaves, above and beyond that
> statement above.

Look, The two different entities run in the same thread,
They actually even don't know each other,
One set MTU to 1500,
The second set MTU to 3000.
The first one run rx burst for port 5 while the second do it exactly the same.
Crash is not far to come.
How can you say that this is OK and no error here?

> 
> There is a convienience to having ownership be differentiated in the
> master/slave model when it comes to iterating over top level vs subordinate
> ports, but I would agrue thats a problem that should be solved
> independently, adding it here is somewhat confusing.  I would suggest
> adding a parent rte_eth_Dev and childrent rte_eth_dev list to the
> rte_eth_dev structure so that iterations can be preformed over top level
> devices, children, children of children, etc.  You can do this with your
> ownership model as well of course, but there are other ways to skin that cat.
> 

Suggest a full design, I will be happy to review it if you want :)

> 
> > The second one just describe more scenario about more than one DPDK
> entities which run from the same thread.
> >
> > > >
> > > > So, no any optional  owner becomes an owner, it depends in the
> > > > specific
> > > implementation.
> > > >
> > > > So if some "part of code" wants to manage a port exclusively and
> > > > wants to
> > > take ownership of it to prevent other "part of code" to use this port :
> > > > 1. Take ownership.
> > > > 2. It should ask itself: Am I run in different threads\processes?
> > > > If yes, it
> > > should synchronize its port management.
> > > > 3. Release ownership in the end.
> > > >
> > > > Remember that may be different "part of code"s running in the same
> > > thread\threads\process\processes.
> > > >
> So it seems like the real point of contention that we need to settle here is,
> what codifies an 'owner'.  Must it be a specific execution context, or can we
> define any arbitrary section of code as being an owner?  I would agrue
> against the latter.  While in your master/slave model I can see how it seems
> tempting, I would suggest alternate use cases that make that ownership
> model ambiguous.  If, for example, we use your interrupt example above,
> and an interrupt call back is run for a given port, how, using your example,
> does it now which area of code/object/thread to co-ordinate releasing of
> that port with so that it can operate exclusively?

Example:
Some DPDK entity succeed to take ownership of port X.
Than it wants to register for LINK event - and configure something in the callback.
There is another code for this DPDK entity which may configure same area in other thread.
Since the DPDK entity knows about all its code(includes the cb code) it can just synchronize these 2 configurations by itself.

Thomas Monjalon Jan. 19, 2018, 2:07 p.m. UTC | #47

19/01/2018 14:57, Neil Horman:
> > > I specifically pointed that out above.  There is no reason an owernship record
> > > couldn't be added to the rte_eth_dev structure.
> >
> > Sorry, don't understand why.
> >
> Because, thats the resource your trying to protect, and the object you want to
> identify ownership of, no?

No
The rte_eth_dev structure is the port representation in the process.
The rte_eth_dev_data structure is the port represenation across multi-process.
The ownership must be in rte_eth_dev_data to cover multi-process protection.

Thomas Monjalon Jan. 19, 2018, 2:13 p.m. UTC | #48

19/01/2018 14:30, Neil Horman:
> So it seems like the real point of contention that we need to settle here is,
> what codifies an 'owner'.  Must it be a specific execution context, or can we
> define any arbitrary section of code as being an owner?  I would agrue against
> the latter.

This is the first thing explained in the cover letter:
"2. The port usage synchronization will be managed by the port owner."
There is no intent to manage the threads synchronization for a given port.
It is the responsibility of the owner (a code object) to configure its
port via only one thread.
It is consistent with not trying to manage threads synchronization
for Rx/Tx on a given queue.

Neil Horman Jan. 19, 2018, 2:32 p.m. UTC | #49

On Fri, Jan 19, 2018 at 03:07:28PM +0100, Thomas Monjalon wrote:
> 19/01/2018 14:57, Neil Horman:
> > > > I specifically pointed that out above.  There is no reason an owernship record
> > > > couldn't be added to the rte_eth_dev structure.
> > >
> > > Sorry, don't understand why.
> > >
> > Because, thats the resource your trying to protect, and the object you want to
> > identify ownership of, no?
> 
> No
> The rte_eth_dev structure is the port representation in the process.
> The rte_eth_dev_data structure is the port represenation across multi-process.
> The ownership must be in rte_eth_dev_data to cover multi-process protection.
> 
Ok.   You get the idea though right?  That the port representation,
for some definition thereof, should embody the ownership state.
Neil

> 
> 
>

Neil Horman Jan. 19, 2018, 3:27 p.m. UTC | #50

On Fri, Jan 19, 2018 at 03:13:47PM +0100, Thomas Monjalon wrote:
> 19/01/2018 14:30, Neil Horman:
> > So it seems like the real point of contention that we need to settle here is,
> > what codifies an 'owner'.  Must it be a specific execution context, or can we
> > define any arbitrary section of code as being an owner?  I would agrue against
> > the latter.
> 
> This is the first thing explained in the cover letter:
> "2. The port usage synchronization will be managed by the port owner."
> There is no intent to manage the threads synchronization for a given port.
> It is the responsibility of the owner (a code object) to configure its
> port via only one thread.
> It is consistent with not trying to manage threads synchronization
> for Rx/Tx on a given queue.
> 
> 
Yes, in his cover letter, and I contend that notion is an invalid design point.
By codifying an area of code as an 'owner', rather than an execution context,
you're defining the notion of heirarchy, not ownership. That is to say,
you want to codify the notion that there are top level ports that the
application might see, and some of those top level ports are parents to
subordinate ports, which only the parent port should access directly.  If thats
all you want to encode, there are far easier ways to do it:

struct rte_eth_shared_data {
	< existing bits >
	struct rte_eth_port_list {
		struct rte_eth_port_list *children;
		struct rte_eth_port_list *parent;
	};
};


Build an api around a structure like that, so that the parent/child relationship
is globally clear, and this would be much easier, especially if you want to
continue asserting that the notion of synchronization/exclusion is an exercise
left to the application.

Neil

Thomas Monjalon Jan. 19, 2018, 5:09 p.m. UTC | #51

19/01/2018 15:32, Neil Horman:
> On Fri, Jan 19, 2018 at 03:07:28PM +0100, Thomas Monjalon wrote:
> > 19/01/2018 14:57, Neil Horman:
> > > > > I specifically pointed that out above.  There is no reason an owernship record
> > > > > couldn't be added to the rte_eth_dev structure.
> > > >
> > > > Sorry, don't understand why.
> > > >
> > > Because, thats the resource your trying to protect, and the object you want to
> > > identify ownership of, no?
> > 
> > No
> > The rte_eth_dev structure is the port representation in the process.
> > The rte_eth_dev_data structure is the port represenation across multi-process.
> > The ownership must be in rte_eth_dev_data to cover multi-process protection.
> > 
> Ok.   You get the idea though right?  That the port representation,
> for some definition thereof, should embody the ownership state.
> Neil

Not sure to understand your question.

Thomas Monjalon Jan. 19, 2018, 5:17 p.m. UTC | #52

19/01/2018 16:27, Neil Horman:
> On Fri, Jan 19, 2018 at 03:13:47PM +0100, Thomas Monjalon wrote:
> > 19/01/2018 14:30, Neil Horman:
> > > So it seems like the real point of contention that we need to settle here is,
> > > what codifies an 'owner'.  Must it be a specific execution context, or can we
> > > define any arbitrary section of code as being an owner?  I would agrue against
> > > the latter.
> > 
> > This is the first thing explained in the cover letter:
> > "2. The port usage synchronization will be managed by the port owner."
> > There is no intent to manage the threads synchronization for a given port.
> > It is the responsibility of the owner (a code object) to configure its
> > port via only one thread.
> > It is consistent with not trying to manage threads synchronization
> > for Rx/Tx on a given queue.
> > 
> > 
> Yes, in his cover letter, and I contend that notion is an invalid design point.
> By codifying an area of code as an 'owner', rather than an execution context,
> you're defining the notion of heirarchy, not ownership. That is to say,
> you want to codify the notion that there are top level ports that the
> application might see, and some of those top level ports are parents to
> subordinate ports, which only the parent port should access directly.  If thats
> all you want to encode, there are far easier ways to do it:
> 
> struct rte_eth_shared_data {
> 	< existing bits >
> 	struct rte_eth_port_list {
> 		struct rte_eth_port_list *children;
> 		struct rte_eth_port_list *parent;
> 	};
> };
> 
> 
> Build an api around a structure like that, so that the parent/child relationship
> is globally clear, and this would be much easier, especially if you want to
> continue asserting that the notion of synchronization/exclusion is an exercise
> left to the application.

Not only Neil.
An owner can be something else than a port.
An owner can be an app process (multi-processes).
An owner can be a library.
The intent is really to solve the generic problem of which code
is managing a port.

Neil Horman Jan. 19, 2018, 5:37 p.m. UTC | #53

On Fri, Jan 19, 2018 at 06:09:47PM +0100, Thomas Monjalon wrote:
> 19/01/2018 15:32, Neil Horman:
> > On Fri, Jan 19, 2018 at 03:07:28PM +0100, Thomas Monjalon wrote:
> > > 19/01/2018 14:57, Neil Horman:
> > > > > > I specifically pointed that out above.  There is no reason an owernship record
> > > > > > couldn't be added to the rte_eth_dev structure.
> > > > >
> > > > > Sorry, don't understand why.
> > > > >
> > > > Because, thats the resource your trying to protect, and the object you want to
> > > > identify ownership of, no?
> > > 
> > > No
> > > The rte_eth_dev structure is the port representation in the process.
> > > The rte_eth_dev_data structure is the port represenation across multi-process.
> > > The ownership must be in rte_eth_dev_data to cover multi-process protection.
> > > 
> > Ok.   You get the idea though right?  That the port representation,
> > for some definition thereof, should embody the ownership state.
> > Neil
> 
> Not sure to understand your question.
> 
There is no real question here, only confirming that we are saying the same
thing.  I misspoke when I indicated ownership information should be embodied in
rte_eth_dev rather than its shared data.  But regardless, the concept is the
same

Neil

Neil Horman Jan. 19, 2018, 5:43 p.m. UTC | #54

On Fri, Jan 19, 2018 at 06:17:51PM +0100, Thomas Monjalon wrote:
> 19/01/2018 16:27, Neil Horman:
> > On Fri, Jan 19, 2018 at 03:13:47PM +0100, Thomas Monjalon wrote:
> > > 19/01/2018 14:30, Neil Horman:
> > > > So it seems like the real point of contention that we need to settle here is,
> > > > what codifies an 'owner'.  Must it be a specific execution context, or can we
> > > > define any arbitrary section of code as being an owner?  I would agrue against
> > > > the latter.
> > > 
> > > This is the first thing explained in the cover letter:
> > > "2. The port usage synchronization will be managed by the port owner."
> > > There is no intent to manage the threads synchronization for a given port.
> > > It is the responsibility of the owner (a code object) to configure its
> > > port via only one thread.
> > > It is consistent with not trying to manage threads synchronization
> > > for Rx/Tx on a given queue.
> > > 
> > > 
> > Yes, in his cover letter, and I contend that notion is an invalid design point.
> > By codifying an area of code as an 'owner', rather than an execution context,
> > you're defining the notion of heirarchy, not ownership. That is to say,
> > you want to codify the notion that there are top level ports that the
> > application might see, and some of those top level ports are parents to
> > subordinate ports, which only the parent port should access directly.  If thats
> > all you want to encode, there are far easier ways to do it:
> > 
> > struct rte_eth_shared_data {
> > 	< existing bits >
> > 	struct rte_eth_port_list {
> > 		struct rte_eth_port_list *children;
> > 		struct rte_eth_port_list *parent;
> > 	};
> > };
> > 
> > 
> > Build an api around a structure like that, so that the parent/child relationship
> > is globally clear, and this would be much easier, especially if you want to
> > continue asserting that the notion of synchronization/exclusion is an exercise
> > left to the application.
> 
> Not only Neil.
> An owner can be something else than a port.
> An owner can be an app process (multi-processes).
> An owner can be a library.
> The intent is really to solve the generic problem of which code
> is managing a port.
> 
I don't see how this precludes any part of what you just said.  Define the
rte_eth_port_list externally to the shared_data struct and allow any object you
want to allocate it, then anything you want to control a heirarchy of ports can
do so without issue, and the structure is far more clear than an opaque id that
carries subtle semantic ordering with it.

Neil

Thomas Monjalon Jan. 19, 2018, 6:10 p.m. UTC | #55

19/01/2018 18:37, Neil Horman:
> On Fri, Jan 19, 2018 at 06:09:47PM +0100, Thomas Monjalon wrote:
> > 19/01/2018 15:32, Neil Horman:
> > > On Fri, Jan 19, 2018 at 03:07:28PM +0100, Thomas Monjalon wrote:
> > > > 19/01/2018 14:57, Neil Horman:
> > > > > > > I specifically pointed that out above.  There is no reason an owernship record
> > > > > > > couldn't be added to the rte_eth_dev structure.
> > > > > >
> > > > > > Sorry, don't understand why.
> > > > > >
> > > > > Because, thats the resource your trying to protect, and the object you want to
> > > > > identify ownership of, no?
> > > > 
> > > > No
> > > > The rte_eth_dev structure is the port representation in the process.
> > > > The rte_eth_dev_data structure is the port represenation across multi-process.
> > > > The ownership must be in rte_eth_dev_data to cover multi-process protection.
> > > > 
> > > Ok.   You get the idea though right?  That the port representation,
> > > for some definition thereof, should embody the ownership state.
> > > Neil
> > 
> > Not sure to understand your question.
> > 
> There is no real question here, only confirming that we are saying the same
> thing.  I misspoke when I indicated ownership information should be embodied in
> rte_eth_dev rather than its shared data.  But regardless, the concept is the
> same

Yes we agree.
And I think it is what Matan did.
The owner is in struct rte_eth_dev_data:

@@ -1789,6 +1798,7 @@ struct rte_eth_dev_data {
        int numa_node;  /**< NUMA node connection */
        struct rte_vlan_filter_conf vlan_filter_conf;
        /**< VLAN filter configuration. */
+       struct rte_eth_dev_owner owner; /**< The port owner. */
 };

Thomas Monjalon Jan. 19, 2018, 6:12 p.m. UTC | #56

19/01/2018 18:43, Neil Horman:
> On Fri, Jan 19, 2018 at 06:17:51PM +0100, Thomas Monjalon wrote:
> > 19/01/2018 16:27, Neil Horman:
> > > On Fri, Jan 19, 2018 at 03:13:47PM +0100, Thomas Monjalon wrote:
> > > > 19/01/2018 14:30, Neil Horman:
> > > > > So it seems like the real point of contention that we need to settle here is,
> > > > > what codifies an 'owner'.  Must it be a specific execution context, or can we
> > > > > define any arbitrary section of code as being an owner?  I would agrue against
> > > > > the latter.
> > > > 
> > > > This is the first thing explained in the cover letter:
> > > > "2. The port usage synchronization will be managed by the port owner."
> > > > There is no intent to manage the threads synchronization for a given port.
> > > > It is the responsibility of the owner (a code object) to configure its
> > > > port via only one thread.
> > > > It is consistent with not trying to manage threads synchronization
> > > > for Rx/Tx on a given queue.
> > > > 
> > > > 
> > > Yes, in his cover letter, and I contend that notion is an invalid design point.
> > > By codifying an area of code as an 'owner', rather than an execution context,
> > > you're defining the notion of heirarchy, not ownership. That is to say,
> > > you want to codify the notion that there are top level ports that the
> > > application might see, and some of those top level ports are parents to
> > > subordinate ports, which only the parent port should access directly.  If thats
> > > all you want to encode, there are far easier ways to do it:
> > > 
> > > struct rte_eth_shared_data {
> > > 	< existing bits >
> > > 	struct rte_eth_port_list {
> > > 		struct rte_eth_port_list *children;
> > > 		struct rte_eth_port_list *parent;
> > > 	};
> > > };
> > > 
> > > 
> > > Build an api around a structure like that, so that the parent/child relationship
> > > is globally clear, and this would be much easier, especially if you want to
> > > continue asserting that the notion of synchronization/exclusion is an exercise
> > > left to the application.
> > 
> > Not only Neil.
> > An owner can be something else than a port.
> > An owner can be an app process (multi-processes).
> > An owner can be a library.
> > The intent is really to solve the generic problem of which code
> > is managing a port.
> > 
> I don't see how this precludes any part of what you just said.  Define the
> rte_eth_port_list externally to the shared_data struct and allow any object you
> want to allocate it, then anything you want to control a heirarchy of ports can
> do so without issue, and the structure is far more clear than an opaque id that
> carries subtle semantic ordering with it.

Sorry, I don't understand. Please could you rephrase?

Neil Horman Jan. 19, 2018, 7:47 p.m. UTC | #57

On Fri, Jan 19, 2018 at 07:12:36PM +0100, Thomas Monjalon wrote:
> 19/01/2018 18:43, Neil Horman:
> > On Fri, Jan 19, 2018 at 06:17:51PM +0100, Thomas Monjalon wrote:
> > > 19/01/2018 16:27, Neil Horman:
> > > > On Fri, Jan 19, 2018 at 03:13:47PM +0100, Thomas Monjalon wrote:
> > > > > 19/01/2018 14:30, Neil Horman:
> > > > > > So it seems like the real point of contention that we need to settle here is,
> > > > > > what codifies an 'owner'.  Must it be a specific execution context, or can we
> > > > > > define any arbitrary section of code as being an owner?  I would agrue against
> > > > > > the latter.
> > > > > 
> > > > > This is the first thing explained in the cover letter:
> > > > > "2. The port usage synchronization will be managed by the port owner."
> > > > > There is no intent to manage the threads synchronization for a given port.
> > > > > It is the responsibility of the owner (a code object) to configure its
> > > > > port via only one thread.
> > > > > It is consistent with not trying to manage threads synchronization
> > > > > for Rx/Tx on a given queue.
> > > > > 
> > > > > 
> > > > Yes, in his cover letter, and I contend that notion is an invalid design point.
> > > > By codifying an area of code as an 'owner', rather than an execution context,
> > > > you're defining the notion of heirarchy, not ownership. That is to say,
> > > > you want to codify the notion that there are top level ports that the
> > > > application might see, and some of those top level ports are parents to
> > > > subordinate ports, which only the parent port should access directly.  If thats
> > > > all you want to encode, there are far easier ways to do it:
> > > > 
> > > > struct rte_eth_shared_data {
> > > > 	< existing bits >
> > > > 	struct rte_eth_port_list {
> > > > 		struct rte_eth_port_list *children;
> > > > 		struct rte_eth_port_list *parent;
> > > > 	};
> > > > };
> > > > 
> > > > 
> > > > Build an api around a structure like that, so that the parent/child relationship
> > > > is globally clear, and this would be much easier, especially if you want to
> > > > continue asserting that the notion of synchronization/exclusion is an exercise
> > > > left to the application.
> > > 
> > > Not only Neil.
> > > An owner can be something else than a port.
> > > An owner can be an app process (multi-processes).
> > > An owner can be a library.
> > > The intent is really to solve the generic problem of which code
> > > is managing a port.
> > > 
> > I don't see how this precludes any part of what you just said.  Define the
> > rte_eth_port_list externally to the shared_data struct and allow any object you
> > want to allocate it, then anything you want to control a heirarchy of ports can
> > do so without issue, and the structure is far more clear than an opaque id that
> > carries subtle semantic ordering with it.
> 
> Sorry, I don't understand. Please could you rephrase?
> 

Sure, I'm saying the fact that you want an owner to be an object
(library/port/process) rather than strictly an execution context
(process/thread) doesn't preclude what I'm proposing above.  You can create a
generic version of the strcture I propose above like so:

struct rte_obj_heirarchy {
	struct rte_obj_heirarchy *children;
	struct rte_obj_heirarchy *parent;
	void *owner_data; /* optional */
};

And embed that structure in any object you would like to give a representative
heirarchy to, you then have a fairly simple api

struct rte_obj_heirarchy *heirarchy_alloc();
bool heirarchy_set(struct rte_obj_heirarchy *parent, struct rte_obj_heirarcy *child)
void heirarchy_release(struct rte_obj_heirarchy *obj)

That gives you the privately held list relationship I think you are in part
looking for (i.e. the ability for a failsafe device to iterate over the ports it
is in control of), without the awkwardness of the ordinal priority that the
current implementation imposes.

In summary, if what you want is ownership in the strictest sense of the word
(i.e. mutually exclusive access, which I think makes sense), then using a lock
and flag is really the simplest way to go.  If instead what you want is a
heirarchical relationship where you can iterate over a limited set of objects
(the failsafe child port example), then the above is what you want.

The soution Matan is providing does some of each of these things, but comes with
very odd side effects

It offers a level of mutual exclusion, in that only one
object can own another at a time, but does so in a way that introduces this very
atypical ordinality (once an ownership object is created with owner_new, any
previously created ownership object will be denied the ability to take ownership
of a port)

It also offers a level of filtering (in that if you can set the ownership id of
a given set of object to the value X, you can then iterate over them by
iterating over all objects of that type, and filtering on their id), but it
offers no clear in-memory relationship between parent and children (i.e. if you
were to look at at an object in a debugger and see that it was owned by owner id
X, it would provide you with no indicator of what object held the allocated
ownership object assigned id X.  My proposal trades a few bytes of data in
exchage for a global clear, definitive heirarcy relationship.  And if you add an
api call and a spinlock, you can easily graft on mutual exclusion here, by
blocking access to objects that arent the immediate parent of a given object.

Neil

subsequently created object

Thomas Monjalon Jan. 19, 2018, 8:19 p.m. UTC | #58

19/01/2018 20:47, Neil Horman:
> On Fri, Jan 19, 2018 at 07:12:36PM +0100, Thomas Monjalon wrote:
> > 19/01/2018 18:43, Neil Horman:
> > > On Fri, Jan 19, 2018 at 06:17:51PM +0100, Thomas Monjalon wrote:
> > > > 19/01/2018 16:27, Neil Horman:
> > > > > On Fri, Jan 19, 2018 at 03:13:47PM +0100, Thomas Monjalon wrote:
> > > > > > 19/01/2018 14:30, Neil Horman:
> > > > > > > So it seems like the real point of contention that we need to settle here is,
> > > > > > > what codifies an 'owner'.  Must it be a specific execution context, or can we
> > > > > > > define any arbitrary section of code as being an owner?  I would agrue against
> > > > > > > the latter.
> > > > > > 
> > > > > > This is the first thing explained in the cover letter:
> > > > > > "2. The port usage synchronization will be managed by the port owner."
> > > > > > There is no intent to manage the threads synchronization for a given port.
> > > > > > It is the responsibility of the owner (a code object) to configure its
> > > > > > port via only one thread.
> > > > > > It is consistent with not trying to manage threads synchronization
> > > > > > for Rx/Tx on a given queue.
> > > > > > 
> > > > > > 
> > > > > Yes, in his cover letter, and I contend that notion is an invalid design point.
> > > > > By codifying an area of code as an 'owner', rather than an execution context,
> > > > > you're defining the notion of heirarchy, not ownership. That is to say,
> > > > > you want to codify the notion that there are top level ports that the
> > > > > application might see, and some of those top level ports are parents to
> > > > > subordinate ports, which only the parent port should access directly.  If thats
> > > > > all you want to encode, there are far easier ways to do it:
> > > > > 
> > > > > struct rte_eth_shared_data {
> > > > > 	< existing bits >
> > > > > 	struct rte_eth_port_list {
> > > > > 		struct rte_eth_port_list *children;
> > > > > 		struct rte_eth_port_list *parent;
> > > > > 	};
> > > > > };
> > > > > 
> > > > > 
> > > > > Build an api around a structure like that, so that the parent/child relationship
> > > > > is globally clear, and this would be much easier, especially if you want to
> > > > > continue asserting that the notion of synchronization/exclusion is an exercise
> > > > > left to the application.
> > > > 
> > > > Not only Neil.
> > > > An owner can be something else than a port.
> > > > An owner can be an app process (multi-processes).
> > > > An owner can be a library.
> > > > The intent is really to solve the generic problem of which code
> > > > is managing a port.
> > > > 
> > > I don't see how this precludes any part of what you just said.  Define the
> > > rte_eth_port_list externally to the shared_data struct and allow any object you
> > > want to allocate it, then anything you want to control a heirarchy of ports can
> > > do so without issue, and the structure is far more clear than an opaque id that
> > > carries subtle semantic ordering with it.
> > 
> > Sorry, I don't understand. Please could you rephrase?
> > 
> 
> Sure, I'm saying the fact that you want an owner to be an object
> (library/port/process) rather than strictly an execution context
> (process/thread) doesn't preclude what I'm proposing above.  You can create a
> generic version of the strcture I propose above like so:
> 
> struct rte_obj_heirarchy {
> 	struct rte_obj_heirarchy *children;
> 	struct rte_obj_heirarchy *parent;
> 	void *owner_data; /* optional */
> };
> 
> And embed that structure in any object you would like to give a representative
> heirarchy to, you then have a fairly simple api
> 
> struct rte_obj_heirarchy *heirarchy_alloc();
> bool heirarchy_set(struct rte_obj_heirarchy *parent, struct rte_obj_heirarcy *child)
> void heirarchy_release(struct rte_obj_heirarchy *obj)
> 
> That gives you the privately held list relationship I think you are in part
> looking for (i.e. the ability for a failsafe device to iterate over the ports it
> is in control of), without the awkwardness of the ordinal priority that the
> current implementation imposes.

What is the awkward ordinal priority?
I see you discuss it below. So let's discuss it below.

> In summary, if what you want is ownership in the strictest sense of the word
> (i.e. mutually exclusive access, which I think makes sense), then using a lock
> and flag is really the simplest way to go.  If instead what you want is a
> heirarchical relationship where you can iterate over a limited set of objects
> (the failsafe child port example), then the above is what you want.

We want only ownership. That's why it's called ownership :)
The hierarchical relationship is private to the owner.
For instance, failsafe implements its own list of sub-devices.
So we just need to expose that the ports are already owned.

> The soution Matan is providing does some of each of these things, but comes with
> very odd side effects
> 
> It offers a level of mutual exclusion, in that only one
> object can own another at a time, but does so in a way that introduces this very
> atypical ordinality (once an ownership object is created with owner_new, any
> previously created ownership object will be denied the ability to take ownership
> of a port)

You mean only the last owner id can take an ownership?
If yes, it looks like a bug.
Please could you show what is responsible of this effect in the patch?

> It also offers a level of filtering (in that if you can set the ownership id of
> a given set of object to the value X, you can then iterate over them by
> iterating over all objects of that type, and filtering on their id), but it
> offers no clear in-memory relationship between parent and children (i.e. if you
> were to look at at an object in a debugger and see that it was owned by owner id
> X, it would provide you with no indicator of what object held the allocated
> ownership object assigned id X.

I think it is wrong. There is an owner name for debug/printing purpose.

> My proposal trades a few bytes of data in
> exchage for a global clear, definitive heirarcy relationship.  And if you add an
> api call and a spinlock, you can easily graft on mutual exclusion here, by
> blocking access to objects that arent the immediate parent of a given object.

For the hierarchical relationship, I think it is over-engineered.
For blocking access, it means you need a caller id parameter in every
functions in order to identify if the caller is the owner.

My summary:
- you think there is a bug - needs to show
- you think about relationship needs that I don't see
- you think about access permission which would be a huge change

Neil Horman Jan. 19, 2018, 10:52 p.m. UTC | #59

On Fri, Jan 19, 2018 at 09:19:18PM +0100, Thomas Monjalon wrote:
Apolgies for the top post, but I'm preparing for a trip out of the country, and
so may not have time to fully answer these questions until I get back (or at
least until I get someplace with power and internet).  If the conversation is
still going at that time, I'll chime back in
Neil

> 19/01/2018 20:47, Neil Horman:
> > On Fri, Jan 19, 2018 at 07:12:36PM +0100, Thomas Monjalon wrote:
> > > 19/01/2018 18:43, Neil Horman:
> > > > On Fri, Jan 19, 2018 at 06:17:51PM +0100, Thomas Monjalon wrote:
> > > > > 19/01/2018 16:27, Neil Horman:
> > > > > > On Fri, Jan 19, 2018 at 03:13:47PM +0100, Thomas Monjalon wrote:
> > > > > > > 19/01/2018 14:30, Neil Horman:
> > > > > > > > So it seems like the real point of contention that we need to settle here is,
> > > > > > > > what codifies an 'owner'.  Must it be a specific execution context, or can we
> > > > > > > > define any arbitrary section of code as being an owner?  I would agrue against
> > > > > > > > the latter.
> > > > > > > 
> > > > > > > This is the first thing explained in the cover letter:
> > > > > > > "2. The port usage synchronization will be managed by the port owner."
> > > > > > > There is no intent to manage the threads synchronization for a given port.
> > > > > > > It is the responsibility of the owner (a code object) to configure its
> > > > > > > port via only one thread.
> > > > > > > It is consistent with not trying to manage threads synchronization
> > > > > > > for Rx/Tx on a given queue.
> > > > > > > 
> > > > > > > 
> > > > > > Yes, in his cover letter, and I contend that notion is an invalid design point.
> > > > > > By codifying an area of code as an 'owner', rather than an execution context,
> > > > > > you're defining the notion of heirarchy, not ownership. That is to say,
> > > > > > you want to codify the notion that there are top level ports that the
> > > > > > application might see, and some of those top level ports are parents to
> > > > > > subordinate ports, which only the parent port should access directly.  If thats
> > > > > > all you want to encode, there are far easier ways to do it:
> > > > > > 
> > > > > > struct rte_eth_shared_data {
> > > > > > 	< existing bits >
> > > > > > 	struct rte_eth_port_list {
> > > > > > 		struct rte_eth_port_list *children;
> > > > > > 		struct rte_eth_port_list *parent;
> > > > > > 	};
> > > > > > };
> > > > > > 
> > > > > > 
> > > > > > Build an api around a structure like that, so that the parent/child relationship
> > > > > > is globally clear, and this would be much easier, especially if you want to
> > > > > > continue asserting that the notion of synchronization/exclusion is an exercise
> > > > > > left to the application.
> > > > > 
> > > > > Not only Neil.
> > > > > An owner can be something else than a port.
> > > > > An owner can be an app process (multi-processes).
> > > > > An owner can be a library.
> > > > > The intent is really to solve the generic problem of which code
> > > > > is managing a port.
> > > > > 
> > > > I don't see how this precludes any part of what you just said.  Define the
> > > > rte_eth_port_list externally to the shared_data struct and allow any object you
> > > > want to allocate it, then anything you want to control a heirarchy of ports can
> > > > do so without issue, and the structure is far more clear than an opaque id that
> > > > carries subtle semantic ordering with it.
> > > 
> > > Sorry, I don't understand. Please could you rephrase?
> > > 
> > 
> > Sure, I'm saying the fact that you want an owner to be an object
> > (library/port/process) rather than strictly an execution context
> > (process/thread) doesn't preclude what I'm proposing above.  You can create a
> > generic version of the strcture I propose above like so:
> > 
> > struct rte_obj_heirarchy {
> > 	struct rte_obj_heirarchy *children;
> > 	struct rte_obj_heirarchy *parent;
> > 	void *owner_data; /* optional */
> > };
> > 
> > And embed that structure in any object you would like to give a representative
> > heirarchy to, you then have a fairly simple api
> > 
> > struct rte_obj_heirarchy *heirarchy_alloc();
> > bool heirarchy_set(struct rte_obj_heirarchy *parent, struct rte_obj_heirarcy *child)
> > void heirarchy_release(struct rte_obj_heirarchy *obj)
> > 
> > That gives you the privately held list relationship I think you are in part
> > looking for (i.e. the ability for a failsafe device to iterate over the ports it
> > is in control of), without the awkwardness of the ordinal priority that the
> > current implementation imposes.
> 
> What is the awkward ordinal priority?
> I see you discuss it below. So let's discuss it below.
> 
> > In summary, if what you want is ownership in the strictest sense of the word
> > (i.e. mutually exclusive access, which I think makes sense), then using a lock
> > and flag is really the simplest way to go.  If instead what you want is a
> > heirarchical relationship where you can iterate over a limited set of objects
> > (the failsafe child port example), then the above is what you want.
> 
> We want only ownership. That's why it's called ownership :)
> The hierarchical relationship is private to the owner.
> For instance, failsafe implements its own list of sub-devices.
> So we just need to expose that the ports are already owned.
> 
> > The soution Matan is providing does some of each of these things, but comes with
> > very odd side effects
> > 
> > It offers a level of mutual exclusion, in that only one
> > object can own another at a time, but does so in a way that introduces this very
> > atypical ordinality (once an ownership object is created with owner_new, any
> > previously created ownership object will be denied the ability to take ownership
> > of a port)
> 
> You mean only the last owner id can take an ownership?
> If yes, it looks like a bug.
> Please could you show what is responsible of this effect in the patch?
> 
> > It also offers a level of filtering (in that if you can set the ownership id of
> > a given set of object to the value X, you can then iterate over them by
> > iterating over all objects of that type, and filtering on their id), but it
> > offers no clear in-memory relationship between parent and children (i.e. if you
> > were to look at at an object in a debugger and see that it was owned by owner id
> > X, it would provide you with no indicator of what object held the allocated
> > ownership object assigned id X.
> 
> I think it is wrong. There is an owner name for debug/printing purpose.
> 
> > My proposal trades a few bytes of data in
> > exchage for a global clear, definitive heirarcy relationship.  And if you add an
> > api call and a spinlock, you can easily graft on mutual exclusion here, by
> > blocking access to objects that arent the immediate parent of a given object.
> 
> For the hierarchical relationship, I think it is over-engineered.
> For blocking access, it means you need a caller id parameter in every
> functions in order to identify if the caller is the owner.
> 
> My summary:
> - you think there is a bug - needs to show
> - you think about relationship needs that I don't see
> - you think about access permission which would be a huge change
>

Neil Horman Jan. 20, 2018, 3:38 a.m. UTC | #60

Writing from my phone, so sorry for typos and top posting.

Need to apologise for a misunderstanding on my part.  I had a dyslexic 
moment and reversed the validity check on the port owner comparison.  What 
I thought was => was actually =<, and so my concern that only the last 
allocated owner is false, and erroneous on my part.

More comments as I'm able while afk
Neil

Sent with AquaMail for Android
http://www.aqua-mail.com


On January 19, 2018 3:20:49 PM Thomas Monjalon <thomas@monjalon.net> wrote:

> 19/01/2018 20:47, Neil Horman:
>> On Fri, Jan 19, 2018 at 07:12:36PM +0100, Thomas Monjalon wrote:
>> > 19/01/2018 18:43, Neil Horman:
>> > > On Fri, Jan 19, 2018 at 06:17:51PM +0100, Thomas Monjalon wrote:
>> > > > 19/01/2018 16:27, Neil Horman:
>> > > > > On Fri, Jan 19, 2018 at 03:13:47PM +0100, Thomas Monjalon wrote:
>> > > > > > 19/01/2018 14:30, Neil Horman:
>> > > > > > > So it seems like the real point of contention that we need to 
>> settle here is,
>> > > > > > > what codifies an 'owner'.  Must it be a specific execution 
>> context, or can we
>> > > > > > > define any arbitrary section of code as being an owner?  I 
>> would agrue against
>> > > > > > > the latter.
>> > > > > >
>> > > > > > This is the first thing explained in the cover letter:
>> > > > > > "2. The port usage synchronization will be managed by the port 
>> owner."
>> > > > > > There is no intent to manage the threads synchronization for a 
>> given port.
>> > > > > > It is the responsibility of the owner (a code object) to 
>> configure its
>> > > > > > port via only one thread.
>> > > > > > It is consistent with not trying to manage threads synchronization
>> > > > > > for Rx/Tx on a given queue.
>> > > > > >
>> > > > > >
>> > > > > Yes, in his cover letter, and I contend that notion is an invalid 
>> design point.
>> > > > > By codifying an area of code as an 'owner', rather than an 
>> execution context,
>> > > > > you're defining the notion of heirarchy, not ownership. That is to say,
>> > > > > you want to codify the notion that there are top level ports that the
>> > > > > application might see, and some of those top level ports are parents to
>> > > > > subordinate ports, which only the parent port should access 
>> directly.  If thats
>> > > > > all you want to encode, there are far easier ways to do it:
>> > > > >
>> > > > > struct rte_eth_shared_data {
>> > > > > 	< existing bits >
>> > > > > 	struct rte_eth_port_list {
>> > > > > 		struct rte_eth_port_list *children;
>> > > > > 		struct rte_eth_port_list *parent;
>> > > > > 	};
>> > > > > };
>> > > > >
>> > > > >
>> > > > > Build an api around a structure like that, so that the parent/child 
>> relationship
>> > > > > is globally clear, and this would be much easier, especially if you 
>> want to
>> > > > > continue asserting that the notion of synchronization/exclusion is 
>> an exercise
>> > > > > left to the application.
>> > > >
>> > > > Not only Neil.
>> > > > An owner can be something else than a port.
>> > > > An owner can be an app process (multi-processes).
>> > > > An owner can be a library.
>> > > > The intent is really to solve the generic problem of which code
>> > > > is managing a port.
>> > > >
>> > > I don't see how this precludes any part of what you just said.  Define the
>> > > rte_eth_port_list externally to the shared_data struct and allow any 
>> object you
>> > > want to allocate it, then anything you want to control a heirarchy of 
>> ports can
>> > > do so without issue, and the structure is far more clear than an opaque 
>> id that
>> > > carries subtle semantic ordering with it.
>> >
>> > Sorry, I don't understand. Please could you rephrase?
>> >
>>
>> Sure, I'm saying the fact that you want an owner to be an object
>> (library/port/process) rather than strictly an execution context
>> (process/thread) doesn't preclude what I'm proposing above.  You can create a
>> generic version of the strcture I propose above like so:
>>
>> struct rte_obj_heirarchy {
>> 	struct rte_obj_heirarchy *children;
>> 	struct rte_obj_heirarchy *parent;
>> 	void *owner_data; /* optional */
>> };
>>
>> And embed that structure in any object you would like to give a representative
>> heirarchy to, you then have a fairly simple api
>>
>> struct rte_obj_heirarchy *heirarchy_alloc();
>> bool heirarchy_set(struct rte_obj_heirarchy *parent, struct 
>> rte_obj_heirarcy *child)
>> void heirarchy_release(struct rte_obj_heirarchy *obj)
>>
>> That gives you the privately held list relationship I think you are in part
>> looking for (i.e. the ability for a failsafe device to iterate over the 
>> ports it
>> is in control of), without the awkwardness of the ordinal priority that the
>> current implementation imposes.
>
> What is the awkward ordinal priority?
> I see you discuss it below. So let's discuss it below.
>
>> In summary, if what you want is ownership in the strictest sense of the word
>> (i.e. mutually exclusive access, which I think makes sense), then using a lock
>> and flag is really the simplest way to go.  If instead what you want is a
>> heirarchical relationship where you can iterate over a limited set of objects
>> (the failsafe child port example), then the above is what you want.
>
> We want only ownership. That's why it's called ownership :)
> The hierarchical relationship is private to the owner.
> For instance, failsafe implements its own list of sub-devices.
> So we just need to expose that the ports are already owned.
>
>> The soution Matan is providing does some of each of these things, but comes 
>> with
>> very odd side effects
>>
>> It offers a level of mutual exclusion, in that only one
>> object can own another at a time, but does so in a way that introduces this 
>> very
>> atypical ordinality (once an ownership object is created with owner_new, any
>> previously created ownership object will be denied the ability to take 
>> ownership
>> of a port)
>
> You mean only the last owner id can take an ownership?
> If yes, it looks like a bug.
> Please could you show what is responsible of this effect in the patch?
>
>> It also offers a level of filtering (in that if you can set the ownership id of
>> a given set of object to the value X, you can then iterate over them by
>> iterating over all objects of that type, and filtering on their id), but it
>> offers no clear in-memory relationship between parent and children (i.e. if you
>> were to look at at an object in a debugger and see that it was owned by 
>> owner id
>> X, it would provide you with no indicator of what object held the allocated
>> ownership object assigned id X.
>
> I think it is wrong. There is an owner name for debug/printing purpose.
>
>> My proposal trades a few bytes of data in
>> exchage for a global clear, definitive heirarcy relationship.  And if you 
>> add an
>> api call and a spinlock, you can easily graft on mutual exclusion here, by
>> blocking access to objects that arent the immediate parent of a given object.
>
> For the hierarchical relationship, I think it is over-engineered.
> For blocking access, it means you need a caller id parameter in every
> functions in order to identify if the caller is the owner.
>
> My summary:
> - you think there is a bug - needs to show
> - you think about relationship needs that I don't see
> - you think about access permission which would be a huge change
>

Ananyev, Konstantin Jan. 20, 2018, 12:54 p.m. UTC | #61

Hi Neil,

> ----- Message-----
> From: Neil Horman [mailto:nhorman@tuxdriver.com]
> Sent: Friday, January 19, 2018 7:48 PM
> To: Thomas Monjalon <thomas@monjalon.net>
> Cc: dev@dpdk.org; Matan Azrad <matan@mellanox.com>; Richardson, Bruce <bruce.richardson@intel.com>; Ananyev, Konstantin
> <konstantin.ananyev@intel.com>; Gaetan Rivet <gaetan.rivet@6wind.com>; Wu, Jingjing <jingjing.wu@intel.com>
> Subject: Re: [dpdk-dev] [PATCH v2 2/6] ethdev: add port ownership
> 
> On Fri, Jan 19, 2018 at 07:12:36PM +0100, Thomas Monjalon wrote:
> > 19/01/2018 18:43, Neil Horman:
> > > On Fri, Jan 19, 2018 at 06:17:51PM +0100, Thomas Monjalon wrote:
> > > > 19/01/2018 16:27, Neil Horman:
> > > > > On Fri, Jan 19, 2018 at 03:13:47PM +0100, Thomas Monjalon wrote:
> > > > > > 19/01/2018 14:30, Neil Horman:
> > > > > > > So it seems like the real point of contention that we need to settle here is,
> > > > > > > what codifies an 'owner'.  Must it be a specific execution context, or can we
> > > > > > > define any arbitrary section of code as being an owner?  I would agrue against
> > > > > > > the latter.
> > > > > >
> > > > > > This is the first thing explained in the cover letter:
> > > > > > "2. The port usage synchronization will be managed by the port owner."
> > > > > > There is no intent to manage the threads synchronization for a given port.
> > > > > > It is the responsibility of the owner (a code object) to configure its
> > > > > > port via only one thread.
> > > > > > It is consistent with not trying to manage threads synchronization
> > > > > > for Rx/Tx on a given queue.
> > > > > >
> > > > > >
> > > > > Yes, in his cover letter, and I contend that notion is an invalid design point.
> > > > > By codifying an area of code as an 'owner', rather than an execution context,
> > > > > you're defining the notion of heirarchy, not ownership. That is to say,
> > > > > you want to codify the notion that there are top level ports that the
> > > > > application might see, and some of those top level ports are parents to
> > > > > subordinate ports, which only the parent port should access directly.  If thats
> > > > > all you want to encode, there are far easier ways to do it:
> > > > >
> > > > > struct rte_eth_shared_data {
> > > > > 	< existing bits >
> > > > > 	struct rte_eth_port_list {
> > > > > 		struct rte_eth_port_list *children;
> > > > > 		struct rte_eth_port_list *parent;
> > > > > 	};
> > > > > };
> > > > >
> > > > >
> > > > > Build an api around a structure like that, so that the parent/child relationship
> > > > > is globally clear, and this would be much easier, especially if you want to
> > > > > continue asserting that the notion of synchronization/exclusion is an exercise
> > > > > left to the application.
> > > >
> > > > Not only Neil.
> > > > An owner can be something else than a port.
> > > > An owner can be an app process (multi-processes).
> > > > An owner can be a library.
> > > > The intent is really to solve the generic problem of which code
> > > > is managing a port.
> > > >
> > > I don't see how this precludes any part of what you just said.  Define the
> > > rte_eth_port_list externally to the shared_data struct and allow any object you
> > > want to allocate it, then anything you want to control a heirarchy of ports can
> > > do so without issue, and the structure is far more clear than an opaque id that
> > > carries subtle semantic ordering with it.
> >
> > Sorry, I don't understand. Please could you rephrase?
> >
> 
> Sure, I'm saying the fact that you want an owner to be an object
> (library/port/process) rather than strictly an execution context
> (process/thread) doesn't preclude what I'm proposing above.  You can create a
> generic version of the strcture I propose above like so:
> 
> struct rte_obj_heirarchy {
> 	struct rte_obj_heirarchy *children;
> 	struct rte_obj_heirarchy *parent;
> 	void *owner_data; /* optional */
> };
> 
> And embed that structure in any object you would like to give a representative
> heirarchy to, you then have a fairly simple api
> 
> struct rte_obj_heirarchy *heirarchy_alloc();
> bool heirarchy_set(struct rte_obj_heirarchy *parent, struct rte_obj_heirarcy *child)
> void heirarchy_release(struct rte_obj_heirarchy *obj)
> 
> That gives you the privately held list relationship I think you are in part
> looking for (i.e. the ability for a failsafe device to iterate over the ports it
> is in control of), without the awkwardness of the ordinal priority that the
> current implementation imposes.
> 
> In summary, if what you want is ownership in the strictest sense of the word
> (i.e. mutually exclusive access, which I think makes sense), then using a lock
> and flag is really the simplest way to go.  If instead what you want is a
> heirarchical relationship where you can iterate over a limited set of objects
> (the failsafe child port example), then the above is what you want.
> 
> 
> The soution Matan is providing does some of each of these things, but comes with
> very odd side effects
> 
> It offers a level of mutual exclusion, in that only one
> object can own another at a time, but does so in a way that introduces this very
> atypical ordinality (once an ownership object is created with owner_new, any
> previously created ownership object will be denied the ability to take ownership
> of a port)

Why is that?
As I understand current code: any owner id between 1 and next_owner_id 
is considered as valid.
Konstantin


> 
> It also offers a level of filtering (in that if you can set the ownership id of
> a given set of object to the value X, you can then iterate over them by
> iterating over all objects of that type, and filtering on their id), but it
> offers no clear in-memory relationship between parent and children (i.e. if you
> were to look at at an object in a debugger and see that it was owned by owner id
> X, it would provide you with no indicator of what object held the allocated
> ownership object assigned id X.  My proposal trades a few bytes of data in
> exchage for a global clear, definitive heirarcy relationship.  And if you add an
> api call and a spinlock, you can easily graft on mutual exclusion here, by
> blocking access to objects that arent the immediate parent of a given object.
> 
> Neil
> 
> 
> 
> subsequently created object

Thomas Monjalon Jan. 20, 2018, 2:02 p.m. UTC | #62

20/01/2018 13:54, Ananyev, Konstantin:
> Hi Neil,
> 
> > ----- Message-----
> > From: Neil Horman [mailto:nhorman@tuxdriver.com]
> > Sent: Friday, January 19, 2018 7:48 PM
> > To: Thomas Monjalon <thomas@monjalon.net>
> > Cc: dev@dpdk.org; Matan Azrad <matan@mellanox.com>; Richardson, Bruce <bruce.richardson@intel.com>; Ananyev, Konstantin
> > <konstantin.ananyev@intel.com>; Gaetan Rivet <gaetan.rivet@6wind.com>; Wu, Jingjing <jingjing.wu@intel.com>
> > Subject: Re: [dpdk-dev] [PATCH v2 2/6] ethdev: add port ownership
> > 
> > On Fri, Jan 19, 2018 at 07:12:36PM +0100, Thomas Monjalon wrote:
> > > 19/01/2018 18:43, Neil Horman:
> > > > On Fri, Jan 19, 2018 at 06:17:51PM +0100, Thomas Monjalon wrote:
> > > > > 19/01/2018 16:27, Neil Horman:
> > > > > > On Fri, Jan 19, 2018 at 03:13:47PM +0100, Thomas Monjalon wrote:
> > > > > > > 19/01/2018 14:30, Neil Horman:
> > > > > > > > So it seems like the real point of contention that we need to settle here is,
> > > > > > > > what codifies an 'owner'.  Must it be a specific execution context, or can we
> > > > > > > > define any arbitrary section of code as being an owner?  I would agrue against
> > > > > > > > the latter.
> > > > > > >
> > > > > > > This is the first thing explained in the cover letter:
> > > > > > > "2. The port usage synchronization will be managed by the port owner."
> > > > > > > There is no intent to manage the threads synchronization for a given port.
> > > > > > > It is the responsibility of the owner (a code object) to configure its
> > > > > > > port via only one thread.
> > > > > > > It is consistent with not trying to manage threads synchronization
> > > > > > > for Rx/Tx on a given queue.
> > > > > > >
> > > > > > >
> > > > > > Yes, in his cover letter, and I contend that notion is an invalid design point.
> > > > > > By codifying an area of code as an 'owner', rather than an execution context,
> > > > > > you're defining the notion of heirarchy, not ownership. That is to say,
> > > > > > you want to codify the notion that there are top level ports that the
> > > > > > application might see, and some of those top level ports are parents to
> > > > > > subordinate ports, which only the parent port should access directly.  If thats
> > > > > > all you want to encode, there are far easier ways to do it:
> > > > > >
> > > > > > struct rte_eth_shared_data {
> > > > > > 	< existing bits >
> > > > > > 	struct rte_eth_port_list {
> > > > > > 		struct rte_eth_port_list *children;
> > > > > > 		struct rte_eth_port_list *parent;
> > > > > > 	};
> > > > > > };
> > > > > >
> > > > > >
> > > > > > Build an api around a structure like that, so that the parent/child relationship
> > > > > > is globally clear, and this would be much easier, especially if you want to
> > > > > > continue asserting that the notion of synchronization/exclusion is an exercise
> > > > > > left to the application.
> > > > >
> > > > > Not only Neil.
> > > > > An owner can be something else than a port.
> > > > > An owner can be an app process (multi-processes).
> > > > > An owner can be a library.
> > > > > The intent is really to solve the generic problem of which code
> > > > > is managing a port.
> > > > >
> > > > I don't see how this precludes any part of what you just said.  Define the
> > > > rte_eth_port_list externally to the shared_data struct and allow any object you
> > > > want to allocate it, then anything you want to control a heirarchy of ports can
> > > > do so without issue, and the structure is far more clear than an opaque id that
> > > > carries subtle semantic ordering with it.
> > >
> > > Sorry, I don't understand. Please could you rephrase?
> > >
> > 
> > Sure, I'm saying the fact that you want an owner to be an object
> > (library/port/process) rather than strictly an execution context
> > (process/thread) doesn't preclude what I'm proposing above.  You can create a
> > generic version of the strcture I propose above like so:
> > 
> > struct rte_obj_heirarchy {
> > 	struct rte_obj_heirarchy *children;
> > 	struct rte_obj_heirarchy *parent;
> > 	void *owner_data; /* optional */
> > };
> > 
> > And embed that structure in any object you would like to give a representative
> > heirarchy to, you then have a fairly simple api
> > 
> > struct rte_obj_heirarchy *heirarchy_alloc();
> > bool heirarchy_set(struct rte_obj_heirarchy *parent, struct rte_obj_heirarcy *child)
> > void heirarchy_release(struct rte_obj_heirarchy *obj)
> > 
> > That gives you the privately held list relationship I think you are in part
> > looking for (i.e. the ability for a failsafe device to iterate over the ports it
> > is in control of), without the awkwardness of the ordinal priority that the
> > current implementation imposes.
> > 
> > In summary, if what you want is ownership in the strictest sense of the word
> > (i.e. mutually exclusive access, which I think makes sense), then using a lock
> > and flag is really the simplest way to go.  If instead what you want is a
> > heirarchical relationship where you can iterate over a limited set of objects
> > (the failsafe child port example), then the above is what you want.
> > 
> > 
> > The soution Matan is providing does some of each of these things, but comes with
> > very odd side effects
> > 
> > It offers a level of mutual exclusion, in that only one
> > object can own another at a time, but does so in a way that introduces this very
> > atypical ordinality (once an ownership object is created with owner_new, any
> > previously created ownership object will be denied the ability to take ownership
> > of a port)
> 
> Why is that?
> As I understand current code: any owner id between 1 and next_owner_id 
> is considered as valid.

Yes, Neil sent another email to explain it was a review mistake.

Ferruh Yigit Jan. 21, 2018, 10:12 p.m. UTC | #63

On 1/19/2018 6:10 PM, Thomas Monjalon wrote:
> 19/01/2018 18:37, Neil Horman:
>> On Fri, Jan 19, 2018 at 06:09:47PM +0100, Thomas Monjalon wrote:
>>> 19/01/2018 15:32, Neil Horman:
>>>> On Fri, Jan 19, 2018 at 03:07:28PM +0100, Thomas Monjalon wrote:
>>>>> 19/01/2018 14:57, Neil Horman:
>>>>>>>> I specifically pointed that out above.  There is no reason an owernship record
>>>>>>>> couldn't be added to the rte_eth_dev structure.
>>>>>>>
>>>>>>> Sorry, don't understand why.
>>>>>>>
>>>>>> Because, thats the resource your trying to protect, and the object you want to
>>>>>> identify ownership of, no?
>>>>>
>>>>> No
>>>>> The rte_eth_dev structure is the port representation in the process.
>>>>> The rte_eth_dev_data structure is the port represenation across multi-process.
>>>>> The ownership must be in rte_eth_dev_data to cover multi-process protection.
>>>>>
>>>> Ok.   You get the idea though right?  That the port representation,
>>>> for some definition thereof, should embody the ownership state.
>>>> Neil
>>>
>>> Not sure to understand your question.
>>>
>> There is no real question here, only confirming that we are saying the same
>> thing.  I misspoke when I indicated ownership information should be embodied in
>> rte_eth_dev rather than its shared data.  But regardless, the concept is the
>> same
> 
> Yes we agree.
> And I think it is what Matan did.
> The owner is in struct rte_eth_dev_data:

Hi Thomas, Neil,

Sorry I did not able to this thred, is discussion concluded?

diff mbox

Patch

diff --git a/doc/guides/prog_guide/poll_mode_drv.rst b/doc/guides/prog_guide/poll_mode_drv.rst
index 6a0c9f9..046cde7 100644
--- a/doc/guides/prog_guide/poll_mode_drv.rst
+++ b/doc/guides/prog_guide/poll_mode_drv.rst
@@ -156,8 +156,8 @@  concurrently on the same tx queue without SW lock. This PMD feature found in som
 
 See `Hardware Offload`_ for ``DEV_TX_OFFLOAD_MT_LOCKFREE`` capability probing details.
 
-Device Identification and Configuration
----------------------------------------
+Device Identification, Ownership and Configuration
+--------------------------------------------------
 
 Device Identification
 ~~~~~~~~~~~~~~~~~~~~~
@@ -171,6 +171,16 @@  Based on their PCI identifier, NIC ports are assigned two other identifiers:
 *   A port name used to designate the port in console messages, for administration or debugging purposes.
     For ease of use, the port name includes the port index.
 
+Port Ownership
+~~~~~~~~~~~~~~
+The Ethernet devices ports can be owned by a single DPDK entity (application, library, PMD, process, etc).
+The ownership mechanism is controlled by ethdev APIs and allows to set/remove/get a port owner by DPDK entities.
+Allowing this should prevent any multiple management of Ethernet port by different entities.
+
+.. note::
+
+    It is the DPDK entity responsibility to set the port owner before using it and to manage the port usage synchronization between different threads or processes.
+
 Device Configuration
 ~~~~~~~~~~~~~~~~~~~~
 
diff --git a/lib/librte_ether/rte_ethdev.c b/lib/librte_ether/rte_ethdev.c
index 684e3e8..0e12452 100644
--- a/lib/librte_ether/rte_ethdev.c
+++ b/lib/librte_ether/rte_ethdev.c
@@ -70,7 +70,10 @@ 
 
 static const char *MZ_RTE_ETH_DEV_DATA = "rte_eth_dev_data";
 struct rte_eth_dev rte_eth_devices[RTE_MAX_ETHPORTS];
+/* ports data array stored in shared memory */
 static struct rte_eth_dev_data *rte_eth_dev_data;
+/* next owner identifier stored in shared memory */
+static uint16_t *rte_eth_next_owner_id;
 static uint8_t eth_dev_last_created_port;
 
 /* spinlock for eth device callbacks */
@@ -82,6 +85,9 @@ 
 /* spinlock for add/remove tx callbacks */
 static rte_spinlock_t rte_eth_tx_cb_lock = RTE_SPINLOCK_INITIALIZER;
 
+/* spinlock for eth device ownership management stored in shared memory */
+static rte_spinlock_t *rte_eth_dev_ownership_lock;
+
 /* store statistics names and its offset in stats structure  */
 struct rte_eth_xstats_name_off {
 	char name[RTE_ETH_XSTATS_NAME_SIZE];
@@ -153,14 +159,18 @@  enum {
 }
 
 static void
-rte_eth_dev_data_alloc(void)
+rte_eth_dev_share_data_alloc(void)
 {
 	const unsigned flags = 0;
 	const struct rte_memzone *mz;
+	const unsigned int data_size = RTE_MAX_ETHPORTS *
+						sizeof(*rte_eth_dev_data);
 
 	if (rte_eal_process_type() == RTE_PROC_PRIMARY) {
+		/* Allocate shared memory for port data and ownership */
 		mz = rte_memzone_reserve(MZ_RTE_ETH_DEV_DATA,
-				RTE_MAX_ETHPORTS * sizeof(*rte_eth_dev_data),
+				data_size + sizeof(*rte_eth_next_owner_id) +
+				sizeof(*rte_eth_dev_ownership_lock),
 				rte_socket_id(), flags);
 	} else
 		mz = rte_memzone_lookup(MZ_RTE_ETH_DEV_DATA);
@@ -168,9 +178,17 @@  enum {
 		rte_panic("Cannot allocate memzone for ethernet port data\n");
 
 	rte_eth_dev_data = mz->addr;
-	if (rte_eal_process_type() == RTE_PROC_PRIMARY)
-		memset(rte_eth_dev_data, 0,
-				RTE_MAX_ETHPORTS * sizeof(*rte_eth_dev_data));
+	rte_eth_next_owner_id = (uint16_t *)((uintptr_t)mz->addr +
+					     data_size);
+	rte_eth_dev_ownership_lock = (rte_spinlock_t *)
+		((uintptr_t)rte_eth_next_owner_id +
+		 sizeof(*rte_eth_next_owner_id));
+
+	if (rte_eal_process_type() == RTE_PROC_PRIMARY) {
+		memset(rte_eth_dev_data, 0, data_size);
+		*rte_eth_next_owner_id = RTE_ETH_DEV_NO_OWNER + 1;
+		rte_spinlock_init(rte_eth_dev_ownership_lock);
+	}
 }
 
 struct rte_eth_dev *
@@ -225,7 +243,7 @@  struct rte_eth_dev *
 	}
 
 	if (rte_eth_dev_data == NULL)
-		rte_eth_dev_data_alloc();
+		rte_eth_dev_share_data_alloc();
 
 	if (rte_eth_dev_allocated(name) != NULL) {
 		RTE_PMD_DEBUG_TRACE("Ethernet Device with name %s already allocated!\n",
@@ -253,7 +271,7 @@  struct rte_eth_dev *
 	struct rte_eth_dev *eth_dev;
 
 	if (rte_eth_dev_data == NULL)
-		rte_eth_dev_data_alloc();
+		rte_eth_dev_share_data_alloc();
 
 	for (i = 0; i < RTE_MAX_ETHPORTS; i++) {
 		if (strcmp(rte_eth_dev_data[i].name, name) == 0)
@@ -278,8 +296,12 @@  struct rte_eth_dev *
 	if (eth_dev == NULL)
 		return -EINVAL;
 
-	memset(eth_dev->data, 0, sizeof(struct rte_eth_dev_data));
+	rte_spinlock_lock(rte_eth_dev_ownership_lock);
+
 	eth_dev->state = RTE_ETH_DEV_UNUSED;
+	memset(eth_dev->data, 0, sizeof(struct rte_eth_dev_data));
+
+	rte_spinlock_unlock(rte_eth_dev_ownership_lock);
 	return 0;
 }
 
@@ -294,6 +316,174 @@  struct rte_eth_dev *
 		return 1;
 }
 
+static int
+rte_eth_is_valid_owner_id(uint16_t owner_id)
+{
+	if (owner_id == RTE_ETH_DEV_NO_OWNER ||
+	    (*rte_eth_next_owner_id > RTE_ETH_DEV_NO_OWNER &&
+	     *rte_eth_next_owner_id <= owner_id)) {
+		RTE_LOG(ERR, EAL, "Invalid owner_id=%d.\n", owner_id);
+		return 0;
+	}
+	return 1;
+}
+
+uint16_t
+rte_eth_find_next_owned_by(uint16_t port_id, const uint16_t owner_id)
+{
+	while (port_id < RTE_MAX_ETHPORTS &&
+	       (rte_eth_devices[port_id].state != RTE_ETH_DEV_ATTACHED ||
+	       rte_eth_devices[port_id].data->owner.id != owner_id))
+		port_id++;
+
+	if (port_id >= RTE_MAX_ETHPORTS)
+		return RTE_MAX_ETHPORTS;
+
+	return port_id;
+}
+
+int
+rte_eth_dev_owner_new(uint16_t *owner_id)
+{
+	int ret = 0;
+
+	rte_spinlock_lock(rte_eth_dev_ownership_lock);
+
+	if (*rte_eth_next_owner_id == RTE_ETH_DEV_NO_OWNER) {
+		/* Counter wrap around. */
+		RTE_PMD_DEBUG_TRACE("Reached maximum number of Ethernet port owners.\n");
+		ret = -EUSERS;
+	} else {
+		*owner_id = (*rte_eth_next_owner_id)++;
+	}
+
+	rte_spinlock_unlock(rte_eth_dev_ownership_lock);
+	return ret;
+}
+
+int
+rte_eth_dev_owner_set(const uint16_t port_id,
+		      const struct rte_eth_dev_owner *owner)
+{
+	struct rte_eth_dev_owner *port_owner;
+	int ret = 0;
+	int sret;
+
+	rte_spinlock_lock(rte_eth_dev_ownership_lock);
+
+	if (!rte_eth_dev_is_valid_port(port_id)) {
+		RTE_PMD_DEBUG_TRACE("Invalid port_id=%d\n", port_id);
+		ret = -ENODEV;
+		goto unlock;
+	}
+
+	if (!rte_eth_is_valid_owner_id(owner->id)) {
+		ret = -EINVAL;
+		goto unlock;
+	}
+
+	port_owner = &rte_eth_devices[port_id].data->owner;
+	if (port_owner->id != RTE_ETH_DEV_NO_OWNER &&
+	    port_owner->id != owner->id) {
+		RTE_LOG(ERR, EAL,
+			"Cannot set owner to port %d already owned by %s_%05d.\n",
+			port_id, port_owner->name, port_owner->id);
+		ret = -EPERM;
+		goto unlock;
+	}
+
+	sret = snprintf(port_owner->name, RTE_ETH_MAX_OWNER_NAME_LEN, "%s",
+			owner->name);
+	if (sret < 0 || sret >= RTE_ETH_MAX_OWNER_NAME_LEN) {
+		memset(port_owner->name, 0, RTE_ETH_MAX_OWNER_NAME_LEN);
+		RTE_LOG(ERR, EAL, "Invalid owner name.\n");
+		ret = -EINVAL;
+		goto unlock;
+	}
+
+	port_owner->id = owner->id;
+	RTE_PMD_DEBUG_TRACE("Port %d owner is %s_%05d.\n", port_id,
+			    owner->name, owner->id);
+
+unlock:
+	rte_spinlock_unlock(rte_eth_dev_ownership_lock);
+	return ret;
+}
+
+int
+rte_eth_dev_owner_unset(const uint16_t port_id, const uint16_t owner_id)
+{
+	struct rte_eth_dev_owner *port_owner;
+	int ret = 0;
+
+	rte_spinlock_lock(rte_eth_dev_ownership_lock);
+
+	if (!rte_eth_dev_is_valid_port(port_id)) {
+		RTE_PMD_DEBUG_TRACE("Invalid port_id=%d\n", port_id);
+		ret = -ENODEV;
+		goto unlock;
+	}
+
+	if (!rte_eth_is_valid_owner_id(owner_id)) {
+		ret = -EINVAL;
+		goto unlock;
+	}
+
+	port_owner = &rte_eth_devices[port_id].data->owner;
+	if (port_owner->id != owner_id) {
+		RTE_LOG(ERR, EAL, "Cannot unset port %d owner (%s_%05d) by"
+			" a different owner with id %5d.\n", port_id,
+			port_owner->name, port_owner->id, owner_id);
+		ret = -EPERM;
+		goto unlock;
+	}
+	RTE_PMD_DEBUG_TRACE("Port %d owner %s_%05d has removed.\n", port_id,
+			    port_owner->name, port_owner->id);
+
+	memset(port_owner, 0, sizeof(struct rte_eth_dev_owner));
+
+unlock:
+	rte_spinlock_unlock(rte_eth_dev_ownership_lock);
+	return ret;
+}
+
+void
+rte_eth_dev_owner_delete(const uint16_t owner_id)
+{
+	uint16_t port_id;
+
+	rte_spinlock_lock(rte_eth_dev_ownership_lock);
+
+	if (rte_eth_is_valid_owner_id(owner_id)) {
+		RTE_ETH_FOREACH_DEV_OWNED_BY(port_id, owner_id)
+			memset(&rte_eth_devices[port_id].data->owner, 0,
+			       sizeof(struct rte_eth_dev_owner));
+		RTE_PMD_DEBUG_TRACE("All port owners owned by %05d identifier"
+				    " have removed.\n", owner_id);
+	}
+
+	rte_spinlock_unlock(rte_eth_dev_ownership_lock);
+}
+
+int
+rte_eth_dev_owner_get(const uint16_t port_id, struct rte_eth_dev_owner *owner)
+{
+	int ret = 0;
+
+	rte_spinlock_lock(rte_eth_dev_ownership_lock);
+
+	if (!rte_eth_dev_is_valid_port(port_id)) {
+		RTE_PMD_DEBUG_TRACE("Invalid port_id=%d\n", port_id);
+		ret = -ENODEV;
+	} else {
+		rte_memcpy(owner, &rte_eth_devices[port_id].data->owner,
+			   sizeof(*owner));
+	}
+
+	rte_spinlock_unlock(rte_eth_dev_ownership_lock);
+	return ret;
+}
+
 int
 rte_eth_dev_socket_id(uint16_t port_id)
 {
diff --git a/lib/librte_ether/rte_ethdev.h b/lib/librte_ether/rte_ethdev.h
index 57b61ed..88ad765 100644
--- a/lib/librte_ether/rte_ethdev.h
+++ b/lib/librte_ether/rte_ethdev.h
@@ -1760,6 +1760,15 @@  struct rte_eth_dev_sriov {
 
 #define RTE_ETH_NAME_MAX_LEN RTE_DEV_NAME_MAX_LEN
 
+#define RTE_ETH_DEV_NO_OWNER 0
+
+#define RTE_ETH_MAX_OWNER_NAME_LEN 64
+
+struct rte_eth_dev_owner {
+	uint16_t id; /**< The owner unique identifier. */
+	char name[RTE_ETH_MAX_OWNER_NAME_LEN]; /**< The owner name. */
+};
+
 /**
  * @internal
  * The data part, with no function pointers, associated with each ethernet device.
@@ -1810,6 +1819,7 @@  struct rte_eth_dev_data {
 	int numa_node;  /**< NUMA node connection */
 	struct rte_vlan_filter_conf vlan_filter_conf;
 	/**< VLAN filter configuration. */
+	struct rte_eth_dev_owner owner; /**< The port owner. */
 };
 
 /** Device supports link state interrupt */
@@ -1846,6 +1856,85 @@  struct rte_eth_dev_data {
 
 
 /**
+ * Iterates over valid ethdev ports owned by a specific owner.
+ *
+ * @param port_id
+ *   The id of the next possible valid owned port.
+ * @param	owner_id
+ *  The owner identifier.
+ *  RTE_ETH_DEV_NO_OWNER means iterate over all valid ownerless ports.
+ * @return
+ *   Next valid port id owned by owner_id, RTE_MAX_ETHPORTS if there is none.
+ */
+uint16_t rte_eth_find_next_owned_by(uint16_t port_id, const uint16_t owner_id);
+
+/**
+ * Macro to iterate over all enabled ethdev ports owned by a specific owner.
+ */
+#define RTE_ETH_FOREACH_DEV_OWNED_BY(p, o) \
+	for (p = rte_eth_find_next_owned_by(0, o); \
+	     (unsigned int)p < (unsigned int)RTE_MAX_ETHPORTS; \
+	     p = rte_eth_find_next_owned_by(p + 1, o))
+
+/**
+ * Get a new unique owner identifier.
+ * An owner identifier is used to owns Ethernet devices by only one DPDK entity
+ * to avoid multiple management of device by different entities.
+ *
+ * @param	owner_id
+ *   Owner identifier pointer.
+ * @return
+ *   Negative errno value on error, 0 on success.
+ */
+int rte_eth_dev_owner_new(uint16_t *owner_id);
+
+/**
+ * Set an Ethernet device owner.
+ *
+ * @param	port_id
+ *  The identifier of the port to own.
+ * @param	owner
+ *  The owner pointer.
+ * @return
+ *  Negative errno value on error, 0 on success.
+ */
+int rte_eth_dev_owner_set(const uint16_t port_id,
+			  const struct rte_eth_dev_owner *owner);
+
+/**
+ * Unset Ethernet device owner to make the device ownerless.
+ *
+ * @param	port_id
+ *  The identifier of port to make ownerless.
+ * @param	owner
+ *  The owner identifier.
+ * @return
+ *  0 on success, negative errno value on error.
+ */
+int rte_eth_dev_owner_unset(const uint16_t port_id, const uint16_t owner_id);
+
+/**
+ * Remove owner from all Ethernet devices owned by a specific owner.
+ *
+ * @param	owner
+ *  The owner identifier.
+ */
+void rte_eth_dev_owner_delete(const uint16_t owner_id);
+
+/**
+ * Get the owner of an Ethernet device.
+ *
+ * @param	port_id
+ *  The port identifier.
+ * @param	owner
+ *  The owner structure pointer to fill.
+ * @return
+ *  0 on success, negative errno value on error..
+ */
+int rte_eth_dev_owner_get(const uint16_t port_id,
+			  struct rte_eth_dev_owner *owner);
+
+/**
  * Get the total number of Ethernet devices that have been successfully
  * initialized by the matching Ethernet driver during the PCI probing phase
  * and that are available for applications to use. These devices must be
diff --git a/lib/librte_ether/rte_ethdev_version.map b/lib/librte_ether/rte_ethdev_version.map
index e9681ac..5d20b5f 100644
--- a/lib/librte_ether/rte_ethdev_version.map
+++ b/lib/librte_ether/rte_ethdev_version.map
@@ -198,6 +198,18 @@  DPDK_17.11 {
 
 } DPDK_17.08;
 
+DPDK_18.02 {
+	global:
+
+	rte_eth_dev_owner_delete;
+	rte_eth_dev_owner_get;
+	rte_eth_dev_owner_new;
+	rte_eth_dev_owner_set;
+	rte_eth_dev_owner_unset;
+	rte_eth_find_next_owned_by;
+
+} DPDK_17.11;
+
 EXPERIMENTAL {
 	global: