[dpdk-dev] [PATCH v7 03/11] net/failsafe: add fail-safe PMD

Jan Blunck jblunck at infradead.org
Sun Jul 9 13:10:49 CEST 2017


On Sat, Jul 8, 2017 at 9:47 PM, Gaetan Rivet <gaetan.rivet at 6wind.com> wrote:
> Introduce the fail-safe poll mode driver initialization and enable its
> build infrastructure.
>
> This PMD allows for applications to benefit from true hot-plugging
> support without having to implement it.
>
> It intercepts and manages Ethernet device removal events issued by
> slave PMDs and re-initializes them transparently when brought back.
> It also allows defining a contingency to the removal of a device, by
> designating a fail-over device that will take on transmitting operations
> if the preferred device is removed.
>
> Applications only see a fail-safe instance, without caring for
> underlying activity ensuring their continued operations.
>
> Signed-off-by: Gaetan Rivet <gaetan.rivet at 6wind.com>
> Acked-by: Olga Shern <olgas at mellanox.com>
> ---
>  MAINTAINERS                                       |   5 +
>  config/common_base                                |   6 +
>  doc/guides/nics/fail_safe.rst                     | 133 +++++
>  doc/guides/nics/features/failsafe.ini             |  24 +
>  doc/guides/nics/index.rst                         |   1 +
>  drivers/net/Makefile                              |   2 +
>  drivers/net/failsafe/Makefile                     |  76 +++
>  drivers/net/failsafe/failsafe.c                   | 231 ++++++++
>  drivers/net/failsafe/failsafe_args.c              | 331 +++++++++++
>  drivers/net/failsafe/failsafe_eal.c               | 154 +++++
>  drivers/net/failsafe/failsafe_ops.c               | 663 ++++++++++++++++++++++
>  drivers/net/failsafe/failsafe_private.h           | 227 ++++++++
>  drivers/net/failsafe/failsafe_rxtx.c              | 107 ++++
>  drivers/net/failsafe/rte_pmd_failsafe_version.map |   4 +
>  mk/rte.app.mk                                     |   1 +
>  15 files changed, 1965 insertions(+)
>  create mode 100644 doc/guides/nics/fail_safe.rst
>  create mode 100644 doc/guides/nics/features/failsafe.ini
>  create mode 100644 drivers/net/failsafe/Makefile
>  create mode 100644 drivers/net/failsafe/failsafe.c
>  create mode 100644 drivers/net/failsafe/failsafe_args.c
>  create mode 100644 drivers/net/failsafe/failsafe_eal.c
>  create mode 100644 drivers/net/failsafe/failsafe_ops.c
>  create mode 100644 drivers/net/failsafe/failsafe_private.h
>  create mode 100644 drivers/net/failsafe/failsafe_rxtx.c
>  create mode 100644 drivers/net/failsafe/rte_pmd_failsafe_version.map
>
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 8fb2132..b4a446f 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -336,6 +336,11 @@ F: drivers/net/enic/
>  F: doc/guides/nics/enic.rst
>  F: doc/guides/nics/features/enic.ini
>
> +Fail-safe PMD
> +M: Gaetan Rivet <gaetan.rivet at 6wind.com>
> +F: drivers/net/failsafe/
> +F: doc/guides/nics/fail_safe.rst
> +
>  Intel e1000
>  M: Wenzhuo Lu <wenzhuo.lu at intel.com>
>  F: drivers/net/e1000/
> diff --git a/config/common_base b/config/common_base
> index bb1ba8b..cf5e7f5 100644
> --- a/config/common_base
> +++ b/config/common_base
> @@ -420,6 +420,12 @@ CONFIG_RTE_LIBRTE_PMD_XENVIRT=n
>  CONFIG_RTE_LIBRTE_PMD_NULL=y
>
>  #
> +# Compile fail-safe PMD
> +#
> +CONFIG_RTE_LIBRTE_PMD_FAILSAFE=y
> +CONFIG_RTE_LIBRTE_PMD_FAILSAFE_DEBUG=n
> +
> +#
>  # Do prefetch of packet data within PMD driver receive function
>  #
>  CONFIG_RTE_PMD_PACKET_PREFETCH=y
> diff --git a/doc/guides/nics/fail_safe.rst b/doc/guides/nics/fail_safe.rst
> new file mode 100644
> index 0000000..056f85f
> --- /dev/null
> +++ b/doc/guides/nics/fail_safe.rst
> @@ -0,0 +1,133 @@
> +..  BSD LICENSE
> +    Copyright 2017 6WIND S.A.
> +
> +    Redistribution and use in source and binary forms, with or without
> +    modification, are permitted provided that the following conditions
> +    are met:
> +
> +    * Redistributions of source code must retain the above copyright
> +    notice, this list of conditions and the following disclaimer.
> +    * Redistributions in binary form must reproduce the above copyright
> +    notice, this list of conditions and the following disclaimer in
> +    the documentation and/or other materials provided with the
> +    distribution.
> +    * Neither the name of 6WIND S.A. nor the names of its
> +    contributors may be used to endorse or promote products derived
> +    from this software without specific prior written permission.
> +
> +    THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
> +    "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
> +    LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
> +    A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
> +    OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
> +    SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
> +    LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
> +    DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
> +    THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
> +    (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
> +    OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
> +
> +Fail-safe poll mode driver library
> +==================================
> +
> +The Fail-safe poll mode driver library (**librte_pmd_failsafe**) is a virtual
> +device that allows using any device supporting hotplug (sudden device removal
> +and plugging on its bus), without modifying other components relying on such
> +device (application, other PMDs).
> +
> +Additionally to the Seamless Hotplug feature, the Fail-safe PMD offers the
> +ability to redirect operations to secondary devices when the primary has been
> +removed from the system.
> +
> +.. note::
> +
> +   The library is enabled by default. You can enable it or disable it manually
> +   by setting the ``CONFIG_RTE_LIBRTE_PMD_FAILSAFE`` configuration option.
> +
> +Features
> +--------
> +
> +The Fail-safe PMD only supports a limited set of features. If you plan to use a
> +device underneath the Fail-safe PMD with a specific feature, this feature must
> +be supported by the Fail-safe PMD to avoid throwing any error.
> +
> +Check the feature matrix for the complete set of supported features.
> +
> +Compilation options
> +-------------------
> +
> +These options can be modified in the ``$RTE_TARGET/build/.config`` file.
> +
> +- ``CONFIG_RTE_LIBRTE_PMD_FAILSAFE`` (default **y**)
> +
> +  Toggle compiling librte_pmd_failsafe itself.
> +
> +- ``CONFIG_RTE_LIBRTE_PMD_FAILSAFE_DEBUG`` (default **n**)
> +
> +  Toggle debugging code.
> +
> +Using the Fail-safe PMD from the EAL command line
> +-------------------------------------------------
> +
> +The Fail-safe PMD can be used like most other DPDK virtual devices, by passing a
> +``--vdev`` parameter to the EAL when starting the application. The device name
> +must start with the *net_failsafe* prefix, followed by numbers or letters. This
> +name must be unique for each device. Each fail-safe instance must have at least one
> +sub-device, up to ``RTE_MAX_ETHPORTS-1``.
> +
> +A sub-device can be any legal DPDK device, including possibly another fail-safe
> +instance.
> +
> +Fail-safe command line parameters
> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> +
> +- **dev(<iface>)** parameter
> +
> +  This parameter allows the user to define a sub-device. The ``<iface>`` part of
> +  this parameter must be a valid device definition. It could be the argument
> +  provided to a ``-w`` PCI device specification or the argument that would be
> +  given to a ``--vdev`` parameter (including a fail-safe).
> +  Enclosing the device definition within parenthesis here allows using
> +  additional sub-device parameters if need be. They will be passed on to the
> +  sub-device.
> +
> +- **mac** parameter [MAC address]
> +
> +  This parameter allows the user to set a default MAC address to the fail-safe
> +  and all of its sub-devices.
> +  If no default mac address is provided, the fail-safe PMD will read the MAC
> +  address of the first of its sub-device to be successfully probed and use it as
> +  its default MAC address, trying to set it to all of its other sub-devices.
> +  If no sub-device was successfully probed at initialization, then a random MAC
> +  address is generated, that will be subsequently applied to all sub-device once
> +  they are probed.
> +
> +Usage example
> +~~~~~~~~~~~~~
> +
> +This section shows some example of using **testpmd** with a fail-safe PMD.
> +
> +#. Request huge pages:
> +
> +   .. code-block:: console
> +
> +      echo 1024 > /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages
> +
> +#. Start testpmd
> +
> +   .. code-block:: console
> +
> +      $RTE_TARGET/build/app/testpmd -c 0xff -n 4 --no-pci \
> +         --vdev='net_failsafe0,mac=de:ad:be:ef:01:02,dev(84:00.0),dev(net_ring0,nodeaction=r1:0:CREATE)' -- \
> +         -i
> +
> +Using the Fail-safe PMD from an application
> +-------------------------------------------
> +
> +This driver strives to be as seamless as possible to existing applications, in
> +order to propose the hotplug functionality in the easiest way possible.
> +
> +Care must be taken, however, to respect the **ether** API concerning device
> +access, and in particular, using the ``RTE_ETH_FOREACH_DEV`` macro to iterate
> +over ethernet devices, instead of directly accessing them or by writing one's
> +own device iterator.
> diff --git a/doc/guides/nics/features/failsafe.ini b/doc/guides/nics/features/failsafe.ini
> new file mode 100644
> index 0000000..3c52823
> --- /dev/null
> +++ b/doc/guides/nics/features/failsafe.ini
> @@ -0,0 +1,24 @@
> +;
> +; Supported features of the 'fail-safe' poll mode driver.
> +;
> +; Refer to default.ini for the full list of available PMD features.
> +;
> +[Features]
> +Link status          = Y
> +Queue start/stop     = Y
> +MTU update           = Y
> +Jumbo frame          = Y
> +Promiscuous mode     = Y
> +Allmulticast mode    = Y
> +Unicast MAC filter   = Y
> +Multicast MAC filter = Y
> +VLAN filter          = Y
> +Packet type parsing  = Y
> +Basic stats          = Y
> +Stats per queue      = Y
> +ARMv7                = Y
> +ARMv8                = Y
> +Power8               = Y
> +x86-32               = Y
> +x86-64               = Y
> +Usage doc            = Y
> diff --git a/doc/guides/nics/index.rst b/doc/guides/nics/index.rst
> index 240d082..17eaaf4 100644
> --- a/doc/guides/nics/index.rst
> +++ b/doc/guides/nics/index.rst
> @@ -64,6 +64,7 @@ Network Interface Controller Drivers
>      vhost
>      vmxnet3
>      pcap_ring
> +    fail_safe
>
>  **Figures**
>
> diff --git a/drivers/net/Makefile b/drivers/net/Makefile
> index 35ed813..d33c959 100644
> --- a/drivers/net/Makefile
> +++ b/drivers/net/Makefile
> @@ -59,6 +59,8 @@ DIRS-$(CONFIG_RTE_LIBRTE_ENA_PMD) += ena
>  DEPDIRS-ena = $(core-libs)
>  DIRS-$(CONFIG_RTE_LIBRTE_ENIC_PMD) += enic
>  DEPDIRS-enic = $(core-libs) librte_hash
> +DIRS-$(CONFIG_RTE_LIBRTE_PMD_FAILSAFE) += failsafe
> +DEPDIRS-failsafe = $(core-libs)
>  DIRS-$(CONFIG_RTE_LIBRTE_FM10K_PMD) += fm10k
>  DEPDIRS-fm10k = $(core-libs) librte_hash
>  DIRS-$(CONFIG_RTE_LIBRTE_I40E_PMD) += i40e
> diff --git a/drivers/net/failsafe/Makefile b/drivers/net/failsafe/Makefile
> new file mode 100644
> index 0000000..c759035
> --- /dev/null
> +++ b/drivers/net/failsafe/Makefile
> @@ -0,0 +1,76 @@
> +#   BSD LICENSE
> +#
> +#   Copyright 2017 6WIND S.A.
> +#   Copyright 2017 Mellanox.
> +#
> +#   Redistribution and use in source and binary forms, with or without
> +#   modification, are permitted provided that the following conditions
> +#   are met:
> +#
> +#     * Redistributions of source code must retain the above copyright
> +#       notice, this list of conditions and the following disclaimer.
> +#     * Redistributions in binary form must reproduce the above copyright
> +#       notice, this list of conditions and the following disclaimer in
> +#       the documentation and/or other materials provided with the
> +#       distribution.
> +#     * Neither the name of 6WIND S.A. nor the names of its
> +#       contributors may be used to endorse or promote products derived
> +#       from this software without specific prior written permission.
> +#
> +#   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
> +#   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
> +#   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
> +#   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
> +#   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
> +#   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
> +#   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
> +#   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
> +#   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
> +#   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
> +#   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
> +
> +include $(RTE_SDK)/mk/rte.vars.mk
> +
> +# Library name
> +LIB = librte_pmd_failsafe.a
> +
> +EXPORT_MAP := rte_pmd_failsafe_version.map
> +
> +LIBABIVER := 1
> +
> +# Sources are stored in SRCS-y
> +SRCS-$(CONFIG_RTE_LIBRTE_PMD_FAILSAFE) += failsafe.c
> +SRCS-$(CONFIG_RTE_LIBRTE_PMD_FAILSAFE) += failsafe_args.c
> +SRCS-$(CONFIG_RTE_LIBRTE_PMD_FAILSAFE) += failsafe_eal.c
> +SRCS-$(CONFIG_RTE_LIBRTE_PMD_FAILSAFE) += failsafe_ops.c
> +SRCS-$(CONFIG_RTE_LIBRTE_PMD_FAILSAFE) += failsafe_rxtx.c
> +
> +# No exported include files
> +
> +# This lib depends upon:
> +DEPDIRS-$(CONFIG_RTE_LIBRTE_PMD_FAILSAFE) += lib/librte_eal
> +DEPDIRS-$(CONFIG_RTE_LIBRTE_PMD_FAILSAFE) += lib/librte_ether
> +DEPDIRS-$(CONFIG_RTE_LIBRTE_PMD_FAILSAFE) += lib/librte_kvargs
> +DEPDIRS-$(CONFIG_RTE_LIBRTE_PMD_FAILSAFE) += lib/librte_mbuf
> +
> +ifneq ($(DEBUG),)
> +CONFIG_RTE_LIBRTE_PMD_FAILSAFE_DEBUG := y
> +endif
> +
> +# Basic CFLAGS:
> +CFLAGS += -std=gnu99 -Wall -Wextra
> +CFLAGS += -I.
> +CFLAGS += -D_DEFAULT_SOURCE
> +CFLAGS += -D_XOPEN_SOURCE=700
> +CFLAGS += $(WERROR_FLAGS)
> +CFLAGS += -Wno-strict-prototypes
> +CFLAGS += -pedantic -DPEDANTIC
> +
> +ifeq ($(CONFIG_RTE_LIBRTE_PMD_FAILSAFE_DEBUG),y)
> +CFLAGS += -g -UNDEBUG
> +else
> +CFLAGS += -O3
> +CFLAGS += -DNDEBUG
> +endif
> +
> +include $(RTE_SDK)/mk/rte.lib.mk
> diff --git a/drivers/net/failsafe/failsafe.c b/drivers/net/failsafe/failsafe.c
> new file mode 100644
> index 0000000..7cf33e8
> --- /dev/null
> +++ b/drivers/net/failsafe/failsafe.c
> @@ -0,0 +1,231 @@
> +/*-
> + *   BSD LICENSE
> + *
> + *   Copyright 2017 6WIND S.A.
> + *   Copyright 2017 Mellanox.
> + *
> + *   Redistribution and use in source and binary forms, with or without
> + *   modification, are permitted provided that the following conditions
> + *   are met:
> + *
> + *     * Redistributions of source code must retain the above copyright
> + *       notice, this list of conditions and the following disclaimer.
> + *     * Redistributions in binary form must reproduce the above copyright
> + *       notice, this list of conditions and the following disclaimer in
> + *       the documentation and/or other materials provided with the
> + *       distribution.
> + *     * Neither the name of 6WIND S.A. nor the names of its
> + *       contributors may be used to endorse or promote products derived
> + *       from this software without specific prior written permission.
> + *
> + *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
> + *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
> + *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
> + *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
> + *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
> + *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
> + *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
> + *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
> + *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
> + *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
> + *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
> + */
> +#include <rte_alarm.h>
> +#include <rte_malloc.h>
> +#include <rte_ethdev.h>
> +#include <rte_ethdev_vdev.h>
> +#include <rte_devargs.h>
> +#include <rte_kvargs.h>
> +#include <rte_vdev.h>
> +
> +#include "failsafe_private.h"
> +
> +const char pmd_failsafe_driver_name[] = FAILSAFE_DRIVER_NAME;
> +static const struct rte_eth_link eth_link = {
> +       .link_speed = ETH_SPEED_NUM_10G,
> +       .link_duplex = ETH_LINK_FULL_DUPLEX,
> +       .link_status = ETH_LINK_UP,
> +       .link_autoneg = ETH_LINK_SPEED_AUTONEG,
> +};
> +
> +static int
> +fs_sub_device_create(struct rte_eth_dev *dev,
> +               const char *params)
> +{
> +       uint8_t nb_subs;
> +       int ret;
> +
> +       ret = failsafe_args_count_subdevice(dev, params);
> +       if (ret)
> +               return ret;
> +       if (PRIV(dev)->subs_tail > FAILSAFE_MAX_ETHPORTS) {
> +               ERROR("Cannot allocate more than %d ports",
> +                       FAILSAFE_MAX_ETHPORTS);
> +               return -ENOSPC;
> +       }
> +       nb_subs = PRIV(dev)->subs_tail;
> +       PRIV(dev)->subs = rte_zmalloc(NULL,
> +                       sizeof(struct sub_device) * nb_subs,
> +                       RTE_CACHE_LINE_SIZE);
> +       if (PRIV(dev)->subs == NULL) {
> +               ERROR("Could not allocate sub_devices");
> +               return -ENOMEM;
> +       }
> +       return 0;
> +}
> +
> +static void
> +fs_sub_device_free(struct rte_eth_dev *dev)
> +{
> +       rte_free(PRIV(dev)->subs);
> +}
> +
> +static int
> +fs_eth_dev_create(struct rte_vdev_device *vdev)
> +{
> +       struct rte_eth_dev *dev;
> +       struct ether_addr *mac;
> +       struct fs_priv *priv;
> +       struct sub_device *sdev;
> +       const char *params;
> +       unsigned int socket_id;
> +       uint8_t i;
> +       int ret;
> +
> +       dev = NULL;
> +       priv = NULL;
> +       params = rte_vdev_device_args(vdev);
> +       socket_id = rte_socket_id();
> +       INFO("Creating fail-safe device on NUMA socket %u",
> +            socket_id);
> +       dev = rte_eth_vdev_allocate(vdev, sizeof(*priv));
> +       if (dev == NULL) {
> +               ERROR("Unable to allocate rte_eth_dev");
> +               return -1;
> +       }
> +       priv = dev->data->dev_private;
> +       PRIV(dev)->dev = dev;
> +       dev->dev_ops = &failsafe_ops;
> +       TAILQ_INIT(&dev->link_intr_cbs);
> +       dev->data->dev_flags = 0x0;
> +       dev->data->mac_addrs = &PRIV(dev)->mac_addrs[0];
> +       dev->data->dev_link = eth_link;
> +       PRIV(dev)->nb_mac_addr = 1;
> +       dev->rx_pkt_burst = (eth_rx_burst_t)&failsafe_rx_burst;
> +       dev->tx_pkt_burst = (eth_tx_burst_t)&failsafe_tx_burst;
> +       if (params == NULL) {
> +               ERROR("This PMD requires sub-devices, none provided");
> +               goto free_dev;
> +       }
> +       ret = fs_sub_device_create(dev, params);
> +       if (ret) {
> +               ERROR("Could not allocate sub_devices");
> +               goto free_dev;
> +       }
> +       ret = failsafe_args_parse(dev, params);
> +       if (ret)
> +               goto free_subs;
> +       ret = failsafe_eal_init(dev);
> +       if (ret)
> +               goto free_args;
> +       mac = &dev->data->mac_addrs[0];
> +       if (mac_from_arg) {
> +               /*
> +                * If MAC address was provided as a parameter,
> +                * apply to all probed slaves.
> +                */
> +               FOREACH_SUBDEV_ST(sdev, i, dev, DEV_PROBED) {
> +                       ret = rte_eth_dev_default_mac_addr_set(PORT_ID(sdev),
> +                                                              mac);
> +                       if (ret) {
> +                               ERROR("Failed to set default MAC address");
> +                               goto free_args;
> +                       }
> +               }
> +       } else {
> +               /*
> +                * Use the ether_addr from first probed
> +                * device, either preferred or fallback.
> +                */
> +               FOREACH_SUBDEV(sdev, i, dev)
> +                       if (sdev->state >= DEV_PROBED) {
> +                               ether_addr_copy(&ETH(sdev)->data->mac_addrs[0],
> +                                               mac);
> +                               break;
> +                       }
> +               /*
> +                * If no device has been probed and no ether_addr
> +                * has been provided on the command line, use a random
> +                * valid one.
> +                * It will be applied during future slave state syncs to
> +                * probed slaves.
> +                */
> +               if (i == priv->subs_tail)
> +                       eth_random_addr(&mac->addr_bytes[0]);
> +       }
> +       INFO("MAC address is %02x:%02x:%02x:%02x:%02x:%02x",
> +               mac->addr_bytes[0], mac->addr_bytes[1],
> +               mac->addr_bytes[2], mac->addr_bytes[3],
> +               mac->addr_bytes[4], mac->addr_bytes[5]);
> +       return 0;
> +free_args:
> +       failsafe_args_free(dev);
> +free_subs:
> +       fs_sub_device_free(dev);
> +free_dev:
> +       rte_eth_dev_release_port(dev);
> +       return -1;
> +}
> +
> +static int
> +fs_rte_eth_free(const char *name)
> +{
> +       struct rte_eth_dev *dev;
> +       int ret;
> +
> +       dev = rte_eth_dev_allocated(name);
> +       if (dev == NULL)
> +               return -ENODEV;
> +       ret = failsafe_eal_uninit(dev);
> +       if (ret)
> +               ERROR("Error while uninitializing sub-EAL");
> +       failsafe_args_free(dev);
> +       fs_sub_device_free(dev);
> +       rte_free(PRIV(dev));
> +       rte_eth_dev_release_port(dev);
> +       return ret;
> +}
> +
> +static int
> +rte_pmd_failsafe_probe(struct rte_vdev_device *vdev)
> +{
> +       const char *name;
> +
> +       name = rte_vdev_device_name(vdev);
> +       if (vdev == NULL)
> +               return -EINVAL;
> +       INFO("Initializing " FAILSAFE_DRIVER_NAME " for %s",
> +                       name);
> +       return fs_eth_dev_create(vdev);
> +}
> +
> +static int
> +rte_pmd_failsafe_remove(struct rte_vdev_device *vdev)
> +{
> +       const char *name;
> +
> +       name = rte_vdev_device_name(vdev);
> +       if (name == NULL)
> +               return -EINVAL;
> +       INFO("Uninitializing " FAILSAFE_DRIVER_NAME " for %s", name);
> +       return fs_rte_eth_free(name);
> +}
> +
> +static struct rte_vdev_driver failsafe_drv = {
> +       .probe = rte_pmd_failsafe_probe,
> +       .remove = rte_pmd_failsafe_remove,
> +};
> +
> +RTE_PMD_REGISTER_VDEV(net_failsafe, failsafe_drv);
> +RTE_PMD_REGISTER_ALIAS(net_failsafe, eth_failsafe);
> +RTE_PMD_REGISTER_PARAM_STRING(net_failsafe, PMD_FAILSAFE_PARAM_STRING);
> diff --git a/drivers/net/failsafe/failsafe_args.c b/drivers/net/failsafe/failsafe_args.c
> new file mode 100644
> index 0000000..f07d26e
> --- /dev/null
> +++ b/drivers/net/failsafe/failsafe_args.c
> @@ -0,0 +1,331 @@
> +/*-
> + *   BSD LICENSE
> + *
> + *   Copyright 2017 6WIND S.A.
> + *   Copyright 2017 Mellanox.
> + *
> + *   Redistribution and use in source and binary forms, with or without
> + *   modification, are permitted provided that the following conditions
> + *   are met:
> + *
> + *     * Redistributions of source code must retain the above copyright
> + *       notice, this list of conditions and the following disclaimer.
> + *     * Redistributions in binary form must reproduce the above copyright
> + *       notice, this list of conditions and the following disclaimer in
> + *       the documentation and/or other materials provided with the
> + *       distribution.
> + *     * Neither the name of 6WIND S.A. nor the names of its
> + *       contributors may be used to endorse or promote products derived
> + *       from this software without specific prior written permission.
> + *
> + *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
> + *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
> + *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
> + *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
> + *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
> + *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
> + *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
> + *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
> + *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
> + *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
> + *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
> + */
> +#include <string.h>
> +#include <errno.h>
> +
> +#include <rte_devargs.h>
> +#include <rte_malloc.h>
> +#include <rte_kvargs.h>
> +
> +#include "failsafe_private.h"
> +
> +#define DEVARGS_MAXLEN 4096
> +
> +/* Callback used when a new device is found in devargs */
> +typedef int (parse_cb)(struct rte_eth_dev *dev, const char *params,
> +               uint8_t head);
> +
> +int mac_from_arg;
> +
> +const char *pmd_failsafe_init_parameters[] = {
> +       PMD_FAILSAFE_MAC_KVARG,
> +       NULL,
> +};
> +
> +/*
> + * input: text.
> + * output: 0: if text[0] != '(',
> + *         0: if there are no corresponding ')'
> + *         n: distance to corresponding ')' otherwise
> + */
> +static size_t
> +closing_paren(const char *text)
> +{
> +       int nb_open = 0;
> +       size_t i = 0;
> +
> +       while (text[i] != '\0') {
> +               if (text[i] == '(')
> +                       nb_open++;
> +               if (text[i] == ')')
> +                       nb_open--;
> +               if (nb_open == 0)
> +                       return i;
> +               i++;
> +       }
> +       return 0;
> +}
> +
> +static int
> +fs_parse_device(struct sub_device *sdev, char *args)
> +{
> +       struct rte_devargs *d;
> +       int ret;
> +
> +       d = &sdev->devargs;
> +       DEBUG("%s", args);
> +       ret = rte_eal_devargs_parse(args, d);
> +       if (ret) {
> +               DEBUG("devargs parsing failed with code %d", ret);
> +               return ret;
> +       }
> +       sdev->bus = d->bus;
> +       sdev->state = DEV_PARSED;

You seem to be mostly interested in the bus name for the device. Why
don't you track this via your sub_device structure instead of using
rte_devargs?


> +       return 0;
> +}
> +
> +static int
> +fs_parse_device_param(struct rte_eth_dev *dev, const char *param,
> +               uint8_t head)
> +{
> +       struct fs_priv *priv;
> +       struct sub_device *sdev;
> +       char *args = NULL;
> +       size_t a, b;
> +       int ret;
> +
> +       priv = PRIV(dev);
> +       a = 0;
> +       b = 0;
> +       ret = 0;
> +       while  (param[b] != '(' &&
> +               param[b] != '\0')
> +               b++;
> +       a = b;
> +       b += closing_paren(&param[b]);
> +       if (a == b) {
> +               ERROR("Dangling parenthesis");
> +               return -EINVAL;
> +       }
> +       a += 1;
> +       args = strndup(&param[a], b - a);
> +       if (args == NULL) {
> +               ERROR("Not enough memory for parameter parsing");
> +               return -ENOMEM;
> +       }
> +       sdev = &priv->subs[head];
> +       if (strncmp(param, "dev", 3) == 0) {
> +               ret = fs_parse_device(sdev, args);
> +               if (ret)
> +                       goto free_args;
> +       } else {
> +               ERROR("Unrecognized device type: %.*s", (int)b, param);
> +               return -EINVAL;
> +       }
> +free_args:
> +       free(args);
> +       return ret;
> +}
> +
> +static int
> +fs_parse_sub_devices(parse_cb *cb,
> +               struct rte_eth_dev *dev, const char *params)
> +{
> +       size_t a, b;
> +       uint8_t head;
> +       int ret;
> +
> +       a = 0;
> +       head = 0;
> +       ret = 0;
> +       while (params[a] != '\0') {
> +               b = a;
> +               while (params[b] != '(' &&
> +                      params[b] != ',' &&
> +                      params[b] != '\0')
> +                       b++;
> +               if (b == a) {
> +                       ERROR("Invalid parameter");
> +                       return -EINVAL;
> +               }
> +               if (params[b] == ',') {
> +                       a = b + 1;
> +                       continue;
> +               }
> +               if (params[b] == '(') {
> +                       size_t start = b;
> +
> +                       b += closing_paren(&params[b]);
> +                       if (b == start) {
> +                               ERROR("Dangling parenthesis");
> +                               return -EINVAL;
> +                       }
> +                       ret = (*cb)(dev, &params[a], head);
> +                       if (ret)
> +                               return ret;
> +                       head += 1;
> +                       b += 1;
> +                       if (params[b] == '\0')
> +                               return 0;
> +               }
> +               a = b + 1;
> +       }
> +       return 0;
> +}
> +
> +static int
> +fs_remove_sub_devices_definition(char params[DEVARGS_MAXLEN])
> +{
> +       char buffer[DEVARGS_MAXLEN] = {0};
> +       size_t a, b;
> +       int i;
> +
> +       a = 0;
> +       i = 0;
> +       while (params[a] != '\0') {
> +               b = a;
> +               while (params[b] != '(' &&
> +                      params[b] != ',' &&
> +                      params[b] != '\0')
> +                       b++;
> +               if (b == a) {
> +                       ERROR("Invalid parameter");
> +                       return -EINVAL;
> +               }
> +               if (params[b] == ',' || params[b] == '\0')
> +                       i += snprintf(&buffer[i], b - a + 1, "%s", &params[a]);
> +               if (params[b] == '(') {
> +                       size_t start = b;
> +                       b += closing_paren(&params[b]);
> +                       if (b == start)
> +                               return -EINVAL;
> +                       b += 1;
> +                       if (params[b] == '\0')
> +                               goto out;
> +               }
> +               a = b + 1;
> +       }
> +out:
> +       snprintf(params, DEVARGS_MAXLEN, "%s", buffer);
> +       return 0;
> +}
> +
> +static int
> +fs_get_mac_addr_arg(const char *key __rte_unused,
> +               const char *value, void *out)
> +{
> +       struct ether_addr *ea = out;
> +       int ret;
> +
> +       if ((value == NULL) || (out == NULL))
> +               return -EINVAL;
> +       ret = sscanf(value, "%hhx:%hhx:%hhx:%hhx:%hhx:%hhx",
> +               &ea->addr_bytes[0], &ea->addr_bytes[1],
> +               &ea->addr_bytes[2], &ea->addr_bytes[3],
> +               &ea->addr_bytes[4], &ea->addr_bytes[5]);
> +       return ret != ETHER_ADDR_LEN;
> +}
> +
> +int
> +failsafe_args_parse(struct rte_eth_dev *dev, const char *params)
> +{
> +       struct fs_priv *priv;
> +       char mut_params[DEVARGS_MAXLEN] = "";
> +       struct rte_kvargs *kvlist = NULL;
> +       unsigned int arg_count;
> +       size_t n;
> +       int ret;
> +
> +       if (dev == NULL || params == NULL)
> +               return -EINVAL;
> +       priv = PRIV(dev);
> +       ret = 0;
> +       priv->subs_tx = FAILSAFE_MAX_ETHPORTS;
> +       /* default parameters */
> +       mac_from_arg = 0;
> +       n = snprintf(mut_params, sizeof(mut_params), "%s", params);
> +       if (n >= sizeof(mut_params)) {
> +               ERROR("Parameter string too long (>=%zu)",
> +                               sizeof(mut_params));
> +               return -ENOMEM;
> +       }
> +       ret = fs_parse_sub_devices(fs_parse_device_param,
> +                                  dev, params);
> +       if (ret < 0)
> +               return ret;
> +       ret = fs_remove_sub_devices_definition(mut_params);
> +       if (ret < 0)
> +               return ret;
> +       if (strnlen(mut_params, sizeof(mut_params)) > 0) {
> +               kvlist = rte_kvargs_parse(mut_params,
> +                               pmd_failsafe_init_parameters);
> +               if (kvlist == NULL) {
> +                       ERROR("Error parsing parameters, usage:\n"
> +                               PMD_FAILSAFE_PARAM_STRING);
> +                       return -1;
> +               }
> +               /* MAC addr */
> +               arg_count = rte_kvargs_count(kvlist,
> +                               PMD_FAILSAFE_MAC_KVARG);
> +               if (arg_count == 1) {
> +                       ret = rte_kvargs_process(kvlist,
> +                                       PMD_FAILSAFE_MAC_KVARG,
> +                                       &fs_get_mac_addr_arg,
> +                                       &dev->data->mac_addrs[0]);
> +                       if (ret < 0)
> +                               goto free_kvlist;
> +                       mac_from_arg = 1;
> +               }
> +       }
> +free_kvlist:
> +       rte_kvargs_free(kvlist);
> +       return ret;
> +}
> +
> +void
> +failsafe_args_free(struct rte_eth_dev *dev)
> +{
> +       struct sub_device *sdev;
> +       uint8_t i;
> +
> +       FOREACH_SUBDEV(sdev, i, dev) {
> +               free(sdev->devargs.args);
> +               sdev->devargs.args = NULL;
> +       }
> +}
> +
> +static int
> +fs_count_device(struct rte_eth_dev *dev, const char *param,
> +               uint8_t head __rte_unused)
> +{
> +       size_t b = 0;
> +
> +       while  (param[b] != '(' &&
> +               param[b] != '\0')
> +               b++;
> +       if (strncmp(param, "dev", b) &&
> +           strncmp(param, "exec", b)) {
> +               ERROR("Unrecognized device type: %.*s", (int)b, param);
> +               return -EINVAL;
> +       }
> +       PRIV(dev)->subs_tail += 1;
> +       return 0;
> +}
> +
> +int
> +failsafe_args_count_subdevice(struct rte_eth_dev *dev,
> +                       const char *params)
> +{
> +       return fs_parse_sub_devices(fs_count_device,
> +                                   dev, params);
> +}
> diff --git a/drivers/net/failsafe/failsafe_eal.c b/drivers/net/failsafe/failsafe_eal.c
> new file mode 100644
> index 0000000..6c3a811
> --- /dev/null
> +++ b/drivers/net/failsafe/failsafe_eal.c
> @@ -0,0 +1,154 @@
> +/*-
> + *   BSD LICENSE
> + *
> + *   Copyright 2017 6WIND S.A.
> + *   Copyright 2017 Mellanox.
> + *
> + *   Redistribution and use in source and binary forms, with or without
> + *   modification, are permitted provided that the following conditions
> + *   are met:
> + *
> + *     * Redistributions of source code must retain the above copyright
> + *       notice, this list of conditions and the following disclaimer.
> + *     * Redistributions in binary form must reproduce the above copyright
> + *       notice, this list of conditions and the following disclaimer in
> + *       the documentation and/or other materials provided with the
> + *       distribution.
> + *     * Neither the name of 6WIND S.A. nor the names of its
> + *       contributors may be used to endorse or promote products derived
> + *       from this software without specific prior written permission.
> + *
> + *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
> + *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
> + *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
> + *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
> + *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
> + *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
> + *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
> + *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
> + *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
> + *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
> + *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
> + */
> +
> +#include <rte_malloc.h>
> +
> +#include "failsafe_private.h"
> +
> +static struct rte_eth_dev *
> +fs_find_ethdev(const struct rte_device *dev)
> +{
> +       struct rte_eth_dev *eth_dev;
> +       uint8_t i;
> +
> +       RTE_ETH_FOREACH_DEV(i) {
> +               eth_dev = &rte_eth_devices[i];
> +               if (eth_dev->device == dev)
> +                       return eth_dev;
> +       }
> +       return NULL;
> +}

Why don't you use rte_eth_dev_allocated() here?


> +
> +static int
> +fs_bus_init(struct rte_eth_dev *dev)
> +{
> +       struct sub_device *sdev;
> +       struct rte_device *rdev;
> +       struct rte_devargs *da;
> +       uint8_t i;
> +       int ret;
> +
> +       FOREACH_SUBDEV(sdev, i, dev) {
> +               if (sdev->state != DEV_PARSED)
> +                       continue;
> +               da = &sdev->devargs;
> +               rdev = rte_eal_hotplug_add(da->bus->name,
> +                                          da->name,
> +                                          da->args);

Why don't you track the bus name through your sub_device structure instead?


> +               ret = rdev ? 0 : -rte_errno;
> +               if (ret) {
> +                       ERROR("sub_device %d probe failed %s%s%s", i,
> +                             errno ? "(" : "",
> +                             errno ? strerror(rte_errno) : "",
> +                             errno ? ")" : "");
> +                       continue;
> +               }
> +               ETH(sdev) = fs_find_ethdev(rdev);
> +               if (ETH(sdev) == NULL) {
> +                       ERROR("sub_device %d init went wrong", i);
> +                       return -ENODEV;
> +               }
> +               sdev->dev = ETH(sdev)->device;
> +               ETH(sdev)->state = RTE_ETH_DEV_DEFERRED;
> +               sdev->state = DEV_PROBED;
> +       }
> +       return 0;
> +}
> +
> +int
> +failsafe_eal_init(struct rte_eth_dev *dev)
> +{
> +       struct sub_device *sdev;
> +       uint8_t i;
> +       int ret;
> +
> +       ret = fs_bus_init(dev);
> +       if (ret)
> +               return ret;
> +       /*
> +        * We only update TX_SUBDEV if we are not started.
> +        * If a sub_device is emitting, we will switch the TX_SUBDEV to the
> +        * preferred port only upon starting it, so that the switch is smoother.
> +        */
> +       if (PREFERRED_SUBDEV(dev)->state >= DEV_PROBED) {
> +               if (TX_SUBDEV(dev) != PREFERRED_SUBDEV(dev) &&
> +                   (TX_SUBDEV(dev) == NULL ||
> +                    (TX_SUBDEV(dev) && TX_SUBDEV(dev)->state < DEV_STARTED))) {
> +                       DEBUG("Switching tx_dev to preferred sub_device");
> +                       PRIV(dev)->subs_tx = 0;
> +               }
> +       } else {
> +               if ((TX_SUBDEV(dev) && TX_SUBDEV(dev)->state < DEV_PROBED) ||
> +                   TX_SUBDEV(dev) == NULL) {
> +                       /* Using first probed device */
> +                       FOREACH_SUBDEV_ST(sdev, i, dev, DEV_PROBED) {
> +                               DEBUG("Switching tx_dev to sub_device %d",
> +                                     i);
> +                               PRIV(dev)->subs_tx = i;
> +                               break;
> +                       }
> +               }
> +       }
> +       return 0;
> +}
> +
> +static int
> +fs_bus_uninit(struct rte_eth_dev *dev)
> +{
> +       struct sub_device *sdev = NULL;
> +       uint8_t i;
> +       int ret;
> +
> +       FOREACH_SUBDEV_ST(sdev, i, dev, DEV_PROBED) {
> +               ret = rte_eal_hotplug_remove(sdev->bus->name,
> +                                            sdev->dev->name);
> +               if (ret) {
> +                       ERROR("Failed to remove requested device %s",
> +                             sdev->dev->name);
> +                       continue;
> +               }
> +               sdev->state = DEV_PROBED - 1;
> +       }
> +       return 0;
> +}
> +
> +int
> +failsafe_eal_uninit(struct rte_eth_dev *dev)
> +{
> +       int ret;
> +
> +       ret = fs_bus_uninit(dev);
> +       if (ret)
> +               return ret;
> +       return 0;
> +}
> diff --git a/drivers/net/failsafe/failsafe_ops.c b/drivers/net/failsafe/failsafe_ops.c
> new file mode 100644
> index 0000000..693162e
> --- /dev/null
> +++ b/drivers/net/failsafe/failsafe_ops.c
> @@ -0,0 +1,663 @@
> +/*-
> + *   BSD LICENSE
> + *
> + *   Copyright 2017 6WIND S.A.
> + *   Copyright 2017 Mellanox.
> + *
> + *   Redistribution and use in source and binary forms, with or without
> + *   modification, are permitted provided that the following conditions
> + *   are met:
> + *
> + *     * Redistributions of source code must retain the above copyright
> + *       notice, this list of conditions and the following disclaimer.
> + *     * Redistributions in binary form must reproduce the above copyright
> + *       notice, this list of conditions and the following disclaimer in
> + *       the documentation and/or other materials provided with the
> + *       distribution.
> + *     * Neither the name of 6WIND S.A. nor the names of its
> + *       contributors may be used to endorse or promote products derived
> + *       from this software without specific prior written permission.
> + *
> + *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
> + *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
> + *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
> + *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
> + *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
> + *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
> + *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
> + *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
> + *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
> + *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
> + *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
> + */
> +
> +#include <assert.h>
> +#include <stdint.h>
> +#include <rte_ethdev.h>
> +#include <rte_malloc.h>
> +
> +#include "failsafe_private.h"
> +
> +static struct rte_eth_dev_info default_infos = {
> +       .driver_name = pmd_failsafe_driver_name,
> +       /* Max possible number of elements */
> +       .max_rx_pktlen = UINT32_MAX,
> +       .max_rx_queues = RTE_MAX_QUEUES_PER_PORT,
> +       .max_tx_queues = RTE_MAX_QUEUES_PER_PORT,
> +       .max_mac_addrs = FAILSAFE_MAX_ETHADDR,
> +       .max_hash_mac_addrs = UINT32_MAX,
> +       .max_vfs = UINT16_MAX,
> +       .max_vmdq_pools = UINT16_MAX,
> +       .rx_desc_lim = {
> +               .nb_max = UINT16_MAX,
> +               .nb_min = 0,
> +               .nb_align = 1,
> +               .nb_seg_max = UINT16_MAX,
> +               .nb_mtu_seg_max = UINT16_MAX,
> +       },
> +       .tx_desc_lim = {
> +               .nb_max = UINT16_MAX,
> +               .nb_min = 0,
> +               .nb_align = 1,
> +               .nb_seg_max = UINT16_MAX,
> +               .nb_mtu_seg_max = UINT16_MAX,
> +       },
> +       /* Set of understood capabilities */
> +       .rx_offload_capa = 0x0,
> +       .tx_offload_capa = 0x0,
> +       .flow_type_rss_offloads = 0x0,
> +};
> +
> +static int
> +fs_dev_configure(struct rte_eth_dev *dev)
> +{
> +       struct sub_device *sdev;
> +       uint8_t i;
> +       int ret;
> +
> +       FOREACH_SUBDEV(sdev, i, dev) {
> +               if (sdev->state != DEV_PROBED)
> +                       continue;
> +               DEBUG("Configuring sub-device %d", i);
> +               ret = rte_eth_dev_configure(PORT_ID(sdev),
> +                                       dev->data->nb_rx_queues,
> +                                       dev->data->nb_tx_queues,
> +                                       &dev->data->dev_conf);
> +               if (ret) {
> +                       ERROR("Could not configure sub_device %d", i);
> +                       return ret;
> +               }
> +               sdev->state = DEV_ACTIVE;
> +       }
> +       return 0;
> +}
> +
> +static int
> +fs_dev_start(struct rte_eth_dev *dev)
> +{
> +       struct sub_device *sdev;
> +       uint8_t i;
> +       int ret;
> +
> +       FOREACH_SUBDEV(sdev, i, dev) {
> +               if (sdev->state != DEV_ACTIVE)
> +                       continue;
> +               DEBUG("Starting sub_device %d", i);
> +               ret = rte_eth_dev_start(PORT_ID(sdev));
> +               if (ret)
> +                       return ret;
> +               sdev->state = DEV_STARTED;
> +       }
> +       if (PREFERRED_SUBDEV(dev)->state == DEV_STARTED) {
> +               if (TX_SUBDEV(dev) != PREFERRED_SUBDEV(dev)) {
> +                       DEBUG("Switching tx_dev to preferred sub_device");
> +                       PRIV(dev)->subs_tx = 0;
> +               }
> +       } else {
> +               if ((TX_SUBDEV(dev) && TX_SUBDEV(dev)->state < DEV_STARTED) ||
> +                   TX_SUBDEV(dev) == NULL) {
> +                       FOREACH_SUBDEV_ST(sdev, i, dev, DEV_STARTED) {
> +                               DEBUG("Switching tx_dev to sub_device %d", i);
> +                               PRIV(dev)->subs_tx = i;
> +                               break;
> +                       }
> +               }
> +       }
> +       return 0;
> +}
> +
> +static void
> +fs_dev_stop(struct rte_eth_dev *dev)
> +{
> +       struct sub_device *sdev;
> +       uint8_t i;
> +
> +       FOREACH_SUBDEV_ST(sdev, i, dev, DEV_STARTED) {
> +               rte_eth_dev_stop(PORT_ID(sdev));
> +               sdev->state = DEV_STARTED - 1;
> +       }
> +}
> +
> +static int
> +fs_dev_set_link_up(struct rte_eth_dev *dev)
> +{
> +       struct sub_device *sdev;
> +       uint8_t i;
> +       int ret;
> +
> +       FOREACH_SUBDEV_ST(sdev, i, dev, DEV_ACTIVE) {
> +               DEBUG("Calling rte_eth_dev_set_link_up on sub_device %d", i);
> +               ret = rte_eth_dev_set_link_up(PORT_ID(sdev));
> +               if (ret) {
> +                       ERROR("Operation rte_eth_dev_set_link_up failed for sub_device %d"
> +                             " with error %d", i, ret);
> +                       return ret;
> +               }
> +       }
> +       return 0;
> +}
> +
> +static int
> +fs_dev_set_link_down(struct rte_eth_dev *dev)
> +{
> +       struct sub_device *sdev;
> +       uint8_t i;
> +       int ret;
> +
> +       FOREACH_SUBDEV_ST(sdev, i, dev, DEV_ACTIVE) {
> +               DEBUG("Calling rte_eth_dev_set_link_down on sub_device %d", i);
> +               ret = rte_eth_dev_set_link_down(PORT_ID(sdev));
> +               if (ret) {
> +                       ERROR("Operation rte_eth_dev_set_link_down failed for sub_device %d"
> +                             " with error %d", i, ret);
> +                       return ret;
> +               }
> +       }
> +       return 0;
> +}
> +
> +static void fs_dev_free_queues(struct rte_eth_dev *dev);
> +static void
> +fs_dev_close(struct rte_eth_dev *dev)
> +{
> +       struct sub_device *sdev;
> +       uint8_t i;
> +
> +       FOREACH_SUBDEV_ST(sdev, i, dev, DEV_ACTIVE) {
> +               DEBUG("Closing sub_device %d", i);
> +               rte_eth_dev_close(PORT_ID(sdev));
> +               sdev->state = DEV_ACTIVE - 1;
> +       }
> +       fs_dev_free_queues(dev);
> +}
> +
> +static void
> +fs_rx_queue_release(void *queue)
> +{
> +       struct rte_eth_dev *dev;
> +       struct sub_device *sdev;
> +       uint8_t i;
> +       struct rxq *rxq;
> +
> +       if (queue == NULL)
> +               return;
> +       rxq = queue;
> +       dev = rxq->priv->dev;
> +       FOREACH_SUBDEV_ST(sdev, i, dev, DEV_ACTIVE)
> +               SUBOPS(sdev, rx_queue_release)
> +                       (ETH(sdev)->data->rx_queues[rxq->qid]);
> +       dev->data->rx_queues[rxq->qid] = NULL;
> +       rte_free(rxq);
> +}
> +
> +static int
> +fs_rx_queue_setup(struct rte_eth_dev *dev,
> +               uint16_t rx_queue_id,
> +               uint16_t nb_rx_desc,
> +               unsigned int socket_id,
> +               const struct rte_eth_rxconf *rx_conf,
> +               struct rte_mempool *mb_pool)
> +{
> +       struct sub_device *sdev;
> +       struct rxq *rxq;
> +       uint8_t i;
> +       int ret;
> +
> +       rxq = dev->data->rx_queues[rx_queue_id];
> +       if (rxq != NULL) {
> +               fs_rx_queue_release(rxq);
> +               dev->data->rx_queues[rx_queue_id] = NULL;
> +       }
> +       rxq = rte_zmalloc(NULL, sizeof(*rxq),
> +                         RTE_CACHE_LINE_SIZE);
> +       if (rxq == NULL)
> +               return -ENOMEM;
> +       rxq->qid = rx_queue_id;
> +       rxq->socket_id = socket_id;
> +       rxq->info.mp = mb_pool;
> +       rxq->info.conf = *rx_conf;
> +       rxq->info.nb_desc = nb_rx_desc;
> +       rxq->priv = PRIV(dev);
> +       dev->data->rx_queues[rx_queue_id] = rxq;
> +       FOREACH_SUBDEV_ST(sdev, i, dev, DEV_ACTIVE) {
> +               ret = rte_eth_rx_queue_setup(PORT_ID(sdev),
> +                               rx_queue_id,
> +                               nb_rx_desc, socket_id,
> +                               rx_conf, mb_pool);
> +               if (ret) {
> +                       ERROR("RX queue setup failed for sub_device %d", i);
> +                       goto free_rxq;
> +               }
> +       }
> +       return 0;
> +free_rxq:
> +       fs_rx_queue_release(rxq);
> +       return ret;
> +}
> +
> +static void
> +fs_tx_queue_release(void *queue)
> +{
> +       struct rte_eth_dev *dev;
> +       struct sub_device *sdev;
> +       uint8_t i;
> +       struct txq *txq;
> +
> +       if (queue == NULL)
> +               return;
> +       txq = queue;
> +       dev = txq->priv->dev;
> +       FOREACH_SUBDEV_ST(sdev, i, dev, DEV_ACTIVE)
> +               SUBOPS(sdev, tx_queue_release)
> +                       (ETH(sdev)->data->tx_queues[txq->qid]);
> +       dev->data->tx_queues[txq->qid] = NULL;
> +       rte_free(txq);
> +}
> +
> +static int
> +fs_tx_queue_setup(struct rte_eth_dev *dev,
> +               uint16_t tx_queue_id,
> +               uint16_t nb_tx_desc,
> +               unsigned int socket_id,
> +               const struct rte_eth_txconf *tx_conf)
> +{
> +       struct sub_device *sdev;
> +       struct txq *txq;
> +       uint8_t i;
> +       int ret;
> +
> +       txq = dev->data->tx_queues[tx_queue_id];
> +       if (txq != NULL) {
> +               fs_tx_queue_release(txq);
> +               dev->data->tx_queues[tx_queue_id] = NULL;
> +       }
> +       txq = rte_zmalloc("ethdev TX queue", sizeof(*txq),
> +                         RTE_CACHE_LINE_SIZE);
> +       if (txq == NULL)
> +               return -ENOMEM;
> +       txq->qid = tx_queue_id;
> +       txq->socket_id = socket_id;
> +       txq->info.conf = *tx_conf;
> +       txq->info.nb_desc = nb_tx_desc;
> +       txq->priv = PRIV(dev);
> +       dev->data->tx_queues[tx_queue_id] = txq;
> +       FOREACH_SUBDEV_ST(sdev, i, dev, DEV_ACTIVE) {
> +               ret = rte_eth_tx_queue_setup(PORT_ID(sdev),
> +                               tx_queue_id,
> +                               nb_tx_desc, socket_id,
> +                               tx_conf);
> +               if (ret) {
> +                       ERROR("TX queue setup failed for sub_device %d", i);
> +                       goto free_txq;
> +               }
> +       }
> +       return 0;
> +free_txq:
> +       fs_tx_queue_release(txq);
> +       return ret;
> +}
> +
> +static void
> +fs_dev_free_queues(struct rte_eth_dev *dev)
> +{
> +       uint16_t i;
> +
> +       for (i = 0; i < dev->data->nb_rx_queues; i++) {
> +               fs_rx_queue_release(dev->data->rx_queues[i]);
> +               dev->data->rx_queues[i] = NULL;
> +       }
> +       dev->data->nb_rx_queues = 0;
> +       for (i = 0; i < dev->data->nb_tx_queues; i++) {
> +               fs_tx_queue_release(dev->data->tx_queues[i]);
> +               dev->data->tx_queues[i] = NULL;
> +       }
> +       dev->data->nb_tx_queues = 0;
> +}
> +
> +static void
> +fs_promiscuous_enable(struct rte_eth_dev *dev)
> +{
> +       struct sub_device *sdev;
> +       uint8_t i;
> +
> +       FOREACH_SUBDEV_ST(sdev, i, dev, DEV_ACTIVE)
> +               rte_eth_promiscuous_enable(PORT_ID(sdev));
> +}
> +
> +static void
> +fs_promiscuous_disable(struct rte_eth_dev *dev)
> +{
> +       struct sub_device *sdev;
> +       uint8_t i;
> +
> +       FOREACH_SUBDEV_ST(sdev, i, dev, DEV_ACTIVE)
> +               rte_eth_promiscuous_disable(PORT_ID(sdev));
> +}
> +
> +static void
> +fs_allmulticast_enable(struct rte_eth_dev *dev)
> +{
> +       struct sub_device *sdev;
> +       uint8_t i;
> +
> +       FOREACH_SUBDEV_ST(sdev, i, dev, DEV_ACTIVE)
> +               rte_eth_allmulticast_enable(PORT_ID(sdev));
> +}
> +
> +static void
> +fs_allmulticast_disable(struct rte_eth_dev *dev)
> +{
> +       struct sub_device *sdev;
> +       uint8_t i;
> +
> +       FOREACH_SUBDEV_ST(sdev, i, dev, DEV_ACTIVE)
> +               rte_eth_allmulticast_disable(PORT_ID(sdev));
> +}
> +
> +static int
> +fs_link_update(struct rte_eth_dev *dev,
> +               int wait_to_complete)
> +{
> +       struct sub_device *sdev;
> +       uint8_t i;
> +       int ret;
> +
> +       FOREACH_SUBDEV_ST(sdev, i, dev, DEV_ACTIVE) {
> +               DEBUG("Calling link_update on sub_device %d", i);
> +               ret = (SUBOPS(sdev, link_update))(ETH(sdev), wait_to_complete);
> +               if (ret && ret != -1) {
> +                       ERROR("Link update failed for sub_device %d with error %d",
> +                             i, ret);
> +                       return ret;
> +               }
> +       }
> +       if (TX_SUBDEV(dev)) {
> +               struct rte_eth_link *l1;
> +               struct rte_eth_link *l2;
> +
> +               l1 = &dev->data->dev_link;
> +               l2 = &ETH(TX_SUBDEV(dev))->data->dev_link;
> +               if (memcmp(l1, l2, sizeof(*l1))) {
> +                       *l1 = *l2;
> +                       return 0;
> +               }
> +       }
> +       return -1;
> +}
> +
> +static void
> +fs_stats_get(struct rte_eth_dev *dev,
> +            struct rte_eth_stats *stats)
> +{
> +       memset(stats, 0, sizeof(*stats));
> +       if (TX_SUBDEV(dev) == NULL)
> +               return;
> +       rte_eth_stats_get(PORT_ID(TX_SUBDEV(dev)), stats);
> +}
> +
> +static void
> +fs_stats_reset(struct rte_eth_dev *dev)
> +{
> +       struct sub_device *sdev;
> +       uint8_t i;
> +
> +       FOREACH_SUBDEV_ST(sdev, i, dev, DEV_ACTIVE)
> +               rte_eth_stats_reset(PORT_ID(sdev));
> +}
> +
> +/**
> + * Fail-safe dev_infos_get rules:
> + *
> + * No sub_device:
> + *   Numerables:
> + *      Use the maximum possible values for any field, so as not
> + *      to impede any further configuration effort.
> + *   Capabilities:
> + *      Limits capabilities to those that are understood by the
> + *      fail-safe PMD. This understanding stems from the fail-safe
> + *      being capable of verifying that the related capability is
> + *      expressed within the device configuration (struct rte_eth_conf).
> + *
> + * At least one probed sub_device:
> + *   Numerables:
> + *      Uses values from the active probed sub_device
> + *      The rationale here is that if any sub_device is less capable
> + *      (for example concerning the number of queues) than the active
> + *      sub_device, then its subsequent configuration will fail.
> + *      It is impossible to foresee this failure when the failing sub_device
> + *      is supposed to be plugged-in later on, so the configuration process
> + *      is the single point of failure and error reporting.
> + *   Capabilities:
> + *      Uses a logical AND of RX capabilities among
> + *      all sub_devices and the default capabilities.
> + *      Uses a logical AND of TX capabilities among
> + *      the active probed sub_device and the default capabilities.
> + *
> + */
> +static void
> +fs_dev_infos_get(struct rte_eth_dev *dev,
> +                 struct rte_eth_dev_info *infos)
> +{
> +       struct sub_device *sdev;
> +       uint8_t i;
> +
> +       sdev = TX_SUBDEV(dev);
> +       if (sdev == NULL) {
> +               DEBUG("No probed device, using default infos");
> +               rte_memcpy(&PRIV(dev)->infos, &default_infos,
> +                          sizeof(default_infos));
> +       } else {
> +               uint32_t rx_offload_capa;
> +
> +               rx_offload_capa = default_infos.rx_offload_capa;
> +               FOREACH_SUBDEV_ST(sdev, i, dev, DEV_PROBED) {
> +                       rte_eth_dev_info_get(PORT_ID(sdev),
> +                                       &PRIV(dev)->infos);
> +                       rx_offload_capa &= PRIV(dev)->infos.rx_offload_capa;
> +               }
> +               sdev = TX_SUBDEV(dev);
> +               rte_eth_dev_info_get(PORT_ID(sdev), &PRIV(dev)->infos);
> +               PRIV(dev)->infos.rx_offload_capa = rx_offload_capa;
> +               PRIV(dev)->infos.tx_offload_capa &=
> +                                       default_infos.tx_offload_capa;
> +               PRIV(dev)->infos.flow_type_rss_offloads &=
> +                                       default_infos.flow_type_rss_offloads;
> +       }
> +       rte_memcpy(infos, &PRIV(dev)->infos, sizeof(*infos));
> +}
> +
> +static const uint32_t *
> +fs_dev_supported_ptypes_get(struct rte_eth_dev *dev)
> +{
> +       struct sub_device *sdev;
> +       struct rte_eth_dev *edev;
> +
> +       sdev = TX_SUBDEV(dev);
> +       if (sdev == NULL)
> +               return NULL;
> +       edev = ETH(sdev);
> +       /* ENOTSUP: counts as no supported ptypes */
> +       if (SUBOPS(sdev, dev_supported_ptypes_get) == NULL)
> +               return NULL;
> +       /*
> +        * The API does not permit to do a clean AND of all ptypes,
> +        * It is also incomplete by design and we do not really care
> +        * to have a best possible value in this context.
> +        * We just return the ptypes of the device of highest
> +        * priority, usually the PREFERRED device.
> +        */
> +       return SUBOPS(sdev, dev_supported_ptypes_get)(edev);
> +}
> +
> +static int
> +fs_mtu_set(struct rte_eth_dev *dev, uint16_t mtu)
> +{
> +       struct sub_device *sdev;
> +       uint8_t i;
> +       int ret;
> +
> +       FOREACH_SUBDEV_ST(sdev, i, dev, DEV_ACTIVE) {
> +               DEBUG("Calling rte_eth_dev_set_mtu on sub_device %d", i);
> +               ret = rte_eth_dev_set_mtu(PORT_ID(sdev), mtu);
> +               if (ret) {
> +                       ERROR("Operation rte_eth_dev_set_mtu failed for sub_device %d"
> +                             " with error %d", i, ret);
> +                       return ret;
> +               }
> +       }
> +       return 0;
> +}
> +
> +static int
> +fs_vlan_filter_set(struct rte_eth_dev *dev, uint16_t vlan_id, int on)
> +{
> +       struct sub_device *sdev;
> +       uint8_t i;
> +       int ret;
> +
> +       FOREACH_SUBDEV_ST(sdev, i, dev, DEV_ACTIVE) {
> +               DEBUG("Calling rte_eth_dev_vlan_filter on sub_device %d", i);
> +               ret = rte_eth_dev_vlan_filter(PORT_ID(sdev), vlan_id, on);
> +               if (ret) {
> +                       ERROR("Operation rte_eth_dev_vlan_filter failed for sub_device %d"
> +                             " with error %d", i, ret);
> +                       return ret;
> +               }
> +       }
> +       return 0;
> +}
> +
> +static int
> +fs_flow_ctrl_get(struct rte_eth_dev *dev,
> +               struct rte_eth_fc_conf *fc_conf)
> +{
> +       struct sub_device *sdev;
> +
> +       sdev = TX_SUBDEV(dev);
> +       if (sdev == NULL)
> +               return 0;
> +       if (SUBOPS(sdev, flow_ctrl_get) == NULL)
> +               return -ENOTSUP;
> +       return SUBOPS(sdev, flow_ctrl_get)(ETH(sdev), fc_conf);
> +}
> +
> +static int
> +fs_flow_ctrl_set(struct rte_eth_dev *dev,
> +               struct rte_eth_fc_conf *fc_conf)
> +{
> +       struct sub_device *sdev;
> +       uint8_t i;
> +       int ret;
> +
> +       FOREACH_SUBDEV_ST(sdev, i, dev, DEV_ACTIVE) {
> +               DEBUG("Calling rte_eth_dev_flow_ctrl_set on sub_device %d", i);
> +               ret = rte_eth_dev_flow_ctrl_set(PORT_ID(sdev), fc_conf);
> +               if (ret) {
> +                       ERROR("Operation rte_eth_dev_flow_ctrl_set failed for sub_device %d"
> +                             " with error %d", i, ret);
> +                       return ret;
> +               }
> +       }
> +       return 0;
> +}
> +
> +static void
> +fs_mac_addr_remove(struct rte_eth_dev *dev, uint32_t index)
> +{
> +       struct sub_device *sdev;
> +       uint8_t i;
> +
> +       /* No check: already done within the rte_eth_dev_mac_addr_remove
> +        * call for the fail-safe device.
> +        */
> +       FOREACH_SUBDEV_ST(sdev, i, dev, DEV_ACTIVE)
> +               rte_eth_dev_mac_addr_remove(PORT_ID(sdev),
> +                               &dev->data->mac_addrs[index]);
> +       PRIV(dev)->mac_addr_pool[index] = 0;
> +}
> +
> +static int
> +fs_mac_addr_add(struct rte_eth_dev *dev,
> +               struct ether_addr *mac_addr,
> +               uint32_t index,
> +               uint32_t vmdq)
> +{
> +       struct sub_device *sdev;
> +       int ret;
> +       uint8_t i;
> +
> +       assert(index < FAILSAFE_MAX_ETHADDR);
> +       FOREACH_SUBDEV_ST(sdev, i, dev, DEV_ACTIVE) {
> +               ret = rte_eth_dev_mac_addr_add(PORT_ID(sdev), mac_addr, vmdq);
> +               if (ret) {
> +                       ERROR("Operation rte_eth_dev_mac_addr_add failed for sub_device %"
> +                             PRIu8 " with error %d", i, ret);
> +                       return ret;
> +               }
> +       }
> +       if (index >= PRIV(dev)->nb_mac_addr) {
> +               DEBUG("Growing mac_addrs array");
> +               PRIV(dev)->nb_mac_addr = index;
> +       }
> +       PRIV(dev)->mac_addr_pool[index] = vmdq;
> +       return 0;
> +}
> +
> +static void
> +fs_mac_addr_set(struct rte_eth_dev *dev, struct ether_addr *mac_addr)
> +{
> +       struct sub_device *sdev;
> +       uint8_t i;
> +
> +       FOREACH_SUBDEV_ST(sdev, i, dev, DEV_ACTIVE)
> +               rte_eth_dev_default_mac_addr_set(PORT_ID(sdev), mac_addr);
> +}
> +
> +const struct eth_dev_ops failsafe_ops = {
> +       .dev_configure = fs_dev_configure,
> +       .dev_start = fs_dev_start,
> +       .dev_stop = fs_dev_stop,
> +       .dev_set_link_down = fs_dev_set_link_down,
> +       .dev_set_link_up = fs_dev_set_link_up,
> +       .dev_close = fs_dev_close,
> +       .promiscuous_enable = fs_promiscuous_enable,
> +       .promiscuous_disable = fs_promiscuous_disable,
> +       .allmulticast_enable = fs_allmulticast_enable,
> +       .allmulticast_disable = fs_allmulticast_disable,
> +       .link_update = fs_link_update,
> +       .stats_get = fs_stats_get,
> +       .stats_reset = fs_stats_reset,
> +       .dev_infos_get = fs_dev_infos_get,
> +       .dev_supported_ptypes_get = fs_dev_supported_ptypes_get,
> +       .mtu_set = fs_mtu_set,
> +       .vlan_filter_set = fs_vlan_filter_set,
> +       .rx_queue_setup = fs_rx_queue_setup,
> +       .tx_queue_setup = fs_tx_queue_setup,
> +       .rx_queue_release = fs_rx_queue_release,
> +       .tx_queue_release = fs_tx_queue_release,
> +       .flow_ctrl_get = fs_flow_ctrl_get,
> +       .flow_ctrl_set = fs_flow_ctrl_set,
> +       .mac_addr_remove = fs_mac_addr_remove,
> +       .mac_addr_add = fs_mac_addr_add,
> +       .mac_addr_set = fs_mac_addr_set,
> +};
> diff --git a/drivers/net/failsafe/failsafe_private.h b/drivers/net/failsafe/failsafe_private.h
> new file mode 100644
> index 0000000..e7a7592
> --- /dev/null
> +++ b/drivers/net/failsafe/failsafe_private.h
> @@ -0,0 +1,227 @@
> +/*-
> + *   BSD LICENSE
> + *
> + *   Copyright 2017 6WIND S.A.
> + *   Copyright 2017 Mellanox.
> + *
> + *   Redistribution and use in source and binary forms, with or without
> + *   modification, are permitted provided that the following conditions
> + *   are met:
> + *
> + *     * Redistributions of source code must retain the above copyright
> + *       notice, this list of conditions and the following disclaimer.
> + *     * Redistributions in binary form must reproduce the above copyright
> + *       notice, this list of conditions and the following disclaimer in
> + *       the documentation and/or other materials provided with the
> + *       distribution.
> + *     * Neither the name of 6WIND S.A. nor the names of its
> + *       contributors may be used to endorse or promote products derived
> + *       from this software without specific prior written permission.
> + *
> + *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
> + *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
> + *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
> + *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
> + *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
> + *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
> + *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
> + *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
> + *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
> + *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
> + *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
> + */
> +
> +#ifndef _RTE_ETH_FAILSAFE_PRIVATE_H_
> +#define _RTE_ETH_FAILSAFE_PRIVATE_H_
> +
> +#include <rte_dev.h>
> +#include <rte_ethdev.h>
> +#include <rte_devargs.h>
> +
> +#define FAILSAFE_DRIVER_NAME "Fail-safe PMD"
> +
> +#define PMD_FAILSAFE_MAC_KVARG "mac"
> +#define PMD_FAILSAFE_PARAM_STRING      \
> +       "dev(<ifc>),"                   \
> +       "mac=mac_addr"                  \
> +       ""
> +
> +#define FAILSAFE_PLUGIN_DEFAULT_TIMEOUT_MS 2000
> +
> +#define FAILSAFE_MAX_ETHPORTS 2
> +#define FAILSAFE_MAX_ETHADDR 128
> +
> +/* TYPES */
> +
> +struct rxq {
> +       struct fs_priv *priv;
> +       uint16_t qid;
> +       /* id of last sub_device polled */
> +       uint8_t last_polled;
> +       unsigned int socket_id;
> +       struct rte_eth_rxq_info info;
> +};
> +
> +struct txq {
> +       struct fs_priv *priv;
> +       uint16_t qid;
> +       unsigned int socket_id;
> +       struct rte_eth_txq_info info;
> +};
> +
> +enum dev_state {
> +       DEV_UNDEFINED = 0,
> +       DEV_PARSED,
> +       DEV_PROBED,
> +       DEV_ACTIVE,
> +       DEV_STARTED,
> +};
> +
> +struct sub_device {
> +       /* Exhaustive DPDK device description */
> +       struct rte_devargs devargs;
> +       struct rte_bus *bus;
> +       struct rte_device *dev;
> +       struct rte_eth_dev *edev;
> +       /* Device state machine */
> +       enum dev_state state;
> +};
> +
> +struct fs_priv {
> +       struct rte_eth_dev *dev;
> +       /*
> +        * Set of sub_devices.
> +        * subs[0] is the preferred device
> +        * any other is just another slave
> +        */
> +       struct sub_device *subs;
> +       uint8_t subs_head; /* if head == tail, no subs */
> +       uint8_t subs_tail; /* first invalid */
> +       uint8_t subs_tx; /* current emitting device */
> +       uint8_t current_probed;
> +       /* current number of mac_addr slots allocated. */
> +       uint32_t nb_mac_addr;
> +       struct ether_addr mac_addrs[FAILSAFE_MAX_ETHADDR];
> +       uint32_t mac_addr_pool[FAILSAFE_MAX_ETHADDR];
> +       /* current capabilities */
> +       struct rte_eth_dev_info infos;
> +};
> +
> +/* RX / TX */
> +
> +uint16_t failsafe_rx_burst(void *rxq,
> +               struct rte_mbuf **rx_pkts, uint16_t nb_pkts);
> +uint16_t failsafe_tx_burst(void *txq,
> +               struct rte_mbuf **tx_pkts, uint16_t nb_pkts);
> +
> +/* ARGS */
> +
> +int failsafe_args_parse(struct rte_eth_dev *dev, const char *params);
> +void failsafe_args_free(struct rte_eth_dev *dev);
> +int failsafe_args_count_subdevice(struct rte_eth_dev *dev, const char *params);
> +
> +/* EAL */
> +
> +int failsafe_eal_init(struct rte_eth_dev *dev);
> +int failsafe_eal_uninit(struct rte_eth_dev *dev);
> +
> +/* GLOBALS */
> +
> +extern const char pmd_failsafe_driver_name[];
> +extern const struct eth_dev_ops failsafe_ops;
> +extern int mac_from_arg;
> +
> +/* HELPERS */
> +
> +/* dev: (struct rte_eth_dev *) fail-safe device */
> +#define PRIV(dev) \
> +       ((struct fs_priv *)(dev)->data->dev_private)
> +
> +/* sdev: (struct sub_device *) */
> +#define ETH(sdev) \
> +       ((sdev)->edev)
> +
> +/* sdev: (struct sub_device *) */
> +#define PORT_ID(sdev) \
> +       (ETH(sdev)->data->port_id)
> +
> +/**
> + * Stateful iterator construct over fail-safe sub-devices:
> + * s:     (struct sub_device *), iterator
> + * i:     (uint8_t), increment
> + * dev:   (struct rte_eth_dev *), fail-safe ethdev
> + * state: (enum dev_state), minimum acceptable device state
> + */
> +#define FOREACH_SUBDEV_ST(s, i, dev, state)                            \
> +       for (i = fs_find_next((dev), 0, state);                         \
> +            i < PRIV(dev)->subs_tail && (s = &PRIV(dev)->subs[i]);     \
> +            i = fs_find_next((dev), i + 1, state))
> +
> +/**
> + * Iterator construct over fail-safe sub-devices:
> + * s:   (struct sub_device *), iterator
> + * i:   (uint8_t), increment
> + * dev: (struct rte_eth_dev *), fail-safe ethdev
> + */
> +#define FOREACH_SUBDEV(s, i, dev)                      \
> +       FOREACH_SUBDEV_ST(s, i, dev, DEV_UNDEFINED)
> +
> +/* dev: (struct rte_eth_dev *) fail-safe device */
> +#define PREFERRED_SUBDEV(dev) \
> +       (&PRIV(dev)->subs[0])
> +
> +/* dev: (struct rte_eth_dev *) fail-safe device */
> +#define TX_SUBDEV(dev)                                                   \
> +       (PRIV(dev)->subs_tx >= PRIV(dev)->subs_tail                ? NULL \
> +        : (PRIV(dev)->subs[PRIV(dev)->subs_tx].state < DEV_PROBED ? NULL \
> +        : &PRIV(dev)->subs[PRIV(dev)->subs_tx]))
> +
> +/**
> + * s:   (struct sub_device *)
> + * ops: (struct eth_dev_ops) member
> + */
> +#define SUBOPS(s, ops) \
> +       (ETH(s)->dev_ops->ops)
> +
> +#ifndef NDEBUG
> +#include <stdio.h>
> +#define DEBUG__(m, ...)                                                \
> +       (fprintf(stderr, "%s:%d: %s(): " m "%c",                \
> +                __FILE__, __LINE__, __func__, __VA_ARGS__),    \
> +        (void)0)
> +#define DEBUG_(...)                            \
> +       (errno = ((int []){                     \
> +               *(volatile int *)&errno,        \
> +               (DEBUG__(__VA_ARGS__), 0)       \
> +       })[0])
> +#define DEBUG(...) DEBUG_(__VA_ARGS__, '\n')
> +#define INFO(...) DEBUG(__VA_ARGS__)
> +#define WARN(...) DEBUG(__VA_ARGS__)
> +#define ERROR(...) DEBUG(__VA_ARGS__)
> +#else
> +#define DEBUG(...) ((void)0)
> +#define LOG__(level, m, ...) \
> +       RTE_LOG(level, PMD, "net_failsafe: " m "%c", __VA_ARGS__)
> +#define LOG_(level, ...) LOG__(level, __VA_ARGS__, '\n')
> +#define INFO(...) LOG_(INFO, __VA_ARGS__)
> +#define WARN(...) LOG_(WARNING, "WARNING: " __VA_ARGS__)
> +#define ERROR(...) LOG_(ERR, "ERROR: " __VA_ARGS__)
> +#endif
> +
> +/* inlined functions */
> +
> +static inline uint8_t
> +fs_find_next(struct rte_eth_dev *dev, uint8_t sid,
> +               enum dev_state min_state)
> +{
> +       while (sid < PRIV(dev)->subs_tail) {
> +               if (PRIV(dev)->subs[sid].state >= min_state)
> +                       break;
> +               sid++;
> +       }
> +       if (sid >= PRIV(dev)->subs_tail)
> +               return PRIV(dev)->subs_tail;
> +       return sid;
> +}
> +
> +#endif /* _RTE_ETH_FAILSAFE_PRIVATE_H_ */
> diff --git a/drivers/net/failsafe/failsafe_rxtx.c b/drivers/net/failsafe/failsafe_rxtx.c
> new file mode 100644
> index 0000000..a45b4e5
> --- /dev/null
> +++ b/drivers/net/failsafe/failsafe_rxtx.c
> @@ -0,0 +1,107 @@
> +/*-
> + *   BSD LICENSE
> + *
> + *   Copyright 2017 6WIND S.A.
> + *   Copyright 2017 Mellanox.
> + *
> + *   Redistribution and use in source and binary forms, with or without
> + *   modification, are permitted provided that the following conditions
> + *   are met:
> + *
> + *     * Redistributions of source code must retain the above copyright
> + *       notice, this list of conditions and the following disclaimer.
> + *     * Redistributions in binary form must reproduce the above copyright
> + *       notice, this list of conditions and the following disclaimer in
> + *       the documentation and/or other materials provided with the
> + *       distribution.
> + *     * Neither the name of 6WIND S.A. nor the names of its
> + *       contributors may be used to endorse or promote products derived
> + *       from this software without specific prior written permission.
> + *
> + *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
> + *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
> + *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
> + *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
> + *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
> + *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
> + *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
> + *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
> + *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
> + *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
> + *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
> + */
> +
> +#include <rte_mbuf.h>
> +#include <rte_ethdev.h>
> +
> +#include "failsafe_private.h"
> +
> +/*
> + * TODO: write fast version,
> + * without additional checks, to be activated once
> + * everything has been verified to comply.
> + */
> +uint16_t
> +failsafe_rx_burst(void *queue,
> +                 struct rte_mbuf **rx_pkts,
> +                 uint16_t nb_pkts)
> +{
> +       struct fs_priv *priv;
> +       struct sub_device *sdev;
> +       struct rxq *rxq;
> +       void *sub_rxq;
> +       uint16_t nb_rx;
> +       uint8_t nb_polled, nb_subs;
> +       uint8_t i;
> +
> +       rxq = queue;
> +       priv = rxq->priv;
> +       nb_subs = priv->subs_tail - priv->subs_head;
> +       nb_polled = 0;
> +       for (i = rxq->last_polled; nb_polled < nb_subs; nb_polled++) {
> +               i++;
> +               if (i == priv->subs_tail)
> +                       i = priv->subs_head;
> +               sdev = &priv->subs[i];
> +               if (unlikely(ETH(sdev) == NULL))
> +                       continue;
> +               if (unlikely(ETH(sdev)->rx_pkt_burst == NULL))
> +                       continue;
> +               if (unlikely(sdev->state != DEV_STARTED))
> +                       continue;
> +               sub_rxq = ETH(sdev)->data->rx_queues[rxq->qid];
> +               nb_rx = ETH(sdev)->
> +                       rx_pkt_burst(sub_rxq, rx_pkts, nb_pkts);
> +               if (nb_rx) {
> +                       rxq->last_polled = i;
> +                       return nb_rx;
> +               }
> +       }
> +       return 0;
> +}
> +
> +/*
> + * TODO: write fast version,
> + * without additional checks, to be activated once
> + * everything has been verified to comply.
> + */
> +uint16_t
> +failsafe_tx_burst(void *queue,
> +                 struct rte_mbuf **tx_pkts,
> +                 uint16_t nb_pkts)
> +{
> +       struct sub_device *sdev;
> +       struct txq *txq;
> +       void *sub_txq;
> +
> +       txq = queue;
> +       sdev = TX_SUBDEV(txq->priv->dev);
> +       if (unlikely(sdev == NULL))
> +               return 0;
> +       if (unlikely(ETH(sdev) == NULL))
> +               return 0;
> +       if (unlikely(ETH(sdev)->tx_pkt_burst == NULL))
> +               return 0;
> +       sub_txq = ETH(sdev)->data->tx_queues[txq->qid];
> +       return ETH(sdev)->tx_pkt_burst(sub_txq, tx_pkts, nb_pkts);
> +}
> diff --git a/drivers/net/failsafe/rte_pmd_failsafe_version.map b/drivers/net/failsafe/rte_pmd_failsafe_version.map
> new file mode 100644
> index 0000000..b6d2840
> --- /dev/null
> +++ b/drivers/net/failsafe/rte_pmd_failsafe_version.map
> @@ -0,0 +1,4 @@
> +DPDK_17.08 {
> +
> +       local: *;
> +};
> diff --git a/mk/rte.app.mk b/mk/rte.app.mk
> index dbd3614..d7581b7 100644
> --- a/mk/rte.app.mk
> +++ b/mk/rte.app.mk
> @@ -120,6 +120,7 @@ _LDLIBS-$(CONFIG_RTE_LIBRTE_E1000_PMD)      += -lrte_pmd_e1000
>  _LDLIBS-$(CONFIG_RTE_LIBRTE_ENA_PMD)        += -lrte_pmd_ena
>  _LDLIBS-$(CONFIG_RTE_LIBRTE_ENIC_PMD)       += -lrte_pmd_enic
>  _LDLIBS-$(CONFIG_RTE_LIBRTE_FM10K_PMD)      += -lrte_pmd_fm10k
> +_LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_FAILSAFE)   += -lrte_pmd_failsafe
>  _LDLIBS-$(CONFIG_RTE_LIBRTE_I40E_PMD)       += -lrte_pmd_i40e
>  _LDLIBS-$(CONFIG_RTE_LIBRTE_IXGBE_PMD)      += -lrte_pmd_ixgbe
>  ifeq ($(CONFIG_RTE_LIBRTE_KNI),y)
> --
> 2.1.4
>


More information about the dev mailing list