[dpdk-dev] [PATCH v3 1/4] net/softnic: add softnic PMD

Singh, Jasvinder jasvinder.singh at intel.com
Fri Sep 8 11:30:28 CEST 2017


Hi Ferruh,

Thank you for the review and feedback. Please see inline response;

> -----Original Message-----
> From: Yigit, Ferruh
> Sent: Tuesday, September 5, 2017 3:53 PM
> To: Singh, Jasvinder <jasvinder.singh at intel.com>; dev at dpdk.org
> Cc: Dumitrescu, Cristian <cristian.dumitrescu at intel.com>;
> thomas at monjalon.net
> Subject: Re: [PATCH v3 1/4] net/softnic: add softnic PMD
> 
> On 8/11/2017 1:49 PM, Jasvinder Singh wrote:
> > Add SoftNIC PMD to provide SW fall-back for ethdev APIs.
> >
> > Signed-off-by: Cristian Dumitrescu <cristian.dumitrescu at intel.com>
> > Signed-off-by: Jasvinder Singh <jasvinder.singh at intel.com>
> > ---
> > v3 changes:
> > - rebase to dpdk17.08 release
> >
> > v2 changes:
> > - fix build errors
> > - rebased to TM APIs v6 plus dpdk master
> >
> >  MAINTAINERS                                     |   5 +
> >  config/common_base                              |   5 +
> >  drivers/net/Makefile                            |   5 +
> >  drivers/net/softnic/Makefile                    |  56 +++
> >  drivers/net/softnic/rte_eth_softnic.c           | 609
> ++++++++++++++++++++++++
> >  drivers/net/softnic/rte_eth_softnic.h           |  54 +++
> >  drivers/net/softnic/rte_eth_softnic_internals.h | 114 +++++
> >  drivers/net/softnic/rte_eth_softnic_version.map |   7 +
> >  mk/rte.app.mk                                   |   5 +-
> 
> Also documentation updates are required:
> - .ini file
> - PMD documentation .rst file
> - I believe it is good to update release note about new PMD
> - release notes library version info, since this has public API

Will send documentation patch.

> <...>
> 
> > +EXPORT_MAP := rte_eth_softnic_version.map
> 
> rte_pmd_... to be consistent.
> 
> <...>

Will do.

> > +#
> > +# Export include files
> > +#
> > +SYMLINK-y-include +=rte_eth_softnic.h
> 
> space after +=
> 

Will add space.
> 
> > diff --git a/drivers/net/softnic/rte_eth_softnic.c
> > b/drivers/net/softnic/rte_eth_softnic.c
> <...>
> > +
> > +static struct rte_vdev_driver pmd_drv;
> 
> Why this is required, already defined below.
> And for naming, pmd=poll mode driver, drv=driver, makes "poll mode driver
> driver"
> 

Ok. will correct this.

> <...>
> 
> > +static int
> > +pmd_rx_queue_setup(struct rte_eth_dev *dev,
> > +	uint16_t rx_queue_id,
> > +	uint16_t nb_rx_desc __rte_unused,
> > +	unsigned int socket_id,
> > +	const struct rte_eth_rxconf *rx_conf __rte_unused,
> > +	struct rte_mempool *mb_pool __rte_unused) {
> > +	struct pmd_internals *p = dev->data->dev_private;
> > +
> > +	if (p->params.soft.intrusive == 0) {
> > +		struct pmd_rx_queue *rxq;
> > +
> > +		rxq = rte_zmalloc_socket(p->params.soft.name,
> > +			sizeof(struct pmd_rx_queue), 0, socket_id);
> > +		if (rxq == NULL)
> > +			return -1;
> 
> return -ENOMEM ?

Ok.
 
> > +
> > +		rxq->hard.port_id = p->hard.port_id;
> > +		rxq->hard.rx_queue_id = rx_queue_id;
> > +		dev->data->rx_queues[rx_queue_id] = rxq;
> > +	} else {
> > +		struct rte_eth_dev *hard_dev =
> > +			&rte_eth_devices[p->hard.port_id];> +
> 	void *rxq = hard_dev->data->rx_queues[rx_queue_id];
> > +
> > +		if (rxq == NULL)
> > +			return -1;
> > +
> > +		dev->data->rx_queues[rx_queue_id] = rxq;
> 
> This assigns underlying hw queue as this soft PMD queue, what happens if
> two different cores, one polls the actual hw device and other polls the this
> virtual device, since both are indeed same queues?

Once soft device is created and attached to hard device, application has to reads packets from/writes packets to the "soft" port instead of the "hard" port as soft device is feature rich
version of the hard device (See Cover letter notes). The RX and TX queues of the "soft" port are thread safe, as any ethdev. 
 
> > +	}
> > +	return 0;
> > +}
> > +
> 
> <...>
> 
> > +static __rte_always_inline int
> > +rte_pmd_softnic_run_default(struct rte_eth_dev *dev) {
> > +	struct pmd_internals *p = dev->data->dev_private;
> > +
> > +	/* Persistent context: Read Only (update not required) */
> > +	struct rte_mbuf **pkts = p->soft.def.pkts;
> > +	uint16_t nb_tx_queues = dev->data->nb_tx_queues;
> > +
> > +	/* Persistent context: Read - Write (update required) */
> > +	uint32_t txq_pos = p->soft.def.txq_pos;
> > +	uint32_t pkts_len = p->soft.def.pkts_len;
> > +	uint32_t flush_count = p->soft.def.flush_count;
> > +
> > +	/* Not part of the persistent context */
> > +	uint32_t pos;
> > +	uint16_t i;
> > +
> > +	/* Soft device TXQ read, Hard device TXQ write */
> > +	for (i = 0; i < nb_tx_queues; i++) {
> > +		struct rte_ring *txq = dev->data->tx_queues[txq_pos];
> > +
> > +		/* Read soft device TXQ burst to packet enqueue buffer */
> > +		pkts_len += rte_ring_sc_dequeue_burst(txq,
> > +			(void **) &pkts[pkts_len],
> > +			DEFAULT_BURST_SIZE,
> > +			NULL);
> > +
> > +		/* Increment soft device TXQ */
> > +		txq_pos++;
> > +		if (txq_pos >= nb_tx_queues)
> > +			txq_pos = 0;
> > +
> > +		/* Hard device TXQ write when complete burst is available */
> > +		if (pkts_len >= DEFAULT_BURST_SIZE) {
> 
> There questions:
> 1- When there are multiple tx_queues of softnic, and assume all will be
> processed by a core, this core will be reading from all into single HW queue,
> won' this create a bottle neck?

I am not sure if I understand correctly. As per QoS sched library implementation, the number of tx queues of the softnic depend upon the number of users sending their traffic and configurable via one of the input
argument for device creation. There doesn't exist any mapping between the softnic tx queues and hard device tx queues.  The softnic device receives the packets in its scheduling queues (tx queues) and prioritizes their transmission
and transmit them accordingly to specific queue of the hard device(can be specified as an input argument). It would be redundant for thread implementing the QoS scheduler to distribute the packets among the hard device tx queues which actually doesn't serve any purpose.
 
> 2- This logic reads from all queues as BURST_SIZE and merges them, if
> queues split with a RSS or similar, that clasiffication will be lost, will it be
> problem?

I don't think so. The QoS scheduler sits on the tx side just before the transmission stage and receives the packet burst destined for the specific network interface to which it is attached.
Thus, it schedules the packets egressing through the specific port instead of merging the packets going to different interfaces.
 
> 3- If there is not enough packets in the queues ( < DEFAULT_BURST_SIZE)
> those packets won't be transmitted unless more is comming, will this create
> latency for those cases?

In case of low traffic rate situation, packets will be automatically flushed at specific interval as discussed below.

> 
> > +			for (pos = 0; pos < pkts_len; )
> > +				pos += rte_eth_tx_burst(p->hard.port_id,
> > +					p->params.hard.tx_queue_id,
> > +					&pkts[pos],
> > +					(uint16_t) (pkts_len - pos));
> > +
> > +			pkts_len = 0;
> > +			flush_count = 0;
> > +			break;
> > +		}
> > +	}
> > +
> > +	if (flush_count >= FLUSH_COUNT_THRESHOLD) {
> 
> FLUSH_COUNT_THRESHOLD is (1 << 17), and if no packet is sent, flash count
> incremented by one, just want to confirm the treshold value?
> 
> And why this flush exists?

Flush mechanism comes in play when traffic rate is very low. In such instance, packet flush will be triggered once threshold value is satisfied. For example,  cpu core spining at 2.0 GHz, as per current
setting, the packet flush will happen at ~65us interval in case of packet burst size is less than set value.

> > +		for (pos = 0; pos < pkts_len; )
> > +			pos += rte_eth_tx_burst(p->hard.port_id,
> > +				p->params.hard.tx_queue_id,
> > +				&pkts[pos],
> > +				(uint16_t) (pkts_len - pos));
> > +
> > +		pkts_len = 0;
> > +		flush_count = 0;
> > +	}
> > +
> > +	p->soft.def.txq_pos = txq_pos;
> > +	p->soft.def.pkts_len = pkts_len;
> > +	p->soft.def.flush_count = flush_count + 1;
> > +
> > +	return 0;
> > +}
> > +
> > +int
> > +rte_pmd_softnic_run(uint8_t port_id)
> > +{
> > +	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
> 
> It can be possible to create a macro for this.

Ok. Will do.
 
> <...>
> 
> > +static int
> > +default_init(struct pmd_internals *p,
> 
> default_mbufs_init()? default_init() on its own in not that clear.
> 
> <...>
> 
> > +static void
> > +default_free(struct pmd_internals *p)
> 
> default_mbufs_free()?

The generic  name is chosen if we initialize and free more parameters than mbufs.
 
> <...>
> 
> > +static void *
> > +pmd_init(struct pmd_params *params, int numa_node) {
> > +	struct pmd_internals *p;
> > +	int status;
> > +
> > +	p = rte_zmalloc_socket(params->soft.name,
> > +		sizeof(struct pmd_internals),
> > +		0,
> > +		numa_node);
> > +	if (p == NULL)
> > +		return NULL;
> > +
> > +	memcpy(&p->params, params, sizeof(p->params));
> > +	rte_eth_dev_get_port_by_name(params->hard.name, &p-
> >hard.port_id);
> 
> You may want to check return value of this.

Will add check.
 
> > +
> > +	/* Default */
> > +	status = default_init(p, params, numa_node);
> > +	if (status) {
> > +		rte_free(p);
> > +		return NULL;
> > +	}
> > +
> > +	return p;
> > +}
> > +
> > +static void
> > +pmd_free(struct pmd_internals *p)
> > +{
> > +	default_free(p);
> 
> p->hard.name also needs to be freed here.

No, we don't allocate any memory to this varibale as it points to the value retrieved from the rte_eth_dev_get_port_by_name();
 
> > +
> > +	rte_free(p);
> > +}
> > +
> > +static int
> > +pmd_ethdev_register(struct rte_vdev_device *vdev,
> > +	struct pmd_params *params,
> > +	void *dev_private)
> > +{
> > +	struct rte_eth_dev_info hard_info;
> > +	struct rte_eth_dev *soft_dev;
> > +	struct rte_eth_dev_data *soft_data;
> > +	uint32_t hard_speed;
> > +	int numa_node;
> > +	uint8_t hard_port_id;
> > +
> > +	rte_eth_dev_get_port_by_name(params->hard.name,
> &hard_port_id);
> > +	rte_eth_dev_info_get(hard_port_id, &hard_info);
> > +	hard_speed = eth_dev_speed_max_mbps(hard_info.speed_capa);
> > +	numa_node = rte_eth_dev_socket_id(hard_port_id);
> > +
> > +	/* Memory allocation */
> > +	soft_data = rte_zmalloc_socket(params->soft.name,
> > +		sizeof(*soft_data), 0, numa_node);
> > +	if (!soft_data)
> > +		return -ENOMEM;
> > +
> > +	/* Ethdev entry allocation */
> > +	soft_dev = rte_eth_dev_allocate(params->soft.name);
> > +	if (!soft_dev) {
> > +		rte_free(soft_data);
> > +		return -ENOMEM;
> > +	}
> > +
> > +	/* Connect dev->data */
> > +	memmove(soft_data->name,
> > +		soft_dev->data->name,
> > +		sizeof(soft_data->name));
> 
> I guess this is redundant here, allocating soft_data and rest, it is possible to
> use soft_dev->data directly.

Yes,  will correct this.
 
> > +	soft_data->port_id = soft_dev->data->port_id;
> > +	soft_data->mtu = soft_dev->data->mtu;
> > +	soft_dev->data = soft_data;
> > +
> > +	/* dev */
> > +	soft_dev->rx_pkt_burst = (params->soft.intrusive) ?
> > +		NULL : /* set up later */
> > +		pmd_rx_pkt_burst;
> > +	soft_dev->tx_pkt_burst = pmd_tx_pkt_burst;
> > +	soft_dev->tx_pkt_prepare = NULL;
> > +	soft_dev->dev_ops = &pmd_ops;
> > +	soft_dev->device = &vdev->device;
> > +
> > +	/* dev->data */
> > +	soft_dev->data->dev_private = dev_private;
> > +	soft_dev->data->dev_link.link_speed = hard_speed;
> > +	soft_dev->data->dev_link.link_duplex = ETH_LINK_FULL_DUPLEX;
> > +	soft_dev->data->dev_link.link_autoneg = ETH_LINK_SPEED_FIXED;
> > +	soft_dev->data->dev_link.link_status = ETH_LINK_DOWN;
> 
> For simplity, it is possible to have a static struct rte_eth_link, and assing it to
> data->dev_link, as done in null pmd.

The device speed is determined from that of hard device, so thought to assign value explicitly here.
 
> > +	soft_dev->data->mac_addrs = &eth_addr;
> > +	soft_dev->data->promiscuous = 1;
> > +	soft_dev->data->kdrv = RTE_KDRV_NONE;
> > +	soft_dev->data->numa_node = numa_node;
> 
> If pmd is detachable, need following flag:
> data->dev_flags = RTE_ETH_DEV_DETACHABLE;

Ok. Will do that.
 
> > +
> > +	return 0;
> > +}
> > +
> 
> <...>
> 
> > +static int
> > +pmd_probe(struct rte_vdev_device *vdev) {
> > +	struct pmd_params p;
> > +	const char *params;
> > +	int status;
> > +
> > +	struct rte_eth_dev_info hard_info;
> > +	uint8_t hard_port_id;
> > +	int numa_node;
> > +	void *dev_private;
> > +
> > +	if (!vdev)
> > +		return -EINVAL;
> 
> This check is not required, eal won't call this function with NULL vdev.

Ok. Will correct this.
 
> <...>
> 
> > diff --git a/drivers/net/softnic/rte_eth_softnic.h
> > b/drivers/net/softnic/rte_eth_softnic.h
> <...>
> > +int
> > +rte_pmd_softnic_run(uint8_t port_id);
> 
> Since this is public API, this needs to be commented properly, with doxygen
> comment.
>
> Btw, since there is API in this PMD perhaps api documentation also needs to
> be updated to include this.

Yes, will add documentation.
> <...>


More information about the dev mailing list