[PATCH v2 0/5] net/mlx5: introduce Tx datapath tracing

Slava Ovsiienko viacheslavo at nvidia.com
Tue Jun 27 13:24:14 CEST 2023



> -----Original Message-----
> From: Thomas Monjalon <thomas at monjalon.net>
> Sent: Tuesday, June 27, 2023 3:46 AM
> To: Slava Ovsiienko <viacheslavo at nvidia.com>
> Cc: dev at dpdk.org; Raslan Darawsheh <rasland at nvidia.com>;
> rjarry at redhat.com; jerinj at marvell.com
> Subject: Re: [PATCH v2 0/5] net/mlx5: introduce Tx datapath tracing
> 
> 20/06/2023 14:00, Raslan Darawsheh:
> > Hi,
> >
> > > -----Original Message-----
> > > From: Viacheslav Ovsiienko <viacheslavo at nvidia.com>
> > > Sent: Tuesday, June 13, 2023 7:59 PM
> > > To: dev at dpdk.org
> > > Subject: [PATCH v2 0/5] net/mlx5: introduce Tx datapath tracing
> > >
> > > The mlx5 provides the send scheduling on specific moment of time,
> > > and for the related kind of applications it would be extremely
> > > useful to have extra debug information - when and how packets were
> > > scheduled and when the actual sending was completed by the NIC
> > > hardware (it helps application to track the internal delay issues).
> > >
> > > Because the DPDK tx datapath API does not suppose getting any
> > > feedback from the driver and the feature looks like to be mlx5
> > > specific, it seems to be reasonable to engage exisiting DPDK datapath
> tracing capability.
> > >
> > > The work cycle is supposed to be:
> > >   - compile appplication with enabled tracing
> > >   - run application with EAL parameters configuring the tracing in mlx5
> > >     Tx datapath
> > >   - store the dump file with gathered tracing information
> > >   - run analyzing scrypt (in Python) to combine related events (packet
> > >     firing and completion) and see the data in human-readable view
> > >
> > > Below is the detailed instruction "how to" with mlx5 NIC to gather
> > > all the debug data including the full timings information.
> > >
> > >
> > > 1. Build DPDK application with enabled datapath tracing
> > >
> > > The meson option should be specified:
> > >    --enable_trace_fp=true
> > >
> > > The c_args shoudl be specified:
> > >    -DALLOW_EXPERIMENTAL_API
> > >
> > > The DPDK configuration examples:
> > >
> > >   meson configure --buildtype=debug -Denable_trace_fp=true
> > >         -Dc_args='-DRTE_LIBRTE_MLX5_DEBUG -DRTE_ENABLE_ASSERT -
> > > DALLOW_EXPERIMENTAL_API' build
> > >
> > >   meson configure --buildtype=debug -Denable_trace_fp=true
> > >         -Dc_args='-DRTE_ENABLE_ASSERT -DALLOW_EXPERIMENTAL_API'
> > > build
> > >
> > >   meson configure --buildtype=release -Denable_trace_fp=true
> > >         -Dc_args='-DRTE_ENABLE_ASSERT -DALLOW_EXPERIMENTAL_API'
> > > build
> > >
> > >   meson configure --buildtype=release -Denable_trace_fp=true
> > >         -Dc_args='-DALLOW_EXPERIMENTAL_API' build
> > >
> > >
> > > 2. Configuring the NIC
> > >
> > > If the sending completion timings are important the NIC should be
> > > configured to provide realtime timestamps, the
> > > REAL_TIME_CLOCK_ENABLE NV settings parameter should be configured
> to
> > > TRUE, for example with command (and with following FW/driver reset):
> > >
> > >   sudo mlxconfig -d /dev/mst/mt4125_pciconf0 s
> > > REAL_TIME_CLOCK_ENABLE=1
> > >
> > >
> > > 3. Run DPDK application to gather the traces
> > >
> > > EAL parameters controlling trace capability in runtime
> > >
> > >   --trace=pmd.net.mlx5.tx - the regular expression enabling the tracepoints
> > >                             with matching names at least "pmd.net.mlx5.tx"
> > >                             must be enabled to gather all events needed
> > >                             to analyze mlx5 Tx datapath and its timings.
> > >                             By default all tracepoints are disabled.
> > >
> > >   --trace-dir=/var/log - trace storing directory
> > >
> > >   --trace-bufsz=<val>B|<val>K|<val>M - optional, trace data buffer size
> > >                                        per thread. The default is 1MB.
> > >
> > >   --trace-mode=overwrite|discard  - optional, selects trace data buffer
> mode.
> > >
> > >
> > > 4. Installing or Building Babeltrace2 Package
> > >
> > > The gathered trace data can be analyzed with a developed Python script.
> > > To parse the trace, the data script uses the Babeltrace2 library.
> > > The package should be either installed or built from source code as
> > > shown below:
> > >
> > >   git clone https://github.com/efficios/babeltrace.git
> > >   cd babeltrace
> > >   ./bootstrap
> > >   ./configure -help
> > >   ./configure --disable-api-doc --disable-man-pages
> > >               --disable-python-bindings-doc --enbale-python-plugins
> > >               --enable-python-binding
> > >
> > > 5. Running the Analyzing Script
> > >
> > > The analyzing script is located in the folder:
> > > ./drivers/net/mlx5/tools It requires Python3.6, Babeltrace2 packages
> > > and it takes the only parameter of trace data file. For example:
> > >
> > >    ./mlx5_trace.py /var/log/rte-2023-01-23-AM-11-52-39
> > >
> > >
> > > 6. Interpreting the Script Output Data
> > >
> > > All the timings are given in nanoseconds.
> > > The list of Tx (and coming Rx) bursts per port/queue is presented in
> > > the output.
> > > Each list element contains the list of built WQEs with specific
> > > opcodes, and each WQE contains the list of the encompassed packets to
> send.
> 
> This information should be in the documentation.
OK, should we make this cover-letter part of mlx5.rst?

> 
> I think we should request a review of the Python script from people familiar
> with tracing and from people more familiar with Python scripting for user
> tools.
Would be very helpful, could you recommend/ask someone?

With best regards,
Slava



> 



More information about the dev mailing list