diff mbox series

[v4,2/2] doc: add guide for debug and troubleshoot

Message ID	20190116145452.53835-3-vipin.varghese@intel.com (mailing list archive)
State	Superseded, archived
Delegated to:	Thomas Monjalon
Headers	From: Vipin Varghese <vipin.varghese@intel.com> To: dev@dpdk.org, shreyansh.jain@nxp.com, thomas@monjalon.net Cc: john.mcnamara@intel.com, marko.kovacevic@intel.com, amol.patel@intel.com, sanjay.padubidri@intel.com, Vipin Varghese <vipin.varghese@intel.com> Date: Wed, 16 Jan 2019 20:24:52 +0530 Message-Id: <20190116145452.53835-3-vipin.varghese@intel.com> In-Reply-To: <20190116145452.53835-1-vipin.varghese@intel.com> References: <20181126070815.37501-2-vipin.varghese@intel.com> <20190116145452.53835-1-vipin.varghese@intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=y Content-Transfer-Encoding: 8bit Subject: [dpdk-dev] [PATCH v4 2/2] doc: add guide for debug and troubleshoot Precedence: list Errors-To: dev-bounces@dpdk.org Sender: "dev" <dev-bounces@dpdk.org>
Series	doci/howto: add debug and troubleshoot guide \| [v4,0/2] doci/howto: add debug and troubleshoot guide [v4,1/2] doc: add svg for debug and troubleshoot guide [v4,2/2] doc: add guide for debug and troubleshoot

Checks

Context	Check	Description
ci/checkpatch	success	coding style OK
ci/Intel-compilation	success	Compilation OK

Commit Message

Varghese, Vipin Jan. 16, 2019, 2:54 p.m. UTC

  Add user guide on debug and troubleshoot for common issues and bottleneck
found in sample application model.

Signed-off-by: Vipin Varghese <vipin.varghese@intel.com>
Acked-by: Marko Kovacevic <marko.kovacevic@intel.com>
---
 doc/guides/howto/debug_troubleshoot_guide.rst | 375 ++++++++++++++++++
 doc/guides/howto/index.rst                    |   1 +
 2 files changed, 376 insertions(+)
 create mode 100644 doc/guides/howto/debug_troubleshoot_guide.rst

Comments

Kovacevic, Marko Jan. 18, 2019, 3:28 p.m. UTC | #1

After checking the patch again I found a few spelling mistakes

> Add user guide on debug and troubleshoot for common issues and
> bottleneck found in sample application model.
> 
> Signed-off-by: Vipin Varghese <vipin.varghese@intel.com>
> Acked-by: Marko Kovacevic <marko.kovacevic@intel.com>
> ---
>  doc/guides/howto/debug_troubleshoot_guide.rst | 375
> ++++++++++++++++++
>  doc/guides/howto/index.rst                    |   1 +
>  2 files changed, 376 insertions(+)
>  create mode 100644 doc/guides/howto/debug_troubleshoot_guide.rst
>

<...>
 
receieve / receive

> +    -  If stats for RX and drops updated on same queue? check receieve
> thread
> +    -  If packet does not reach PMD? check if offload for port and queue
> +       matches to traffic pattern send.
> +

<...>

Offlaod/ offload
 
> +    -  Is the packet multi segmented? Check if port and queue offlaod is set.
> +
> +Are there object drops in producer point for ring?
> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

<...>

sufficent / sufficient 

> +    -  Are drops on specific socket? If yes check if there are sufficent
> +       objects by rte_mempool_get_count() or rte_mempool_avail_count()
> +    -  Is 'rte_mempool_get_count() or rte_mempool_avail_count()' zero?
> +       application requires more objects hence reconfigure number of
> +       elements in rte_mempool_create().
> +    -  Is there single RX thread for multiple NIC? try having multiple
> +       lcore to read from fixed interface or we might be hitting cache
> +       limit, so increase cache_size for pool_create().
> +

Sceanrios/ scenarios
 
> +#. Is performance low for some sceanrios?
> +    -  Check if sufficient objects in mempool by rte_mempool_avail_count()
> +    -  Is failure seen in some packets? we might be getting packets with
> +       'size > mbuf data size'.
> +    -  Is NIC offload or application handling multi segment mbuf? check the
> +       special packets are continuous with rte_pktmbuf_is_contiguous().
> +    -  If there separate user threads used to access mempool objects, use
> +       rte_mempool_cache_create() for non DPDK threads.

debuging / debugging 

> +    -  Is the error reproducible with 1GB hugepage? If no, then try debuging
> +       the issue with lookup table or objects with rte_mem_lock_page().
> +
> +.. note::
> +  Stall in release of MBUF can be because

<...>

softwre / software

> +    -  If softwre crypto is in use, check if the CRYPTO Library is build with
> +       right (SIMD) flags or check if the queue pair using CPU ISA for
> +       feature_flags AVX|SSE|NEON using rte_cryptodev_info_get()

Assited/ assisted 

> +    -  If its hardware assited crypto showing performance variance? Check if
> +       hardware is on same NUMA socket as queue pair and session pool.
> +

<...>

exceeeding / exceeding 

> +       core? registered functions may be exceeeding the desired time slots
> +       while running on same service core.
> +    -  Is function is running on RTE core? check if there are conflicting
> +       functions running on same CPU core by rte_thread_get_affinity().
> +

<...>

> +#. Where to capture packets?
> +    -  Enable pdump in primary to allow secondary to access queue-pair for
> +       ports. Thus packets are copied over in RX|TX callback by secondary
> +       process using ring buffers.
> +    -  To capture packet in middle of pipeline stage, user specific hooks
> +       or callback are to be used to copy the packets. These packets can

secodnary / secondary 

> +       be shared to secodnary process via user defined custom rings.
> +
> +Issue still persists?
> +~~~~~~~~~~~~~~~~~~~~~
> +
> +#. Are there custom or vendor specific offload meta data?
> +    -  From PMD, then check for META data error and drops.
> +    -  From application, then check for META data error and drops.
> +#. Is multiprocess is used configuration and data processing?
> +    -  Check enabling or disabling features from secondary is supported or
> not?

Obejcts/ objects 

> +#. Is there drops for certain scenario for packets or obejcts?
> +    -  Check user private data in objects by dumping the details for debug.
> +
<...>

Thanks,
Marko K

Varghese, Vipin Jan. 21, 2019, 3:38 a.m. UTC | #2

Thanks Marko, I will spin v5 with the changes asap.

Note: Just wondering why 'devtools/checkpatches.sh' did not report any error.

Thanks
Vipin Varghese

> -----Original Message-----
> From: Kovacevic, Marko
> Sent: Friday, January 18, 2019 8:59 PM
> To: Varghese, Vipin <vipin.varghese@intel.com>; dev@dpdk.org;
> shreyansh.jain@nxp.com; thomas@monjalon.net
> Cc: Mcnamara, John <john.mcnamara@intel.com>; Patel, Amol
> <amol.patel@intel.com>; Padubidri, Sanjay A <sanjay.padubidri@intel.com>
> Subject: RE: [PATCH v4 2/2] doc: add guide for debug and troubleshoot
> 
> After checking the patch again I found a few spelling mistakes
> 
> > Add user guide on debug and troubleshoot for common issues and
> > bottleneck found in sample application model.
> >
> > Signed-off-by: Vipin Varghese <vipin.varghese@intel.com>
> > Acked-by: Marko Kovacevic <marko.kovacevic@intel.com>
> > ---
> >  doc/guides/howto/debug_troubleshoot_guide.rst | 375
> > ++++++++++++++++++
> >  doc/guides/howto/index.rst                    |   1 +
> >  2 files changed, 376 insertions(+)
> >  create mode 100644 doc/guides/howto/debug_troubleshoot_guide.rst
> >
> 
> <...>
> 
> receieve / receive
> 
> > +    -  If stats for RX and drops updated on same queue? check
> > + receieve
> > thread
> > +    -  If packet does not reach PMD? check if offload for port and queue
> > +       matches to traffic pattern send.
> > +
> 
> <...>
> 
> Offlaod/ offload
> 
> > +    -  Is the packet multi segmented? Check if port and queue offlaod is set.
> > +
> > +Are there object drops in producer point for ring?
> > +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> 
> <...>
> 
> sufficent / sufficient
> 
> > +    -  Are drops on specific socket? If yes check if there are sufficent
> > +       objects by rte_mempool_get_count() or rte_mempool_avail_count()
> > +    -  Is 'rte_mempool_get_count() or rte_mempool_avail_count()' zero?
> > +       application requires more objects hence reconfigure number of
> > +       elements in rte_mempool_create().
> > +    -  Is there single RX thread for multiple NIC? try having multiple
> > +       lcore to read from fixed interface or we might be hitting cache
> > +       limit, so increase cache_size for pool_create().
> > +
> 
> Sceanrios/ scenarios
> 
> > +#. Is performance low for some sceanrios?
> > +    -  Check if sufficient objects in mempool by rte_mempool_avail_count()
> > +    -  Is failure seen in some packets? we might be getting packets with
> > +       'size > mbuf data size'.
> > +    -  Is NIC offload or application handling multi segment mbuf? check the
> > +       special packets are continuous with rte_pktmbuf_is_contiguous().
> > +    -  If there separate user threads used to access mempool objects, use
> > +       rte_mempool_cache_create() for non DPDK threads.
> 
> debuging / debugging
> 
> > +    -  Is the error reproducible with 1GB hugepage? If no, then try debuging
> > +       the issue with lookup table or objects with rte_mem_lock_page().
> > +
> > +.. note::
> > +  Stall in release of MBUF can be because
> 
> <...>
> 
> softwre / software
> 
> > +    -  If softwre crypto is in use, check if the CRYPTO Library is build with
> > +       right (SIMD) flags or check if the queue pair using CPU ISA for
> > +       feature_flags AVX|SSE|NEON using rte_cryptodev_info_get()
> 
> Assited/ assisted
> 
> > +    -  If its hardware assited crypto showing performance variance? Check if
> > +       hardware is on same NUMA socket as queue pair and session pool.
> > +
> 
> <...>
> 
> exceeeding / exceeding
> 
> > +       core? registered functions may be exceeeding the desired time slots
> > +       while running on same service core.
> > +    -  Is function is running on RTE core? check if there are conflicting
> > +       functions running on same CPU core by rte_thread_get_affinity().
> > +
> 
> <...>
> 
> > +#. Where to capture packets?
> > +    -  Enable pdump in primary to allow secondary to access queue-pair for
> > +       ports. Thus packets are copied over in RX|TX callback by secondary
> > +       process using ring buffers.
> > +    -  To capture packet in middle of pipeline stage, user specific hooks
> > +       or callback are to be used to copy the packets. These packets
> > +can
> 
> secodnary / secondary
> 
> > +       be shared to secodnary process via user defined custom rings.
> > +
> > +Issue still persists?
> > +~~~~~~~~~~~~~~~~~~~~~
> > +
> > +#. Are there custom or vendor specific offload meta data?
> > +    -  From PMD, then check for META data error and drops.
> > +    -  From application, then check for META data error and drops.
> > +#. Is multiprocess is used configuration and data processing?
> > +    -  Check enabling or disabling features from secondary is
> > +supported or
> > not?
> 
> Obejcts/ objects
> 
> > +#. Is there drops for certain scenario for packets or obejcts?
> > +    -  Check user private data in objects by dumping the details for debug.
> > +
> <...>
> 
> Thanks,
> Marko K

diff mbox series

Patch

diff --git a/doc/guides/howto/debug_troubleshoot_guide.rst b/doc/guides/howto/debug_troubleshoot_guide.rst
new file mode 100644
index 000000000..f2e337bb1
--- /dev/null
+++ b/doc/guides/howto/debug_troubleshoot_guide.rst
@@ -0,0 +1,375 @@ 
+..  SPDX-License-Identifier: BSD-3-Clause
+    Copyright(c) 2018 Intel Corporation.
+
+.. _debug_troubleshoot_via_pmd:
+
+Debug & Troubleshoot guide via PMD
+==================================
+
+DPDK applications can be designed to run as single thread simple stage to
+multiple threads with complex pipeline stages. These application can use poll
+mode devices which helps in offloading CPU cycles. A few models are
+
+  *  single primary
+  *  multiple primary
+  *  single primary single secondary
+  *  single primary multiple secondary
+
+In all the above cases, it is a tedious task to isolate, debug and understand
+odd behaviour which occurs randomly or periodically. The goal of guide is to
+share and explore a few commonly seen patterns and behaviour. Then, isolate
+and identify the root cause via step by step debug at various processing
+stages.
+
+Application Overview
+--------------------
+
+Let us take up an example application as reference for explaining issues and
+patterns commonly seen. The sample application in discussion makes use of
+single primary model with various pipeline stages. The application uses PMD
+and libraries such as service cores, mempool, pkt mbuf, event, crypto, QoS
+and eth.
+
+The overview of an application modeled using PMD is shown in
+:numref:`dtg_sample_app_model`.
+
+.. _dtg_sample_app_model:
+
+.. figure:: img/dtg_sample_app_model.*
+
+   Overview of pipeline stage of an application
+
+Bottleneck Analysis
+-------------------
+
+To debug the bottleneck and performance issues the desired application
+is made to run in an environment matching as below
+
+#. Linux 64-bit|32-bit
+#. DPDK PMD and libraries are used
+#. Libraries and PMD are either static or shared. But not both
+#. Machine flag optimizations of gcc or compiler are made constant
+
+Is there mismatch in packet rate (received < send)?
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+RX Port and associated core :numref:`dtg_rx_rate`.
+
+.. _dtg_rx_rate:
+
+.. figure:: img/dtg_rx_rate.*
+
+   RX send rate compared against Received rate
+
+#. Are generic configuration correct?
+    -  What is port Speed, Duplex? rte_eth_link_get()
+    -  Are packets of higher sizes are dropped? rte_eth_get_mtu()
+    -  Are only specific MAC received? rte_eth_promiscuous_get()
+
+#. Are there NIC specific drops?
+    -  Check rte_eth_rx_queue_info_get() for nb_desc and scattered_rx
+    -  Is RSS enabled? rte_eth_dev_rss_hash_conf_get()
+    -  Are packets spread on all queues? rte_eth_dev_stats()
+    -  If stats for RX and drops updated on same queue? check receieve thread
+    -  If packet does not reach PMD? check if offload for port and queue
+       matches to traffic pattern send.
+
+#. If problem still persists, this might be at RX lcore thread
+    -  Check if RX thread, distributor or event rx adapter? these may be
+       processing less than required
+    -  Is the application is build using processing pipeline with RX stage? If
+       there are multiple port-pair tied to a single RX core, try to debug by
+       using rte_prefetch_non_temporal(). This will intimate the mbuf in cache
+       is temporary.
+
+Are there packet drops (receive|transmit)?
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+RX-TX Port and associated cores :numref:`dtg_rx_tx_drop`.
+
+.. _dtg_rx_tx_drop:
+
+.. figure:: img/dtg_rx_tx_drop.*
+
+   RX-TX drops
+
+#. At RX
+    -  Get RX queue count? nb_rx_queues using rte_eth_dev_info_get()
+    -  Are there miss, errors, qerros? rte_eth_dev_stats() for imissed,
+       ierrors, q_erros, rx_nombuf, rte_mbuf_ref_count
+
+#. At TX
+    -  Are you doing in bulk TX? check application for TX descriptor overhead.
+    -  Are there TX errors? rte_eth_dev_stats() for oerrors and qerros
+    -  Is specific scenarios not releasing mbuf? check rte_mbuf_ref_count of
+       those packets.
+    -  Is the packet multi segmented? Check if port and queue offlaod is set.
+
+Are there object drops in producer point for ring?
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Producer point for ring :numref:`dtg_producer_ring`.
+
+.. _dtg_producer_ring:
+
+.. figure:: img/dtg_producer_ring.*
+
+   Producer point for Rings
+
+#. Performance for Producer
+    -  Fetch the type of RING 'rte_ring_dump()' for flags (RING_F_SP_ENQ)
+    -  If '(burst enqueue - actual enqueue) > 0' check rte_ring_count() or
+       rte_ring_free_count()
+    -  If 'burst or single enqueue returning always 0'? is rte_ring_full()
+       true then next stage is not pulling the content at desired rate.
+
+Are there object drops in consumer point for ring?
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Consumer point for ring :numref:`dtg_consumer_ring`.
+
+.. _dtg_consumer_ring:
+
+.. figure:: img/dtg_consumer_ring.*
+
+   Consumer point for Rings
+
+#. Performance for Consumer
+    -  Fetch the type of RING – rte_ring_dump() for flags (RING_F_SC_DEQ)
+    -  If '(burst dequeue - actual dequeue) > 0' for rte_ring_free_count()
+    -  If 'burst or single enqueue' always results 0 check the ring is empty
+       via rte_ring_empty()
+
+Are packets or objects are not processed at desired rate?
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Memory objects close to NUMA :numref:`dtg_mempool`.
+
+.. _dtg_mempool:
+
+.. figure:: img/dtg_mempool.*
+
+   Memory objects has to be close to device per NUMA
+
+#. Is the performance low?
+    -  Are packets received from multiple NIC? rte_eth_dev_count_all()
+    -  Are NIC interfaces on different socket? use rte_eth_dev_socket_id()
+    -  Is mempool created with right socket? rte_mempool_create() or
+       rte_pktmbuf_pool_create()
+    -  Are drops on specific socket? If yes check if there are sufficent
+       objects by rte_mempool_get_count() or rte_mempool_avail_count()
+    -  Is 'rte_mempool_get_count() or rte_mempool_avail_count()' zero?
+       application requires more objects hence reconfigure number of
+       elements in rte_mempool_create().
+    -  Is there single RX thread for multiple NIC? try having multiple
+       lcore to read from fixed interface or we might be hitting cache
+       limit, so increase cache_size for pool_create().
+
+#. Is performance low for some sceanrios?
+    -  Check if sufficient objects in mempool by rte_mempool_avail_count()
+    -  Is failure seen in some packets? we might be getting packets with
+       'size > mbuf data size'.
+    -  Is NIC offload or application handling multi segment mbuf? check the
+       special packets are continuous with rte_pktmbuf_is_contiguous().
+    -  If there separate user threads used to access mempool objects, use
+       rte_mempool_cache_create() for non DPDK threads.
+    -  Is the error reproducible with 1GB hugepage? If no, then try debuging
+       the issue with lookup table or objects with rte_mem_lock_page().
+
+.. note::
+  Stall in release of MBUF can be because
+
+  *  Processing pipeline is too heavy
+  *  Number of stages are too many
+  *  TX is not transferred at desired rate
+  *  Multi segment is not offloaded at TX device.
+  *  Application misuse scenarios can be
+      -  not freeing packets
+      -  invalid rte_pktmbuf_refcnt_set
+      -  invalid rte_pktmbuf_prefree_seg
+
+Is there difference in performance for crypto?
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Crypto device and PMD :numref:`dtg_crypto`.
+
+.. _dtg_crypto:
+
+.. figure:: img/dtg_crypto.*
+
+   CRYPTO and interaction with PMD device
+
+#. Are generic configuration correct?
+    -  Get total crypto devices – rte_cryptodev_count()
+    -  Cross check software or hardware flags are configured properly
+       rte_cryptodev_info_get() for feature_flags
+
+#. If enqueue request > actual enqueue (drops)?
+    -  Is the queue pair setup for right NUMA? check for socket_id using
+       rte_cryptodev_queue_pair_setup().
+    -  Is the session_pool created from same socket_id as queue pair? If no,
+       then create on same NUMA.
+    -  Is enqueue thread on same socket_id as object? If no, then try
+       to put on same NUMA.
+    -  Are there errors and drops? check err_count using rte_cryptodev_stats()
+    -  Do multiple threads enqueue or dequeue from same queue pair? Try
+       debugging with separate threads.
+
+#. If enqueue rate > dequeue rate?
+    -  Is dequeue lcore thread is same socket_id?
+    -  If softwre crypto is in use, check if the CRYPTO Library is build with
+       right (SIMD) flags or check if the queue pair using CPU ISA for
+       feature_flags AVX|SSE|NEON using rte_cryptodev_info_get()
+    -  If its hardware assited crypto showing performance variance? Check if
+       hardware is on same NUMA socket as queue pair and session pool.
+
+Worker functions not giving performance?
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Custom worker function :numref:`dtg_distributor_worker`.
+
+.. _dtg_distributor_worker:
+
+.. figure:: img/dtg_distributor_worker.*
+
+   Custom worker function performance drops
+
+#. Performance
+    -  Threads context switches are more frequent? Identify lcore with
+       rte_lcore() and lcore index mapping with rte_lcore_index(). Best
+       performance when mapping of thread and core is 1:1.
+    -  What are lcore role (type or state)? fetch the roles like RTE, OFF and
+       SERVICE using rte_eal_lcore_role().
+    -  Check if application has multiple functions running on same service
+       core? registered functions may be exceeeding the desired time slots
+       while running on same service core.
+    -  Is function is running on RTE core? check if there are conflicting
+       functions running on same CPU core by rte_thread_get_affinity().
+
+#. Debug
+    -  Check what is mode of operation? master core, lcore, service core,
+       and numa count can be fetched with rte_eal_get_configuration().
+    -  Is it occurring on special scenario? Analyze run logic with
+       rte_dump_stack(), rte_dump_registers() and rte_memdump() for more
+       insights.
+    -  Is 'perf' showing data process or memory stalls in functions? check
+       instruction being generated for functions using objdump.
+
+Service functions are not frequent enough?
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+service functions on service cores :numref:`dtg_service`.
+
+.. _dtg_service:
+
+.. figure:: img/dtg_service.*
+
+   functions running on service cores
+
+#. Performance
+    -  Get service core count using rte_service_lcore_count() and compare with
+       result of rte_eal_get_configuration()
+    -  Check registered service is available using rte_service_get_by_name(),
+       rte_service_get_count() and rte_service_get_name()
+    -  Is given service running parallel on multiple lcores?
+       rte_service_probe_capability() and rte_service_map_lcore_get()
+    -  Is service running? rte_service_runstate_get()
+
+#. Debug
+    -  Find how many services are running on specific service lcore by
+       rte_service_lcore_count_services()
+    -  Generic debug via rte_service_dump()
+
+Is there bottleneck in eventdev?
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+#. Are generic configuration correct?
+    -  Get event_dev devices? rte_event_dev_count()
+    -  Are they created on correct socket_id? - rte_event_dev_socket_id()
+    -  Check if HW or SW capabilities? - rte_event_dev_info_get() for
+       event_qos, queue_all_types, burst_mode, multiple_queue_port,
+       max_event_queue|dequeue_depth
+    -  Is packet stuck in queue? check for stages (event qeueue) where
+       packets are looped back to same or previous stages.
+
+#. Performance drops in enqueue (event count > actual enqueue)?
+    -  Dump the event_dev information? rte_event_dev_dump()
+    -  Check stats for queue and port for eventdev
+    -  Check the inflight, current queue element for enqueue|deqeue
+
+How to debug QoS via TM?
+~~~~~~~~~~~~~~~~~~~~~~~~
+
+TM on TX interface :numref:`dtg_qos_tx`.
+
+.. _dtg_qos_tx:
+
+.. figure:: img/dtg_qos_tx.*
+
+   Traffic Manager just before TX
+
+#. Is configuration right?
+    -  Get current capabilities for DPDK port for max nodes, level, shaper
+       private, shaper shared, sched_n_children and stats_mask using
+       rte_tm_capabilities_get()
+    -  Check if current leaf are configured identically by fetching
+       leaf_nodes_identical using rte_tm_capabilities_get()
+    -  Get leaf nodes for a dpdk port - rte_tn_get_number_of_leaf_node()
+    -  Check level capabilities by rte_tm_level_capabilities_get for n_nodes
+        -  Max, nonleaf_max, leaf_max
+        -  identical, non_identical
+        -  Shaper_private_supported
+        -  Stats_mask
+        -  Cman wred packet|byte supported
+        -  Cman head drop supported
+    -  Check node capabilities by rte_tm_node_capabilities_get for n_nodes
+        -  Shaper_private_supported
+        -  Stats_mask
+        -  Cman wred packet|byte supported
+        -  Cman head drop supported
+    -  Debug via stats - rte_tm_stats_update() and rte_tm_node_stats_read()
+
+Packet is not of right format?
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Packet capture before and after processing :numref:`dtg_pdump`.
+
+.. _dtg_pdump:
+
+.. figure:: img/dtg_pdump.*
+
+   Capture points of Traffic at RX-TX
+
+#. Where to capture packets?
+    -  Enable pdump in primary to allow secondary to access queue-pair for
+       ports. Thus packets are copied over in RX|TX callback by secondary
+       process using ring buffers.
+    -  To capture packet in middle of pipeline stage, user specific hooks
+       or callback are to be used to copy the packets. These packets can
+       be shared to secodnary process via user defined custom rings.
+
+Issue still persists?
+~~~~~~~~~~~~~~~~~~~~~
+
+#. Are there custom or vendor specific offload meta data?
+    -  From PMD, then check for META data error and drops.
+    -  From application, then check for META data error and drops.
+#. Is multiprocess is used configuration and data processing?
+    -  Check enabling or disabling features from secondary is supported or not?
+#. Is there drops for certain scenario for packets or obejcts?
+    -  Check user private data in objects by dumping the details for debug.
+
+How to develop custom code to debug?
+------------------------------------
+
+-  For single process - the debug functionality is to be added in same
+   process
+-  For multiple process - the debug functionality can be added to
+   secondary multi process
+
+.. note::
+
+  Primary's Debug functions invoked via
+    #. Timer call-back
+    #. Service function under service core
+    #. USR1 or USR2 signal handler
diff --git a/doc/guides/howto/index.rst b/doc/guides/howto/index.rst
index a642a2be1..9527fa84d 100644
--- a/doc/guides/howto/index.rst
+++ b/doc/guides/howto/index.rst
@@ -18,3 +18,4 @@  HowTo Guides
     virtio_user_as_exceptional_path
     packet_capture_framework
     telemetry
+    debug_troubleshoot_guide