Bug 1352 - Port driver rebinding
Summary: Port driver rebinding
Status: UNCONFIRMED
Alias: None
Product: DPDK
Classification: Unclassified
Component: DTS (show other bugs)
Version: unspecified
Hardware: All All
: Normal normal
Target Milestone: ---
Assignee: dev
URL:
Depends on:
Blocks:
 
Reported: 2024-01-10 10:51 CET by Juraj Linkeš
Modified: 2024-04-26 11:13 CEST (History)
4 users (show)



Attachments

Description Juraj Linkeš 2024-01-10 10:51:27 CET
We're currently using port rebinding in the temporary os_udp test suite, which will be removed. The rebinding is basically a workaround so that the test suite is functional until removed.

We need to solidify our DTS port rebinding policy - do we want to rebind and if so, under what circumstances.

The current considerations are:
1. Don't modify the DUT/server configuration if not explicitly asked. The port binding could be configurable in which case we could do the rebinding instead of just checking the right port in the smoke tests (maybe in the smoke tests, maybe separately beforehand).
2. We want to be able to run both functional and performance tests in one DTS run. In this case, we must be able to rebind in case we need to use two different traffic generators which require different drivers (as is the case with Scapy for functional testing and T-rex/DPDK tgen for performance testing). This clashes with the requirement for explicit consent for rebinding, but if we properly document (in all places - the docs, config, code) that func+perf testing implies DTS can rebind for this purpose, it could be a sensible exception. The rebinding would be done only twice, once between functional and performance test cases and once to revert back to original.
Comment 1 Juraj Linkeš 2024-01-10 13:31:21 CET
Another thing that's related is to check that the TG driver is bound properly in smoke tests. We have this test for SUT nodes, but not for TG. The TG driver to check should be based on the capturing TG driver if func tests are enabled, but if disabled, check the non-capturing TG.
Comment 2 Thomas Monjalon 2024-01-10 14:44:54 CET
Why performance and functional tests would have a different driver?

Why not making binding an option of the port initialization?
Comment 3 Juraj Linkeš 2024-01-11 13:05:31 CET
(In reply to Thomas Monjalon from comment #2)
> Why performance and functional tests would have a different driver?
> 

A different driver could be required if the traffic generators used for performance and functional testing are different. The most common use case is to use Scapy for functional tests and T-Rex for performance tests. Functional tests require the traffic generator to capture individual packets and Scapy provides that. However, Scapy is not fast enough for performance testing which is where T-Rex comes in. But T-Rex can't be used for functional testing, as it doesn't capture individual packets. Scapy requires the kernel driver while T-Rex requires the DPDK driver.


> Why not making binding an option of the port initialization?

What sort of option do you have in mind? Would users pass this option?

We could do basically anything as long as we agree on what's acceptable behavior.
Comment 4 Jeremy Spewock 2024-02-19 18:20:16 CET
Something else to note with the rebinding is it doesn't only exist for use with the os_udp test case, it is also used at the end of DTS runs to rebind the NIC to the kernel driver as part of the cleanup. I don't think this is a terrible policy because this driver is configurable in the config file so we could create an understanding where this driver that you set is always considered a "default state". I don't see the harm in just rebinding back to the driver it was set to before we changed anything though. That might be a better assumption to make about the "default state" of the NIC on that particular host.

I have also noticed something that might be better to be broken into a separate bug, but is also related to this bug which is: During the cleanup when DTS attempts to bind the NIC back to the kernel driver, if DTS is stopping because an exception was thrown the first devbind call times out which leads to only one port being rebound back to the kernel driver and the other still being bound to the DPDK driver. On clean runs where there is no exception however, this binding works as expected.
Comment 5 Patrick Robb 2024-02-28 17:00:45 CET
In general the unbind/rebind can be handled as default behavior from the configuration, but users can also do any binding as needed from within the testsuite as needed.
Comment 6 Juraj Linkeš 2024-04-26 11:07:30 CEST
We also have need to think about the order of port binding and other port operations. We should bind before we run a traffic generator and also before the smoke test suite which checks the drivers. We're also getting the mac address and logical name of a port right after connecting to a node, but this should also be done after binding to the proper port (if we do it before, the port may not be bound to the OS driver which would preclude us from getting the logical name if it's needed, such as for Scapy).

In general, we should do the binding and only then do anything else with the ports. It's possible we want to do this only after the smoke suite because it checks the ports (so the order would be bind to port, then smoke suite, then we're free to do things with ports).

Additionally, the smoke test suite should check the drivers on the TG node as well. We can copy the binding script to the TG node to achieve this.
Comment 7 Juraj Linkeš 2024-04-26 11:13:00 CEST
(In reply to Jeremy Spewock from comment #4)
> I have also noticed something that might be better to be broken into a
> separate bug, but is also related to this bug which is: During the cleanup
> when DTS attempts to bind the NIC back to the kernel driver, if DTS is
> stopping because an exception was thrown the first devbind call times out
> which leads to only one port being rebound back to the kernel driver and the
> other still being bound to the DPDK driver. On clean runs where there is no
> exception however, this binding works as expected.

While debugging my local setup I also noticed this. I think this is because we don't have proper cleanup of testpmd in the scatter test suite. Testpmd is left running no matter what, but in case of a failure, the forwarding is not stopped so that's likely what's causing the failure. If this is the cause (my testing suggests it is) then https://bugs.dpdk.org/show_bug.cgi?id=1404 is the bug for this.

Note You need to log in before you can comment on or make changes to this bug.