Bug 1360 - DTS Config improvements
Summary: DTS Config improvements
Status: IN_PROGRESS
Alias: None
Product: DPDK
Classification: Unclassified
Component: DTS (show other bugs)
Version: unspecified
Hardware: All All
: High normal
Target Milestone: ---
Assignee: Nicholas Pratte
URL:
Depends on:
Blocks:
 
Reported: 2024-01-10 13:13 CET by Juraj Linkeš
Modified: 2024-04-24 15:11 CEST (History)
4 users (show)



Attachments

Description Juraj Linkeš 2024-01-10 13:13:07 CET
We’re not using the arch, os and cpu options in build target config. These were meant for cross compilation and we may not even use them in the future, so remove them. 

Remove the use_first_core option in execution config and modify the logic as if use_first_core=True if the lcore option is not set. Also  properly document this (in code and docs) and also emit an info log when skipping the first core and a warning when using the core (something like only do this if you know what you’re doing).

There are a lot of supported devices in the schema, trim down.

We may also remove the OS/Arch config of SUT nodes, as we should be able to discover that.

We may want to do additional config validation, as the schema doesn't cover everything.
Comment 1 Juraj Linkeš 2024-01-10 13:27:15 CET
Another possible feature to think about is caching.
Comment 2 Juraj Linkeš 2024-01-10 13:52:19 CET
We can also move the vdev config:
Vdevs in config are under sut:, but in ExecutionConfiguration they’re not. Each execution contains only one SUT, so we don't have to configure it under system_under_test_node. That's probably more readable (when not under it). We also have other sut-specific config not under system_under_test_node (memory channels), so this should be unified. If we want a sub-section, it could be dpdk_config or something similar.
Comment 3 Thomas Monjalon 2024-01-10 15:11:38 CET
We should also remove the list of devices.
Comment 4 Juraj Linkeš 2024-01-16 17:44:14 CET
One more improvement is to make lcores and use_first_core SUT-specific, possibly moving them into a new section (named dpdk). The reason is that the traffic generator may not need this configuration (such sa scapy), making it superfluous. If a traffic generator needs it, it should be listed in the traffic generator section.
Comment 5 Luca Vizzarro 2024-02-28 17:29:58 CET
Switch lcores to use a keyword like ALL to indicate to DPDK to use all the cores available. An empty string is not particularly obvious.

Perhaps introduce better management of port topologies. Right now nodes have lists of NICs, and then for each entry they have their peer specified. This leads to unnecessary duplication and makes user error more likely. Also the lists are unbound, we can set any items we want without DTS complaining, but what DTS really works with right now is a fixed amount of 2 NICs, specifically in the paired port topology. I think it would be more than ideal to make nodes have lists of all the NICs we can use, each having a unique identifier (!). Moreover, separately in the configuration we can have a `port_topologies` property which would effectively be a pool of links at the tests' disposal. This way if we introduce device capability checking, we can make it so a test can request a port topology from the pool with "only Intel drivers" for example, and if none is available automatically skip. An example of this:

```
nodes:
- id: SUT
  ports:
  - id: SUT1
    pci_address: 0000:00:00.00
    dpdk_driver: vfio-pci # if set, the dpdk-devbind process happens, otherwise ignore
    # hopefully we can omit the "normal conditions" driver and let DTS just rollback to what was set before if we changed it
  - id: SUT2
    pci_address: 0000:00:00.01
    dpdk_driver: vfio-pci
- id: TG
  ports:
  - id: TG1
    pci_address: 0000:00:00.00
  - id: TG2
    pic-address: 0000:00:00.01
port_topologies:
- type: pair
  name: Generic pair # maybe specify a name to use for easier-to-read logs
  links: # this way of linking is a bit ugly, but just an example
  - left: SUT1
    right: TG1
  - left: SUT2
    right: TG2
```
Comment 6 Nicholas Pratte 2024-03-21 19:05:13 CET
(In reply to Juraj Linkeš from comment #0)
> Remove the use_first_core option in execution config and modify the logic as
> if use_first_core=True if the lcore option is not set. Also  properly
> document this (in code and docs) and also emit an info log when skipping the
> first core and a warning when using the core (something like only do this if
> you know what you’re doing).

After consulting Patrick about the comment above, we are unsure about the logic you have provided here. From retrospect, the way Patrick understood this is that use_first_core would be removed from the conf.yaml file and instead be used internally within the code to provide a safety net for users that do not set the lcore option (i.e. If the user does not set anything for lcores, then set use_first_core=False to prevent problems for the user). Otherwise, the use_first_core option is set to true, and an info log warning is displayed if a defined lcore is part of core 0.

I have added basic implementation for this feature based on the above explanation. If the user does not set the lcores option, use_first_core=False by default and an info log message is prompted. If the user defines lcores, and any of those lcores are part of core 0, then a warning is prompted to the user.
Comment 7 Patrick Robb 2024-03-27 14:47:14 CET
(In reply to Luca Vizzarro from comment #5)
> Switch lcores to use a keyword like ALL to indicate to DPDK to use all the
> cores available. An empty string is not particularly obvious.

Maybe any instead of all? Otherwise I think this is fine and we can have the default config file which dts ships with having "any" by default, but we should still maintain the same behavior if empty string is used (in my opinion).

> 
> Perhaps introduce better management of port topologies. Right now nodes have
> lists of NICs, and then for each entry they have their peer specified. This
> leads to unnecessary duplication and makes user error more likely. Also the
> lists are unbound, we can set any items we want without DTS complaining, but
> what DTS really works with right now is a fixed amount of 2 NICs,
> specifically in the paired port topology. I think it would be more than
> ideal to make nodes have lists of all the NICs we can use, each having a
> unique identifier (!). Moreover, separately in the configuration we can have
> a `port_topologies` property which would effectively be a pool of links at
> the tests' disposal. This way if we introduce device capability checking, we
> can make it so a test can request a port topology from the pool with "only
> Intel drivers" for example, and if none is available automatically skip. An
> example of this:
> 

This all sound fine to me. I am also wondering now whether testsuites can/should have some valid list of port topologies. Then the testsuites checks that settings.topology is in the valid list, and if so, modify the testpmd according to that input from settings. An example is the Scatter suite which is now merged - that test can be paired topology, loop topology, or anything else (I think). This better supports DUTs with only 1 interface. Or, we can change the majority of func tests to loop topology only and skip this step.
Comment 8 Juraj Linkeš 2024-04-02 16:12:59 CEST
(In reply to Nicholas Pratte from comment #6)
> (In reply to Juraj Linkeš from comment #0)
> > Remove the use_first_core option in execution config and modify the logic
> as
> > if use_first_core=True if the lcore option is not set. Also  properly
> > document this (in code and docs) and also emit an info log when skipping
> the
> > first core and a warning when using the core (something like only do this
> if
> > you know what you’re doing).
> 
> After consulting Patrick about the comment above, we are unsure about the
> logic you have provided here. From retrospect, the way Patrick understood
> this is that use_first_core would be removed from the conf.yaml file and
> instead be used internally within the code to provide a safety net for users
> that do not set the lcore option (i.e. If the user does not set anything for
> lcores, then set use_first_core=False to prevent problems for the user).
> Otherwise, the use_first_core option is set to true, and an info log warning
> is displayed if a defined lcore is part of core 0.
> 
> I have added basic implementation for this feature based on the above
> explanation. If the user does not set the lcores option,
> use_first_core=False by default and an info log message is prompted. If the
> user defines lcores, and any of those lcores are part of core 0, then a
> warning is prompted to the user.

I think we discussed this behavior with Patrick and I made an error here: "use_first_core=True if the lcore option is not set".

The behavior you outlined matches what I remember from the discussion and it makes sense.
Comment 9 Nicholas Pratte 2024-04-04 19:21:45 CEST
(In reply to Juraj Linkeš from comment #0) 
> We may also remove the OS/Arch config of SUT nodes, as we should be able to
> discover that.

With regard to the suggestions above, getting the architecture information for any given node is easy given the current object hierarchy DTS is using; it is easy to just create a new method in linuxsession.py, for example, and update accordingly. As far as the OS is concerned, I think it makes sense to keep this key/value pair in the config. linux_session inherits the OSSession object, and depending on the OS you provide, the corresponding session object is created. We cannot establish the proper session object unless we know the OS from the onset.

That said, given that there is no Windows support currently, we could make an assumption that users are using Linux just to clean up the config for the time-being.
Comment 10 Juraj Linkeš 2024-04-05 10:07:29 CEST
(In reply to Nicholas Pratte from comment #9)
> (In reply to Juraj Linkeš from comment #0) 
> > We may also remove the OS/Arch config of SUT nodes, as we should be able to
> > discover that.
> 
> With regard to the suggestions above, getting the architecture information
> for any given node is easy given the current object hierarchy DTS is using;
> it is easy to just create a new method in linuxsession.py, for example, and
> update accordingly. As far as the OS is concerned, I think it makes sense to
> keep this key/value pair in the config. linux_session inherits the OSSession
> object, and depending on the OS you provide, the corresponding session
> object is created. We cannot establish the proper session object unless we
> know the OS from the onset.
> 

We can, it's just needlessly complicated. :-)

We can move the creation of the OS-unaware ssh session into node.create_session(), figure out the OS in that ssh session and then create the proper OS-aware session from that (passing the OS-unaware session to it). How we figure out the OS would then be the tricky part. I guess it won't be that hard to distinguish between Linux/Windows and then we'd need to think about Posix/non-Posix. Making it foolproof could be complicated in the future when the support for different distros widens.

> That said, given that there is no Windows support currently, we could make
> an assumption that users are using Linux just to clean up the config for the
> time-being.

Let's leave it explicit as an illustration that this is the place to specify the OS. Unless you want to look into my suggestion above. :-)
Comment 11 Patrick Robb 2024-04-09 17:03:27 CEST
(In reply to Juraj Linkeš from comment #10)
> (In reply to Nicholas Pratte from comment #9)
> > (In reply to Juraj Linkeš from comment #0) 
> > > We may also remove the OS/Arch config of SUT nodes, as we should be able
> to
> > > discover that.
> > 
> > With regard to the suggestions above, getting the architecture information
> > for any given node is easy given the current object hierarchy DTS is using;
> > it is easy to just create a new method in linuxsession.py, for example, and
> > update accordingly. As far as the OS is concerned, I think it makes sense
> to
> > keep this key/value pair in the config. linux_session inherits the
> OSSession
> > object, and depending on the OS you provide, the corresponding session
> > object is created. We cannot establish the proper session object unless we
> > know the OS from the onset.
> > 
> 
> We can, it's just needlessly complicated. :-)
> 
> We can move the creation of the OS-unaware ssh session into
> node.create_session(), figure out the OS in that ssh session and then create
> the proper OS-aware session from that (passing the OS-unaware session to
> it). How we figure out the OS would then be the tricky part. I guess it
> won't be that hard to distinguish between Linux/Windows and then we'd need
> to think about Posix/non-Posix. Making it foolproof could be complicated in
> the future when the support for different distros widens.
> 

My view is that this is a significant effort for not much gain (this can just be one of the things we do leave to the user to set in their conf). I think you were saying the same, but just checking. 

So Nick, we do not have to dynamically determine the os in this way. Thanks.
Comment 12 Juraj Linkeš 2024-04-10 12:05:24 CEST
(In reply to Patrick Robb from comment #11)
> (In reply to Juraj Linkeš from comment #10)
> > (In reply to Nicholas Pratte from comment #9)
> > > (In reply to Juraj Linkeš from comment #0) 
> > > > We may also remove the OS/Arch config of SUT nodes, as we should be
> able
> > to
> > > > discover that.
> > > 
> > > With regard to the suggestions above, getting the architecture
> information
> > > for any given node is easy given the current object hierarchy DTS is
> using;
> > > it is easy to just create a new method in linuxsession.py, for example,
> and
> > > update accordingly. As far as the OS is concerned, I think it makes sense
> > to
> > > keep this key/value pair in the config. linux_session inherits the
> > OSSession
> > > object, and depending on the OS you provide, the corresponding session
> > > object is created. We cannot establish the proper session object unless
> we
> > > know the OS from the onset.
> > > 
> > 
> > We can, it's just needlessly complicated. :-)
> > 
> > We can move the creation of the OS-unaware ssh session into
> > node.create_session(), figure out the OS in that ssh session and then
> create
> > the proper OS-aware session from that (passing the OS-unaware session to
> > it). How we figure out the OS would then be the tricky part. I guess it
> > won't be that hard to distinguish between Linux/Windows and then we'd need
> > to think about Posix/non-Posix. Making it foolproof could be complicated in
> > the future when the support for different distros widens.
> > 
> 
> My view is that this is a significant effort for not much gain (this can
> just be one of the things we do leave to the user to set in their conf). I
> think you were saying the same, but just checking. 
> 

Yes. Doable, but not really worth it.

> So Nick, we do not have to dynamically determine the os in this way. Thanks.
Comment 13 Patrick Robb 2024-04-24 07:35:26 CEST
(In reply to Luca Vizzarro from comment #5)
> Switch lcores to use a keyword like ALL to indicate to DPDK to use all the
> cores available. An empty string is not particularly obvious.
> 
> Perhaps introduce better management of port topologies. Right now nodes have
> lists of NICs, and then for each entry they have their peer specified. This
> leads to unnecessary duplication and makes user error more likely. Also the
> lists are unbound, we can set any items we want without DTS complaining, but
> what DTS really works with right now is a fixed amount of 2 NICs,
> specifically in the paired port topology. I think it would be more than
> ideal to make nodes have lists of all the NICs we can use, each having a
> unique identifier (!). Moreover, separately in the configuration we can have
> a `port_topologies` property which would effectively be a pool of links at
> the tests' disposal. This way if we introduce device capability checking, we
> can make it so a test can request a port topology from the pool with "only
> Intel drivers" for example, and if none is available automatically skip. An
> example of this:
> 
> ```
> nodes:
> - id: SUT
>   ports:
>   - id: SUT1
>     pci_address: 0000:00:00.00
>     dpdk_driver: vfio-pci # if set, the dpdk-devbind process happens,
> otherwise ignore
>     # hopefully we can omit the "normal conditions" driver and let DTS just
> rollback to what was set before if we changed it
>   - id: SUT2
>     pci_address: 0000:00:00.01
>     dpdk_driver: vfio-pci
> - id: TG
>   ports:
>   - id: TG1
>     pci_address: 0000:00:00.00
>   - id: TG2
>     pic-address: 0000:00:00.01
> port_topologies:
> - type: pair
>   name: Generic pair # maybe specify a name to use for easier-to-read logs
>   links: # this way of linking is a bit ugly, but just an example
>   - left: SUT1
>     right: TG1
>   - left: SUT2
>     right: TG2
> ```

Hi, I chatted with Nick and Jeremy more about the ports definition process today. We actually didn't end up with 100% agreement (hah) but I figured I'd write up my thoughts now since we have the DTS meeting tomorrow.

So, at a minimum it's clear that we want to eliminate the duplication of port definitions from the conf.yaml, and the above accomplishes that. But I also prefer that the port links/topology in use be explicitly defined by the user per execution. I guess this gets at what we want a specific execution to yield... in my mind it is supposed to contain results for a specific NIC, but this is something we can discuss.

So, an idea (which I like) is to eliminate all definition of ports in the node sections, and move the entire ports definition to the Execution config block. Then it can be written to the nodes when they're instantiated (like we do with vdevs now) or something similar to vdevs. 

Or another option is to move the port addresses to a separate topology section like Luca drew up above, and then select a particular topology in the execution using the topologies unique identifier. But I still like the idea of specifying the ports in use in the execution somehow. Another thing to note is I don't think specifying the port "type" is needed - if we specify 1 port link and a topology runs via loop in testpmd - great. If we specify 1 port links and it requires paired topology, then skip testsuite. If we specify 2 port links and the testsuite uses loop topology, then intake the 2 port links and simply only use the first physical link.
Comment 14 Juraj Linkeš 2024-04-24 10:48:41 CEST
I'll offer so my thoughts.

> Hi, I chatted with Nick and Jeremy more about the ports definition process
> today. We actually didn't end up with 100% agreement (hah) but I figured I'd
> write up my thoughts now since we have the DTS meeting tomorrow.
> 
> So, at a minimum it's clear that we want to eliminate the duplication of
> port definitions from the conf.yaml, and the above accomplishes that. But I
> also prefer that the port links/topology in use be explicitly defined by the
> user per execution. I guess this gets at what we want a specific execution
> to yield... in my mind it is supposed to contain results for a specific NIC,
> but this is something we can discuss.
> 

I think we've talked about this before and we've agreed that we want one execution to test one NIC.

> So, an idea (which I like) is to eliminate all definition of ports in the
> node sections, and move the entire ports definition to the Execution config
> block.

I like listing then in the node section as that node-specific information. What makes the most sense to me is to list the available hardware in the nodes section and then just pick what we want to test in an execution (we would reference just the NIC name or some other unique id - like we do with system_under_test_node and traffic_generator_node).
And, if we moved the definition to the execution section, we may be duplicating the same data if there are two execution using the same NIC, but something else is different (like a different DPDK build in the version with pre-built DPDK support where we wanted to remove the nested DPDK builds).

> Then it can be written to the nodes when they're instantiated (like
> we do with vdevs now) or something similar to vdevs. 
> 

This could be done, but the difference here is that vdevs should work on all SUTs, and thus are execution specific, but NICs are SUT specific.
If we take this further to the extreme, then we don't even need the node section and we can just put everything into the execution section, but I like the referencing we do now, as it delineates what node specific and what's execution specific (or at least that should be the delineation).

> Or another option is to move the port addresses to a separate topology
> section like Luca drew up above, and then select a particular topology in
> the execution using the topologies unique identifier. But I still like the
> idea of specifying the ports in use in the execution somehow. Another thing
> to note is I don't think specifying the port "type" is needed - if we
> specify 1 port link and a topology runs via loop in testpmd - great. If we
> specify 1 port links and it requires paired topology, then skip testsuite.
> If we specify 2 port links and the testsuite uses loop topology, then intake
> the 2 port links and simply only use the first physical link.

Yes, the topology type is not needed. DTS should figure out what topologies are supported by the port configuration and then run/skip tests based on that (the capabilities feature will accomplish this).
Comment 15 Patrick Robb 2024-04-24 15:11:59 CEST
(In reply to Juraj Linkeš from comment #14)
> I'll offer so my thoughts.
> 
> > Hi, I chatted with Nick and Jeremy more about the ports definition process
> > today. We actually didn't end up with 100% agreement (hah) but I figured
> I'd
> > write up my thoughts now since we have the DTS meeting tomorrow.
> > 
> > So, at a minimum it's clear that we want to eliminate the duplication of
> > port definitions from the conf.yaml, and the above accomplishes that. But I
> > also prefer that the port links/topology in use be explicitly defined by
> the
> > user per execution. I guess this gets at what we want a specific execution
> > to yield... in my mind it is supposed to contain results for a specific
> NIC,
> > but this is something we can discuss.
> > 
> 
> I think we've talked about this before and we've agreed that we want one
> execution to test one NIC.
> 
> > So, an idea (which I like) is to eliminate all definition of ports in the
> > node sections, and move the entire ports definition to the Execution config
> > block.
> 
> I like listing then in the node section as that node-specific information.
> What makes the most sense to me is to list the available hardware in the
> nodes section and then just pick what we want to test in an execution (we
> would reference just the NIC name or some other unique id - like we do with
> system_under_test_node and traffic_generator_node).
> And, if we moved the definition to the execution section, we may be
> duplicating the same data if there are two execution using the same NIC, but
> something else is different (like a different DPDK build in the version with
> pre-built DPDK support where we wanted to remove the nested DPDK builds).
> 
> > Then it can be written to the nodes when they're instantiated (like
> > we do with vdevs now) or something similar to vdevs. 
> > 
> 
> This could be done, but the difference here is that vdevs should work on all
> SUTs, and thus are execution specific, but NICs are SUT specific.
> If we take this further to the extreme, then we don't even need the node
> section and we can just put everything into the execution section, but I
> like the referencing we do now, as it delineates what node specific and
> what's execution specific (or at least that should be the delineation).
> 

Okay, the idea of building a list of devices within the nodes and then referencing a specific device key within the execution config block sounds okay.

My confusion/concern though is about whether this information really node specific? The nature of how we build the portlinks on the nodes is that they require the peer addresses, which are not a part of the node. So the information really seems testbed specific, not node specific. Now, using this method, we either have to duplicate information in the two node configs to include the addresses for both TG and DUT within each node config section (the currrent approach), or we only include the nodes local pci address(es) and have to reference the "other" node's NodeConfiguration when instantiating the nodes.

Note You need to log in before you can comment on or make changes to this bug.