[dpdk-dev] Question about unsupported transceivers

Alex Forster alex at alexforster.com
Mon Oct 19 03:06:14 CEST 2015


On 10/15/15, 3:53 PM, "Alexander Duyck" <alexander.duyck at gmail.com> wrote:


>>> It looks like you are probably seeing interfaces be unbound and then
>>> rebound.  As such you are likely pushing things outside of the array
>>> boundary.  One solution might just be to at more ",1"s if you are only
>>> going to be doing this kind of thing at boot up.  The upper limit for
>>> the array is 32 entries so as long as you only are setting this up once
>>> you could probably get away with that.
>>>
>>> An alternative would be to modify the definition of the parameter in
>>> ixgbe_param.c.  If you look through the file you should fine several
>>> likes like below:
>>> 	struct ixgbe_option opt = {
>>> 			.type = enable_option,
>>> 			.name = "allow_unsupported_sfp",
>>> 			.err  = "defaulting to Disabled",
>>> 			.def  = OPTION_DISABLED
>>> 		};
>>>
>>> If you modify the .def value to "OPTION_ENABLED", and then rebuild and
>>> reinstall your driver you should be able have it install without any
>>> issues.
>>>
>>> - Alex
>>>
>> Yeah, I've had roughly the same thought process since you mentioned the
>> args array. My first idea was "maybe the driver can't fit all of my 1's"
>> but I saw it was defined at 32. Then I decided to just patch the whole
>> enable_unsupported_sfp option out
>> https://gist.github.com/AlexForster/112fd822704caf804849 but I'm still
>> failing.
>>
>> I've been digging a bit, and I'm failing here in ixgbe_main.c...
>>
>> /* reset_hw fills in the perm_addr as well */
>> hw->phy.reset_if_overtemp = true;
>> err = hw->mac.ops.reset_hw(hw);
>> hw->phy.reset_if_overtemp = false;
>> if (err == IXGBE_ERR_SFP_NOT_PRESENT) {
>> 	err = IXGBE_SUCCESS;
>> } else if (err == IXGBE_ERR_SFP_NOT_SUPPORTED) {
>> 	e_dev_err("failed to load because an unsupported SFP+ or QSFP "
>> 		  "module type was detected.\n");
>> 	e_dev_err("Reload the driver after installing a supported "
>> 		  "module.\n");
>> 	goto err_sw_init;
>> } else if (err) {
>> 	e_dev_err("HW Init failed: %d\n", err);
>> 	goto err_sw_init;
>> }
>>
>>
>> I've attempted a hand-stacktrace and came up with the following...
>>
>> ixgbe_82599.c at 1016
>>   * ixgbe_reset_hw_82599() is defined
>>   * calls phy->ops.init() which potentially returns
>> IXGBE_ERR_SFP_NOT_SUPPORTED
>>
>> ixgbe_82599.c at 102
>>   * ixgbe_init_phy_ops_82599() is defined
>>   * IXGBE_ERR_SFP_NOT_SUPPORTED is returned after calling
>> phy->ops.identify()
>>
>> ixgbe_82599.c at 2085
>>   * ixgbe_identify_phy_82599() is defined
>>   * calls ixgbe_identify_module_generic()
>>
>> ixgbe_phy.c at 1281
>>   * ixgbe_identify_module_generic() is defined
>>   * calls ixgbe_identify_qsfp_module_generic()
>>
>> ixgbe_phy.c at 1663
>>   * ixgbe_identify_qsfp_module_generic() is defined
>>   * We fail somewhere before the ending call to ixgbe_get_device_caps()
>> which does take allow_unsupported_sfp into account
>>
>>   * Possibility: hw->phy.ops.read_i2c_eeprom(hw, IXGBE_SFF_IDENTIFIER,
>> &identifier) != IXGBE_SFF_IDENTIFIER_QSFP_PLUS
>>   * Possibility: active_cable != true
>>
>> And then I'm over my head. Should I assume from here that the most
>>likely
>> explanation is a bad transceiver or bad fiber?
>>
>> Alex Forster
>
>Are you able to swap transceiver or fiber between the 4 ports that work
>and the 4 that don't?  If you could do that then you should be able to
>tell if the issue is following the NIC ports, or if it is an issue with
>the external connections.  If it is issue is following the transceiver
>or fiber then it is probably what is causing the issue.
>
>The other thing you could try doing is adding a printk to the spots
>where the status is set to SFP_NOT_SUPPORTED so that you could figure
>out exactly which spot is triggering the rejection of the module.
>
>- Alex

I had remote hands swap fibers on the QSFP side and the issue moved to the
first card, so I'm going to have the fibers cleaned and tested. This
appears to be my issue.

I'd like to submit a patch for ixgbe_identify_qsfp_module_generic() to
print more helpful errors in the two cases mentioned above, so that
hopefully nobody ever has to deal with this again. Would I be wasting my
time, or does something like this have a likelihood of being accepted?

Thank you for all of your help! I wouldn't have figured this out nearly as
quickly without it.

Alex Forster



More information about the dev mailing list