Bug 344 - Broken CPU feature discovery for armv8
Summary: Broken CPU feature discovery for armv8
Status: CONFIRMED
Alias: None
Product: DPDK
Classification: Unclassified
Component: meson (show other bugs)
Version: unspecified
Hardware: ARM All
: Normal normal
Target Milestone: ---
Assignee: Ilya Maximets
URL:
Depends on:
Blocks:
 
Reported: 2019-09-04 14:43 CEST by Ilya Maximets
Modified: 2020-09-16 23:40 CEST (History)
5 users (show)



Attachments

Description Ilya Maximets 2019-09-04 14:43:56 CEST
To check available cpuflags both build systems (make and meson) uses compiler
invocations with provided machine flags which is '-march=armv8-a+crc'.
It makes no much sense getting defines from the cc with non-native arch as
it will have nothing common with real cpu features available.
This leads to inability using dpdk on armv8 platform without crc support due
to illegal instructions.

'-march=native' is only available since GCC 8. Also, DPDK uses '-march=armv8-a+crc'
as a default armv8 machine regardless of fact that crc extension is optional.

Reported-at: https://mail.openvswitch.org/pipermail/ovs-dev/2019-September/362301.html
Comment 1 Jerin 2019-09-04 15:13:16 CEST
it seems only armv8 CPU which does not have CRC support is OLD xgene1.
Please use config/defconfig_arm64-xgene1-linux-gcc for xgene1 target.

We could remove CRC from mk/machine/armv8a/ but it will affect the binaries
for generic arm64 build in distributions. In the benefit all latest CPUs(except OLD xgene1), it is decided to add CRC for armv8 distribution build.
Comment 2 Ilya Maximets 2019-09-04 17:57:15 CEST
(In reply to Jerin from comment #1)
> it seems only armv8 CPU which does not have CRC support is OLD xgene1.
> Please use config/defconfig_arm64-xgene1-linux-gcc for xgene1 target.
> 
> We could remove CRC from mk/machine/armv8a/ but it will affect the binaries
> for generic arm64 build in distributions. In the benefit all latest
> CPUs(except OLD xgene1), it is decided to add CRC for armv8 distribution
> build.

IIUC, the issue happened on Launchpad package build server that uses xgene1:
https://mail.openvswitch.org/pipermail/ovs-dev/2019-September/362339.html
So, you're suggesting to just stop using them for package building?

BTW, It looks like meson doesn't support xgene1 target.
Comment 3 Jerin 2019-09-04 19:03:44 CEST
> BTW, It looks like meson doesn't support xgene1 target.

Yes. No contribution from APM. Is xgene1 in production? Does not looks like.

After looking at the CRC code carefully, selection of CRC instruction is based on runtime check. It looks like segfault is coming from OVS code, not from DPDK.

static inline void
rte_hash_crc_set_alg(uint8_t alg)
{
        switch (alg) {
        case CRC32_ARM64:
                if (!rte_cpu_get_flag_enabled(RTE_CPUFLAG_CRC32))
                        alg = CRC32_SW;
                /* fall-through */
        case CRC32_SW:
                crc32_alg = alg;
                /* fall-through */
        default:
                break;
        }
}

/* Setting the best available algorithm */
RTE_INIT(rte_hash_crc_init_alg)
{
        rte_hash_crc_set_alg(CRC32_ARM64);
}

static inline uint32_t
rte_hash_crc_8byte(uint64_t data, uint32_t init_val)
{
        if (likely(crc32_alg == CRC32_ARM64))
                return crc32c_arm64_u64(data, init_val);

        return crc32c_2words(data, init_val);
}
Comment 4 Ilya Maximets 2019-09-04 19:22:59 CEST
(In reply to Jerin from comment #3)
> > BTW, It looks like meson doesn't support xgene1 target.
> 
> Yes. No contribution from APM. Is xgene1 in production? Does not looks like.

It seems that Lounchpad build servers uses it, this is the origin of this issue.

> 
> After looking at the CRC code carefully, selection of CRC instruction is
> based on runtime check. It looks like segfault is coming from OVS code, not
> from DPDK.

Yes, the crash goes from OVS code, but it happens because DPDK defines
__ARM_FEATURE_CRC32 by forcing incorrect -march value.
Comment 5 Ajit Khaparde 2019-09-04 20:51:21 CEST
Jerin, I am parking this bug with you since you are already interacting with Ilya. Please reassign if needed. Thanks
Comment 6 Honnappa Nagarahalli 2019-09-05 06:15:47 CEST
Is it possible to add a run time check for CRC32 support in OVS?
Comment 7 Ilya Maximets 2019-09-05 07:34:15 CEST
(In reply to Honnappa Nagarahalli from comment #6)
> Is it possible to add a run time check for CRC32 support in OVS?

I'd say no for several reasons:

1. OVS hash library is based on compile time implementation choosing.
   Runtime checks will require significant re-work.

2. Almost all parts of OVS heavily uses hash library. Any runtime checks
   will impact performance significantly.
Comment 8 Jerin 2019-09-05 08:31:04 CEST
> 2. Almost all parts of OVS heavily uses hash library. Any runtime checks
   will impact performance significantly.

Removing CRC from machine flags also will impact the performance of the systems
which HW CRC as OVS has to fallback to SW implementation blindly. So runtime check is the ideal solution.

or use xgene1 config provided by DPDK.
Comment 9 Ilya Maximets 2019-09-05 09:05:13 CEST
(In reply to Jerin from comment #8)
> > 2. Almost all parts of OVS heavily uses hash library. Any runtime checks
>    will impact performance significantly.
> 
> Removing CRC from machine flags also will impact the performance of the
> systems
> which HW CRC as OVS has to fallback to SW implementation blindly. So runtime
> check is the ideal solution.

Removing crc from machine flags will allow OVS to work on machine
without CRC support. No performance questions here. Works or not.
For machines with HW CRC there should be armv8-a+crc march and no
performance difference.

Adding the runtime checks will allow OVS to work on machine without
CRC support, but will slow it down for both cases either CRC supported
or not.

BTW, "runtime check + fallback to sw hash" will be slower anyway
than just "fallback to sw hash" in compile time.

IMHO, changing OVS is not an option. And the main reason is that
OVS code is fully correct.

> or use xgene1 config provided by DPDK.

As we discussed previously, meson doesn't support that.
Comment 10 Jerin 2019-09-05 10:38:53 CEST
> BTW, "runtime check + fallback to sw hash" will be slower anyway
than just "fallback to sw hash" in compile time.

This is on xgene1. Right? How about the CPU with CRC support?
so it is "runtime check + HW CRC" vs "fallback to sw hash" case.

> As we discussed previously, meson doesn't support that.

It is trivial to add that support.
Comment 11 Ilya Maximets 2019-09-05 12:45:43 CEST
(In reply to Jerin from comment #10)
> > BTW, "runtime check + fallback to sw hash" will be slower anyway
> than just "fallback to sw hash" in compile time.
> 
> This is on xgene1. Right? How about the CPU with CRC support?
> so it is "runtime check + HW CRC" vs "fallback to sw hash" case.

For CPU with CRC support DPDK should use armv8-a+crc, so it'll be
"runtime check + HW CRC" vs "HW CRC".
Comment 12 Ilya Maximets 2019-09-05 12:51:25 CEST
In common, one of solutions could look like this:
1. Add support for xgene1 target to meson.
2. Forbid building default target on xgene1 machine.

i.e. if user will want to build default target on xgen1 machine
using legacy make he/she should get an error like:
  "default target is not supported on xgene1,
   please, use special xgene1 target".
meson could detect xgene1 and choose right target by itself.
Comment 13 Jerin 2019-09-05 13:26:39 CEST
Sounds reasonable.

Ajit/Ilya, Are you planning to take it up? I don't have xgene1 board and 
I don't have a plan to work on this.
Comment 14 Ilya Maximets 2019-09-05 13:44:19 CEST
(In reply to Jerin from comment #13)
> Sounds reasonable.
> 
> Ajit/Ilya, Are you planning to take it up? I don't have xgene1 board and 
> I don't have a plan to work on this.

I don't have xgene1 too.
Maybe someone from Canonical could as they have xgene1 on their build servers?
Comment 15 Honnappa Nagarahalli 2019-09-05 21:32:16 CEST
We still need to fix the ability to select functions during runtime. As additional versions of Arm ISA are released, the new features need to be enabled during run time using the generic configuration (the generic configuration cannot just remain at v8.0).

Looking at the performance argument here, why would one use the software algorithm if the underlying architectures already support the algorithm in HW? IMO, the performance of the software algorithm is irrelevant.
Comment 16 Ajit Khaparde 2020-09-16 23:40:58 CEST
Ilya, Is this still a concern? Thanks

Note You need to log in before you can comment on or make changes to this bug.