Bug 959 - tsc hz not matching kernel reported tcs frequency
Summary: tsc hz not matching kernel reported tcs frequency
Status: UNCONFIRMED
Alias: None
Product: DPDK
Classification: Unclassified
Component: core (show other bugs)
Version: 21.11
Hardware: x86 Linux
: Normal normal
Target Milestone: ---
Assignee: dev
URL:
Depends on:
Blocks:
 
Reported: 2022-03-16 13:58 CET by Maria Lingemark
Modified: 2022-05-24 17:00 CEST (History)
1 user (show)



Attachments

Description Maria Lingemark 2022-03-16 13:58:18 CET
On some machines the dpdk value of tsc hz does not match the kernel value.

The effect of this can be observed by writing a small test program that measures elapsed time using two different sources.

The issue has been shown by taking clock_gettime() and rte_get_tsc_cycles(),
sleep, and then fetch the time stamps again with clock_gettime() and rte_get_tsc_cycles().

When calculating the time from the two sources we notice a difference of ~2.3 ms per second.

Example from a test run with 1 second sleep:

rte_get_tsc_hz=2400000000
clock_gettime=1000056 us, tsc_time=997708 us, diff=2348 us

Info from dmesg:
tsc: Refined TSC clocksource calibration: 2394.365 MHz

We don't see the time difference on 
Intel(R) Xeon(R) CPU E5-2680 v4 @ 2.40GHz
Linux kernel 4.13.0-16-generic

but we do observe it on 
Intel(R) Xeon(R) Gold 6148 CPU @ 2.40GHz
Linux kernel 5.4.0-100-generic
Comment 1 Morten Brørup 2022-05-24 16:43:27 CEST
This is probably a Linux kernel problem. I have an old AMD APU motherboard where the kernel's initial clock calibration function (which uses the PIT as reference) sometimes gets the CPU frequency slightly wrong. We found out that kernel's calibration routine in tsc.c very quickly times out, and accepts quite a lot of inaccuracy - at least in older kernel versions.

If possible, could you compare to an external reference clock, to tell which of the two clocks are wrong (clock_gettime or tsc_time)? Let the test run for an hour, and compare to your wrist watch. 2.3 permille is 8 seconds per hour.

PS: I assume you are using CLOCK_MONOTONIC_RAW, so your results are not affected by NTP or similar.
Comment 2 Maria Lingemark 2022-05-24 17:00:07 CEST
(In reply to Morten Brørup from comment #1)
> This is probably a Linux kernel problem. I have an old AMD APU motherboard
> where the kernel's initial clock calibration function (which uses the PIT as
> reference) sometimes gets the CPU frequency slightly wrong. We found out
> that kernel's calibration routine in tsc.c very quickly times out, and
> accepts quite a lot of inaccuracy - at least in older kernel versions.
> 
> If possible, could you compare to an external reference clock, to tell which
> of the two clocks are wrong (clock_gettime or tsc_time)? Let the test run
> for an hour, and compare to your wrist watch. 2.3 permille is 8 seconds per
> hour.
> 
> PS: I assume you are using CLOCK_MONOTONIC_RAW, so your results are not
> affected by NTP or similar.

We did a test, running for 30 minutes with a phone stopwatch.

clock_gettime=1800000060 us, tsc_time=1795773149 us, diff=4226911 us

Phone time: 30.00.25

Which points to clock_gettime being the correct time.

This test was done with CLOCK_MONOTONIC, but we have done the shorter tests with both CLOCK_MONOTONIC and CLOCK_MONOTONIC_RAW with the same results.

Note You need to log in before you can comment on or make changes to this bug.