Project

General

Profile

Actions

Bug #711

closed

dntpd

Added by pavalos over 17 years ago. Updated over 17 years ago.

Status:
Closed
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
Start date:
Due date:
% Done:

0%

Estimated time:

Description

This is crazy:

  1. grep ntpd /var/log/all.log
    Jun 24 02:06:39 ylem dntpd629: issuing frequency adjustment: 338.189ppm
    Jun 24 02:22:27 ylem dntpd629: issuing offset adjustment: 3.743491
    Jun 24 02:53:35 ylem dntpd629: issuing offset adjustment: 2.301601
    Jun 24 06:45:12 ylem dntpd629: issuing frequency adjustment: 554.987ppm
    Jun 24 11:09:01 ylem dntpd629: issuing frequency adjustment: 339.188ppm
    Jun 24 14:13:07 ylem dntpd629: issuing COARSE offset adjustment: -3587.777365, 24-Jun-2007 14:13:07.027
    Jun 24 14:37:39 ylem dntpd629: issuing frequency adjustment: 1447.123ppm
    Jun 24 15:48:12 ylem dntpd629: issuing COARSE offset adjustment: 3598.513848, 24-Jun-2007 15:48:12.709
    Jun 24 15:53:26 ylem dntpd629: issuing frequency adjustment: 1358.734ppm
    Jun 24 15:58:43 ylem dntpd629: issuing frequency adjustment: 1290.414ppm
    Jun 24 16:04:00 ylem dntpd629: issuing frequency adjustment: 1240.480ppm
    Jun 24 16:09:15 ylem dntpd629: issuing frequency adjustment: 1200.177ppm
    Jun 24 16:14:31 ylem dntpd629: issuing frequency adjustment: 1166.282ppm

There's no servers directive, so here's what dntpd.conf looks like:

  1. cat /etc/dntpd.conf
    server us.pool.ntp.org
    server us.pool.ntp.org
    server us.pool.ntp.org
    server us.pool.ntp.org

Operating on -HEAD sources as of June 22.

--Peter

Actions #1

Updated by dillon over 17 years ago

:This is crazy:
:
:# grep ntpd /var/log/all.log
:Jun 24 02:06:39 ylem dntpd629: issuing frequency adjustment: 338.189ppm
:Jun 24 02:22:27 ylem dntpd629: issuing offset adjustment: 3.743491
:...
:
:There's no servers directive, so here's what dntpd.conf looks like:
:
:# cat /etc/dntpd.conf
:server us.pool.ntp.org
:server us.pool.ntp.org
:server us.pool.ntp.org
:server us.pool.ntp.org
:
:Operating on -HEAD sources as of June 22.

Does your machine have a history of problems keeping time or is this
something new? Have you changed your video settings or operation
recently?... video bioses are known to mess up 8254 based timing
(which also means you are running without ACPI). What kind of machine
is it?
It kinda sounds like the bios is interfering with the 8254, and not
something broken with dntpd. If that is the case then you can tell
the kernel to use a different 8254 timer with:
hw.i8254.walltimer=1
in your /boot/loader.conf.  Sometimes this works, sometimes this
locks up the box. If it locks up the box you will have to reboot,
break out of the boot menu and unset the variable in the loader before
booting so you can boot your kernel and then remove the line.
-Matt
Matthew Dillon
<>
Actions #2

Updated by dillon over 17 years ago

:This is crazy:
:
:# grep ntpd /var/log/all.log
:Jun 24 02:06:39 ylem dntpd629: issuing frequency adjustment: 338.189ppm
:Jun 24 02:22:27 ylem dntpd629: issuing offset adjustment: 3.743491
:Jun 24 02:53:35 ylem dntpd629: issuing offset adjustment: 2.301601
:Jun 24 06:45:12 ylem dntpd629: issuing frequency adjustment: 554.987ppm
:Jun 24 11:09:01 ylem dntpd629: issuing frequency adjustment: 339.188ppm
:Jun 24 14:13:07 ylem dntpd629: issuing COARSE offset adjustment: -3587.777365, 24-Jun-2007 14:13:07.027
:Jun 24 14:37:39 ylem dntpd629: issuing frequency adjustment: 1447.123ppm
:Jun 24 15:48:12 ylem dntpd629: issuing COARSE offset adjustment: 3598.513848, 24-Jun-2007 15:48:12.709
:Jun 24 15:53:26 ylem dntpd629: issuing frequency adjustment: 1358.734ppm
:Jun 24 15:58:43 ylem dntpd629: issuing frequency adjustment: 1290.414ppm
:Jun 24 16:04:00 ylem dntpd629: issuing frequency adjustment: 1240.480ppm
:Jun 24 16:09:15 ylem dntpd629: issuing frequency adjustment: 1200.177ppm
:Jun 24 16:14:31 ylem dntpd629: issuing frequency adjustment: 1166.282ppm
:
:There's no servers directive, so here's what dntpd.conf looks like:

Hmm.  Wait a second... 3587?  3598?  That's 3600 seconds == one hour.
It sounds like one or more servers is off by an hour. But the ppm
adjustments are also pretty crazy. It's over 1 millisecond per second!
Did you reboot inbetween any of those reports?  It could be you told
the kernel the real time clock was operating in local time instead of
GMT and the kernel isn't writing it out properly. Do you have
a /etc/wall_cmos_clock file? If you do, that could be the problem.
Its possible that I am not parsing something properly, but insofar as
I know the NTP protocol just passes a raw timestamp.
-Matt
Actions #3

Updated by pavalos over 17 years ago

This is new. The only change I made was I replaced one of my hard drives
with a bigger one.

The machine is a dual Xeon. I'm not using ACPI. I suppose I'll try the
walltimer.

--Peter

Actions #4

Updated by dillon over 17 years ago

:This is new. The only change I made was I replaced one of my hard drives
:with a bigger one.
:
:The machine is a dual Xeon. I'm not using ACPI. I suppose I'll try the
:walltimer.
:
:--Peter

Try specifying a single NTP server.  It could be that one of the
servers in the pool is broken and causing dntpd to flip back and forth.
-Matt
Matthew Dillon
<>
Actions #5

Updated by pavalos over 17 years ago

Like I said...crazy.

No reboots and I don't have a wall_cmos_clock. Basically...nothing changed
afiact.

Here's a grep of the past week. That disk was replaced on the 22nd.

Jun 24 02:06:39 ylem dntpd629: issuing frequency adjustment: 338.189ppm
Jun 24 02:22:27 ylem dntpd629: issuing offset adjustment: 3.743491
Jun 24 02:53:35 ylem dntpd629: issuing offset adjustment: 2.301601
Jun 24 06:45:12 ylem dntpd629: issuing frequency adjustment: 554.987ppm
Jun 24 11:09:01 ylem dntpd629: issuing frequency adjustment: 339.188ppm
Jun 24 14:13:07 ylem dntpd629: issuing COARSE offset adjustment: -3587.777365, 24-Jun-2007 14:13:07.027
Jun 24 14:37:39 ylem dntpd629: issuing frequency adjustment: 1447.123ppm
Jun 24 15:48:12 ylem dntpd629: issuing COARSE offset adjustment: 3598.513848, 24-Jun-2007 15:48:12.709
Jun 24 15:53:26 ylem dntpd629: issuing frequency adjustment: 1358.734ppm
Jun 24 15:58:43 ylem dntpd629: issuing frequency adjustment: 1290.414ppm
Jun 24 16:04:00 ylem dntpd629: issuing frequency adjustment: 1240.480ppm
Jun 24 16:09:15 ylem dntpd629: issuing frequency adjustment: 1200.177ppm
Jun 24 16:14:31 ylem dntpd629: issuing frequency adjustment: 1166.282ppm
Jun 24 16:46:08 ylem dntpd629: issuing frequency adjustment: 1082.083ppm

Jun 23 00:02:40 ylem dntpd629: issuing frequency adjustment: 26792.744ppm
Jun 23 04:31:50 ylem dntpd629: issuing COARSE offset adjustment: -346.983869, 23-Jun-2007 04:31:50.889
Jun 23 04:31:50 ylem dntpd629: issuing frequency adjustment: 141.977ppm
Jun 23 04:52:45 ylem dntpd629: issuing offset adjustment: -58.732077
Jun 23 09:03:38 ylem dntpd629: issuing offset adjustment: -0.081487
Jun 23 09:24:38 ylem dntpd629: issuing offset adjustment: -0.006328
Jun 23 09:45:41 ylem dntpd629: issuing offset adjustment: 0.010649
Jun 23 11:41:10 ylem dntpd629: issuing offset adjustment: -0.006669
Jun 23 13:31:10 ylem dntpd629: issuing offset adjustment: 0.003460
Jun 23 13:46:52 ylem dntpd629: issuing offset adjustment: 0.004132
Jun 23 15:00:28 ylem dntpd629: issuing offset adjustment: -0.008398
Jun 23 16:50:32 ylem dntpd629: issuing offset adjustment: 0.011628
Jun 23 17:47:58 ylem dntpd629: issuing offset adjustment: -0.003545
Jun 23 18:08:53 ylem dntpd629: issuing offset adjustment: 0.005953
Jun 23 19:22:22 ylem dntpd629: issuing offset adjustment: -0.009702
Jun 23 21:12:38 ylem dntpd629: issuing offset adjustment: 0.010340

Jun 22 16:26:31 ylem dntpd73041: issuing frequency adjustment: 3731.489ppm
Jun 22 16:31:49 ylem dntpd73041: issuing frequency adjustment: 3798.854ppm
Jun 22 16:37:08 ylem dntpd73041: issuing frequency adjustment: 3862.556ppm
Jun 22 16:42:29 ylem dntpd73041: issuing frequency adjustment: 3936.830ppm
Jun 22 16:47:46 ylem dntpd73041: issuing frequency adjustment: 4010.071ppm
Jun 22 16:53:06 ylem dntpd73041: issuing frequency adjustment: 4071.539ppm
Jun 22 17:04:55 ylem kernel: Starting dntpd.
Jun 22 17:04:55 ylem dntpd616: dntpd version 1.0 started
Jun 22 17:05:12 ylem dntpd616: issuing offset adjustment: 29.684924
Jun 22 17:10:45 ylem dntpd616: issuing frequency adjustment: 148.342ppm
Jun 22 19:45:46 ylem kernel: Starting dntpd.
Jun 22 19:45:46 ylem dntpd614: dntpd version 1.0 started
Jun 22 19:46:03 ylem dntpd614: issuing offset adjustment: 1.346819
Jun 22 19:53:13 ylem kernel: Starting dntpd.
Jun 22 19:53:13 ylem dntpd612: dntpd version 1.0 started
Jun 22 19:53:30 ylem dntpd612: issuing offset adjustment: 1.449639
Jun 22 19:59:04 ylem dntpd612: issuing frequency adjustment: 145.648ppm
Jun 22 20:16:35 ylem kernel: Starting dntpd.
Jun 22 20:16:36 ylem dntpd613: dntpd version 1.0 started
Jun 22 20:16:52 ylem dntpd613: issuing offset adjustment: 1.421246
Jun 22 20:24:44 ylem kernel: Starting dntpd.
Jun 22 20:24:44 ylem dntpd618: dntpd version 1.0 started
Jun 22 20:25:01 ylem dntpd618: issuing offset adjustment: 1.743558
Jun 22 20:30:35 ylem dntpd618: issuing frequency adjustment: 312.421ppm
Jun 22 20:41:05 ylem dntpd618: issuing frequency adjustment: 277.720ppm
Jun 22 21:44:44 ylem kernel: Starting dntpd.
Jun 22 21:44:44 ylem dntpd615: dntpd version 1.0 started
Jun 22 21:45:01 ylem dntpd615: issuing offset adjustment: 2.932254
Jun 22 21:53:59 ylem kernel: Starting dntpd.
Jun 22 21:53:59 ylem dntpd616: dntpd version 1.0 started
Jun 22 21:54:15 ylem dntpd616: issuing offset adjustment: 4.018087
Jun 22 22:04:30 ylem kernel: Starting dntpd.
Jun 22 22:04:31 ylem dntpd629: dntpd version 1.0 started
Jun 22 22:04:47 ylem dntpd629: issuing offset adjustment: 4.841953
Jun 22 22:10:19 ylem dntpd629: issuing frequency adjustment: 474.179ppm
Jun 22 22:57:41 ylem dntpd629: issuing frequency adjustment: 25446.670ppm
Jun 22 23:08:29 ylem dntpd629: issuing frequency adjustment: 30413.801ppm
Jun 22 23:13:53 ylem dntpd629: issuing frequency adjustment: 31116.993ppm
Jun 22 23:19:18 ylem dntpd629: issuing frequency adjustment: 31014.725ppm
Jun 22 23:24:45 ylem dntpd629: issuing frequency adjustment: 30421.985ppm
Jun 22 23:30:13 ylem dntpd629: issuing frequency adjustment: 29545.499ppm

Jun 21 00:48:49 ylem dntpd73041: issuing frequency adjustment: 465.512ppm
Jun 21 02:50:29 ylem dntpd73041: issuing frequency adjustment: 629.095ppm
Jun 21 05:07:37 ylem dntpd73041: issuing offset adjustment: 5.351704
Jun 21 05:07:37 ylem dntpd73041: issuing frequency adjustment: 444.871ppm
Jun 21 05:44:10 ylem dntpd73041: issuing offset adjustment: -1.266207
Jun 21 06:21:11 ylem dntpd73041: issuing offset adjustment: -0.644232
Jun 21 07:51:09 ylem dntpd73041: issuing frequency adjustment: 387.807ppm
Jun 21 08:12:14 ylem dntpd73041: issuing frequency adjustment: 297.753ppm
Jun 21 08:28:01 ylem dntpd73041: issuing offset adjustment: -1.392859
Jun 21 11:48:13 ylem dntpd73041: issuing frequency adjustment: 232.549ppm
Jun 21 11:48:51 ylem dntpd73041: issuing frequency adjustment: 264.763ppm
Jun 21 13:24:00 ylem dntpd73041: issuing frequency adjustment: 231.455ppm
Jun 21 16:18:03 ylem dntpd73041: issuing frequency adjustment: 158.302ppm
Jun 21 18:40:52 ylem dntpd73041: issuing offset adjustment: -1.510266
Jun 21 19:06:29 ylem dntpd73041: issuing offset adjustment: -0.355233

Jun 20 02:42:43 ylem dntpd73041: issuing offset adjustment: 0.002615
Jun 20 03:03:51 ylem dntpd73041: issuing offset adjustment: -0.001543
Jun 20 03:51:22 ylem dntpd73041: issuing offset adjustment: 0.003253
Jun 20 05:52:59 ylem dntpd73041: issuing offset adjustment: -0.004461
Jun 20 07:54:33 ylem dntpd73041: issuing offset adjustment: -0.001060
Jun 20 08:15:32 ylem dntpd73041: issuing offset adjustment: -0.005544
Jun 20 09:08:23 ylem dntpd73041: issuing offset adjustment: 0.007024
Jun 20 11:09:51 ylem dntpd73041: issuing offset adjustment: -0.005646
Jun 20 13:11:34 ylem dntpd73041: issuing offset adjustment: 0.006287
Jun 20 13:32:27 ylem dntpd73041: issuing offset adjustment: -0.002789
Jun 20 14:25:22 ylem dntpd73041: issuing offset adjustment: 0.002805
Jun 20 19:52:41 ylem dntpd73041: issuing frequency adjustment: 289.730ppm
Jun 20 21:38:19 ylem dntpd73041: issuing frequency adjustment: 333.651ppm
Jun 20 23:08:22 ylem dntpd73041: issuing frequency adjustment: 292.546ppm
Jun 20 23:13:39 ylem dntpd73041: issuing frequency adjustment: 329.224ppm
Jun 20 23:40:03 ylem dntpd73041: issuing frequency adjustment: 305.610ppm

Jun 18 02:01:14 ylem dntpd73041: issuing offset adjustment: -2.026739
Jun 18 02:28:23 ylem dntpd73041: issuing offset adjustment: -0.591158
Jun 18 03:05:25 ylem dntpd73041: issuing offset adjustment: 0.003780
Jun 18 05:12:17 ylem dntpd73041: issuing offset adjustment: -0.003959
Jun 18 17:17:45 ylem dntpd73041: issuing offset adjustment: 0.013625

Jun 17 02:08:53 ylem dntpd73041: issuing offset adjustment: 0.346366
Jun 17 02:36:34 ylem dntpd73041: issuing offset adjustment: -0.153339
Jun 17 09:18:10 ylem dntpd73041: issuing offset adjustment: -0.034912
Jun 17 09:40:26 ylem dntpd73041: issuing offset adjustment: -0.007264
Jun 17 15:09:15 ylem dntpd73041: issuing frequency adjustment: 351.922ppm
Jun 17 16:29:21 ylem dntpd73041: issuing frequency adjustment: 406.836ppm
Jun 17 18:51:17 ylem dntpd73041: issuing offset adjustment: 3.181001
Jun 17 19:18:45 ylem dntpd73041: issuing offset adjustment: -0.341387
Jun 17 21:15:01 ylem dntpd73041: issuing frequency adjustment: 285.856ppm
Jun 17 23:37:42 ylem dntpd73041: issuing frequency adjustment: 163.266ppm

Jun 16 02:13:41 ylem dntpd73041: issuing offset adjustment: 0.577109
Jun 16 03:13:23 ylem dntpd73041: issuing offset adjustment: -0.114155
Jun 16 04:53:48 ylem dntpd73041: issuing offset adjustment: -0.005487
Jun 16 06:50:05 ylem dntpd73041: issuing offset adjustment: 0.005330
Jun 16 07:12:46 ylem dntpd73041: issuing offset adjustment: -0.007755
Jun 16 08:00:22 ylem dntpd73041: issuing offset adjustment: 0.007686
Jun 16 09:40:45 ylem dntpd73041: issuing offset adjustment: -0.006718
Jun 16 16:36:24 ylem dntpd73041: issuing frequency adjustment: 658.908ppm
Jun 16 18:43:20 ylem dntpd73041: issuing frequency adjustment: 548.854ppm
Jun 16 19:15:08 ylem dntpd73041: issuing frequency adjustment: 524.173ppm
Jun 16 21:22:10 ylem dntpd73041: issuing frequency adjustment: 164.604ppm

Actions #6

Updated by pavalos over 17 years ago

Doesn't dntpd sense that a server is broken and not use it?

Does openntpd do that?

--Peter

Actions #7

Updated by dillon over 17 years ago

:Doesn't dntpd sense that a server is broken and not use it?
:
:Does openntpd do that?
:
:--Peter

How does one sense that a server is broken?  Maybe with enough
sources we can detect that one is seriously out of line with the
others. In fact, that would be pretty easy to do, I'll look into
it.
Try running it with a single server line.  I ran a test on my test
box with 5 server us.pool.ntp.org lines and this is what I got:

Jun 24 14:28:39 test28 dntpd858: dntpd version 1.0 started
Jun 24 14:28:56 test28 dntpd858: issuing offset adjustment: -0.005709
Jun 24 14:34:29 test28 dntpd858: issuing frequency adjustment: -111.292ppm
Jun 24 14:45:01 test28 dntpd858: issuing offset adjustment: -0.017869
Jun 24 15:05:58 test28 dntpd858: issuing offset adjustment: 0.028012
Jun 24 15:26:57 test28 dntpd858: issuing offset adjustment: -0.005782
Jun 24 15:48:01 test28 dntpd858: issuing offset adjustment: 0.011286
Jun 24 17:27:57 test28 dntpd858: issuing offset adjustment: -0.010092
Jun 24 19:34:00 test28 dntpd858: issuing offset adjustment: 0.013780
Jun 24 19:54:58 test28 dntpd858: issuing offset adjustment: -0.025465
Jun 24 20:16:00 test28 dntpd858: issuing offset adjustment: 0.021708
Jun 24 22:21:59 test28 dntpd858: issuing offset adjustment: -0.006082
Jun 25 00:01:37 test28 dntpd858: issuing offset adjustment: -0.021242
Jun 25 00:22:41 test28 dntpd858: issuing offset adjustment: 0.016988
Jun 25 01:09:56 test28 dntpd858: issuing offset adjustment: 0.008061
Jun 25 03:15:57 test28 dntpd858: issuing offset adjustment: 0.001252
Jun 25 04:29:16 test28 dntpd858: issuing offset adjustment: -0.024935
Jun 25 04:50:15 test28 dntpd858: issuing offset adjustment: -0.002704
Jun 25 06:03:40 test28 dntpd858: issuing offset adjustment: 0.024320
Jun 25 08:09:44 test28 dntpd858: issuing offset adjustment: -0.027622
Jun 25 08:56:38 test28 dntpd858: issuing offset adjustment: 0.012390
Jun 25 09:17:37 test28 dntpd858: issuing offset adjustment: 0.008970

Normally, with a single server line, it issues a frequency correction
and one or two offset adjustments and that's it for the next day or two.
With several server lines it will select the one with the best
stability once or twice an hour, which can ping pong around. That's
ok... the above lines are not bad at all, usually corrections are
only a few milliseconds either way.
-Matt
Matthew Dillon
<>
Actions #8

Updated by dillon over 17 years ago

:No reboots and I don't have a wall_cmos_clock. Basically...nothing changed
:afiact.
:
:Here's a grep of the past week. That disk was replaced on the 22nd.

Well, your machine's timebase is not very stable, even looking at
the earliest reports. It's not totally bogus, but it definitely looks
like the frequency is thermally unstable. What is it using as
the system clock? Do this:
sysctl kern.cputimer
I really can't be sure until you run dntpd with just a single server
line, but usually the frequency adjustment is server-agnostic (won't
ping pong around with different servers), even if the time adjustment
does ping pong around a few milliseconds here and there.
A stable timebase should stick to less then a 50 ppm range.

:Jun 20 19:52:41 ylem dntpd73041: issuing frequency adjustment: 289.730ppm
:Jun 20 21:38:19 ylem dntpd73041: issuing frequency adjustment: 333.651ppm
:Jun 20 23:08:22 ylem dntpd73041: issuing frequency adjustment: 292.546ppm
:Jun 20 23:13:39 ylem dntpd73041: issuing frequency adjustment: 329.224ppm
:Jun 20 23:40:03 ylem dntpd73041: issuing frequency adjustment: 305.610ppm
:Jun 17 15:09:15 ylem dntpd73041: issuing frequency adjustment: 351.922ppm
:Jun 17 16:29:21 ylem dntpd73041: issuing frequency adjustment: 406.836ppm
:Jun 17 21:15:01 ylem dntpd73041: issuing frequency adjustment: 285.856ppm
:Jun 17 23:37:42 ylem dntpd73041: issuing frequency adjustment: 163.266ppm
:Jun 16 16:36:24 ylem dntpd73041: issuing frequency adjustment: 658.908ppm
:Jun 16 18:43:20 ylem dntpd73041: issuing frequency adjustment: 548.854ppm
:Jun 16 19:15:08 ylem dntpd73041: issuing frequency adjustment: 524.173ppm
:Jun 16 21:22:10 ylem dntpd73041: issuing frequency adjustment: 164.604ppm

:Jun 18 15:19:28 test28 dntpd611: issuing frequency adjustment: -99.871ppm
:Jun 18 15:49:00 test28 dntpd611: issuing frequency adjustment: -107.887ppm
:Jun 18 16:11:40 test28 dntpd611: issuing frequency adjustment: -99.721ppm
:Jun 18 18:27:28 test28 dntpd612: issuing frequency adjustment: -104.664ppm
:Jun 17 02:04:02 test28 dntpd611: issuing frequency adjustment: -104.562ppm
:Jun 17 02:15:20 test28 dntpd611: issuing frequency adjustment: -105.193ppm
:Jun 17 02:23:51 test28 dntpd611: issuing frequency adjustment: -104.339ppm
:Jun 17 02:41:41 test28 dntpd612: issuing frequency adjustment: -90.960ppm
:Jun 17 03:36:29 test28 dntpd611: issuing frequency adjustment: -104.870ppm
:Jun 17 15:48:14 test28 dntpd612: issuing frequency adjustment: -106.046ppm
:Jun 16 18:11:07 test28 dntpd611: issuing frequency adjustment: -100.191ppm
:Jun 16 19:09:55 test28 dntpd611: issuing frequency adjustment: -116.056ppm
:Jun 16 19:58:46 test28 dntpd611: issuing frequency adjustment: -101.668ppm

I'm going to do some work on dntpd to try to correct two unrelated
issues. First I'll try to have it detect a bad time source when
several are available. Second, I'll have it re-run the DNS lookup
if a server stops responding and I'll have it detect duplicate IPs.
-Matt
Matthew Dillon
<>
-Matt
Actions #9

Updated by corecode over 17 years ago

How about taking the median of offsets to all servers which are considered "good"? Good would be something like "off less than two times the standard deviation" or so.

cheers
simon

Actions #10

Updated by dillon over 17 years ago

:Matthew Dillon wrote:
:> I'm going to do some work on dntpd to try to correct two unrelated
:> issues. First I'll try to have it detect a bad time source when
:> several are available. Second, I'll have it re-run the DNS lookup
:> if a server stops responding and I'll have it detect duplicate IPs.
:
:How about taking the median of offsets to all servers which are considered "good"? Good would be something like "off less than two times the standard deviation" or so.
:
:cheers
: simon

I don't think you can quantify the accuracy of a median offset verses
a statistically good offset returned by a single server, so no.
What I did, amoung other things, was add a quorum calculation to the
code. Now it chooses the best server and then requires that a quorum
of servers agree that the offset reported by that server is reasonable.
There might be a way to algorithmically reduce offset errors (due to
packet lag) relative to real time given multiple good sources, but I
haven't researched it.
Another thing we can do is 'lock' to a particular server as long as
its accuracy does not deviate from the current best available by too
much. I have not coded something like that yet.
-Matt
Matthew Dillon
<>
Actions #11

Updated by pavalos over 17 years ago

  1. sysctl kern.cputimer
    kern.cputimer.freq: 1193182
    kern.cputimer.clock: 419209229
    kern.cputimer.name: i8254_timer2
    kern.cputimer.select: i8254_timer2 dummy

--Peter

Actions #12

Updated by corecode over 17 years ago

I don't understand you there. What is the definition of a "good offset"? Without knowledge of the real time, we have a set of offsets, in the best case all about the same. There the median does not hurt. If we have offsets which are spaced more, which one should we choose? Probabaly one not on the outer edges of the distribution -- the median again gives good results.

Or maybe I am missunderstanding what you're telling me.

Ah, I've read your code now. I see we might be talking about different metrics for "good offsets":

1. stable offset from the server, i.e. few jitter
2. correct time from the server

you are checking for (1) in client_check(). The quorum check might be sufficient for (2), however when declaring a server as "insane", we might be loosing information. Of course the +/- 30 seconds tolerance right now is way too high, best would be sub-second.

The point is, the best jitter-free time source is worth nothing if it is off one second. Yes, when running an ntpd which also does frequency corrections, I want to have an exact time: otherwise, I could simply run ntpdate every hour from cron.

So how do we select the best time source? First, it needs to be the right offset, and this needs to be jitter-free. So what I think we could do is the following:

1. Strip insane servers, i.e. those who are way off the average.
2. Select the median offset of the remaining servers' samples (not averages)
3. Now however, we might have selected a jittery source, so search up
and down to find a "better" server: one which has the best samples. I'd try
a sum of quadratic differences to the selected "best temptative offset (2)"
to select the best server (jittery servers will more likey drop out unless
they are significantly closer to the offset).
4. using this selected server, I'd take the median (never the average, averages
smear errors) of the samples of this server.

Maybe we should run some traces on the received packets and then evaluate different algorithms (best tracking the real time using a radio clock in parallel).

cheers
simon

Actions #13

Updated by dillon over 17 years ago

:you are checking for (1) in client_check(). The quorum check might be su=
:fficient for (2), however when declaring a server as "insane", we might b=
:e loosing information. Of course the +/- 30 seconds tolerance right now =
:is way too high, best would be sub-second.

And so I say in the comment.  Sub-second is definitely what we want
there, but I don't want to blow it up for people on modem connections
so I'm think something like 0.5 seconds. It's just a sanity check,
really.

:The point is, the best jitter-free time source is worth nothing if it is =
:off one second. Yes, when running an ntpd which also does frequency corr=
:...
:2. Select the median offset of the remaining servers' samples (not averag=
:es)
:3. Now however, we might have selected a jittery source, so search up
: and down to find a "better" server: one which has the best samples. =
:I'd try
: a sum of quadratic differences to the selected "best temptative offset=
: (2)"
: to select the best server (jittery servers will more likey drop out un=
:less
: they are significantly closer to the offset).
:4. using this selected server, I'd take the median (never the average, av=
:erages
: smear errors) of the samples of this server.
:
:Maybe we should run some traces on the received packets and then evaluate=
: different algorithms (best tracking the real time using a radio clock in=
: parallel).
:
:cheers
: simon

When I run dntpd in debug mode with 8 of us.pool.ntp.org's sources
an extract the offsets, this is what I get after 10 minutes of sampling:

dntpd -F -d -l 4 -f /etc/dntpd.conf

(hacked up output)
off=+0.015458 slope +0.000106 yint +0.03 corr +0.996176 freq_ppm +106.04 stddev 0.000393
off=+0.008142 slope +0.000102 yint +0.02 corr +0.998817 freq_ppm +102.14 stddev 0.001171
off=+0.003510 slope +0.000108 yint +0.02 corr +0.995592 freq_ppm +107.65 stddev 0.000233
off=-0.000043 slope +0.000102 yint +0.02 corr +0.999995 freq_ppm +102.26 stddev 0.000016
off=-0.003430 slope +0.000107 yint +0.01 corr +0.999781 freq_ppm +107.44 stddev 0.001419
off=-0.007099 slope +0.000105 yint +0.01 corr +0.997070 freq_ppm +104.53 stddev 0.000057
off=-0.014571 slope +0.000105 yint +0.00 corr +0.995904 freq_ppm +104.86 stddev 0.000013
off=-0.015090 slope +0.000102 yint +0.00 corr +0.999548 freq_ppm +102.28 stddev 0.000222

That's the raw output from the linear regression sorted by offset.
On the bright side, dntpd has no problem calculating the frequency
drift after 10 minutes. Look at those wonderful numbers... it
has it down to 3 or 4 parts per million (3 or 4 uS per second).
Just for the hell of it I put all 15 of pool.ntp.org and all 15 of
us.pool.ntp.org's time sources in my dntpd.conf. Ready? Here is
what it looks like after 10 minutes:

off=+0.025647 uoff=+0.089388 slope +0.000103 yint +0.02 corr +0.999987 freq_ppm +103.28 stddev 0.000274
off=+0.022657 uoff=+0.086397 slope +0.000103 yint +0.02 corr +0.999953 freq_ppm +102.76 stddev 0.000139
off=+0.017995 uoff=+0.081774 slope +0.000104 yint +0.01 corr +0.999668 freq_ppm +103.53 stddev 0.000046
off=+0.016187 uoff=+0.079927 slope +0.000104 yint +0.01 corr +0.999180 freq_ppm +103.58 stddev 0.000894
off=+0.013014 uoff=+0.076755 slope +0.000103 yint +0.01 corr +0.999982 freq_ppm +102.97 stddev 0.000240
off=+0.010906 uoff=+0.074646 slope +0.000102 yint +0.00 corr +0.999972 freq_ppm +102.10
off=+0.008651 uoff=+0.072431 slope +0.000104 yint +0.00 corr +0.999991 freq_ppm +104.22 stddev 0.000426
off=+0.008274 uoff=+0.072014 slope +0.000105 yint -0.00 corr +0.998512 freq_ppm +105.09 stddev 0.000651
off=+0.006000 uoff=+0.069741 slope +0.000103 yint -0.00 corr +0.999976 freq_ppm +103.15 stddev 0.000235
off=+0.003832 uoff=+0.067588 slope +0.000103 yint -0.00 corr +0.999980 freq_ppm +102.56 stddev 0.000057
off=+0.003539 uoff=+0.067318 slope +0.000104 yint -0.00 corr +0.999965 freq_ppm +104.00 stddev 0.000100
off=+0.003278 uoff=+0.067058 slope +0.000103 yint -0.00 corr +0.999981 freq_ppm +103.13 stddev 0.000139
off=+0.002149 uoff=+0.065890 slope +0.000106 yint -0.01 corr +0.999830 freq_ppm +105.83 stddev 0.000110
off=+0.001512 uoff=+0.065250 slope +0.000103 yint -0.01 corr +0.999239 freq_ppm +102.58 stddev 0.000450
off=+0.001466 uoff=+0.065193 slope +0.000081 yint +0.00 corr +0.626335 freq_ppm +081.34 stddev 0.002604
off=+0.000198 uoff=+0.063939 slope +0.000102 yint -0.01 corr +0.999970 freq_ppm +102.28 stddev 0.000426
off=-0.000091 uoff=+0.063650 slope +0.000101 yint -0.00 corr +0.999390 freq_ppm +100.66 stddev 0.002487
off=-0.001574 uoff=+0.062166 slope +0.000106 yint -0.01 corr +0.998864 freq_ppm +105.67 stddev 0.000349
off=-0.001781 uoff=+0.061960 slope +0.000076 yint +0.00 corr +0.589265 freq_ppm +076.36 stddev 0.000309
off=-0.002044 uoff=+0.061695 slope +0.000103 yint -0.01 corr +0.999863 freq_ppm +103.21 stddev 0.000192
off=-0.003443 uoff=+0.060299 slope +0.000103 yint -0.01 corr +0.999986 freq_ppm +102.62 stddev 0.000037
off=-0.003540 uoff=+0.060217 slope +0.000108 yint -0.01 corr +0.998497 freq_ppm +108.23 stddev 0.001595
off=-0.006299 uoff=+0.057443 slope +0.000102 yint -0.01 corr +0.999570 freq_ppm +102.01 stddev 0.000057
off=-0.007387 uoff=+0.056393 slope +0.000106 yint -0.01 corr +0.997239 freq_ppm +105.99 stddev 0.004052
off=-0.007648 uoff=+0.056093 slope +0.000102 yint -0.01 corr +0.999995 freq_ppm +102.30 stddev 0.000188
off=-0.008628 uoff=+0.055114 slope +0.000103 yint -0.02 corr +0.999968 freq_ppm +103.14 stddev 0.000087
off=-0.013917 uoff=+0.049825 slope +0.000104 yint -0.02 corr +0.898742 freq_ppm +103.65 stddev 0.004901

That's a lot of data.  Again, the frequency drift calculation is
perfect (the two 'dead' samples have aweful correlations so dntpd knows
they are bad).
The offsets go from +25ms to -14ms.  I see a pattern with the yintercept
again, but it could just be an artifact of the way the linear regression
operates relative to current real time, and while the pattern is a U
shaped curve in the 27 time sources output, it's a straight line in the
8 time sources output so ... maybe not something that I can use.
I'll have to get a better idea of the meaning of the y-intercept in a
linear regression. When I wrote the manual page for dntpd I described
a non-zero y-intercept as indicating a 'shift' in the time source. There
might be some relationship to network lag.
Note that the standard deviation for the +25ms server is very low, and
also has a very good correlation. No help there... it's a very stable
time source.
-Matt
Actions #14

Updated by dillon over 17 years ago

Ok, I made a couple more adjustments. I changed the default
insane deviation to 0.5 seconds and I added an option to allow it
to be specified. So, e.g. in my test run I can set it to 10ms and
it actually works with 8 servers specified, sorta.

dntpd -F -d -l 5 -f /etc/dntpd.conf -i 0.010
Note however that servers will get mapped out for 60 minutes, so
setting too low a value for the insane deviation test could wind
up mapping out ALL of your server sources! I do not recommend
setting it any lower then 25ms or so (-i 0.025) and even that is
probably fairly dangerous since the range with 27 servers was around
38ms. 50ms should be safe (-i 0.050).
I have set the default to 500ms == 0.5 in order to catch servers with
broken seconds and daylight savings handling.
p.s. this also has the tendancy to cause the servers on the edges
if the offset range to not be chosen, since they can't meet the
quorum requirements, and it also tends to force the linear regression
to run more samples before dntpd decides it can do an offset
correction.
-Matt
Actions #15

Updated by dillon over 17 years ago

:# sysctl kern.cputimer
:kern.cputimer.freq: 1193182
:kern.cputimer.clock: 419209229
:kern.cputimer.name: i8254_timer2
:kern.cputimer.select: i8254_timer2 dummy
:
:--Peter

Insofar as I can tell your timebase is just sloppy, but otherwise not
the issue.
I ran a ton of pool.ntp.org servers in test mode and didn't come
across any with seriously broken results, but I'm guessing the ntp
folks fix mistakes like that quickly. In anycase, if that was the
problem the latest dntpd in HEAD will detect it.
-Matt
Matthew Dillon
<>
Actions #16

Updated by pavalos over 17 years ago

Latest code seems to work better.

Actions

Also available in: Atom PDF