Bug #2951: clock_gettime always repots 0 sec, 0 nsec - DragonFlyBSD - DragonFlyBSD bugtracker

Actions

Copy link

Bug #2951

closed

clock_gettime always repots 0 sec, 0 nsec

Added by zhtw about 9 years ago. Updated over 6 years ago.

Status:

Resolved

Priority:

Normal

Assignee:

Category:

Target version:

Start date:

09/16/2016

Due date:

% Done:

Estimated time:

Description

When called with CLOCK_PROF, clock_gettime always returns 0 sec, 0 nsec:

$ uname -a
DragonFly kl.zta.lk 4.6-RELEASE DragonFly v4.6.0.10.g16fba-RELEASE #10: Wed Aug 17 14:26:31 CEST 2016 root@kl.zta.lk:/usr/obj/usr/src/sys/X86_64_GENERIC x86_64
$ cat time.c
#include <sys/time.h>
#include <stdio.h>

int main() {
struct timespec ts;
struct timespec *tp = &ts;

for (int i = 0; i != 1000000; ++i)
    for (int j = 0; j != 1000; ++j)
      ;

int err = clock_gettime(CLOCK_PROF, tp);

printf("err: %d\n", err);

printf("sec: %ld\n", ts.tv_sec);
  printf("nsec: %ld\n", ts.tv_nsec);
}
$ cc -pedantic -Wall time.c 
$ ./a.out 
err: 0
sec: 0
nsec: 0

The same program on a FreeBSD machine seems to work:
$ uname -a
FreeBSD fbsd 10.1-RELEASE FreeBSD 10.1-RELEASE #0 r274401: Tue Nov 11 21:02:49 UTC 2014 root@releng1.nyi.freebsd.org:/usr/obj/usr/src/sys/GENERIC amd64
$ cc -pedantic -Wall time.c
$ ./a.out
err: 0
sec: 1
nsec: 604478000

History
Notes

Actions

Copy link

Updated by zhtw about 9 years ago

I just noticed that in the header sys/time.h there is a whole lot of CLOCK_* constants. Some of them are FreeBSD-specific, some Linux-specific. So technically, but since (at least) CLOCK_PROF is mentioned in the man page, it should work.

Btw, I checked CLOCK_THREAD_CPUTIME_ID & CLOCK_PROCESS_CPUTIME_ID (which I actually need) and they don't work either, but it's not mentioned in the man page, so technically not a bug.

Actions

Copy link

Updated by swildner about 9 years ago

CLOCK_THREAD_CPUTIME_ID isn't broken here, as far as I can see. Are you sure it returns 0 for you?

CLOCK_PROF and CLOCK_VIRTUAL always return 0 here. Not sure what's the cause.

Please check this patch for CLOCK_PROCESS_CPUTIME_ID -> https://leaf.dragonflybsd.org/~swildner/process_cputime.diff

Not sure if that's the best fix.

Actions

Copy link

Updated by zhtw about 9 years ago

swildner wrote:

CLOCK_THREAD_CPUTIME_ID isn't broken here, as far as I can see. Are you sure it returns 0 for you?

I'm sorry, didn't check it thoroughly. It indeed works with CLOCK_THREAD_CPUTIME_ID correctly.

CLOCK_PROF and CLOCK_VIRTUAL always return 0 here. Not sure what's the cause.

Please check this patch for CLOCK_PROCESS_CPUTIME_ID -> https://leaf.dragonflybsd.org/~swildner/process_cputime.diff

Not sure if that's the best fix.

Thank you for the patch.

I was unable to use it as is because the kernel didn't compile: there is no field "ru" in the proc struct. (I tried 4.4 & 4.6 kernels, see below why). But after I renamed ru to p_ru it did compile.

Unfortunately, I can't test it right now, because I only have remote access to both my machines (I'm on a conference now and will return only in 2 weeks). I'm trying to test it with vkernel, but for some reason it doesn't start (with or without your patch).

The message I see is:

a@hk:~/tmp$ sudo /var/vkernel/4.4/boot/kernel/kernel -m 2g -r root.img -d -p /var/tmp/vkernel.vhost-dev.pid
Using memory file: /var/vkernel/memimg.000000
KVM mapped at 0x8000000000-0x10000000000
Copyright (c) 2003-2015 The DragonFly Project.
Copyright (c) 1992-2003 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
The Regents of the University of California. All rights reserved.
DragonFly v4.4.3.10.gfcff0-RELEASE #2: Wed Sep 21 14:47:58 CEST 2016
root@hk.zta.lk:/usr/obj/usr/src/sys/VKERNEL64
real memory = 2147483648 (2097152K bytes)
avail memory = 2044215296 (1996304K bytes)

Fatal trap 12: page fault while in kernel mode
cpuid = 0
fault virtual address = 0x9
fault code = supervisor read, page not present
instruction pointer = 0x2b:0x6ea789
stack pointer = 0x10:0x7fffffffe9f0
frame pointer = 0x10:0x1
processor eflags = interrupt enabled, resume, IOPL = 0
current process = 0 ()
current thread = pri 12
<- SMP: XXX
kernel: type 12 trap, code=0

CPU0 stopping CPUs: 0x0000000000000000
stopped
Stopped at 0x6ea789: movq 0x8(%rbp),%rdi
db>

I'm trying 4.4 because I was never able to run 4.6 vkernel. 4.4 used to work, but I forgot which revision I compiled.

I will file a separate bug report for this.

Once I have physical access to my desktop I will try your patch again on the real kernel.

Actions

Copy link

Updated by zhtw about 9 years ago

I was able to run 4.4 vkernel with your patch (but with ru replaced with p_ru).
Unfortunately it did not help, I still see 0 nsec both in case of CLOCK_PROCESS_CPUTIME_ID and CLOCK_PROF.

(Still can't run 4.6 vkernel.)

Actions

Copy link

Updated by zhtw about 9 years ago

Antonio Huete (tuxillo) just gave me the fix for 4.6 vkernel.
Still no luck, CLOCK_PROCESS_CPUTIME_ID returns 0.

Actions

Copy link

Updated by swildner about 9 years ago

Yes, my bad, sorry. I messed up the patch with some last minute changes which I didn't test properly.

Please remove the changes to kern_time.c in your tree and apply this diff instead -> https://leaf.dragonflybsd.org/~swildner/process_cputime2.diff

I've tested it and it seems to work here.

It will only fix CLOCK_PROCESS_CPUTIME_ID.

Actions

Copy link

Updated by zhtw about 9 years ago

Thank you. Just tried it. The result is still 0. But again, I'm testing in under a vkernel.
Did you test it on a real kernel?

Is there anything I could do wrong here?

Actions

Copy link

Updated by swildner about 9 years ago

Yeah I tested on a real kernel.

Actions

Copy link

Updated by zhtw about 9 years ago

This is interesting. When you asked me to doublecheck if clock_gettime works with CLOCK_THREAD_CPUTIME_ID (because you said it works for you), I actually tested it on real kernel. So it does work on real kernel.

But now I'm proceeding with testing ChezScheme (this is how I originally found the bug) on a vkernel, I found out again that it clock_gettime doesn't really work even with CLOCK_THREAD_CPUTIME_ID. It always returns 0 as I reported.

So this there is definitely a bug.

Also, when I proceeded testing ChezScheme, I found another bug, also in clock_gettime, but this time it's present in the real kernel: turned out clock_gettime can return nsec greater than one million:

a@kl:~/tmp$ cat t.c
#include <sys/time.h>
#include <stdio.h>

void busywait() {
for (int i = 0; i != 1000000; ++i)
for (int j = 0; j != 10; ++j)
;
}

int main() {
struct timespec ts;

for (int i = 0; i != 1000; ++i) {
    busywait();
    int err = clock_gettime(CLOCK_THREAD_CPUTIME_ID, &ts);
    if (err) {
      printf("err: %d\n", err);
      return 1;
    }
    if (ts.tv_nsec > 1000000000) {
      printf("fatal! nsec = %ld\n", ts.tv_nsec);
      return 1;
    }
  }
}

a@kl:~/tmp$ cc t.c
a@kl:~/tmp$ ./a.out
fatal! nsec = 1007424000

Should I create a separate ticket for this?

Actions

Copy link

#10

Updated by zhtw about 9 years ago

When I said million, I of course meant billion. But the example is correct anyway.
I am also looking at the code, but my experience with kernel programming is nearly non-existing.

Actions

Copy link

#11

Updated by swildner about 9 years ago

Yes, get_curthread_cputime() was wrongly adding so that tv_nsec could become >= 1000000000.

Please check https://leaf.dragonflybsd.org/~swildner/process_cputime3.diff on a real kernel. I've not yet looked at the vkernel issues.

This patch adds a fix for the overflow issue, using timespecadd() now. It also fixes clock_getres() for CLOCK_{PROCESS,THREAD}_CPUTIME_ID.

Actions

Copy link

#12

Updated by swildner about 9 years ago

Sorry, there was one more typo in the overflow fix, fixed in https://leaf.dragonflybsd.org/~swildner/process_cputime4.diff

Actions

Copy link

#13

Updated by zhtw about 9 years ago

Thanks!

It should definitely solve the problem with the nsec overflow. But turned out I was very lucky with this test, because can't reproduce the overflow anymore. But I still have this chezscheme test that fails.

In any case, I will be able to confirm that the bug is fixed only in a few days when I have access to the real machine.

But thanks for your help anyway! I appreciate it.

Actions

Copy link

#14