Project

General

Profile

Bug #2951

clock_gettime always repots 0 sec, 0 nsec

Added by zhtw 7 months ago. Updated 7 months ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
Start date:
09/16/2016
Due date:
% Done:

0%


Description

When called with CLOCK_PROF, clock_gettime always returns 0 sec, 0 nsec:

$ uname -a
DragonFly kl.zta.lk 4.6-RELEASE DragonFly v4.6.0.10.g16fba-RELEASE #10: Wed Aug 17 14:26:31 CEST 2016 :/usr/obj/usr/src/sys/X86_64_GENERIC x86_64
$ cat time.c
#include <sys/time.h>
#include <stdio.h>

int main()
{
struct timespec ts;
struct timespec *tp = &ts;

for (int i = 0; i != 1000000; ++i)
for (int j = 0; j != 1000; ++j)
;

int err = clock_gettime(CLOCK_PROF, tp);

printf("err: %d\n", err);

printf("sec: %ld\n", ts.tv_sec);
printf("nsec: %ld\n", ts.tv_nsec);
}
$ cc -pedantic -Wall time.c
$ ./a.out
err: 0
sec: 0
nsec: 0

The same program on a FreeBSD machine seems to work:
$ uname -a
FreeBSD fbsd 10.1-RELEASE FreeBSD 10.1-RELEASE #0 r274401: Tue Nov 11 21:02:49 UTC 2014 :/usr/obj/usr/src/sys/GENERIC amd64
$ cc -pedantic -Wall time.c
$ ./a.out
err: 0
sec: 1
nsec: 604478000

History

#1 Updated by zhtw 7 months ago

I just noticed that in the header sys/time.h there is a whole lot of CLOCK_* constants. Some of them are FreeBSD-specific, some Linux-specific. So technically, but since (at least) CLOCK_PROF is mentioned in the man page, it should work.

Btw, I checked CLOCK_THREAD_CPUTIME_ID & CLOCK_PROCESS_CPUTIME_ID (which I actually need) and they don't work either, but it's not mentioned in the man page, so technically not a bug.

#2 Updated by swildner 7 months ago

CLOCK_THREAD_CPUTIME_ID isn't broken here, as far as I can see. Are you sure it returns 0 for you?

CLOCK_PROF and CLOCK_VIRTUAL always return 0 here. Not sure what's the cause.

Please check this patch for CLOCK_PROCESS_CPUTIME_ID -> https://leaf.dragonflybsd.org/~swildner/process_cputime.diff

Not sure if that's the best fix.

#3 Updated by zhtw 7 months ago

swildner wrote:
> CLOCK_THREAD_CPUTIME_ID isn't broken here, as far as I can see. Are you sure it returns 0 for you?

I'm sorry, didn't check it thoroughly. It indeed works with CLOCK_THREAD_CPUTIME_ID correctly.

> CLOCK_PROF and CLOCK_VIRTUAL always return 0 here. Not sure what's the cause.
>
> Please check this patch for CLOCK_PROCESS_CPUTIME_ID -> https://leaf.dragonflybsd.org/~swildner/process_cputime.diff
>
> Not sure if that's the best fix.

Thank you for the patch.

I was unable to use it as is because the kernel didn't compile: there is no field "ru" in the proc struct. (I tried 4.4 & 4.6 kernels, see below why). But after I renamed ru to p_ru it did compile.

Unfortunately, I can't test it right now, because I only have remote access to both my machines (I'm on a conference now and will return only in 2 weeks). I'm trying to test it with vkernel, but for some reason it doesn't start (with or without your patch).

The message I see is:

a@hk:~/tmp$ sudo /var/vkernel/4.4/boot/kernel/kernel -m 2g -r root.img -d -p /var/tmp/vkernel.vhost-dev.pid
Using memory file: /var/vkernel/memimg.000000
KVM mapped at 0x8000000000-0x10000000000
Copyright (c) 2003-2015 The DragonFly Project.
Copyright (c) 1992-2003 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
The Regents of the University of California. All rights reserved.
DragonFly v4.4.3.10.gfcff0-RELEASE #2: Wed Sep 21 14:47:58 CEST 2016
:/usr/obj/usr/src/sys/VKERNEL64
real memory = 2147483648 (2097152K bytes)
avail memory = 2044215296 (1996304K bytes)

Fatal trap 12: page fault while in kernel mode
cpuid = 0
fault virtual address = 0x9
fault code = supervisor read, page not present
instruction pointer = 0x2b:0x6ea789
stack pointer = 0x10:0x7fffffffe9f0
frame pointer = 0x10:0x1
processor eflags = interrupt enabled, resume, IOPL = 0
current process = 0 ()
current thread = pri 12
<- SMP: XXX
kernel: type 12 trap, code=0

CPU0 stopping CPUs: 0x0000000000000000
stopped
Stopped at 0x6ea789: movq 0x8(%rbp),%rdi
db>

I'm trying 4.4 because I was never able to run 4.6 vkernel. 4.4 used to work, but I forgot which revision I compiled.

I will file a separate bug report for this.

Once I have physical access to my desktop I will try your patch again on the real kernel.

#4 Updated by zhtw 7 months ago

I was able to run 4.4 vkernel with your patch (but with ru replaced with p_ru).
Unfortunately it did not help, I still see 0 nsec both in case of CLOCK_PROCESS_CPUTIME_ID and CLOCK_PROF.

(Still can't run 4.6 vkernel.)

#5 Updated by zhtw 7 months ago

Antonio Huete (tuxillo) just gave me the fix for 4.6 vkernel.
Still no luck, CLOCK_PROCESS_CPUTIME_ID returns 0.

#6 Updated by swildner 7 months ago

Yes, my bad, sorry. I messed up the patch with some last minute changes which I didn't test properly.

Please remove the changes to kern_time.c in your tree and apply this diff instead -> https://leaf.dragonflybsd.org/~swildner/process_cputime2.diff

I've tested it and it seems to work here.

It will only fix CLOCK_PROCESS_CPUTIME_ID.

#7 Updated by zhtw 7 months ago

Thank you. Just tried it. The result is still 0. But again, I'm testing in under a vkernel.
Did you test it on a real kernel?

Is there anything I could do wrong here?

#8 Updated by swildner 7 months ago

Yeah I tested on a real kernel.

#9 Updated by zhtw 7 months ago

This is interesting. When you asked me to doublecheck if clock_gettime works with CLOCK_THREAD_CPUTIME_ID (because you said it works for you), I actually tested it on real kernel. So it does work on real kernel.

But now I'm proceeding with testing ChezScheme (this is how I originally found the bug) on a vkernel, I found out again that it clock_gettime doesn't really work even with CLOCK_THREAD_CPUTIME_ID. It always returns 0 as I reported.

So this there is definitely a bug.

Also, when I proceeded testing ChezScheme, I found another bug, also in clock_gettime, but this time it's present in the real kernel: turned out clock_gettime can return nsec greater than one million:

a@kl:~/tmp$ cat t.c
#include <sys/time.h>
#include <stdio.h>

void busywait()
{
for (int i = 0; i != 1000000; ++i)
for (int j = 0; j != 10; ++j)
;
}

int main()
{
struct timespec ts;

for (int i = 0; i != 1000; ++i) {
busywait();
int err = clock_gettime(CLOCK_THREAD_CPUTIME_ID, &ts);
if (err) {
printf("err: %d\n", err);
return 1;
}
if (ts.tv_nsec > 1000000000) {
printf("fatal! nsec = %ld\n", ts.tv_nsec);
return 1;
}
}
}

a@kl:~/tmp$ cc t.c
a@kl:~/tmp$ ./a.out
fatal! nsec = 1007424000

Should I create a separate ticket for this?

#10 Updated by zhtw 7 months ago

When I said million, I of course meant billion. But the example is correct anyway.
I am also looking at the code, but my experience with kernel programming is nearly non-existing.

#11 Updated by swildner 7 months ago

Yes, get_curthread_cputime() was wrongly adding so that tv_nsec could become >= 1000000000.

Please check https://leaf.dragonflybsd.org/~swildner/process_cputime3.diff on a real kernel. I've not yet looked at the vkernel issues.

This patch adds a fix for the overflow issue, using timespecadd() now. It also fixes clock_getres() for CLOCK_{PROCESS,THREAD}_CPUTIME_ID.

#12 Updated by swildner 7 months ago

Sorry, there was one more typo in the overflow fix, fixed in https://leaf.dragonflybsd.org/~swildner/process_cputime4.diff

#13 Updated by zhtw 7 months ago

Thanks!

It should definitely solve the problem with the nsec overflow. But turned out I was very lucky with this test, because can't reproduce the overflow anymore. But I still have this chezscheme test that fails.

In any case, I will be able to confirm that the bug is fixed only in a few days when I have access to the real machine.

But thanks for your help anyway! I appreciate it.

#14 Updated by swildner 7 months ago

I've pushed a fix to master fixing these issues on real kernels.

See https://gitweb.dragonflybsd.org/dragonfly.git/commitdiff/c1b9c0932e482bb6a37eb190a5b504b270d2166d

Why it doesn't work for the vkernel, I haven't had much time investigating.

Can you try the chezscheme tests with it (if you need it on the release branch, you can cherry-pick c1b9c0932e482bb6a37eb190a5b504b270d2166d)?

#15 Updated by zhtw 7 months ago

Checked on my real machine using 4.6 kernel with your patch. This specific test passes now.

Also available in: Atom PDF