Bug #2951
closedclock_gettime always repots 0 sec, 0 nsec
Added by zhtw about 8 years ago. Updated over 5 years ago.
0%
Description
When called with CLOCK_PROF, clock_gettime always returns 0 sec, 0 nsec:
$ uname -a
DragonFly kl.zta.lk 4.6-RELEASE DragonFly v4.6.0.10.g16fba-RELEASE #10: Wed Aug 17 14:26:31 CEST 2016 root@kl.zta.lk:/usr/obj/usr/src/sys/X86_64_GENERIC x86_64
$ cat time.c
#include <sys/time.h>
#include <stdio.h>
int main()
{
struct timespec ts;
struct timespec *tp = &ts;
for (int i = 0; i != 1000000; ++i)
for (int j = 0; j != 1000; ++j)
;
int err = clock_gettime(CLOCK_PROF, tp);
printf("err: %d\n", err);
printf("sec: %ld\n", ts.tv_sec);
printf("nsec: %ld\n", ts.tv_nsec);
}
$ cc -pedantic -Wall time.c
$ ./a.out
err: 0
sec: 0
nsec: 0
The same program on a FreeBSD machine seems to work:
$ uname -a
FreeBSD fbsd 10.1-RELEASE FreeBSD 10.1-RELEASE #0 r274401: Tue Nov 11 21:02:49 UTC 2014 root@releng1.nyi.freebsd.org:/usr/obj/usr/src/sys/GENERIC amd64
$ cc -pedantic -Wall time.c
$ ./a.out
err: 0
sec: 1
nsec: 604478000
Updated by zhtw about 8 years ago
I just noticed that in the header sys/time.h there is a whole lot of CLOCK_* constants. Some of them are FreeBSD-specific, some Linux-specific. So technically, but since (at least) CLOCK_PROF is mentioned in the man page, it should work.
Btw, I checked CLOCK_THREAD_CPUTIME_ID & CLOCK_PROCESS_CPUTIME_ID (which I actually need) and they don't work either, but it's not mentioned in the man page, so technically not a bug.
Updated by swildner about 8 years ago
CLOCK_THREAD_CPUTIME_ID isn't broken here, as far as I can see. Are you sure it returns 0 for you?
CLOCK_PROF and CLOCK_VIRTUAL always return 0 here. Not sure what's the cause.
Please check this patch for CLOCK_PROCESS_CPUTIME_ID -> https://leaf.dragonflybsd.org/~swildner/process_cputime.diff
Not sure if that's the best fix.
Updated by zhtw about 8 years ago
swildner wrote:
CLOCK_THREAD_CPUTIME_ID isn't broken here, as far as I can see. Are you sure it returns 0 for you?
I'm sorry, didn't check it thoroughly. It indeed works with CLOCK_THREAD_CPUTIME_ID correctly.
CLOCK_PROF and CLOCK_VIRTUAL always return 0 here. Not sure what's the cause.
Please check this patch for CLOCK_PROCESS_CPUTIME_ID -> https://leaf.dragonflybsd.org/~swildner/process_cputime.diff
Not sure if that's the best fix.
Thank you for the patch.
I was unable to use it as is because the kernel didn't compile: there is no field "ru" in the proc struct. (I tried 4.4 & 4.6 kernels, see below why). But after I renamed ru to p_ru it did compile.
Unfortunately, I can't test it right now, because I only have remote access to both my machines (I'm on a conference now and will return only in 2 weeks). I'm trying to test it with vkernel, but for some reason it doesn't start (with or without your patch).
The message I see is:
a@hk:~/tmp$ sudo /var/vkernel/4.4/boot/kernel/kernel -m 2g -r root.img -d -p /var/tmp/vkernel.vhost-dev.pid
Using memory file: /var/vkernel/memimg.000000
KVM mapped at 0x8000000000-0x10000000000
Copyright (c) 2003-2015 The DragonFly Project.
Copyright (c) 1992-2003 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
The Regents of the University of California. All rights reserved.
DragonFly v4.4.3.10.gfcff0-RELEASE #2: Wed Sep 21 14:47:58 CEST 2016
root@hk.zta.lk:/usr/obj/usr/src/sys/VKERNEL64
real memory = 2147483648 (2097152K bytes)
avail memory = 2044215296 (1996304K bytes)
Fatal trap 12: page fault while in kernel mode
cpuid = 0
fault virtual address = 0x9
fault code = supervisor read, page not present
instruction pointer = 0x2b:0x6ea789
stack pointer = 0x10:0x7fffffffe9f0
frame pointer = 0x10:0x1
processor eflags = interrupt enabled, resume, IOPL = 0
current process = 0 ()
current thread = pri 12
<- SMP: XXX
kernel: type 12 trap, code=0
CPU0 stopping CPUs: 0x0000000000000000
stopped
Stopped at 0x6ea789: movq 0x8(%rbp),%rdi
db>
I'm trying 4.4 because I was never able to run 4.6 vkernel. 4.4 used to work, but I forgot which revision I compiled.
I will file a separate bug report for this.
Once I have physical access to my desktop I will try your patch again on the real kernel.
Updated by zhtw about 8 years ago
I was able to run 4.4 vkernel with your patch (but with ru replaced with p_ru).
Unfortunately it did not help, I still see 0 nsec both in case of CLOCK_PROCESS_CPUTIME_ID and CLOCK_PROF.
(Still can't run 4.6 vkernel.)
Updated by zhtw about 8 years ago
Antonio Huete (tuxillo) just gave me the fix for 4.6 vkernel.
Still no luck, CLOCK_PROCESS_CPUTIME_ID returns 0.
Updated by swildner about 8 years ago
Yes, my bad, sorry. I messed up the patch with some last minute changes which I didn't test properly.
Please remove the changes to kern_time.c in your tree and apply this diff instead -> https://leaf.dragonflybsd.org/~swildner/process_cputime2.diff
I've tested it and it seems to work here.
It will only fix CLOCK_PROCESS_CPUTIME_ID.
Updated by zhtw about 8 years ago
Thank you. Just tried it. The result is still 0. But again, I'm testing in under a vkernel.
Did you test it on a real kernel?
Is there anything I could do wrong here?
Updated by zhtw about 8 years ago
This is interesting. When you asked me to doublecheck if clock_gettime works with CLOCK_THREAD_CPUTIME_ID (because you said it works for you), I actually tested it on real kernel. So it does work on real kernel.
But now I'm proceeding with testing ChezScheme (this is how I originally found the bug) on a vkernel, I found out again that it clock_gettime doesn't really work even with CLOCK_THREAD_CPUTIME_ID. It always returns 0 as I reported.
So this there is definitely a bug.
Also, when I proceeded testing ChezScheme, I found another bug, also in clock_gettime, but this time it's present in the real kernel: turned out clock_gettime can return nsec greater than one million:
a@kl:~/tmp$ cat t.c
#include <sys/time.h>
#include <stdio.h>
void busywait()
{
for (int i = 0; i != 1000000; ++i)
for (int j = 0; j != 10; ++j)
;
}
int main()
{
struct timespec ts;
for (int i = 0; i != 1000; ++i) {
busywait();
int err = clock_gettime(CLOCK_THREAD_CPUTIME_ID, &ts);
if (err) {
printf("err: %d\n", err);
return 1;
}
if (ts.tv_nsec > 1000000000) {
printf("fatal! nsec = %ld\n", ts.tv_nsec);
return 1;
}
}
}
a@kl:~/tmp$ cc t.c
a@kl:~/tmp$ ./a.out
fatal! nsec = 1007424000
Should I create a separate ticket for this?
Updated by zhtw about 8 years ago
When I said million, I of course meant billion. But the example is correct anyway.
I am also looking at the code, but my experience with kernel programming is nearly non-existing.
Updated by swildner about 8 years ago
Yes, get_curthread_cputime() was wrongly adding so that tv_nsec could become >= 1000000000.
Please check https://leaf.dragonflybsd.org/~swildner/process_cputime3.diff on a real kernel. I've not yet looked at the vkernel issues.
This patch adds a fix for the overflow issue, using timespecadd() now. It also fixes clock_getres() for CLOCK_{PROCESS,THREAD}_CPUTIME_ID.
Updated by swildner about 8 years ago
Sorry, there was one more typo in the overflow fix, fixed in https://leaf.dragonflybsd.org/~swildner/process_cputime4.diff
Updated by zhtw about 8 years ago
Thanks!
It should definitely solve the problem with the nsec overflow. But turned out I was very lucky with this test, because can't reproduce the overflow anymore. But I still have this chezscheme test that fails.
In any case, I will be able to confirm that the bug is fixed only in a few days when I have access to the real machine.
But thanks for your help anyway! I appreciate it.
Updated by swildner about 8 years ago
I've pushed a fix to master fixing these issues on real kernels.
See https://gitweb.dragonflybsd.org/dragonfly.git/commitdiff/c1b9c0932e482bb6a37eb190a5b504b270d2166d
Why it doesn't work for the vkernel, I haven't had much time investigating.
Can you try the chezscheme tests with it (if you need it on the release branch, you can cherry-pick c1b9c0932e482bb6a37eb190a5b504b270d2166d)?
Updated by zhtw about 8 years ago
Checked on my real machine using 4.6 kernel with your patch. This specific test passes now.
Updated by liweitianux over 5 years ago
- Status changed from New to Resolved
This issue has been fixed in 2016.