Bug #745: the evil interrupt stats monster is still around! - DragonFlyBSD - DragonFlyBSD bugtracker

Actions

Copy link

Bug #745

closed

the evil interrupt stats monster is still around!

Added by corecode over 18 years ago. Updated over 3 years ago.

Status:

Closed

Priority:

Normal

Assignee:

tuxillo

Category:

Unverifiable

Target version:

6.4

Start date:

Due date:

% Done:

Estimated time:

Description

hey,

while doing the pkgsrc bulk build, I get weird numbers in top. I am pretty sure this has something to do with the multitude of shell scripts being used as wrappers in pkgsrc. command execution is also slower than I'd expect (for example wrapper execution).

load averages: 8.38, 7.96, 5.53 up 0+03:48:32 22:34:35
151 processes: 151 running
CPU0 states: 13.7% user, 0.0% nice, 27.5% system, 0.0% interrupt, 58.8% idle
CPU1 states: 21.4% user, 0.0% nice, 28.3% system, 0.0% interrupt, 50.4% idle
CPU2 states: 22.1% user, 0.0% nice, 22.9% system, 0.0% interrupt, 55.0% idle
CPU3 states: 34.7% user, 0.0% nice, 11.1% system, 42.0% interrupt, 12.2% idle
Mem: 71M Active, 509M Inact, 268M Wired, 199M Buf, 2665M Free
Swap:

PID USERNAME PRI NICE  SIZE    RES STATE  C   TIME   WCPU    CPU COMMAND
36155 root     227   0  3060K  2736K wait   2   0:00 36.82%  8.15% bmake
37456 root     229   0   828K   704K RUN    1   0:00 16.14%  2.93% sh
37677 root     197   0  1576K  1216K wait   2   0:00 14.53%  2.64% bmake
40985 root     182   0   824K   700K RUN    3   0:00 40.00%  1.95% sh
41135 root     193   0   820K   696K wait   1   0:00 39.00%  1.90% sh
37807 root     172   0   828K   704K wait   0   0:00  9.69%  1.76% sh
37310 root     157   0   792K   668K wait   0   0:00  9.15%  1.66% sh
37961 root     210   0  1600K  1240K wait   0   0:00 10.85%  1.51% bmake
40898 root     157   0   784K   660K wait   2   0:00 30.00%  1.46% sh
37563 root     227   0  1464K  1104K wait   1   0:00  5.92%  1.07% bmake
37391 root     168   0   788K   664K wait   0   0:00  4.84%  0.88% sh
37316 root     162   0  1668K  1296K wait   0   0:00  4.57%  0.83% bmake
40213 root     182   0  1216K  1092K wait   1   0:00 16.00%  0.78% sh
39756 root     229   0   824K   700K wait   0   0:00  2.05%  0.20% sh
23163 root     163   0  1444K  1084K wait   2   0:00  0.26%  0.15% bmake
  679 corecode 152   0  5948K  3116K select 0   0:05  0.00%  0.00% sshd
87928 corecode 152   0  2340K  1632K RUN    0   0:03  0.00%  0.00% top
 5266 root     152   0  6548K  6120K kqread 0   0:02  0.00%  0.00% pbulk-build
 2346 root     152   0  2384K  2052K select 2   0:01  0.00%  0.00% screen-4.0.3
  672 corecode 152   0  5948K  3112K select 2   0:01  0.00%  0.00% sshd
 6357 corecode 152   0  5948K  3124K select 0   0:00  0.00%  0.00% sshd
83074 root     157   0  3236K  2908K wait   0   0:00  0.00%  0.00% bmake
61160 root     152   0  2340K  1632K CPU0   0   0:00  0.00%  0.00% top
84410 root     192   0  3152K  2828K wait   0   0:00  0.00%  0.00% bmake
60999 root     157   0  3388K  3072K wait   1   0:00  0.00%  0.00% bmake

that's what systat -vm tells:

11 users    Load  9.22  8.60  6.50                  Jul 26 22:39

Mem:KB REAL VIRTUAL VN PAGER SWAP PAGER
Tot Share Tot Share Free in out in out
Act 28560 5260 67456 9252 2717488 count
All 880132 8524 1793432 13480 pages
13651 zfod Interrupts
Proc:r p d s w Csw Trp Sys Int Sof Flt 16467 cow 556 total
10 45 727964259438586 475 6638986 284684 wire 429 clk
70624 act 13 uhci0
21.6%Sys 12.8%Intr 30.8%User 0.0%Nice 34.8%Idl 524220 inact 69 atapci1 | | | | | | | | | | cache 11 em0
===========++++++>>>>>>>>>>>>>>>> 2717552 free ata0
daefr irq19
Namei Name-cache Dir-cache 21446 prcfr 34 irq21
Calls hits % hits % react
81740 79364 97 189 0 pdwake
pdpgs
Disks ad4 ad6 acd0 cd0 pass0 md0 intrn
KB/t 0.00 36.23 0.00 0.00 0.00 0.00 204096 buf
tps 0 34 0 0 0 0 1533 dirtybuf
MB/s 0.00 1.21 0.00 0.00 0.00 0.00 231015 desiredvnodes
% busy 0 3 0 0 0 0 81881 numvnodes
1602 freevnodes

cheers
simon

Actions

Copy link

Updated by corecode over 19 years ago

Okay, joerg and me could speed pkgsrc building up a bit by telling pkgsrc to use the builtin for test and [.

However, look at this:

avg fork parent: 0.000148
avg fork child: 0.004091

(rough rough numbers, will be off if there are multiple processes running concurrently)

5ms seems to be a good estimate for fork performance. but mind you, that's the child which is taking longer. This means the parent returns first, then starts wait4'ing (usually, or blocks on a read or so), then some time passes and then later the child returns from fork. Seems wrong to me.

so much statistics for now, cheers
simon

Actions

Copy link

Updated by corecode over 19 years ago

Okay, less rough numbers:

122991 104/s fork

avg fork parent: 0.000157
avg fork child: 0.002563

so if we assume that the parent always waits for the child, it'll take 0.267 seconds per second (in average) to fork? That means 1/4 of the time is just spent until the child start running.

the above numbers include vfork, these are without:

avg fork parent: 0.000144
avg fork child: 0.002556

same, basically. Maybe teaching sh(1) to use vfork could give some performance improvement?

cheers
simon

Actions

Copy link

Updated by dillon over 19 years ago

:Okay, less rough numbers:
:
:122991 104/s fork
:
:avg fork parent: 0.000157
:avg fork child: 0.002563
:
:so if we assume that the parent always waits for the child, it'll take 0.267 seconds per second (in average) to fork? That means 1/4 of the time is just spent until the child start running.
:
:the above numbers include vfork, these are without:
:
:avg fork parent: 0.000144
:avg fork child: 0.002556
:
:same, basically. Maybe teaching sh(1) to use vfork could give some performance improvement?
:
:cheers
: simon

There is always something running somewhere when you fork.  Either the
    child is running or the parent is running.  Either way both have to run
    to their completion before the next program can be execed when running
    a shell script since the script serializes everything.

It tends to be more efficient to allow the parent to continue to run
    (that is, to return from the fork) because there are fewer context
    switches.

parent        child
    ----------        -----------
    fork()        -
    wait        -
    -            run
    -            run
    -            run
    -            run
    -            exit
    waitreturn

parent        child
    ----------        -----------
    fork()        -
    -            run
    -            run
    -            run
    wait        -        <<< 2 extra context switches
    -            run
    -            run
    -            exit
    waitreturn

I would rather not try to optimize programs to use vfork().  There
    just isn't a whole lot of difference between fork and vfork.

Most of the time spent by the child is not waiting for the parent
    to switch it in, but instead is being spent in execl().  It may be
    possible to speed up execution by making certain dynamic binaries 
    resident (with the 'resident' command).  Everything in /bin is
    statically linked alredy, though, and already execs as fast as it is
    possible to exec a program.  But perhaps pkgsrc is using certain
    binaries internally that are dynamically linked.  I suggest taking
    a good look at what pkgsrc runs.

There is a fork test program in /usr/src/test/sysperf/fork1.c.
    (make /tmp/fork1 from /usr/src/test/sysperf).  I get around 
    154 uS for a complete fork/exit/wait sequence on my test box.
    A time of 2 mS indicates that something else is going on.. either
    other processes are running or whatever you are execing is taking
    a long time to initialize (which would not be surprising if we are
    talking about /bin/sh here).

The GNU configure script crap is just that... crap.  In a typical
    pkgsrc build that script will probably be run thousands of times,
    doing the same tests with slow shell scripts over and over and over
    and over and over and over again.  It's really the stupidest gnu thing
    I've ever seen in my life.  It was stupid a decade ago, and its still
    stupid today.

-Matt

Actions

Copy link

Updated by corecode over 19 years ago

No, my point was that the child takes some time to even return from the fork, despite the parent already wait()ing. There is some gap involved where the system seemingly just schedules somebody else instead or so.

Ah, I see the point. This however does not explain why the child takes some time to be scheduled.

I think our vfork is not completely optimized. We could for instance make vfork not COW the vmspace, because the user is only allowed to call _exit and exec(). I don't think we are doing this right now.

That's actually not true:

Total runtime 1171.642231 sec

122991 104/s 0.002616 321.756458 fork_child
122991 104/s 0.000148 18.181783 fork_parent
112576 96/s 0.000235 26.489667 execve
239985 204/s 0.000080 19.174908 open
(etc)

total exec time is not very high.

I'm not talking about fork/exec performance. This is from a build kdump, taking the CALL and RET times and calculating the duration of the syscall.

cheers
simon

Actions

Copy link

Updated by dillon over 19 years ago

Those numbers don't make any sense. What program are you using to
try to measure the child fork return time ?

-Matt

Actions

Copy link

Updated by corecode over 19 years ago

ktrace. These are real numbers, not a microbenchmark. There might be a little bit of load interfering, dunno. You can grab the dump from <http://chlamydia.fs.ei.tum.de/~corecode/unsorted/kdump.out.bz2>

cheers
simon

Actions

Copy link

Updated by dillon over 19 years ago

:ktrace. These are real numbers, not a microbenchmark. There might be a little bit of load interfering, dunno. You can grab the dump from <http://chlamydia.fs.ei.tum.de/~corecode/unsorted/kdump.out.bz2>
:
:cheers
: simon

Use kdump -R !!!

In anycase, it does look odd.  But I can't reproduce the effect
    when I ktrace a similar bmake from pkgsrc and there are cases 
    in your dump where you don't get large timeouts:

66255 sh 1185522939.738309 CALL fork
66255 sh 1185522939.738419 RET fork 66264/0x102d8
66255 sh 1185522939.738665 CALL getpgrp
66255 sh 1185522939.738676 RET getpgrp 5268/0x1494
66255 sh 1185522939.738730 CALL wait4(0xffffffff,0xbfbff318,0x2,0)
66264 sh 1185522939.738745 RET fork 0
<----------- 15 uS (326 uS from original
fork)
66264 sh 1185522939.739292 CALL open(0x280de1c8,0x601,0x1b6)
66264 sh 1185522939.739306 NAMI "/dev/null"
66264 sh 1185522939.739346 RET open 5

What it looks like is happening is that there are other processes
    running that get the cpu after the fork/wait4, before the child
    gets the cpu.  Insofar as I can tell it looks like there is a cpu-bound
    program running that is simply not making any system calls to be traced,
    or is not being traced.

Are you running anything in parallel during this test or running
    a parallel build of some sort?

-Matt

Actions

Copy link

Updated by corecode over 19 years ago

heh. well, the absolute ones help me process the time automatically.

Yah, quite possible. However, why'd that only happen for for fork and not for other possibly blocking syscalls (or vfork)? I think we need some better, performance-counter based tracing tools, rather than plain kdump output.

I don't remember anymore, probably there was a build going on in parallel. But again, I'm not trying to benchmark, I'm trying to find a cause for the slow build speed.

cheers
simon

Actions

Copy link

Updated by dillon over 19 years ago

:Yah, quite possible. However, why'd that only happen for for fork and not for other possibly blocking syscalls (or vfork)? I think we need some better, performance-counter based tracing tools, rather than plain kdump output.

There really isn't a whole lot of difference between fork and vfork,
    but I noticed at the end of your dump that both fork and vfork had
    much shorter delays then what I saw at the beginning of your dump.

:> Are you running anything in parallel during this test or running
:> a parallel build of some sort?
:
:I don't remember anymore, probably there was a build going on in parallel. But again, I'm not trying to benchmark, I'm trying to find a cause for the slow build speed.
:
:cheers
: simon

Its called a script titled 'gnu configure'.  Do a systat -vm 1 and look
    at the name cache hit stats.  Everything bmake, gcc, and especially the
    gnu configure script do is extremely filesystem intensive.  And in the
    case of gnu configure, also mostly redundant.

-Matt

Actions

Copy link

#10

Updated by joerg over 19 years ago

When NetBSD implemented a real vfork back around NetBSD 1.4 or so, it
made difference in the percent range for a full buildworld. Basically,
you can avoid the duplication of the VM space and the COW faulting,
which sums up.

While a lot of bad things can be said about autoconf, it was the only
usable, semi-generic solution for over a decade. There is still only
one good alternative (and that is using C++...). The biggest complain
most maintainers on non-default (from the upstream POV) platform had in
all that time was that it is hard to use correctly. E.g. many tests can
be done a lot more intelligently than most configure scripts do. Given
that autoconf follows the "stand-alone" paradigm (as opposed to imake
for some parts of the problem domain), it actually implements it quite
well.

Joerg

Actions

Copy link

#11

Updated by tuxillo over 19 years ago

Description updated (diff)
Status changed from New to In Progress
Assignee deleted (0)

Hi all,

I've not seen that much intr% time during bulk builds, at least when I did some a couple months ago.
ftigeot/marino/dillon, can you please confirm this doesn't happen during poudriere builds now?

Thanks,
Antonio Huete

Actions

Copy link

#12

Updated by tuxillo over 19 years ago

Description updated (diff)
Category set to Other
Assignee set to tuxillo
Target version set to 3.8

Grab.

Actions

Copy link

#13

Updated by tuxillo over 19 years ago

Category changed from Other to Unverifiable
Status changed from In Progress to Feedback
Target version changed from 3.8 to Unverifiable

Hi,

Moving to Unverifiable because:
- Many VM related work done to x64 at least.
- vfork/exec work was done by Matt in the end of 2013.
- We don't do pkgsrc bulkbuilds anymore but we do poudriere builds which are very fork-heavy and as far as I know this issue hasn't showed up.

Feel free to resurrect/provide feedback.

Cheers,
Antonio Huete

Actions

Copy link

#14

Updated by tuxillo over 3 years ago

Status changed from Feedback to Closed
Target version changed from Unverifiable to 6.4

Over a decade marked as Unverifiable.

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

DragonFlyBSD

Bug #745

the evil interrupt stats monster is still around!

Updated by corecode over 19 years ago

Updated by corecode over 19 years ago

Updated by dillon over 19 years ago

Updated by corecode over 19 years ago

Updated by dillon over 19 years ago

Updated by corecode over 19 years ago

Updated by dillon over 19 years ago

Updated by corecode over 19 years ago

Updated by dillon over 19 years ago

Updated by joerg over 19 years ago

Updated by tuxillo over 19 years ago

Updated by tuxillo over 19 years ago

Updated by tuxillo over 19 years ago

Updated by tuxillo over 3 years ago