Bug #3331
opendsynth timeout and limits
0%
Description
Hello.
Watching dsynth I often see that it wants to limit number of running jobs, but actually it just prevents new jobs starting. Also under huge load like building other packages sitting in swap jobs can timeout.
For example build reaches chromium while building something huge like firefox with almost no free mem. This means extracting chromium we would need to write a lot of data to swap while reading it from disk and other build also will be swapping a lot. For me it can take more then 15 minutes sometimes.
It would be really nice that this timeout could be configurable.
Also on the limits: it's really easy to suspend any job by `kill -17 -JOB_PID`, this will make all process group STOP for now. Later it can be resumed with -19.
Big thanks in advance, hope this can be interesting as an improvement.
Updated by tuxillo about 2 years ago
- Status changed from New to In Progress
What is the timeout you're mentioning? Can you be more specific?
Updated by arcade@b1t.name about 2 years ago
There's a 15 minute timeout for a job activity, if there are no new lines in the log during that period job is dropped.
Updated by daftaupe about 2 years ago
Hi,
after having tried to find some infos, I think there are several things to take into account in your case.
Different timeouts are defined in dsynth.h, being WDOG1 -> WDOG9, if you notice a timeout of precisely 15 minutes, first option is that it's due to the use of WDOG3 in one of the call to dophase. That's the case in phases extract_depends, extract, or configure.
Also the timeout can be scaled up if the average load of the last 15 minutes divided by your number of cores is greater to your number of cores, which could be sumed up as, if I'm not mistaken, if the load of the last 15 mins is superior to your number of cores being squared. In that case the new timeout is defined in the first else case in the following piece of code.
/*
* Watchdog scaling
*/
getloadavg(dload, 3);
adjloadavg(dload);
dv = dload[2] / NumCores;
if (dv < (double)NumCores) {
wdog_scaled = wdog;
} else {
if (dv > 4.0 * NumCores)
dv = 4.0 * NumCores;
wdog_scaled = wdog * dv / NumCores;
}
/*
* Watchdog
*/
if (next_time - wdog_time >= wdog_scaled * 60) {
snprintf(buf, sizeof(buf),
"\n--------\n"
"WATCHDOG TIMEOUT FOR %s in %s "
"after %d minutes\n"
"Killing pid %d\n"
"--------\n",
pkg->portdir, phase, wdog_scaled, pid);
if (fdlog >= 0)
write(fdlog, buf, strlen(buf));
dlog(DLOG_ALL,
"[%03d] %s WATCHDOG TIMEOUT in %s "
"after %d minutes (%d min scaled)\n",
work->index, pkg->portdir, phase,
wdog, wdog_scaled);
kill(pid, SIGKILL);
++work->accum_error;
break;
}
But I think that if you see some exact number of minutes in the log, it means it's using the default defined value of WDOG3 if it's during extract phase.
Maybe you could try to change the value of WDOG3 to something bigger, see if that allows you to go past this step for chromium ? You would need to recompile dsynth for that.