Project

General

Profile

Actions

Bug #3331

open

dsynth timeout and limits

Added by arcade@b1t.name about 2 years ago. Updated almost 2 years ago.

Status:
In Progress
Priority:
Normal
Assignee:
-
Category:
-
Target version:
Start date:
09/26/2022
Due date:
% Done:

0%

Estimated time:

Description

Hello.

Watching dsynth I often see that it wants to limit number of running jobs, but actually it just prevents new jobs starting. Also under huge load like building other packages sitting in swap jobs can timeout.

For example build reaches chromium while building something huge like firefox with almost no free mem. This means extracting chromium we would need to write a lot of data to swap while reading it from disk and other build also will be swapping a lot. For me it can take more then 15 minutes sometimes.

It would be really nice that this timeout could be configurable.

Also on the limits: it's really easy to suspend any job by `kill -17 -JOB_PID`, this will make all process group STOP for now. Later it can be resumed with -19.

Big thanks in advance, hope this can be interesting as an improvement.

Actions #1

Updated by tuxillo about 2 years ago

  • Status changed from New to In Progress

What is the timeout you're mentioning? Can you be more specific?

Actions #2

Updated by arcade@b1t.name about 2 years ago

There's a 15 minute timeout for a job activity, if there are no new lines in the log during that period job is dropped.

Actions #3

Updated by daftaupe almost 2 years ago

Hi,

after having tried to find some infos, I think there are several things to take into account in your case.

Different timeouts are defined in dsynth.h, being WDOG1 -> WDOG9, if you notice a timeout of precisely 15 minutes, first option is that it's due to the use of WDOG3 in one of the call to dophase. That's the case in phases extract_depends, extract, or configure.

Also the timeout can be scaled up if the average load of the last 15 minutes divided by your number of cores is greater to your number of cores, which could be sumed up as, if I'm not mistaken, if the load of the last 15 mins is superior to your number of cores being squared. In that case the new timeout is defined in the first else case in the following piece of code.

                        /*
                         * Watchdog scaling
                         */
                        getloadavg(dload, 3);
                        adjloadavg(dload);
                        dv = dload[2] / NumCores;
                        if (dv < (double)NumCores) {
                                wdog_scaled = wdog;
                        } else {
                                if (dv > 4.0 * NumCores)
                                        dv = 4.0 * NumCores;
                                wdog_scaled = wdog * dv / NumCores;
                        }

                        /*
                         * Watchdog
                         */
                        if (next_time - wdog_time >= wdog_scaled * 60) {
                                snprintf(buf, sizeof(buf),
                                         "\n--------\n" 
                                         "WATCHDOG TIMEOUT FOR %s in %s " 
                                         "after %d minutes\n" 
                                         "Killing pid %d\n" 
                                         "--------\n",
                                         pkg->portdir, phase, wdog_scaled, pid);
                                if (fdlog >= 0)
                                        write(fdlog, buf, strlen(buf));
                                dlog(DLOG_ALL,
                                     "[%03d] %s WATCHDOG TIMEOUT in %s " 
                                     "after %d minutes (%d min scaled)\n",
                                     work->index, pkg->portdir, phase,
                                     wdog, wdog_scaled);
                                kill(pid, SIGKILL);
                                ++work->accum_error;
                                break;
                        }

But I think that if you see some exact number of minutes in the log, it means it's using the default defined value of WDOG3 if it's during extract phase.

Maybe you could try to change the value of WDOG3 to something bigger, see if that allows you to go past this step for chromium ? You would need to recompile dsynth for that.

Actions

Also available in: Atom PDF