https://bugs.dragonflybsd.org/https://bugs.dragonflybsd.org/favicon.ico?16293952082005-12-27T03:20:10ZDragonFlyBSD bugtrackerDragonFlyBSD - Bug #24: waitpid() with WUNTRACED flag? (was Re: Hang on ctrl+Z after the MPSAFE tsleep/wakeup commit)https://bugs.dragonflybsd.org/issues/24?journal_id=912005-12-27T03:20:10Zdillon
<ul></ul><p>:While this has been fixed, I realized that I overlooked another<br />:similar problem; try running vipw, suspend it with ctrl-Z and<br />:continue with `fg'; the shell immediately reports that vipw has been<br />:suspended(processes are marked as `TL' by ps command); another `fg'<br />:seems to continue it, but vi(or whatever editor invoked by vipw)<br />:doesn't handle the screen properly, even though it's responding to<br />:the keystrokes.<br />:Attached is a small code fragment that demonstrates the problem.<br />:BTW, it behaves differently on -CURRENT and 1.2.6-RELEASE;<br />:on -CURRENT, waitpid() returns after `fg' command followed by a ctrl-Z,<br />:while on 1.2.6-RELEASE it never returns when suspended or continued.<br />:Without WUNTRACED flag, the code seems to work the same way on both<br />:versions of the OS.<br />:Too bad I couldn't find it before the new release.</p>
<pre><code>We haven't released yet! I can reproduce the same problem so I'll<br /> track it down and get it fixed before the release.</code></pre>
<pre><code>-Matt<br /> Matthew Dillon <br /> &lt;<a class="email" href="mailto:dillon@backplane.com">dillon@backplane.com</a>&gt;</code></pre> DragonFlyBSD - Bug #24: waitpid() with WUNTRACED flag? (was Re: Hang on ctrl+Z after the MPSAFE tsleep/wakeup commit)https://bugs.dragonflybsd.org/issues/24?journal_id=922005-12-27T08:23:10Zqhwt+dfly
<ul></ul><p>On Mon, Dec 26, 2005 at 11:16:10AM -0800, Matthew Dillon wrote:</p>
<blockquote>
<p>:While this has been fixed, I realized that I overlooked another<br />:similar problem; try running vipw, suspend it with ctrl-Z and<br />:continue with `fg'; the shell immediately reports that vipw has been<br />:suspended(processes are marked as `TL' by ps command); another `fg'<br />:seems to continue it, but vi(or whatever editor invoked by vipw)<br />:doesn't handle the screen properly, even though it's responding to<br />:the keystrokes.<br />:Attached is a small code fragment that demonstrates the problem.<br />:BTW, it behaves differently on -CURRENT and 1.2.6-RELEASE;<br />:on -CURRENT, waitpid() returns after `fg' command followed by a ctrl-Z,<br />:while on 1.2.6-RELEASE it never returns when suspended or continued.<br />:Without WUNTRACED flag, the code seems to work the same way on both<br />:versions of the OS.<br />:Too bad I couldn't find it before the new release.</p>
<p>We haven't released yet! I can reproduce the same problem so I'll<br />track it down and get it fixed before the release.</p>
</blockquote>
<p>Oops, I was supposed to say `before branching.' Anyway, I'll play with<br />signal a bit more but hopefully this is the last one.<br />By the way, here's the code I forgot attaching in the previous message.<br />It behaves consistently on Linux, FreeBSD, and NetBSD boxes around me,<br />but differently on the recent DragonFly.</p> DragonFlyBSD - Bug #24: waitpid() with WUNTRACED flag? (was Re: Hang on ctrl+Z after the MPSAFE tsleep/wakeup commit)https://bugs.dragonflybsd.org/issues/24?journal_id=932005-12-27T08:32:30Zqhwt+dfly
<ul></ul><p><strong>blush</strong></p> DragonFlyBSD - Bug #24: waitpid() with WUNTRACED flag? (was Re: Hang on ctrl+Z after the MPSAFE tsleep/wakeup commit)https://bugs.dragonflybsd.org/issues/24?journal_id=952005-12-27T15:31:40Zdillon
<ul></ul><p>:Content-Type: text/plain; charset=us-ascii<br />:Content-Disposition: attachment; filename="a.c" <br />:...</p>
<pre><code>I think it's the same issue. The ^Z effects all processes in<br /> the process group. waitpid in the parent tries to return the<br /> stopped state of the child, but the parent itself is stopped<br /> and the status is not returned until you 'fg'. This starts<br /> both parent and child, but then the parent processes the<br /> return from the waitpid and stops itself again.</code></pre>
<pre><code>I'm not sure if there's a good solution to the problem, or how<br /> much code uses the same sort of construct. Even the original code<br /> still had this race condition, it was just a much smaller window<br /> of opportunity.</code></pre>
<pre><code>One possible solution is to have the kernel wait4() check for a pending<br /> stop request on the parent process after being woken up but before it <br /> checks the state of other processes. This would reduce (but not<br /> eliminate) the window of opportunity back to what it was before<br /> the tsleep work went in.</code></pre>
<pre><code>-Matt</code></pre>
<p>: for (;;) {<br />: caught = waitpid(pid, &st, WUNTRACED);<br />: warnx("waitpid returned %d\n", caught);<br />: errno = <abbr title="st">WEXITSTATUS</abbr>;<br />: if (caught == -1)<br />: return 1;<br />: else if (<abbr title="st">WIFSTOPPED</abbr>) {<br />: sig = <abbr title="st">WSTOPSIG</abbr>;<br />: warnx("<abbr title="%d">WSTOPSIG</abbr> = %d\n", st, sig);<br />: sig = SIGSTOP;<br />: raise(sig);<br />: warnx("after raise(%d)\n", sig);<br />: }</p> DragonFlyBSD - Bug #24: waitpid() with WUNTRACED flag? (was Re: Hang on ctrl+Z after the MPSAFE tsleep/wakeup commit)https://bugs.dragonflybsd.org/issues/24?journal_id=962005-12-27T16:28:10Zqhwt+dfly
<ul></ul><p>On Mon, Dec 26, 2005 at 11:27:44PM -0800, Matthew Dillon wrote:</p>
<blockquote>
<p>:Content-Type: text/plain; charset=us-ascii<br />:Content-Disposition: attachment; filename="a.c" <br />:...</p>
<p>I think it's the same issue. The ^Z effects all processes in<br />the process group. waitpid in the parent tries to return the<br />stopped state of the child, but the parent itself is stopped<br />and the status is not returned until you 'fg'. This starts<br />both parent and child, but then the parent processes the<br />return from the waitpid and stops itself again.</p>
</blockquote>
<p>Ok, the code doesn't really reflect my question here; on computers here<br />running other OSes, including DragonFly 1.2.x-RELEASE, waitpid() (or wait4)<br />doesn't return after ^Z+`fg' even with WUNTRACED (I can confirm this by the<br />fact that warnx() doesn't print "<abbr title="%d">WSTOPSIG</abbr> = %d\n" message), thus no such<br />problem exists. I don't know if it's an (POSIX-ly or SUS-ly?) expected<br />behavior, but are we going to change that?</p>
<blockquote>
<p>: for (;;) {<br />: caught = waitpid(pid, &st, WUNTRACED);<br />: warnx("waitpid returned %d\n", caught);<br />: errno = <abbr title="st">WEXITSTATUS</abbr>;<br />: if (caught == -1)<br />: return 1;<br />: else if (<abbr title="st">WIFSTOPPED</abbr>) {<br />: sig = <abbr title="st">WSTOPSIG</abbr>;<br />: warnx("<abbr title="%d">WSTOPSIG</abbr> = %d\n", st, sig);<br />: sig = SIGSTOP;<br />: raise(sig);<br />: warnx("after raise(%d)\n", sig);<br />: }</p>
</blockquote> DragonFlyBSD - Bug #24: waitpid() with WUNTRACED flag? (was Re: Hang on ctrl+Z after the MPSAFE tsleep/wakeup commit)https://bugs.dragonflybsd.org/issues/24?journal_id=992005-12-28T03:02:30Zdillon
<ul></ul><p>:Ok, the code doesn't really reflect my question here; on computers here<br />:running other OSes, including DragonFly 1.2.x-RELEASE, waitpid() (or wait4)<br />:doesn't return after ^Z+`fg' even with WUNTRACED (I can confirm this by the<br />:fact that warnx() doesn't print "<abbr title="%d">WSTOPSIG</abbr> = %d\n" message), thus no such<br />:problem exists. I don't know if it's an (POSIX-ly or SUS-ly?) expected<br />:behavior, but are we going to change that?</p>
<pre><code>I don't know what the expected behavior is, but the only reason the<br /> parent isn't seeing the STOP signal on the child in FreeBSD is due to<br /> a fluke in the way the parent's own STOP signal is handled, which is<br /> causing the parent to stop before it's wait*() scans the list of<br /> child processes. If you were to wake up the parent on FreeBSD with a<br /> kill -CONT from another shell, it will see the vi's stop signal.</code></pre>
<pre><code>When you 'fg' the process group, both parent and child are being<br /> restarted at the same time. The parent then starts its scan of the list<br /> of child processes and of course no longer sees the child in a stopped<br /> state, so it only reports the SIGCONT, instead of a SIGSTOP followed by<br /> a SIGCONT. (The stop/cont state is held in a single variable in the<br /> proc structure so multiple stops and conts are not queued to the parent's<br /> wait*()).</code></pre>
<pre><code>DragonFly is currently scanning the list of child processes <strong>BEFORE</strong> it<br /> stops, so it sees the stopped child and tries to return its status,<br /> but then the parent process stops before it returns to user mode.<br /> When you 'fg' the process group, DragonFly proceeds to return the<br /> original STOP status that it has already processed (so it doesn't get<br /> overwritten by the new CONT status), then the parent program loops on<br /> the waitpid() and DragonFly sees the CONT status.</code></pre>
<pre><code>Which is more correct? I don't know. I think the userland program<br /> itself (the parent, as in your test program and in vipw) might be<br /> incorrect.</code></pre>
<pre><code>-Matt</code></pre> DragonFlyBSD - Bug #24: waitpid() with WUNTRACED flag? (was Re: Hang on ctrl+Z after the MPSAFE tsleep/wakeup commit)https://bugs.dragonflybsd.org/issues/24?journal_id=1002005-12-28T05:21:10Zcorecode
<ul></ul><p>On 27.12.2005, at 19:57, Matthew Dillon wrote:</p>
<blockquote>
<p>Which is more correct? I don't know. I think the userland program<br />itself (the parent, as in your test program and in vipw) might be<br />incorrect.</p>
</blockquote>
<p>I think from outside it usually should each be an atomar operation: <br />first stopping all foreground processes and later resuming all them at <br />the same time. So in the common case I think the parent shouldn't <br />notice either, but I know that it's not possible in the way we process <br />suspend signals now. Except, of course, if we add another sleep point <br /><strong>in</strong> wait. Or am I talking nonsense? (Few sleep at the moment)</p>
<p>cheers<br /> simon</p> DragonFlyBSD - Bug #24: waitpid() with WUNTRACED flag? (was Re: Hang on ctrl+Z after the MPSAFE tsleep/wakeup commit)https://bugs.dragonflybsd.org/issues/24?journal_id=1042005-12-28T06:00:10Zdillon
<ul></ul><p>:the same time. So in the common case I think the parent shouldn't=20<br />:notice either, but I know that it's not possible in the way we process=20=<br />:<br />:suspend signals now. Except, of course, if we add another sleep point=20=<br />:<br />:*in* wait. Or am I talking nonsense? (Few sleep at the moment)<br />:<br />:cheers<br />: simon</p>
<pre><code>That is precisely what we would have to do, as I indicated earlier<br /> (I worded it differently but it amounts to the same thing). We would<br /> have to put a P_STOPPED check in the kernel wait*() code.</code></pre>
<pre><code>Note that even on FreeBSD the SIGCONT is still picked up. It's in fact<br /> the reason why the SIGSTOP is NOT picked up by the parent, because<br /> the SIGCONT overwrites the SIGSTOP in the chlid's proc.</code></pre>
<pre><code>Making wait*() work like it did before is a two-line fix. Basically just<br /> this after the loop: label in kern_wait():</code></pre>
<pre><code>while (q->p_flag & P_STOPPED) <br /> tstop(q);</code></pre>
<pre><code>But I don't know if I want to actually do that. I think the original<br /> userland code is what was broken, not the kernel.</code></pre>
<pre><code>-Matt</code></pre> DragonFlyBSD - Bug #24: waitpid() with WUNTRACED flag? (was Re: Hang on ctrl+Z after the MPSAFE tsleep/wakeup commit)https://bugs.dragonflybsd.org/issues/24?journal_id=1102005-12-30T13:18:16Zcorecode
<ul></ul><p>fixed by dillon</p>