https://bugs.dragonflybsd.org/https://bugs.dragonflybsd.org/favicon.ico?16293952082007-11-01T02:53:01ZDragonFlyBSD bugtrackerDragonFlyBSD - Bug #833: cache_lock: blocked on 0xe29c3b08 ""https://bugs.dragonflybsd.org/issues/833?journal_id=37882007-11-01T02:53:01Zdillon
<ul></ul><p>:Following [diagnostic] cache_lock: blocked on 0xe29c3b08 "quota.user" in dmesg and /var/log/messages. System almost completely hangs; I am unable to create a new shell, enter commands, etc (or even reboot). Kernel options are SMP, IO_APIC, and QUOTA. The bug appears to manifest itself after applying userquota to a filesystem, mounting it, and then issuing a few commands like mkdir/chmod/etc on it. System is a dual processor Opteron 248.</p>
<pre><code>I'll try to reproduce this one today. It sounds like it ought to be<br /> easy to figure out.</code></pre>
<pre><code>-Matt<br /> Matthew Dillon <br /> &lt;<a class="email" href="mailto:dillon@backplane.com">dillon@backplane.com</a>&gt;</code></pre> DragonFlyBSD - Bug #833: cache_lock: blocked on 0xe29c3b08 ""https://bugs.dragonflybsd.org/issues/833?journal_id=37912007-11-01T13:35:01Zdillon
<ul></ul><p>:<br />:<br />::Following [diagnostic] cache_lock: blocked on 0xe29c3b08 "quota.user" in dmesg and /var/log/messages. System almost completely hangs; I am unable to create a new shell, enter commands, etc (or even reboot). Kernel options are SMP, IO_APIC, and QUOTA. The bug appears to manifest itself after applying userquota to a filesystem, mounting it, and then issuing a few commands like mkdir/chmod/etc on it. System is a dual processor Opteron 248.<br />:<br />: I'll try to reproduce this one today. It sounds like it ought to be<br />: easy to figure out.<br />:<br />: -Matt</p>
<pre><code>I haven't had any luck reproducing it yet. Could you give me a<br /> test script to run that will reproduce the problem? Also, are you<br /> doing your tests as root or as a user ?</code></pre>
<pre><code>Alternatively if you can get a kernel core and a kgdb backtrace of one<br /> of the stuck processes I can probably figure out what is going on from<br /> there.</code></pre>
<pre><code>-Matt</code></pre> DragonFlyBSD - Bug #833: cache_lock: blocked on 0xe29c3b08 ""https://bugs.dragonflybsd.org/issues/833?journal_id=37952007-11-02T05:41:03Zdpwalters
<ul></ul><p>After trying to recreate this problem in a virtual machine I seem to have<br />trouble recreating it too. I apologize for this bug report, after investigating quota.user<br />with the file command it seems it had gotten corrupted somehow as the file command<br />returned it as DATA. I'm not sure how it got corrupted, however, as fsck showed it as<br />clean. Anyway, why does the system become unresponsive after this file becomes<br />corrupted? Also, other than the obvious reasons, how might this file become corrupted?<br />I appreciate all the work you do.<br /> -David</p> DragonFlyBSD - Bug #833: cache_lock: blocked on 0xe29c3b08 ""https://bugs.dragonflybsd.org/issues/833?journal_id=37962007-11-02T06:16:01Zdpwalters
<ul></ul><p>I seem to have spoke to soon as I can now reproduce this in a virtual machine. This is how I<br />was able to reproduce it. qemu -smp 2 with kernel options SMP and QUOTA. It seems that after<br />running touch /home/quota.user as root (maybe after rebooting or not) things start to get<br />weird (/home is a userquota filesystem). I get sigreturn: eflags 0x206 messages in the console<br />as well as others like 0x80207 and 0x80203. Also root's shell beings to lock up and subsequent<br />logins as root lock up after simple commands like "ls".</p> DragonFlyBSD - Bug #833: cache_lock: blocked on 0xe29c3b08 ""https://bugs.dragonflybsd.org/issues/833?journal_id=37972007-11-03T07:20:02Zdpwalters
<ul></ul><p>It seems that these sigreturn: eflags messages and subsequent lockups are due to possibly<br />a bug in qemu (even in the latest CVS sources that I have pulled from 10/30). Upon further<br />research of these messages it seems VirtualBox is affected and I can only assume qemu is<br />as well (as VirtualBox is based on qemu). It just so happened that these messages occurred<br />just after having tried to recreate the issue I was having on my Opteron-based system and the<br />same sort of effect was generated.</p> DragonFlyBSD - Bug #833: cache_lock: blocked on 0xe29c3b08 ""https://bugs.dragonflybsd.org/issues/833?journal_id=37982007-11-04T11:22:03Zdpwalters
<ul></ul><p>OK, after having dumped qemu as a debugging solution for SMP kernels, <br />I have discovered vkernels. I am able to reproduce the bug in a vkernel<br />with options QUOTA and SMP (without annoying qemu bugs). After attaching<br />gdb to the vkernel I am unable to get a decent backtrace. GDB has trouble<br />accessing the memory and gives Device Busy errors. Anyway I thought this<br />information might be of some help. For what it's worth, I have a picture of the <br />backtrace at <a class="external" href="http://woe.likewhoa.net/~david/vkernel_bt.jpg">http://woe.likewhoa.net/~david/vkernel_bt.jpg</a> This is after the<br />vkernel gets hung up. (Note: I still use qemu, just not its smp functionality<br />as this picture is of qemu running DragonFly GENERIC and a vkernel inside<br />of that)</p> DragonFlyBSD - Bug #833: cache_lock: blocked on 0xe29c3b08 ""https://bugs.dragonflybsd.org/issues/833?journal_id=37992007-11-04T20:27:02Zcorecode
<ul></ul><p>I guess a simple shell script should be sufficient so that Matt can<br />reproduce the bug. Unless of course you want to get your hands dirty<br />yourself. In this case I guess gdb needs to be extended to be able to<br />deal with multiple threads :)</p>
<p>cheers<br /> simon</p> DragonFlyBSD - Bug #833: cache_lock: blocked on 0xe29c3b08 ""https://bugs.dragonflybsd.org/issues/833?journal_id=38002007-11-04T22:40:02Zdpwalters
<ul></ul><p>Attached is a script that probably works 100% of the time. You may want<br />to backup your vkernel filesystem before running the script as it may<br />become trashed due to the inability to sync the filesystem after becoming<br />locked up. Also, there is some preparation work to do before running this<br />script inside of a vkernel. Here are some example commands (assuming your<br />vkernel filesystem is a lot like the one in the vkernel man page) to prepare<br />the vkernel and then trigger the bug inside of the vkernel.</p>
<p>1. dd if=/dev/zero of=/mnt.img bs=1m count=5<br />2. vnconfig -c -s labels vn0 /mnt.img<br />3. disklabel -r -w vn0s0 auto<br />4. disklabel -e vn0s0 #edit the label to create a vn0s0a partition<br />5. newfs /dev/vn0s0a<br />6. echo "/dev/vn0s0a /mnt ufs rw,userquota 1 1" >> /etc/fstab<br />7. mount /mnt<br />8. ./bug.sh /mnt</p>
<p>After having run the script myself, it would seem you can ctrl-z out of the<br />script and just have a zombied process. I'm not sure, but when /home<br />is the filesystem and you're trying to login remotely as a user this is<br />very bad (as sshd gets zombied repeatedly and this is what I was experiencing).<br />Also, you cannot unmount the filesystem after locking it up. You also may or<br />may not see the "cache_lock: blocked" message(s).</p> DragonFlyBSD - Bug #833: cache_lock: blocked on 0xe29c3b08 ""https://bugs.dragonflybsd.org/issues/833?journal_id=38012007-11-07T00:25:02Zdillon
<ul></ul><p>:Attached is a script that probably works 100% of the time. You may want<br />:to backup your vkernel filesystem before running the script as it may<br />:become trashed due to the inability to sync the filesystem after becoming<br />:locked up. Also, there is some preparation work to do before running this<br />:..</p>
<pre><code>I've reproduced at least one lockup with your script. I'm tracking it<br /> down now.</code></pre>
<pre><code>-Matt</code></pre> DragonFlyBSD - Bug #833: cache_lock: blocked on 0xe29c3b08 ""https://bugs.dragonflybsd.org/issues/833?journal_id=38022007-11-07T00:47:00Zdillon
<ul></ul><p>:<br />::Attached is a script that probably works 100% of the time. You may want<br />::to backup your vkernel filesystem before running the script as it may<br />::become trashed due to the inability to sync the filesystem after becoming<br />::locked up. Also, there is some preparation work to do before running this<br />::..<br />:<br />: I've reproduced at least one lockup with your script. I'm tracking it<br />: down now.<br />:<br />: -Matt</p>
<pre><code>I think the problem is that the quota.user file is not being pre-populated,<br /> and the filesystem is recursing trying to allocate blocks for the file.<br /> i.e. it is trying to check quotas on the quota file itself.</code></pre>
<pre><code>If you run quotacheck on the filesystem before turning on quotas the<br /> quota.user file will be properly created.</code></pre>
<pre><code>That said, we certainly do not want it to crash. I'll adjust the code<br /> to generate a kernel warning.</code></pre>
<pre><code>-Matt<br /> Matthew Dillon <br /> &lt;<a class="email" href="mailto:dillon@backplane.com">dillon@backplane.com</a>&gt;</code></pre>