Bug #1039

hammer_alloc_data panic

Added by bastyaelvtars over 6 years ago. Updated about 4 years ago.

Status:ClosedStart date:
Priority:LowDue date:
Assignee:-% Done:

0%

Category:-
Target version:-

Description

panic: assertion: *errorp == 0 in hammer_alloc_data
Trace beginning at frame 0xcb4f1888
panic(cb4f18ac,10000,cb4f18fc,cb4f18f4,cb4f18c0) at panic+0x8c
panic(c0517637,c0539efc,c04ff098,c1c12700,cb4f190c) at panic+0x8c
hammer_alloc_data(cb4f1a8c,10000,10,cb4f18f4,cb4f1904) at
hammer_alloc_data+0xda hammer_ioc_reblock
(cb4f1a8c,c9cf9fd0,c1614140,cb4f19f0,c041edfe) at hammer_ioc_r eblock
+0x39e hammer_ioctl(c9cf9fd0,c0e06803,c1614140,1,c1526578) at
hammer_ioctl+0x605 hammer_vop_ioctl
(cb4f1ae4,c05ce5e0,c1665e50,c1690268,0) at hammer_vop_ioctl+0x20
vop_ioctl(c1665e50,c1690268,c0e06803,c1614140,1) at vop_ioctl+0x38
vn_ioctl(c8e74f28,c0e06803,c1614140,c1526578,c8e74f28) at vn_ioctl+0xba
mapped_ioctl(3,c0e06803,bfbff468,0,cb4f1d34) at mapped_ioctl+0x3e1
sys_ioctl(cb4f1cf0,6,0,0,c9b0a518) at sys_ioctl+0x16 syscall2(cb4f1d40)
at syscall2+0x1ec Xint0x80_syscall() at Xint0x80_syscall+0x36
Debugger("panic")
Stopped at Debugger+0x34: movb $0,in_Debugger.3945

I got this while doing a reblock. The dump is available at:

http://medica.semi.szote.u-szeged.hu/~szg/crash080702.tar.bz2

History

#1 Updated by bastyaelvtars over 6 years ago

Actually it seems to occur on every reblock.

#2 Updated by dillon over 6 years ago

:Actually it seems to occur on every reblock.
:
:--
:Gergo Szakal MD <>
:University Of Szeged, HU
:Faculty Of General Medicine
:
:/* Please do not CC me with replies, thank you. */

(kgdb) print *errorp
$1 = 28 (ENOSPC)

Ok, it is failing on ENOSPC. But the root volume is reporting
158 big blocks free (1264 MB free). The reason it is a panic is
that there is supposed to be a higher-level check for available
free space BEFORE the actual allocation is attempted. That clearly
is not happening in the reblocker.

(kgdb) frame 12
(kgdb) print trans->rootvol->ondisk->vol0_stat_bigblocks
$5 = 1758
(kgdb) print trans->rootvol->ondisk->vol0_stat_freebigblocks
$6 = 158

I think I may know what is going on. Does the reblocker always seem
to start working ok, run for a little while, but then panic ? Or does
it panic immediately?

Second question... if you do this, how big a file can you create before
the filesystem runs out of space (or crashes)?

dd if=/dev/zero of=<some_new_file_on_hammer> bs=32k

If the dd is able to create a large file (> 500MB) then I have a pretty
good idea what the problem is, and it should be easy to fix. It is
either the free space estimator is being broken, or the reblocker is
building up a lot of reserved big-blocks during its operation without
flushing.

I'll explain that last bit. When the data associated with a record is
freed, the space can't actually be reused for *TWO* flush cycles
(60 seconds, approximately). The reason it can't be immediately reused
is because no UNDOs are generated for data, only for meta-data. We
don't want the crash recovery code to re-associate data that was
previously deleted, but which may have also gotten overwritten.

The reblocker is constantly allocating new space and freeing the old
space. I'll bet the problem is simply that the reblocker is able to
reallocate 1264MB worth of data without building up enough meta-data
changes to force two flush cycles to actually free up the areas it
had deallocated.

I'm hoping that's the problem. If it is then the solution is simple.
I just need to track the delayed-free space and also do a space check
in the reblocker loop and abort it.

I'll work on a fix today. I want HAMMER to be graceful when disk space
is low.

-Matt
Matthew Dillon
<>

#3 Updated by bastyaelvtars over 6 years ago

On Wed, 2 Jul 2008 16:20:12 -0700 (PDT)
Matthew Dillon <> wrote:

It panics after a little while, not immediately.

dd can fill up the device with no crash.

dd: test: No space left on device
29706159+0 records in
29706158+0 records out
950597056 bytes transferred in 451.974314 secs (2103210 bytes/sec)

#4 Updated by dillon over 6 years ago

:> I think I may know what is going on. Does the reblocker always
:> seem to start working ok, run for a little while, but then panic ?
:> Or does it panic immediately?
:
:It panics after a little while, not immediately.
:
:...
:dd can fill up the device with no crash.
:
:> dd if=/dev/zero of=test bs=32
:dd: test: No space left on device
:--
:Gergo Szakal MD <>
:University Of Szeged, HU

Excellent. Commit 60A/B should fix it then.

-Matt
Matthew Dillon
<>

#5 Updated by bastyaelvtars over 6 years ago

On Thu, 3 Jul 2008 10:26:59 -0700 (PDT)
Matthew Dillon <> wrote:

It does not.

panic: assertion: *errorp == 0 in hammer_alloc_data
Trace beginning at frame 0xcb006888
panic(cb0068ac,10000,cb006900,cb0068f8,cb0068c0) at panic+0x8c
panic(c05187b7,c053b07c,c0500210,c1a30500,cb006910) at panic+0x8c
hammer_alloc_data(cb006a8c,10000,10,cb0068f8,cb006908) at
hammer_alloc_data+0xda hammer_ioc_reblock
(cb006a8c,c9cf9fd0,c16135e0,c041f062,558) at hammer_ioc_reblock+0x3e9
hammer_ioctl(c9cf9fd0,c0e06803,c16135e0,1,c1526038) at hammer_ioctl
+0x5de hammer_vop_ioctl(cb006ae4,c05cf780,c1665e50,c1690268,0) at
hammer_vop_ioctl+0x20 vop_ioctl(c1665e50,c1690268,c0e06803,c16135e0,1)
at vop_ioctl+0x38 vn_ioctl
(c8e6d9e0,c0e06803,c16135e0,c1526038,c8e6d9e0) at vn_ioctl+0xba
mapped_ioctl(3,c0e06803,bfbff928,0,cb006d34) at mapped_ioctl+0x3e1
sys_ioctl(cb006cf0,6,0,0,c9b08a18) at sys_ioctl+0x16 syscall2(cb006d40)
at syscall2+0x1ec Xint0x80_syscall() at Xint0x80_syscall+0x36 Debugger
("panic") Stopped at Debugger+0x34: movb $0,in_Debugger.3945

Would you like a new kernel & dump?

#6 Updated by dillon over 6 years ago

:On Thu, 3 Jul 2008 10:26:59 -0700 (PDT)
:Matthew Dillon <> wrote:
:
:> Excellent. Commit 60A/B should fix it then.
:
:It does not.
:
:panic: assertion: *errorp == 0 in hammer_alloc_data
:Trace beginning at frame 0xcb006888
:panic(cb0068ac,10000,cb006900,cb0068f8,cb0068c0) at panic+0x8c
:panic(c05187b7,c053b07c,c0500210,c1a30500,cb006910) at panic+0x8c
:hammer_alloc_data(cb006a8c,10000,10,cb0068f8,cb006908) at
:hammer_alloc_data+0xda hammer_ioc_reblock
:(cb006a8c,c9cf9fd0,c16135e0,c041f062,558) at hammer_ioc_reblock+0x3e9
:hammer_ioctl(c9cf9fd0,c0e06803,c16135e0,1,c1526038) at hammer_ioctl
:+0x5de hammer_vop_ioctl(cb006ae4,c05cf780,c1665e50,c1690268,0) at
:hammer_vop_ioctl+0x20 vop_ioctl(c1665e50,c1690268,c0e06803,c16135e0,1)
:at vop_ioctl+0x38 vn_ioctl
:(c8e6d9e0,c0e06803,c16135e0,c1526038,c8e6d9e0) at vn_ioctl+0xba
:mapped_ioctl(3,c0e06803,bfbff928,0,cb006d34) at mapped_ioctl+0x3e1
:sys_ioctl(cb006cf0,6,0,0,c9b08a18) at sys_ioctl+0x16 syscall2(cb006d40)
:at syscall2+0x1ec Xint0x80_syscall() at Xint0x80_syscall+0x36 Debugger
:("panic") Stopped at Debugger+0x34: movb $0,in_Debugger.3945
:
:Would you like a new kernel & dump?
:
:
:--
:Gergo Szakal MD <>
:University Of Szeged, HU
:Faculty Of General Medicine

Grr.

I have a better idea. Make a copy of the raw filesystem. It's only 14GB.
See how much you can compress it and I'll download it and reproduce
the problem on my test box.

-Matt
Matthew Dillon
<>

#7 Updated by bastyaelvtars over 6 years ago

On Thu, 3 Jul 2008 12:49:41 -0700 (PDT)
Matthew Dillon <> wrote:

OK, I am going to do it, I'll post back tomorrow at about this time.

#8 Updated by bastyaelvtars over 6 years ago

I got a similar panic with the same filesystem almost filled-up with
different files when reblocking. I do not know how helpful this is, but
I report it anyway. ;-)

#9 Updated by dillon over 6 years ago

Ok, Gergo and I have been working through the two issues he reported
and with kind access to his machine I have figured out what is going
on. I am going to post this to the thread so we have a record of it,
because it is quite interesting.

Gergo found two problems:

(1) 'hammer reblock' can lose track of space reservations and cause
hammer_alloc_data() to run out of space on the media and panic.

I hope to fix this today. At worst we want the reblocker to
return an error if there is insufficient free space on the disk
to reblock it, not panic the machine :-).

I have found that 'dd' can do the same thing. It is the same bug.

(2) On his small 14G test partition, using nohistory, it turns out
that a huge amount of fragmentation can build up if the partition
is not reblocked. I aint talking 10% here, I'm talking 65%
fragmentation or worse. The 14G partition only had 5G worth
of files on it but it was 99% full, with only 300MB free in df.

It was so fragmented that trying to reblock it using the default
fill level (aka 'hammer reblock /home') failed because there was
not enough media space free to reblock into.

This is really a documentation issue. HAMMER partitions must be
reblocked occassionally, preferably via cron and preferably before
you actually run out of disk space.

It is possible to reblock when the media is highly fragmented. You do
it by telling hammer to only reblock nearly-empty blocks first, in
order to get them freed up and available for reuse as quickly as possible.
This is done by specifying a <fill_percentage> argument.

For example, this command will only reblock blocks that are 5% full
(and hence 95% empty):

hammer reblock /home 5

It works because it doesn't cost much to move the small amounts of data
out of those highly fragmented blocks and thus be able to free the blocks.
You then increase the fill percentage until you have freed enough space
to do the remainder with no limitations:

hammer reblock /home 25
hammer reblock /home 50
hammer reblock /home 75
hammer reblock /home 90
hammer reblock /home
whew...

It's a bit complex so what I am going to do is add some foot-shooting
protection to the 'hammer reblock' utility and maybe also have it
print out a warning, a reminder, if you attempt to reblock a hammer
partition that is too full.

-Matt

#10 Updated by swildner over 6 years ago

I suggest that until we have some automatic reblocking mechanism, we
should add a etc/periodic/daily script which reblocks all mounted hammer
file systems.

Sascha

#11 Updated by dillon over 6 years ago

:Matthew Dillon wrote:
:> This is really a documentation issue. HAMMER partitions must be
:> reblocked occassionally, preferably via cron and preferably before
:> you actually run out of disk space.
:
:I suggest that until we have some automatic reblocking mechanism, we
:should add a etc/periodic/daily script which reblocks all mounted hammer
:file systems.
:
:Sascha
:
:--
:http://yoyodyne.ath.cx

I do want to automate it, but I don't think an automatic cron job is
a good idea for this release cycle. It could easily result in user
foot shooting worse then just running out of filesystem space.

-Matt
Matthew Dillon
<>

#12 Updated by bastyaelvtars over 6 years ago

On Wed, 16 Jul 2008 00:42:41 +0200
Sascha Wildner <> wrote:

Yes but that has to be done intelligently via a shell script, i.e. to
try until HAMMER returns success, because small partitions can fill up
in no time. So try first with the default, then 90, 80, 70 etc.

#13 Updated by corecode over 6 years ago

We absolutely need this. People will complain left and right if it
isn't "just" working.

Probably I'm missing something, but can't we reserve N blocks for
reblocking purposes, so that the reblocker always can make progress?

Why wouldn't the reblocker anyways start with the almost empty blocks?
Which order is it picking right now?

cheers
simon

#14 Updated by justin over 6 years ago

On Tue, July 15, 2008 6:09 pm, Matthew Dillon wrote:

What penalty would there be (time taken, perhaps?) to have this be the
default iterative behavior when reblocking?

#15 Updated by aoiko over 6 years ago

Er. Documentation issue? Selecting a 'perfect' line in crontab so that you
stay ahead of fragmentation is hard to do. I'd argue it shouldn't be left
up to the admin. At some point the system should take matters into its own
hands. I mean, if the admin can schedule reblocking so that things never
get out of hand great, but let's be reasonable: that is not going to happen.

IMHO, that's a job that the machine should do. Why not create a thread/daemon
that babysits a hammer fs, and just let the user choose among a set of
predefined policies? This should take care of both issues.

As things are, I fear people are going to hit some of these issues even if
they do read the docs; you just ask too much of them. At the very least, we'd
need a Getting Started With HAMMER doc that explains what the admin should do
if they run into trouble and guides them through setting up an appropriate
crontab line. Another idea would be to warn about all known hammer issues
*in the release announcement*.

Also, the rate of hammer bug reports does not appear to have slowed down this
week. From where I'm standing there are two unresolved bugs and we're 5 days
before the release. I know you've been testing hammer to death but you can't
test everything. People who try the release /are/ going to hit bugs (that's
the point, right?). Perhaps there should at least be a warning about
backups at mount time (or require -o I_have_backups :)?

I do not intend to sound discouraging; I'm just worried that the cries of
those who have hit the reblocking issues and/or some stray bugs are going
to cover the positive feedback.

Aggelos

#16 Updated by dillon over 6 years ago

:> space to do the remainder with no limitations:
:>
:> hammer reblock /home 25
:> hammer reblock /home 50
:> hammer reblock /home 75
:> hammer reblock /home 90
:> hammer reblock /home
:> whew...
:
:What penalty would there be (time taken, perhaps?) to have this be the
:default iterative behavior when reblocking?

Basically just time and disk bandwidth, since it would have to scan
the B-Tree multiple times. It's a good idea. To really make it work
well the filesystem needs to be able to provide feedback to the
hammer utility as to the level of fragmentation.

Ultimately some form of automation will make this a non-issue. My
preference is a thread initiated by the filesystem itself which
does a slow iteration over the B-Tree. At the very least it could
measure the fragmentation and provide the feedback that the hammer
utility needs. Some careful thought is needed to come up with the
proper solution.

There's just no time to think it through before the release, we will have
to do it afterwords. First adopters are going to be well aware of the
issues so there's no rush. I think our best bet for the release
is simply to document the issue as fully as possible.

--

HAMMER creates somewhat of a new paradigm, requiring people to think
about storage a bit differently then they would normally. It fits
very well with the massive amounts of storage that is now becoming
available. What do you with a terrabyte drive that, short of piracy,
would take years to fill up? With HAMMER the whole point is to run
it on large storage media and in full historical mode (hence why that
is the default), so you get free snapshots and fine-grained backups.

-Matt
Matthew Dillon
<>

#17 Updated by dillon over 6 years ago

:I do not intend to sound discouraging; I'm just worried that the cries of
:those who have hit the reblocking issues and/or some stray bugs are going
:to cover the positive feedback.
:
:Aggelos

Actually I think we are doing very well, though I can see why
you might be a little rattled looking at it from the outside. I
apologize for that, and I will try to explain what is going on the
alleviate any concerns. In fact, I am going to go into great detail,
this is as much a philosophical document as it is an explanation :-)
It is virtually the only form of development possible for a one-man
project, or even a two or three-man project.

Nearly all the bug flow is due to the continued work being pushed into
the filesystem. That work essentially ended last weekend with the
last major mirroring infrastructure commit.

Virtually none of the bug flow is related to the older HAMMER code
pertaining to basic filesystem operation. For example, the UNDO
crash recovery and filesystem corruption bugs stopped occuring almost
a month ago. Basic filesystem operations... read, write, open, close,
readdir, chmod, etc.. have been stable for well over 2 months.
Historical lookups and snapshots have been stable for over 3 months.
I purposefully destabilized truncation for a few days last week,
and I purposefully destabilized the deadlock handling for a few days
last weekend, all in order to the mirroring code operational (and in
the case of truncation to fix UNDO FIFO issues related to the
limited UNDO space in small HAMMER filesystems).

When I said 2 weeks ago that I wasn't sure I would be able to get the
mirroring and PFS code in, this is what I was talking about. It isn't
just coding and committing, it is also getting the basic testing done,
the utility support done, and fixing the bugs introduced when surgery
is required on other parts of the filesystem to support the new feature.

What do I mean by purposeful destabilization? Let me give you another
example. Taking the mirroring code again. In order to propagate a
transaction id up the B-Tree to support incremental mirroring I couldn't
abort half way through with an EDEADLK and have the high level code
retry, because the governing insertion or deletion had already occured.
So what I did was implement the propagation *without* deadlock handling,
got it working, then worked through the deadlocks (the 'purposeful
destabilization') that I had created. I knew I was introducing some
deadlock issues when I did that, but it was still the fastest way to
get it implemented.

So what you are seeing is not really new crops of unexpected bugs, but
instead mostly expected bugs whos flow is carefully managed so they
will be fixed by the release, and a few I left on the backburner
(mostly related to filesystem-full issues but also a few related to the
handling of I/O errors), because I knew I could fix them in a day or two.
80% of the bug flow is from purposefully destabilization, and about 20%
is in the 'unexpected bug' category.

HAMMER is a really complex project, and the complexity is somewhat
of a moving target because all the myrid theory does not always fit
together seemlessly. It is not possible to implement each subsystem
independant of the other subsystems to the point where it is perfect.
Invariably working on a later subsystem requires going back to the
earlier ones and making (sometimes major) changes to the algorithms,
with massive debugging inbetween each major piece of subsystem work so
the bugs would not create geometrically complex (and hard to debug)
failures.

The constant flow of bugs is the intended outcome for this sort of
development style. It is the ONLY single-person development style
that has even a half chance of working for a complex project,
something I have learned through the years with various large projects
such as Diablo, various embedded OSs, DICE (The C compiler I wrote for
the Amiga many years ago), numerous other projects, and now HAMMER.

In anycase this week is crunch time for the remaining bugs and I'm still
on schedule! I'm quite happy that I get to dedicate this week just to
fixing bugs, and won't be introducing any new algorithms to start the
endless bug cycle going again :-). Even I was feeling a bit flustered
last week, trying to squeeze that massive, massive mirroring
implementation in. That was literally a 100-hour work week for me.
I was stressing out big-time last week. This week is smooth sailing.

-Matt
Matthew Dillon
<>

#18 Updated by dillon over 6 years ago

:We absolutely need this. People will complain left and right if it
:isn't "just" working.
:
:Probably I'm missing something, but can't we reserve N blocks for
:reblocking purposes, so that the reblocker always can make progress?
:
:Why wouldn't the reblocker anyways start with the almost empty blocks?
:Which order is it picking right now?
:
:cheers
: simon

Yes, but if the filesystem has 60% fragmentation we are talking close
to a gigabyte needing to be reserved (from my testing with Gergo's
wonderfully fragmented 14G filesystem). That clearly will not work very
well for small filesystems. Ok, I admit I really don't want to support
small filesystems anyway, personally I think ~50G is the minimum
reasonable size for a HAMMER filesystem. But I know in my heart that
people are going to want to use HAMMER on smaller filesystems so I am
pushing to make it work as well as possible.

I am open to the many proposed solutions and after the release if someone
wants to take on that task, freeing me up for other continuing work,
it would be even better!

-Matt
Matthew Dillon
<>

#19 Updated by corecode over 6 years ago

Why do you need to reserve 1GB? Wouldn't (in theory), one block be
enough to be able to make progress?

cheers
simon

#20 Updated by dillon over 6 years ago

:Why do you need to reserve 1GB? Wouldn't (in theory), one block be
:enough to be able to make progress?
:
:cheers
: simon

Not without a more sophisticated algorithm, no. The reason is simple.
Lets say you have a filesystem with 32 blocks and 16 of them are half
full. Now you reblock.

But the reblocking doesn't scan by block, it scans the B-Tree. So,
due to fragmentation, reblocking by scanning the B-Tree might free up
*some* of the space in each of those 16 blocks while filling up a new
block, but the new block may become completely full before any one
of those 16 blocks becomes completely empty. So another new block needs
to be allocated.

Eventually all of the original 16 blocks will be compacted and (since
they were originally half full), 8 of them will be freed up. But it
could have required numerous new blocks to be filled before any of the
original 16 became completely free and reusable.

If you reblock with a low fill level, like 20%, then there is a much
MUCH higher chance that the reblocker will be able to clear out
partially empty blocks before having to allocate additional new
blocks. Once you accomplish that you can use a larger fill level,
and repeat, until you get to 100%.

--

Is it possible to reblock with one free block? Yes, but it requires
a different algorithm. The algorithm would have to scan the B-Tree
for all elements using that block we are trying to free. Then scan
the tree again for the next block we are tryingto free, and so forth.
It would be very very expensive. Some optimizations could be made
(e.g. to free a limited number of blocks simultaniously with one
B-Tree scan), but no matter what it will be far more time and
disk-intensive then the current algorithm. We might want to implement
it anyway as an emergency measure, but frankly at that point I think
the person's best bet is to free up some space by removing files
and pruning before doing a standard reblocking.

-Matt
Matthew Dillon
<>

#21 Updated by corecode over 6 years ago

Oh I see, that's in clear contrast to LFS. Is it possible to find all
b-tree elements (or rather, their parents) when looking at one block?

cheers
simon

#22 Updated by dillon over 6 years ago

:...
:> *some* of the space in each of those 16 blocks while filling up a new
:> block, but the new block may become completely full before any one of those
:> 16 blocks becomes completely empty. So another new block needs
:> to be allocated.
:
:Oh I see, that's in clear contrast to LFS. Is it possible to find all
:b-tree elements (or rather, their parents) when looking at one block?
:
:cheers
: simon

... by scanning the B-Tree yes. With some sort of direct lookup? No.

This is the trade-off I made using a coarse-grained storage layer verses
the A-list based fine-grained one I originally envisioned. Though
nothing prevents HAMMER from being able to use a fine-grained storage
layer, if we can swing using the coarse-grained layer we reap some
phenominal advantages, not the least of which being that the management
overhead of the storage layer outside the reblocking function is about
as close to 0 as it is possible to get.

-Matt
Matthew Dillon
<>

#23 Updated by alexh about 4 years ago

So, is there still some intention of having a kernel thread monitoring the FS? Or
do we even have that already? (Not as far as I know).

Cheers,
Alex

Also available in: Atom PDF