Bug #1223

Hammer boot cannot load modules anymore

Added by mneumann almost 6 years ago. Updated almost 6 years ago.

Status:ClosedStart date:
Priority:NormalDue date:
Assignee:-% Done:

0%

Category:-
Target version:-

Description

Hi,

I'm using the pure Hammer bootloader.

At first it worked, but now, after using the system for a while I cannot
load modules at boot time anymore. So I always have to disable acpi,
otherwise it will hang shortly after showing some progress trying to
load the module (it just shows a few "/", "|", backslash, "-"
characters, then it hangs).

The kernel itself boots just fine. I am using the GENERIC kernel.

Regards,

Michael

History

#1 Updated by corecode almost 6 years ago

can you ctrl-alt-del the machine at that point or is it locked?

I'm not sure how to debug this... If your file system is small enough,
you could put it up somewhere for me to have a look. Alternatively you
can build a standalone hammer reader from the boot source in
lib/libstand/hammerread.c and try to debug why it can't load the modules.
I guess it has something to do with the btree layout.

cheers
simon

#2 Updated by mneumann almost 6 years ago

Yes that works.

I have problems compiling hammerread:

cd lib/libstand
make -DTESTING -DDEBUG=3 hammerread

/usr/libexec/binutils217/elf/ld: errno: TLS definition in
/usr/lib/libc.so section .tbss mismatches non-TLS reference in
/tmp//ccVzucFt.o
/usr/lib/libc.so: could not read symbols: Bad value

What am I doing wrong?

Regards,

Michael

#3 Updated by corecode almost 6 years ago

use cc:

cc -DTESTING -DDEBUG=3 hammerread.c

cheers
simon

#4 Updated by mneumann almost 6 years ago

Okay that works (using -std=c99). It can successfully read and list the
directory contents.

In the boot prompt I tried the following:

ls works
ls / works
ls -l immediatly reboots
ls /boot works
ls /boot/modules works, but after pressing a few times SPACE (i.e.
scroll page down), it hangs.

Regards,

Michael

#5 Updated by mneumann almost 6 years ago

Okay that works (using -std=c99). It can successfully read and list the
directory contents.

In the boot prompt I tried the following:

ls works
ls / works
ls -l immediatly reboots
ls /boot works
ls /boot/modules works, but after pressing a few times SPACE (i.e.
scroll page down), it hangs.

Regards,

Michael

#6 Updated by mneumann almost 6 years ago

hammerread /dev/ad0s1a /boot/modules

It hits the right boundary ("hit right boundary (52), resetting search
to"), and goes into an infinite loop.

Regards,

Michael

#7 Updated by corecode almost 6 years ago

ah, good. could you send me the log so that i can trace what is going wrong?

thanks
simon

#8 Updated by dillon almost 6 years ago

:Michael Neumann wrote:
:> hammerread /dev/ad0s1a /boot/modules
:>=20
:> It hits the right boundary ("hit right boundary (52), resetting search =
:
:> to"), and goes into an infinite loop.
:
:ah, good. could you send me the log so that i can trace what is going wr=
:ong?
:
:thanks
: simon

I see some possible issues in hammerread.c. In the B-Tree search
code, e.g. starting at around line 439, 'n' and 'e' can get out of
sync. For example, if node->count is 0 e can wind up bogus.

the while loop here is also bogus. If n == node->count then 'e'
is invalid at a leaf. The right hand boundary element AT
elm[node->count] only exists for internal nodes, not for leaf
nodes.

// Skip deleted elements
while (n <= node->count && e->base.delete_tid != 0) {

What I recommend is to properly track 'n' through that entire
bit of code. So for example, instead of:

// unless we stopped right on the left side, we need to back off a bit
if (n > 0)
e = &node->elms[n - 1];

You would have:

if (n > 0)
e = &node->elms[--n];

Then the if (r > 1) check lower down makes more sense, as the worst
that can happen is n becomes equal to node->count (and 'e' becomes
invalid at that point but presumably the rest of the checks would
take care of it.

Then this bit of code:

// Skip deleted elements
while (n <= node->count && e->base.delete_tid != 0) {
e++;
n++;
}

Should be '<', not '<='. e being invalid when n == node->count.

// Skip deleted elements
while (n < node->count && e->base.delete_tid != 0) {
e++;
n++;
}

And this bit of code:

if (n > node->count) {
...
hit right boundary
}

should be:

KKASSERT(n <= node->count);
if (n == node->count) {
...
hit right boundary
}

There may also be an issue on the right boundary code. What you have
to do is recurse up the tree, returning to the parent element and then
moving on to the next parent index. This recursion can go all the
way to root if the 'next' index in the parent also hits the right
hand boundary. You exhaust the space in the search when you loop up
to get the next element in the parent and the parent is root and
it is already on the right hand boundary, meaning there is no next
element.

-Matt

#9 Updated by corecode almost 6 years ago

Can you try updating loader (and if that works boot2 (disklabel64 -B)) and
try if it works now? You can of course also use hammerread.c as a test
before (probably better :). I can't reproduce the same setting you have
on your HDD, but I think I've fixed the issue.

cheers
simon

#10 Updated by mneumann almost 6 years ago

Am 12.01.2009 01:15, schrieb Simon 'corecode' Schubert:
> Michael Neumann wrote:
>> At first it worked, but now, after using the system for a while I
>> cannot load modules at boot time anymore. So I always have to disable
>> acpi, otherwise it will hang shortly after showing some progress
>> trying to load the module (it just shows a few "/", "|", backslash,
>> "-" characters, then it hangs).
>>
>> The kernel itself boots just fine. I am using the GENERIC kernel.
>
> Can you try updating loader (and if that works boot2 (disklabel64 -B))
> and try if it works now? You can of course also use hammerread.c as a
> test before (probably better :). I can't reproduce the same setting you
> have on your HDD, but I think I've fixed the issue.

hammerread does work now. But I haven't had success with disklabel64 -B
yet. Btw, the manpage of disklabel64 doesn't mention that it can be used
to write the bootcode. So what exactly would I need to do?

disklabel64 -r -B -s /boot/boot1_64 -b /boot/boot2_64

Anything else?

Regards,

Michael

#11 Updated by dillon almost 6 years ago

:hammerread does work now. But I haven't had success with disklabel64 -B
:yet. Btw, the manpage of disklabel64 doesn't mention that it can be used
:to write the bootcode. So what exactly would I need to do?
:
: disklabel64 -r -B -s /boot/boot1_64 -b /boot/boot2_64
:
:Anything else?
:
:Regards,
:
: Michael

Just disklabel64 -B ... it should select the correct boot1 and
boot2 images automatically.

The disklabel64 boot code only supports HAMMER the way it is
currently set up. It wouldn't be too hard to make it support
both UFS and HAMMER as there is plenty of room available for
boot2 to do both, but some reworking of the ABI would be needed
to avoid symbol duplication.

-Matt
Matthew Dillon
<>

#12 Updated by mneumann almost 6 years ago

Am 12.01.2009 09:07, schrieb Matthew Dillon:
> :hammerread does work now. But I haven't had success with disklabel64 -B
> :yet. Btw, the manpage of disklabel64 doesn't mention that it can be used
> :to write the bootcode. So what exactly would I need to do?
> :
> : disklabel64 -r -B -s /boot/boot1_64 -b /boot/boot2_64
> :
> :Anything else?
> :
> :Regards,
> :
> : Michael
>
> Just disklabel64 -B ... it should select the correct boot1 and
> boot2 images automatically.

I forgot to install libstand before compiling boot1/boot2. Now it works
great. Thanks!

Regards,

Michael

Also available in: Atom PDF