Bug #912

panic: pmap_page_protect: illegal for unmanaged page

Added by corecode over 6 years ago. Updated over 5 years ago.

Status:ClosedStart date:
Priority:NormalDue date:
Assignee:-% Done:

0%

Category:-
Target version:-

Description

hey,

I think I found a bug somewhere in the code chain of msync(..., MS_INVALIDATE), if refering to mmap()ed device memory.

device memory has pages of type PG_FICTITIOUS, but pmap_remove_all() complains:

#if defined(PMAP_DIAGNOSTIC)
/*
* XXX this makes pmap_page_protect(NONE) illegal for non-managed
* pages!
*/
if (!pmap_initialized || (m->flags & PG_FICTITIOUS)) {
panic("pmap_page_protect: illegal for unmanaged page, va: 0x%08llx", (long long)VM_PAGE_TO_PHYS(m));
}
#endif

I guess the XXX is correct and we need to either a) avoid calling pmap_remove_all() or b) return without error.

cheers
simon

History

#1 Updated by dillon over 6 years ago

:hey,
:
:I think I found a bug somewhere in the code chain of msync(..., MS_INVALIDATE), if refering to mmap()ed device memory.
:
:device memory has pages of type PG_FICTITIOUS, but pmap_remove_all() complains:
:
:#if defined(PMAP_DIAGNOSTIC)
: /*
: * XXX this makes pmap_page_protect(NONE) illegal for non-managed
: * pages!
: */
: if (!pmap_initialized || (m->flags & PG_FICTITIOUS)) {
: panic("pmap_page_protect: illegal for unmanaged page, va: 0x%08llx", (long long)VM_PAGE_TO_PHYS(m));
: }
:#endif
:
:I guess the XXX is correct and we need to either a) avoid calling pmap_remove_all() or b) return without error.
:
:cheers
: simon

The comment itself is correct in that fictitious pages are not managed
by PV lists, so its impossible for pmap_remove_all() to function on
a fictitiuos page. pmap_remove_all() is typically called to remove
user mappings of pages undergoing write I/O to prevent the pages from
being modified by userland while the I/O is in progress.

If there is a code path calling that function on fictitious pages we
have to examine and fix it (and the 'fix' might be as you suggest, just
ignoring it for such pages). I don't think its possible to put fictitious
pages under PV management due to the physical-to-virtual translations
that might be needed in that codepath... device pages tend to be at
physical addresses that just don't work with the way the translation code
was written.

madvise() or file I/O with such mapped pages specified as the address
might cause it to hit the code path.

-Matt

#2 Updated by corecode over 6 years ago

It happens if you call msync() with an argument of MS_INVALIDATE.

So you suggest to simply return?

What I wonder is, will *any* page of a OBJT_DEVICE not be PG_FICTITIOUS? In other words: shouldn't we simply avoid the whole invalidate code path for OBJT_DEVICE objects?

I'm quite confused about the intentions of this, anyways:

from vm_map_clean:
int clean_only =
(object->type == OBJT_DEVICE) ? FALSE : TRUE;
vm_object_reference(object);
switch(current->maptype) {
case VM_MAPTYPE_NORMAL:
vm_object_page_remove(object,
OFF_TO_IDX(offset),
OFF_TO_IDX(offset + size + PAGE_MASK),
clean_only);

Obviously OBJT_DEVICE is special cased, and it seems that somebody explicitly wanted that OBJT_DEVICE mappings will be completely destroyed on this call.

But then, down this code path, pmap_remove_all() will panic on PG_FICTITOUS pages. Interesting enough, pmap_clearbit() will simply ignore PG_FICTITIOUS pages. I think I'll try the same for pmap_remove_all() now.

cheers
simon

#3 Updated by dillon over 6 years ago

:> pages under PV management due to the physical-to-virtual translations
:> that might be needed in that codepath... device pages tend to be at
:> physical addresses that just don't work with the way the translation code
:> was written.
:
:It happens if you call msync() with an argument of MS_INVALIDATE.
:
:So you suggest to simply return?
:
:What I wonder is, will *any* page of a OBJT_DEVICE not be PG_FICTITIOUS? In other words: shouldn't we simply avoid the whole invalidate code path for OBJT_DEVICE objects?
:
:I'm quite confused about the intentions of this, anyways:
:...

I *THINK* so. I am not entirely sure with regards to block devices like
raw disks that you might mmap. I think those are handled as OBJT_VNODE.

:from vm_map_clean:
: int clean_only =
: (object->type == OBJT_DEVICE) ? FALSE : TRUE;
: vm_object_reference(object);
: switch(current->maptype) {
: case VM_MAPTYPE_NORMAL:
: vm_object_page_remove(object,
: OFF_TO_IDX(offset),
: OFF_TO_IDX(offset + size + PAGE_MASK),
: clean_only);
:
:Obviously OBJT_DEVICE is special cased, and it seems that somebody explicitly wanted that OBJT_DEVICE mappings will be completely destroyed on this call.
:...
:But then, down this code path, pmap_remove_all() will panic on PG_FICTITOUS pages. Interesting enough, pmap_clearbit() will simply ignore PG_FICTITIOUS pages. I think I'll try the same for pmap_remove_all() now.
:
:cheers
: simon

Yes, I agree that's definitely the desire there, but there's no way to
do it there because fictitious pages are not tracked with PV lists.
There's no way to do it at all until the VM object itself is destroyed
in the vm_object_terminate() path.

At the very least I think we have to check for PG_MANAGED in
vm_object_page_remove_callback() and just return if the page is
not managed. PG_MANAGED is always clear if PG_FICTITIOUS is set, so
that handles the fictitious pages while at the same time allows us to
potentially manage them in the future without the new check blowing us up.

-Matt

#4 Updated by dillon over 6 years ago

p.s. what are doing that is creating a device mapping in the first place?

-Matt

#5 Updated by corecode over 6 years ago

I'm trying and porting the nvidia binary graphics driver. Right now I'm stuck debugging a memory overwrite error which seems to happen on the physical page level, but I'm not sure.

cheers
simon

#6 Updated by dillon over 6 years ago

:Matthew Dillon wrote:
:> p.s. what are doing that is creating a device mapping in the first place?
:
:I'm trying and porting the nvidia binary graphics driver. Right now I'm stuck debugging a memory overwrite error which seems to happen on the physical page level, but I'm not sure.
:
:cheers
: simon

That's gonna be really hard to port. Those NVidia binary-only drivers
are known to be full of bugs and without source there's no way to track
them down or fix them.

-Matt

#7 Updated by corecode over 6 years ago

Well, turned out it was my fault. mmap_ops carries a_result as page frame number, not as byte offset. I think that should be changed, but I have the feeling that you only recently changed this in the other way? Using byte offsets would be easier to reason about, tho.

cheers
simon

#8 Updated by dillon over 6 years ago

:Well, turned out it was my fault. mmap_ops carries a_result as page frame number, not as byte offset. I think that should be changed, but I have the feeling that you only recently changed this in the other way? Using byte offsets would be easier to reason about, tho.
:
:cheers
: simon

mmap_ops? Do you mean dev_ops/d_mmap ? Hmm... yah, looking at the
BKTR driver it looks like it's returning a page number rather then a
physical address.

I would personally prefer a physical address, and we could make that
change fairly easily. There are only a few drivers that actually
use the feature and all the device pager does is convert it back
to a physical address anyhow :-)

-Matt

Also available in: Atom PDF