Bug #1774

New IP header cleanup branch available for testing

Added by dillon over 4 years ago. Updated over 4 years ago.

Status:NewStart date:
Priority:NormalDue date:
Assignee:-% Done:

0%

Category:-
Target version:-

Description

My leaf repo now has a branch called 'iphdr'. Here's the git remote
setup:

[remote "dillon"]
url = git://leaf.dragonflybsd.org/~dillon/dragonfly.git
fetch = +refs/heads/*:refs/remotes/dillon/*

git fetch dillon
git branch iphdr dillon/iphdr
git checkout iphdr

I have done basic testing with ipfw, pf, ipv4 fragmentation,
ipv4 TCP, a little bridging, if_tap, and ipv4 UDP.

Tons of other things haven't been tested yet. ipv6, bpf filters,
carp, more extensive ipfw2 and pf, and so forth. Donn't bother
with IPSEC, it has other issues which will have to be solved in
the next stage.

Only do testing if you have some experience with the interfaces
in question.

This branch contains the network byte ordering and adjustment fixes for
ip_len, ip_off, and a large chunk of the protosw cleanup. I am going
to leave the branch alone except for bug fixes.

--

I am now starting work on another branch that will contain a rewrite
of the whole ether/ip demux and dispatch mechanic. All protocols
will become per-cpu and there will be just one protocol support
thread for each cpu. NETISRs will go away and be replaced with a
more direct netmsg queueing operation. The toeplitz hash will be
integrated into the demux mechanic (probably integrated with
ip_lengthcheck()), so the cpu switch will occur even before a
protocol is handed the packet. Fast-forwarding will still occur
in the demux and be able to bypass netmsg queueing.

Hopefully when all is said and done the combination of the stage 1
fixes now available and this stage 2 work will give us a huge degree
of flexibility with regards to managing packet mbufs, including an
ability to trivially requeue them (something IPSEC and other tunnel
implementations need to be able to do).

This second stage will probably take a week at least.

-Matt

History

#1 Updated by pavalos over 4 years ago

I'm seeing differing behavior, but basically the machine deadlocks after I
login via SSH. Sometimes I get a panic, and sometimes it just deadlocks.
Here's a panic I got:

LWKT_WAIT_IPIQ WARNING! 0 wait 1 (-1)
panic: LWKT_WAIT_IPIQ
mp_lock = 00000001; cpuid = 0
Trace beginning at frame 0xd7f1ec70
panic(ffffffff) at panic+0x14f
panic(c0382e3d,297,ff808000,d7cd2e10,ff800000) at panic+0x14f
lwkt_wait_ipiq(ff808000,8e6a,ff808000,c01df72c,d7cd2e10,0) at
lwkt_wait_ipiq+0xd7
callout_stop(d7cd2e10,ff800000,246,d7ba78d0,d7ec83cc) at callout_stop+0xd8
ahd_done(d7e2b4b8,d7cd2db8) at ahd_done+0xaf
ahd_run_qoutfifo(d7e2b4b8,0,ff800000,d7f1ed84,c01a463d) at
ahd_run_qoutfifo+0xd5
ahd_platform_intr(d7e2b4b8,0) at ahd_platform_intr+0x10d
ithread_handler(a,0,0,0,0) at ithread_handler+0x171
lwkt_exit() at lwkt_exit
boot() called on cpu#0
Uptime: 1m33s

Freezed right after that. Unable to get a vmcore. After that, I rebooted and
it just froze right after I logged in via SSH. Had to power cycle to get it
back. Machine is completely unusable running your iphdr branch.

#2 Updated by dillon over 4 years ago

:Peter Avalos <> added the comment:
:
:I'm seeing differing behavior, but basically the machine deadlocks after I
:login via SSH. Sometimes I get a panic, and sometimes it just deadlocks.
:Here's a panic I got:

Ugh. This panic is completely outside the networking paths, which
kinda implies maybe memory corruption somewhere. This is going to
be difficult to track down.

-Matt
Matthew Dillon
<>

#3 Updated by pavalos over 4 years ago

On Sun, May 30, 2010 at 09:00:36PM -0700, Matthew Dillon wrote:
> :Peter Avalos <> added the comment:
> :
> :I'm seeing differing behavior, but basically the machine deadlocks after I
> :login via SSH. Sometimes I get a panic, and sometimes it just deadlocks.
> :Here's a panic I got:
>
> Ugh. This panic is completely outside the networking paths, which
> kinda implies maybe memory corruption somewhere. This is going to
> be difficult to track down.
>

Just to be clear, with a normal master, the box runs fine. Let me know
what I can do to help.

--Peter

#4 Updated by thomas.nikolajsen over 4 years ago

Works here.

I have tested it on UP & SMP systems, using e.g NFS & SSH,
haven't seen any issues.

A minor thing is that kernels doesn't build on iphdr branch:
git cherry-pick 3e2720774f6ba4b654a4212f2b8d9dc65585fba7 fixes this
(http://leaf.dragonflybsd.org/mailarchive/commits/2010-05/msg00082.html).

#5 Updated by dillon over 4 years ago

:Thomas Nikolajsen <> added the comment:
:
:Works here.
:
:I have tested it on UP & SMP systems, using e.g NFS & SSH,
:haven't seen any issues.
:
:A minor thing is that kernels doesn't build on iphdr branch:
:git cherry-pick 3e2720774f6ba4b654a4212f2b8d9dc65585fba7 fixes this
:(http://leaf.dragonflybsd.org/mailarchive/commits/2010-05/msg00082.html).

Ok, I'll merge the latest crater/master into iphdr so it is synced
up.

-Matt
Matthew Dillon
<>

#6 Updated by dillon over 4 years ago

I have re-merged master into my iphdr branch on leaf. It could use
some testing again.

I haven't found anything that would explain the corruption Peter
originally reported in this bug#, is it possible that it might have
been a kernel misbuild?

In anycase, the branch is synchronized with head now and I would like
to have another go at testing.

-Matt
Matthew Dillon
<>

#7 Updated by thomas.nikolajsen over 4 years ago

Works here;
minor testing, build(7) using NFS, w/ SMP kernel.

-thomas

#8 Updated by sjg over 4 years ago

This has been on my test box for a couple of days now, no failures or odd
behavior or etc. to report.

dhclient exercises bpf, so it works (the basics, anyway).

Also available in: Atom PDF