Bug #2845: Hangs using dsched policys other than noop - DragonFlyBSD - DragonFlyBSD bugtracker

Actions

Copy link

Bug #2845

closed

Hangs using dsched policys other than noop

Added by jkolodzi almost 10 years ago. Updated over 9 years ago.

Status:

Closed

Priority:

Low

Assignee:

dillon

Category:

VFS subsystem

Target version:

Unverifiable

Start date:

10/15/2015

Due date:

12/31/2015

% Done:

100%

Estimated time:

8:00 h

Description

When using fq or bfq with heavy disk IO on one SATA HDD (where "heavy" means running make ~~j10 buildworld and building chromium and firefox all at the same time -~~ heavy for a single-user el-cheapo Satellite D55 laptop, really light for a server) occasionally all disk IO will stop and no new IO can be started, including a panic dump from the debugger(!) If top, ps, etc. happen to be in the cache it looks like the processes are all stuck in vnode wait status, and there seems to be no way to kick them out of that status.

The as policy, on the other hand, takes about 3 IOs to hang. I'm not sure yet if processes are also in vnode wait status there.

Putting it as low priority as the default (noop) seems to work perfectly. Target should read 4.3.CURRENT but that isn't a choice.

Actions

Copy link

Updated by deef almost 10 years ago

Similar experience here... When using bfq while making parallel buildworld, the kernel produced several hundreds of these messages:

kernel: dsched_thread_io: destroy race tdio=0xffffffe0f4c6ce00

After about an hour I've found the system unresponsive.

When trying fq during dports compiling, the kernel produced just one such message and currently running pkg(8) process got stuck. On reboot the system "gave up on 232 buffers" which resulted in corrupted pkg(8) database (ouch ;-)).

When using noop scheduler, the system doesn't report any races and runs with no problems.

Actions

Copy link

Updated by dillon over 9 years ago

Status changed from New to Feedback
Assignee set to dillon
% Done changed from 0 to 100

We are going to remove dsched entirely. It doesn't work well with SSDs and the complexity has made finding its bugs too painful. Several people have tried over the years. We will need to rethink the whole disk fairness concept/problem.

-Matt

Actions

Copy link