Project

General

Profile

Actions

Bug #2845

closed

Hangs using dsched policys other than noop

Added by jkolodzi over 9 years ago. Updated about 9 years ago.

Status:
Closed
Priority:
Low
Assignee:
Category:
VFS subsystem
Target version:
Start date:
10/15/2015
Due date:
12/31/2015
% Done:

100%

Estimated time:
8:00 h

Description

When using fq or bfq with heavy disk IO on one SATA HDD (where "heavy" means running make j10 buildworld and building chromium and firefox all at the same time - heavy for a single-user el-cheapo Satellite D55 laptop, really light for a server) occasionally all disk IO will stop and no new IO can be started, including a panic dump from the debugger(!) If top, ps, etc. happen to be in the cache it looks like the processes are all stuck in vnode wait status, and there seems to be no way to kick them out of that status.

The as policy, on the other hand, takes about 3 IOs to hang. I'm not sure yet if processes are also in vnode wait status there.

Putting it as low priority as the default (noop) seems to work perfectly. Target should read 4.3.CURRENT but that isn't a choice.

Actions #1

Updated by deef over 9 years ago

Similar experience here... When using bfq while making parallel buildworld, the kernel produced several hundreds of these messages:

kernel: dsched_thread_io: destroy race tdio=0xffffffe0f4c6ce00

After about an hour I've found the system unresponsive.

When trying fq during dports compiling, the kernel produced just one such message and currently running pkg(8) process got stuck. On reboot the system "gave up on 232 buffers" which resulted in corrupted pkg(8) database (ouch ;-)).

When using noop scheduler, the system doesn't report any races and runs with no problems.

Actions #2

Updated by dillon about 9 years ago

  • Status changed from New to Feedback
  • Assignee set to dillon
  • % Done changed from 0 to 100

We are going to remove dsched entirely. It doesn't work well with SSDs and the complexity has made finding its bugs too painful. Several people have tried over the years. We will need to rethink the whole disk fairness concept/problem.

-Matt

Actions #3

Updated by marino about 9 years ago

  • Status changed from Feedback to Closed

dsched removed in master

Actions

Also available in: Atom PDF