Project

General

Profile

Submit #2775

[PATCH] sbin/newfs_hammer: optimize blockmap initialization loop

Added by tkusumi over 6 years ago. Updated over 6 years ago.

Status:
Closed
Priority:
Normal
Assignee:
-
Category:
Userland
Target version:
-
Start date:
01/21/2015
Due date:
% Done:

0%

Estimated time:

Description

sbin/newfs_hammer: optimize blockmap initialization loop

This patch optimizes blockmap initialization loop of freemap zone by newfs_hammer command.

newfs_hammer does not need to read (either actually read(2) or get cached malloc(3)ed buffer) layer 1 structure everytime within the loop while filling in layer 2 structure. The layer 1 that covers current phys_offset needs to be updated only when layer 1 bits of phys_offset has changed. In other words, phys_offset changes within lower 42 bits (lower 4TB address space) do not affect the layer 1, and newfs_hammer can reuse (keep the pointer of) the buffer until phys_offset reaches to the next layer 1 given that the buffer does not get released while running in this loop.

This reduces reading layer 1 approximately from (size_of_volume / 8MB + 1) times to (size_of_volume / 4TB + 1) times. For exammple, if the volume size is < 4TB it reads layer 1 only once for that volume which is good enough as a single layer 1 structure covers 4TB of address space.

----
Following test shows there is no diff between blockmap layout made by /sbin/newfs_hammer and this patch. It doesn't make newfs much faster as newfs is basically i/o bound process rather than cpu bound, however it should still make sense to do this shortcut.

  1. /sbin/newfs_hammer -L TEST /dev/ad1 /dev/ad2 /dev/ad3 > /dev/null
  2. hammer -f /dev/ad1:/dev/ad2:/dev/ad3 blockmap > out1
  3. ~/work/dragonfly/sbin/newfs_hammer/newfs_hammer -L TEST /dev/ad1 /dev/ad2 /dev/ad3 > /dev/null
  4. hammer -f /dev/ad1:/dev/ad2:/dev/ad3 blockmap > out2
  5. diff out1 out2 #

Since I don't have storages large enough to test whole 60 bits of address space, I've made a unittest that mimics and asserts layer 1 offset calculation within the loop (attached to this issue). It could take a while to finish as it runs through whole 60 bits.

  1. gcc -Wall -g unittest.c -o unittest
  2. ./unittest
    ................................................................................................................................................................................................................................................................
    success

Files

History

#1

Updated by dillon over 6 years ago

  • Status changed from New to In Progress

There are a few problems here. First, look at get_buffer_data() carefully... it will automatically reuse the passed-in buffer if it matches. Secondly, you don't want to use modulo when you can use a mask. Third, the optimization makes a few too many assumptions about the structure of layer1_offset. They are correct, but if a programmer were to rewrite some of the code above it might break the optimization in the future.

So, basically... no, it's better to keep the old code in this case because get_buffer_data() is already heavily optimized to handle the situation. the additional cpu overhead is a few nanoseconds... nothing compared to the multiple microseconds-to-milliseconds the disk writes eat.

-Matt

#2

Updated by tkusumi over 6 years ago

  • Status changed from In Progress to Closed

hi Matt,
Thanks for your review.
Changing the status to Closed.

Also available in: Atom PDF