Submit #2775: [PATCH] sbin/newfs_hammer: optimize blockmap initialization loop - DragonFlyBSD - DragonFlyBSD bugtracker

Actions

Copy link

Submit #2775

closed

[PATCH] sbin/newfs_hammer: optimize blockmap initialization loop

Added by tkusumi almost 11 years ago. Updated almost 11 years ago.

Status:

Closed

Priority:

Normal

Assignee:

Category:

Userland

Target version:

Start date:

01/21/2015

Due date:

% Done:

Estimated time:

Description

sbin/newfs_hammer: optimize blockmap initialization loop

This patch optimizes blockmap initialization loop of freemap zone by newfs_hammer command.

newfs_hammer does not need to read (either actually read(2) or get cached malloc(3)ed buffer) layer 1 structure everytime within the loop while filling in layer 2 structure. The layer 1 that covers current phys_offset needs to be updated only when layer 1 bits of phys_offset has changed. In other words, phys_offset changes within lower 42 bits (lower 4TB address space) do not affect the layer 1, and newfs_hammer can reuse (keep the pointer of) the buffer until phys_offset reaches to the next layer 1 given that the buffer does not get released while running in this loop.

This reduces reading layer 1 approximately from (size_of_volume / 8MB + 1) times to (size_of_volume / 4TB + 1) times. For exammple, if the volume size is < 4TB it reads layer 1 only once for that volume which is good enough as a single layer 1 structure covers 4TB of address space.

----
Following test shows there is no diff between blockmap layout made by /sbin/newfs_hammer and this patch. It doesn't make newfs much faster as newfs is basically i/o bound process rather than cpu bound, however it should still make sense to do this shortcut.

/sbin/newfs_hammer -L TEST /dev/ad1 /dev/ad2 /dev/ad3 > /dev/null
hammer -f /dev/ad1:/dev/ad2:/dev/ad3 blockmap > out1
~/work/dragonfly/sbin/newfs_hammer/newfs_hammer -L TEST /dev/ad1 /dev/ad2 /dev/ad3 > /dev/null
hammer -f /dev/ad1:/dev/ad2:/dev/ad3 blockmap > out2
diff out1 out2 #

Since I don't have storages large enough to test whole 60 bits of address space, I've made a unittest that mimics and asserts layer 1 offset calculation within the loop (attached to this issue). It could take a while to finish as it runs through whole 60 bits.

gcc -Wall -g unittest.c -o unittest
./unittest
................................................................................................................................................................................................................................................................
success

Files

Download all files

0001-sbin-newfs_hammer-optimize-blockmap-initialization-l.patch (2.62 KB) 0001-sbin-newfs_hammer-optimize-blockmap-initialization-l.patch		tkusumi, 01/21/2015 06:44 AM
unittest.c (3.66 KB) unittest.c		tkusumi, 01/21/2015 06:44 AM

Actions

Copy link

Updated by dillon almost 11 years ago

Status changed from New to In Progress

There are a few problems here. First, look at get_buffer_data() carefully... it will automatically reuse the passed-in buffer if it matches. Secondly, you don't want to use modulo when you can use a mask. Third, the optimization makes a few too many assumptions about the structure of layer1_offset. They are correct, but if a programmer were to rewrite some of the code above it might break the optimization in the future.

So, basically... no, it's better to keep the old code in this case because get_buffer_data() is already heavily optimized to handle the situation. the additional cpu overhead is a few nanoseconds... nothing compared to the multiple microseconds-to-milliseconds the disk writes eat.

-Matt

Actions

Copy link

Updated by tkusumi almost 11 years ago

Status changed from In Progress to Closed

hi Matt,
Thanks for your review.
Changing the status to Closed.

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

DragonFlyBSD

Submit #2775

[PATCH] sbin/newfs_hammer: optimize blockmap initialization loop

Updated by dillon almost 11 years ago

Updated by tkusumi almost 11 years ago