nmalloc returns addresses without regard as to whether those addresses

will result in cache bank conflicts; cache bank conflicts will result

in wasting a fair amount of a CPU's load bandwidth, as seen in the

Himeno matrix benchmark, among others.

commit 8120f5e2a46e669c06a7afdd7de60fa6d6996f9d added simple cache

coloring to nmalloc for 32KB allocations, offsetting them by 4KB. While

it does work (and restores Himeno's performance), it can be substantially

improved on. We should look at doing so.

