我写了一个简单的基数排序,但我想知道分支预测中的大量未命中。
这是Radix Sort
的主干int i = 0;
do { bucket[getBucket()]++; } while(++i < n);
i = 1;
do { bucket[i] += bucket[i - 1]; } while(++i < BASEN);
i = n -1;
do { b[--bucket[getBucket()]] = a[i]; } while(--i >= 0);
n是项目数,BASEN是基数(桶数)。
它是循环的标准,我刚刚重写它们以便在汇编中看起来更好(它们每个循环只有1个跳转语句)。
这是三个循环中的最后一个是大罪人,而失误随着n的增加而增加。似乎这只是第一次遇到这个循环。
最后一个循环的assambly,它们看起来都一样:
#NO_APP
subl $1, %r10d
movslq %r10d, %rax
leaq (%rsi,%rax,4), %rax
.p2align 4,,10
.p2align 3
.L4:
movl (%rax), %esi
andl %r8d, %esi
shrl %cl, %esi
movl -120(%rsp,%rsi,4), %edi
subl $1, %edi
movl %edi, -120(%rsp,%rsi,4)
movl (%rax), %esi
subq $4, %rax
subl $1, %r10d
movl %esi, (%rdx,%rdi,4)
jns .L4
我认为它很容易预测何时跳跃。
这是n = 100000
的运行时间段:时间:3712513说明:7205663 CacheFault:1 BranchMispredictions:337。
以下是完整的代码:
#include "RadixSort.h"
#include <iostream>
#include <math.h>
#define BASE 8
#define BASEN 256 // BASE ^ 2
RadixSort::RadixSort(int n) {
b = new unsigned int[n];
}
RadixSort::~RadixSort() {}
unsigned int* RadixSort::start(unsigned int* a, unsigned int* b, int n, int mask, int shift) {
#define getBucket() ((a[i] & mask) >> (shift * BASE))
unsigned int bucket[BASEN] = { 0 };
//asm(">> count buckets <<");
int i = 0;
do { bucket[getBucket()]++; } while(++i < n);
//asm(">> add count prefix <<");
i = 1;
do { bucket[i] += bucket[i - 1]; } while(++i < BASEN);
//asm(">> reorganize items <<");
i = n -1;
do { b[--bucket[getBucket()]] = a[i]; } while(--i >= 0);
//asm(">> return items <<");
return b;
}
unsigned int* RadixSort::Sort(unsigned int* a, int n) {
unsigned shift = 0, mask = (unsigned int)pow(2, BASE) - 1;
b = start(a, b, n, mask, shift);
mask <<= BASE; shift++;
a = start(b, a, n, mask, shift);
mask <<= BASE; shift++;
b = start(a, b, n, mask, shift);
mask <<= BASE; shift++;
a = start(b, a, n, mask, shift);
return a;
}
以下是循环的更详细统计信息(n = 100000):
Time Block: Time: 338752 Instructions: 700082 CacheFault: 0 BranchMispredictions: 5
Time Block: Time: 2951 Instructions: 1358 CacheFault: 0 BranchMispredictions: 5
Time Block: Time: 731609 Instructions: 1100179 CacheFault: 1 BranchMispredictions: 296
Time Block: Time: 338577 Instructions: 700082 CacheFault: 0 BranchMispredictions: 5
Time Block: Time: 2917 Instructions: 1358 CacheFault: 0 BranchMispredictions: 5
Time Block: Time: 608963 Instructions: 1100083 CacheFault: 0 BranchMispredictions: 8
Time Block: Time: 338616 Instructions: 700082 CacheFault: 0 BranchMispredictions: 5
Time Block: Time: 2920 Instructions: 1358 CacheFault: 0 BranchMispredictions: 5
Time Block: Time: 595094 Instructions: 1100082 CacheFault: 0 BranchMispredictions: 5
Time Block: Time: 338611 Instructions: 700082 CacheFault: 0 BranchMispredictions: 5
Time Block: Time: 2957 Instructions: 1358 CacheFault: 0 BranchMispredictions: 5
Time Block: Time: 591652 Instructions: 1100082 CacheFault: 0 BranchMispredictions: 5
在第一次运行后,它似乎学会了