Radix Sort Branch预测C ++

时间:2014-03-16 22:16:40

标签: c++ optimization branch cpu prediction

我写了一个简单的基数排序,但我想知道分支预测中的大量未命中。

这是Radix Sort

的主干
int i = 0;
do { bucket[getBucket()]++; } while(++i < n);

i = 1;
do { bucket[i] += bucket[i - 1]; } while(++i < BASEN);

i = n -1;
do { b[--bucket[getBucket()]] = a[i]; } while(--i >= 0);

n是项目数,BASEN是基数(桶数)。

它是循环的标准,我刚刚重写它们以便在汇编中看起来更好(它们每个循环只有1个跳转语句)。

这是三个循环中的最后一个是大罪人,而失误随着n的增加而增加。似乎这只是第一次遇到这个循环。

最后一个循环的assambly,它们看起来都一样:

#NO_APP
        subl    $1, %r10d
        movslq  %r10d, %rax
        leaq    (%rsi,%rax,4), %rax
        .p2align 4,,10
        .p2align 3
.L4:
        movl    (%rax), %esi
        andl    %r8d, %esi
        shrl    %cl, %esi
        movl    -120(%rsp,%rsi,4), %edi
        subl    $1, %edi
        movl    %edi, -120(%rsp,%rsi,4)
        movl    (%rax), %esi
        subq    $4, %rax
        subl    $1, %r10d
        movl    %esi, (%rdx,%rdi,4)
        jns     .L4

我认为它很容易预测何时跳跃。

这是n = 100000

的运行

时间段:时间:3712513说明:7205663 CacheFault:1 BranchMispredictions:337。

以下是完整的代码:

#include "RadixSort.h"
#include <iostream>
#include <math.h>


#define BASE 8
#define BASEN 256 // BASE ^ 2

RadixSort::RadixSort(int n) {
    b = new unsigned int[n];
}

RadixSort::~RadixSort() {}

unsigned int* RadixSort::start(unsigned int* a, unsigned int* b, int n, int mask, int shift) {
    #define getBucket() ((a[i] & mask) >> (shift * BASE))

    unsigned int bucket[BASEN] = { 0 };

    //asm(">> count buckets <<");
    int i = 0;
    do { bucket[getBucket()]++; } while(++i < n);

    //asm(">> add count prefix <<");
    i = 1;
    do { bucket[i] += bucket[i - 1]; } while(++i < BASEN);

    //asm(">> reorganize items <<");
    i = n -1;
    do { b[--bucket[getBucket()]] = a[i]; } while(--i >= 0);

    //asm(">> return items <<");
    return b;
}

unsigned int* RadixSort::Sort(unsigned int* a, int n) {
    unsigned shift = 0, mask = (unsigned int)pow(2, BASE) - 1;

    b = start(a, b, n, mask, shift);
    mask <<= BASE; shift++;
    a = start(b, a, n, mask, shift);
    mask <<= BASE; shift++;
    b = start(a, b, n, mask, shift);
    mask <<= BASE; shift++;
    a = start(b, a, n, mask, shift);

    return a;
}

以下是循环的更详细统计信息(n = 100000):

Time Block: Time: 338752 Instructions: 700082 CacheFault: 0 BranchMispredictions: 5
Time Block: Time: 2951 Instructions: 1358 CacheFault: 0 BranchMispredictions: 5
Time Block: Time: 731609 Instructions: 1100179 CacheFault: 1 BranchMispredictions: 296

Time Block: Time: 338577 Instructions: 700082 CacheFault: 0 BranchMispredictions: 5
Time Block: Time: 2917 Instructions: 1358 CacheFault: 0 BranchMispredictions: 5
Time Block: Time: 608963 Instructions: 1100083 CacheFault: 0 BranchMispredictions: 8

Time Block: Time: 338616 Instructions: 700082 CacheFault: 0 BranchMispredictions: 5
Time Block: Time: 2920 Instructions: 1358 CacheFault: 0 BranchMispredictions: 5
Time Block: Time: 595094 Instructions: 1100082 CacheFault: 0 BranchMispredictions: 5

Time Block: Time: 338611 Instructions: 700082 CacheFault: 0 BranchMispredictions: 5
Time Block: Time: 2957 Instructions: 1358 CacheFault: 0 BranchMispredictions: 5
Time Block: Time: 591652 Instructions: 1100082 CacheFault: 0 BranchMispredictions: 5

在第一次运行后,它似乎学会了

0 个答案:

没有答案