循环展开多维数组

时间:2013-11-16 00:39:55

标签: c performance optimization loop-unrolling

我最近尝试在这个多维数组中展开内部i和j循环,但是filter-> get(i,j)总是弄乱图像的纹理。任何人都可以协助我展开i和j循环吗?感谢。

我的尝试:

double
applyFilter(struct Filter *filter, cs1300bmp *input, cs1300bmp *output)
{

     long long cycStart, cycStop;

     cycStart = rdtscll();

    output -> width = input -> width;
    output -> height = input -> height;
int a = filter -> getDivisor();
int n = filter -> getSize();
for (int plane = 0; plane < 3; plane++){
    for(int row = 1; row < (input -> height) - 1 ; row = row + 1) {
        for(int col = 1; col < (input -> width) - 1; col = col + 1) {
            int value = 0;
            int val1, val2;
            for (int j = 0; j < n; j++) {
                for (int i = 0; i < n; i+=2) {
                    val1 = val1 + input -> color[plane][row + i - 1][col + j - 1]
                    * filter -> get(i, j);
                    val2 = val2 + input -> color[plane][row + i][col + j -1] * filter->get(i+1,j);
                }
            }
            value = (val1 + val2) / a;
            if ( value  < 0 ) { value = 0; }
            if ( value  > 255 ) { value = 255; }
            output -> color[plane][row][col] = value;
        }

    }
}

 cycStop = rdtscll();
 double diff = cycStop - cycStart;
 double diffPerPixel = diff / (output -> width * output -> height);
 fprintf(stderr, "Took %f cycles to process, or %f cycles per pixel\n",
  diff, diff / (output -> width * output -> height));

 return diffPerPixel;
}

原件:

int a = filter -> getDivisor();
int n = filter -> getSize();    
for (int plane = 0; plane < 3; plane++){
    for(int row = 1; row < (input -> height) - 1 ; row = row + 1) {
        for(int col = 1; col < (input -> width) - 1; col = col + 1) {
            int value = 0;
            for (int j = 0; j < n; j++) {
                for (int i = 0; i < n; i++) {
                    value = value + input -> color[plane][row + i - 1][col + j - 1]
                    * filter -> get(i, j);
                }
            }
            value = value / a;
            if ( value  < 0 ) { value = 0; }
            if ( value  > 255 ) { value = 255; }
            output -> color[plane][row][col] = value;

2 个答案:

答案 0 :(得分:0)

如果n是2的倍数,则只有正确的方法。否则你会错过一行。

增加:

首先,我刚刚意识到您忘记初始化val1val2这可能是您遇到问题的主要原因。

其次,在我看来,您的代码是专门为过滤器大小3编写的:

  • 对于较小的过滤器,您根本不能访问边框。
  • 对于较大的,您可以访问图片之外的位置,例如 [row + i - 1]大于或等于input->height

如果您只想使用大小为3的过滤器,那么我只需完全展开内部循环。否则,请检查行和列值的边界。

现在,对于循环展开,我建议您进行谷歌搜索,因为您可以找到很多关于如何正确执行此操作的示例。可以在wikipedia page找到一个。

在您的情况下,最简单的解决方案是:

int value = 0;
int val1=0, val2=0;
for (int j = 0; j < n; j++) {
    for (int i = 0; i < n-1; i+=2) {
        val1 = val1 + input->color[plane][row+i-1][col+j-1] * filter->get(i  ,j);
        val2 = val2 + input->color[plane][row+i  ][col+j-1] * filter->get(i+1,j);
    }
    if (n%2 !=0) {
        val1 = val1 + input->color[plane][row+n-2][col+j-1] * filter->get(n-1,j);
    }
}
value = (val1 + val2) / a;

如果您想要更多地展开循环,将采用更通用的方式(例如4):

int value = 0;
int val1=0, val2=0, val3=0, val4=0;
for (int j = 0; j < n; j++) {

    for (int i = 0; i < n-3; i+=4) {
        val1 = val1 + input->color[plane][row+i-1][col+j-1] * filter->get(i  ,j);
        val2 = val2 + input->color[plane][row+i  ][col+j-1] * filter->get(i+1,j);
        val3 = val3 + input->color[plane][row+i+1][col+j-1] * filter->get(i+2,j);
        val4 = val4 + input->color[plane][row+i+2][col+j-1] * filter->get(i+3,j);
    }
    switch (n % 4) {
        case 3: val1+=input->color[plane][row+n-4][col+j-1] * filter->get(i+n-3,j);
        case 2: val1+=input->color[plane][row+n-3][col+j-1] * filter->get(i+n-2,j);
        case 1: val1+=input->color[plane][row+n-2][col+j-1] * filter->get(i+n-1,j);
    }
    value = (val1 + val2 + val3 + val4) / a;
}

注意:
请注意,根据过滤器的大小,使用的编译器和编译器选项以及您的系统,上述解决方案可能不会加速您的代码,但甚至会降低它的速度。您还应该知道编译器通常可以为您循环展开(例如,使用gcc中的-funroll-loops选项),如果它有意义的话。

答案 1 :(得分:0)

尝试用:

替换内部循环
int value = 0;
int val1 = 0, val2 = 0;
for (int j = 0; j < n; j++) { 
    int i;
    for (i = 0; i < n; i+=2) {
        val1 += input->color[plane][row+i-1][col+j-1] * filter->get(i,j);
        val2 += input->color[plane][row+i  ][col+j-1] * filter->get(i+1,j);
    } 
    if (i < n)
        val1 += input->color[plane][row+i-1][col+j-1] * filter->get(i,j);
} 
value = (val1 + val2) / a;