
时间:2016-05-30 23:14:42

标签: c performance optimization



int     j;

        for (j = 0; j < ARRAY_SIZE; j += 8) {
            sum += array[j] + array[j+1] + array[j+2] + array[j+3] + array[j+4] + array[j+5] +  array[j+6] + array[j+7];



int     j;

    for (j = 0; j < ARRAY_SIZE; j += 8) {
        sum0 += array[j] + array[j+1]; 
        sum1 += array[j+2] + array[j+3];
        sum2 += array[j+4] + array[j+5]; 
        sum3 += array[j+6] + array[j+7];



我只知道我运行它的机器(因为它是学校购买的服务)是一台32位,远程,基于Intel的Linux虚拟服务器,我认为它运行的是Red Hat。 / p>




    #include <stdio.h>
#include <stdlib.h>

// You are only allowed to make changes to this code as specified by the comments in it.

// The code you submit must have these two values.
#define N_TIMES     600000
#define ARRAY_SIZE   10000

int main(void)
    double  *array = calloc(ARRAY_SIZE, sizeof(double));
    double  sum = 0;
    int     i;

    // You can add variables between this comment ...

//  double sum0 = 0;
//  double sum1 = 0;
//  double sum2 = 0;
//  double sum3 = 0;

    // ... and this one.

    // Please change 'your name' to your actual name.
    printf("CS201 - Asgmt 4 - ACTUAL NAME\n");

    for (i = 0; i < N_TIMES; i++) {

        // You can change anything between this comment ...

        int     j;

        for (j = 0; j < ARRAY_SIZE; j += 8) {
            sum += array[j] + array[j+1] + array[j+2] + array[j+3] + array[j+4] + array[j+5] +  array[j+6] + array[j+7];

        // ... and this one. But your inner loop must do the same
        // number of additions as this one does.


    // You can add some final code between this comment ...
//  sum = sum0 + sum1 + sum2 + sum3;
    // ... and this one.

    return 0;


    #include <stdio.h>
#include <stdlib.h>

// You are only allowed to make changes to this code as specified by the comments in it.

// The code you submit must have these two values.
#define N_TIMES     600000
#define ARRAY_SIZE   10000

int main(void)
    double  *array = calloc(ARRAY_SIZE, sizeof(double));
    double  sum = 0;
    int     i;

    // You can add variables between this comment ...

    double sum0 = 0;
    double sum1 = 0;
    double sum2 = 0;
    double sum3 = 0;

    // ... and this one.

    // Please change 'your name' to your actual name.
    printf("CS201 - Asgmt 4 - ACTUAL NAME\n");

    for (i = 0; i < N_TIMES; i++) {

        // You can change anything between this comment ...

        int     j;

        for (j = 0; j < ARRAY_SIZE; j += 8) {
            sum0 += array[j] + array[j+1]; 
            sum1 += array[j+2] + array[j+3];
            sum2 += array[j+4] + array[j+5]; 
            sum3 += array[j+6] + array[j+7];

        // ... and this one. But your inner loop must do the same
        // number of additions as this one does.


    // You can add some final code between this comment ...
    sum = sum0 + sum1 + sum2 + sum3;
    // ... and this one.

    return 0;



int     j;
        for (j = 0; j < ARRAY_SIZE; j += 50) {
            sum +=(((((((array[j] + array[j+1]) + (array[j+2] + array[j+3])) +
                    ((array[j+4] + array[j+5]) + (array[j+6] + array[j+7]))) + 
                    (((array[j+8] + array[j+9]) + (array[j+10] + array[j+11])) +
                    ((array[j+12] + array[j+13]) + (array[j+14] + array[j+15])))) +
                    ((((array[j+16] + array[j+17]) + (array[j+18] + array[j+19]))))) +
                    (((((array[j+20] + array[j+21]) + (array[j+22] + array[j+23])) +
                    ((array[j+24] + array[j+25]) + (array[j+26] + array[j+27]))) + 
                    (((array[j+28] + array[j+29]) + (array[j+30] + array[j+31])) +
                    ((array[j+32] + array[j+33]) + (array[j+34] + array[j+35])))) +
                    ((((array[j+36] + array[j+37]) + (array[j+38] + array[j+39])))))) + 
                    ((((array[j+40] + array[j+41]) + (array[j+42] + array[j+43])) +
                    ((array[j+44] + array[j+45]) + (array[j+46] + array[j+47]))) + 
                    (array[j+48] + array[j+49])));

2 个答案:

答案 0 :(得分:2)


    for (j = 0; j < ARRAY_SIZE; j += 16) {
        sum = sum +
              (array[j   ] + array[j+ 1]) +
              (array[j+ 2] + array[j+ 3]) +
              (array[j+ 4] + array[j+ 5]) +
              (array[j+ 6] + array[j+ 7]) +
              (array[j+ 8] + array[j+ 9]) +
              (array[j+10] + array[j+11]) +
              (array[j+12] + array[j+13]) +
              (array[j+14] + array[j+15]);





    int     j1, j2;

    j1 = 0;
    do {
        j2 = j1 + 20;
        sum = sum +
              (array[j1   ] + array[j1+ 1]) +
              (array[j1+ 2] + array[j1+ 3]) +
              (array[j1+ 4] + array[j1+ 5]) +
              (array[j1+ 6] + array[j1+ 7]) +
              (array[j1+ 8] + array[j1+ 9]) +
              (array[j1+10] + array[j1+11]) +
              (array[j1+12] + array[j1+13]) +
              (array[j1+14] + array[j1+15]) +
              (array[j1+16] + array[j1+17]) +
              (array[j1+18] + array[j1+19]);
        j1 = j2 + 20;
        sum = sum +
              (array[j2   ] + array[j2+ 1]) +
              (array[j2+ 2] + array[j2+ 3]) +
              (array[j2+ 4] + array[j2+ 5]) +
              (array[j2+ 6] + array[j2+ 7]) +
              (array[j2+ 8] + array[j2+ 9]) +
              (array[j2+10] + array[j2+11]) +
              (array[j2+12] + array[j2+13]) +
              (array[j2+14] + array[j2+15]) +
              (array[j2+16] + array[j2+17]) +
              (array[j2+18] + array[j2+19]);
    while (j1 < ARRAY_SIZE);


答案 1 :(得分:0)


  • 没有优化,对于整数索引为1的循环,简单sum +=。我的 64位 2011 MacBook Pro上花了16.4秒。

  • gcc -O2,相同的代码,下降到5.46秒。

  • gcc -O3,相同的代码,下降到5.45秒。

  • 我尝试在sum变量中使用带有8向加法的代码。这降低到了2.03秒。

  • 我将它加倍到16位加法到sum变量中,这使它降低到1.91秒。

  • 我把它加倍到sum变量的32路加法。时间到了2.08秒。

  • 我按照@kcraigie的建议切换到指针式方法。使用-O3时,时间为6.01秒。 (对我来说非常惊讶!)

    register double * p;
    for (p = array; p < array + ARRAY_SIZE; ++p) {
        sum += *p;
  • 我将for循环更改为while循环,使用sum += *p++并将时间缩短为5.64秒。

  • 我将while循环更改为倒数而不是up,时间上升到5.88秒。

  • 我改回了for循环,增加了8个整数索引,添加了8个寄存器双和[0-7]变量,并在[0中为N添加了_array [j + N]到sumN, 7]。将_array声明为寄存器double * const初始化为array,重要的是它的重要性。这时间缩短到1.86秒。

  • 我改为扩展到10,000个+ _array [n]副本的宏,其中N为常量。然后我做sum = tnKX(addsum)并且编译器因分段错误而崩溃。因此,纯粹的内联方法无法发挥作用。

  • 我切换到一个扩展到10,000 sum += _array[n]个副本的宏,其中N为常量。那跑了6.63秒!显然,加载所有代码的开销会降低内联的效率。

  • 我尝试声明static double _array[ARRAY_SIZE];然后使用__builtin_memcpy在第一个循环之前复制它。通过8路并行添加,这导致2.96秒的时间。我不认为静态数组是要走的路。 (伤心 - 我希望不变的地址会成为赢家。)

从这一切来看,似乎应该是16路内联或8路并行变量。您必须在自己的平台上尝试这一点以确保 - 我不知道更广泛的架构将对数字做什么。



int ntimes = 0;

// ... and this one.
    // You can change anything between this comment ...

            if (ntimes++ == 0) {

将运行时间减少到&lt; 0.01秒;-)如果你没有被F-stick击中,那就是胜利者。