Question

我正在运行以下代码：

当我使用1个子进程运行此代码时：我得到以下时间信息：

（我使用/ usr / bin / time ./job 1运行）

5.489u 0.090s 0：05.58 99.8％（1个工作正在运行）

当我运行6个子进程时：我得到了以下

74.731u 0.692s 0：12.59 599.0％（6个并行作业）

我正在运行实验的机器有6个核心，198 GB的RAM，并且该机器上没有其他任何东西在运行。

我希望在6个并行运行的作业时，用户时间报告为6次。但它远不止于此（13.6次）。我的问题来自于用户时间的增加来自哪里？是因为在并行运行6个作业的情况下，多个核心更频繁地从一个内存位置跳到另一个内存位置？或者还有其他我想念的东西。

由于

#define MAX_SIZE 7000000
#define LOOP_COUNTER 100

#define simple_struct struct _simple_struct
simple_struct {
    int n;
    simple_struct *next;
};

#define ALLOCATION_SPLIT 5
#define CHAIN_LENGTH 1
void do_function3(void)
{
    int i = 0, j = 0, k = 0, l = 0;
    simple_struct **big_array = NULL;
    simple_struct *temp = NULL;

    big_array = calloc(MAX_SIZE + 1, sizeof(simple_struct*));


    for(k = 0; k < ALLOCATION_SPLIT; k ++) {
        for(i =k ; i < MAX_SIZE; i +=ALLOCATION_SPLIT) {
            big_array[i] = calloc(1, sizeof(simple_struct));
            if((CHAIN_LENGTH-1)) {
                for(l = 1; l < CHAIN_LENGTH; l++) {
                    temp = calloc(1, sizeof(simple_struct));
                    temp->next = big_array[i];
                    big_array[i] = temp;
                }
            }
        }
    }

    for (j = 0; j < LOOP_COUNTER; j++) {
        for(i=0 ; i < MAX_SIZE; i++) {
            if(big_array[i] == NULL) {
                big_array[i] = calloc(1, sizeof(simple_struct));
            }
            big_array[i]->n = i * 13;
            temp = big_array[i]->next;
            while(temp) {
                temp->n = i*13;
                temp = temp->next;
            }
        }
    }
}

int main(int argc, char **argv)
{
    int i, no_of_processes = 0;
    pid_t pid, wpid;
    int child_done = 0;
    int status;
    if(argc != 2) {
        printf("usage: this_binary number_of_processes");
        return 0;
    }

    no_of_processes = atoi(argv[1]);

    for(i = 0; i < no_of_processes; i ++) {
        pid = fork();

        switch(pid) {
            case -1:
                printf("error forking");
                exit(-1);
            case 0:
                do_function3();
                return 0;
            default:
                printf("\nchild %d launched with pid %d\n", i, pid);
                break;
        }
    }

    while(child_done != no_of_processes) {
        wpid = wait(&status);
        child_done++;
        printf("\nchild done with pid %d\n", wpid);
    }

    return 0;
}

Answer 1

首先，您的基准测试有点不寻常。通常，在对并发应用程序进行基准测试时，可以比较两个实现：

解决尺寸S问题的单线程版本;
具有N个线程的多线程版本，合作解决大小S的问题;在你的情况下，每个都解决了S / N大小的问题。

然后划分执行时间以获得speedup。

如果您的加速时间是：

大约1：并行实现具有与单线程实现类似的性能;
高于1（通常在1和N之间），并行化应用程序可提高性能;
低于1：并行化应用程序会损害性能。

对性能的影响取决于多种因素：

您的算法可以并行化的程度。见Amdahl's law。这里不适用。
线程间通信的开销。这里不适用。
线程间同步的开销。这里不适用。
争夺CPU资源。这里不应该适用（因为线程数等于核心数）。但是，HyperThreading可能会受到伤害。
争用内存缓存。由于线程不共享内存，因此会降低性能。
争用访问主内存。这会降低性能。

您可以使用profiler衡量最后2个。查找缓存未命中和停止的指令。

多CPU作业中的用户时间增加

1 个答案: