Question

我的CPU有四个核心，MAC os。我使用4个线程来计算数组。但计算的时间并没有减少。如果我不使用多线程，计算时间约为52秒。但即使我使用4个多线程，或2个线程，时间也不会改变。

（我知道为什么会发生这种情况。问题是我使用clock（）来计算时间。在多线程程序中使用它时是错误的，因为这个函数会根据线程数来实时多次。当我使用time（）来计算时间时，结果是正确的。）使用2个线程的输出：

id 1 use time = 43 sec to finish 
id 0 use time = 51 sec to finish 
time for round 1 = 51 sec
id 1 use time = 44 sec to finish 
id 0 use time = 52 sec to finish 
time for round 2 = 52 sec

id 1和id 0是线程1和线程0. time for round是完成两个线程的时间。如果我不使用多线程，time for round也是大约52秒。这是调用4个线程的部分：

 for(i=1;i<=round;i++)
{
    time_round_start=clock();
    for(j=0;j<THREAD_NUM;j++)
    {
        cal_arg[j].roundth=i;
        pthread_create(&thread_t_id[j], NULL, Multi_Calculate, &cal_arg[j]);
    }
    for(j=0;j<THREAD_NUM;j++)
    {
        pthread_join(thread_t_id[j], NULL);
    }
    time_round_end=clock();
    int round_time=(int)((time_round_end-time_round_start)/CLOCKS_PER_SEC);

        printf("time for round %d = %d sec\n",i,round_time);

}

这是线程函数内的代码：

void *Multi_Calculate(void *arg)
{
struct multi_cal_data cal=*((struct multi_cal_data *)arg);
int p_id=cal.thread_id;
int i=0;
int root_level=0;
int leaf_addr=0;
int neighbor_root_level=0;
int neighbor_leaf_addr=0;
Neighbor *locate_neighbor=(Neighbor *)malloc(sizeof(Neighbor));

printf("id:%d, start:%d end:%d,round:%d\n",p_id,cal.start_num,cal.end_num,cal.roundth);

for(i=cal.start_num;i<=cal.end_num;i++)
{

    root_level=i/NUM_OF_EACH_LEVEL;
    leaf_addr=i%NUM_OF_EACH_LEVEL;

    if(root_addr[root_level][leaf_addr].node_value!=i)
    {
        //ignore, because this is a gap, no this node
    }
    else
    {
        int k=0;
        locate_neighbor=root_addr[root_level][leaf_addr].head;
        double tmp_credit=0;

        for(k=0;k<root_addr[root_level][leaf_addr].degree;k++)
        {

            neighbor_root_level=locate_neighbor->neighbor_value/NUM_OF_EACH_LEVEL;
            neighbor_leaf_addr=locate_neighbor->neighbor_value%NUM_OF_EACH_LEVEL;


            tmp_credit += root_addr[neighbor_root_level][neighbor_leaf_addr].g_credit[cal.roundth-1]/root_addr[neighbor_root_level][neighbor_leaf_addr].degree;

            locate_neighbor=locate_neighbor->next;

        }
        root_addr[root_level][leaf_addr].g_credit[cal.roundth]=tmp_credit;

    }
}



return 0;
}

数组非常大，每个线程计算数组的一部分。我的代码有问题吗？

Answer 1

这可能是一个错误，但是如果你觉得代码是正确的，那么并行化，互斥体等的开销可能意味着整体性能（运行时）与非并行化代码相同，对于大小要计算的元素数量。

对于非常大的数组（100k元素？）进行循环代码，单线程和线程代码，这可能是一项有趣的研究，并且看看结果是否在并行/线程中开始变得更快码？

Amdahl's law, also known as Amdahl's argument,[1] is used to find the maximum expected improvement to an overall system when only part of the system is improved. It is often used in parallel computing to predict the theoretical maximum speedup using multiple processors.

https://en.wikipedia.org/wiki/Amdahl%27s_law

Answer 2

通过多线程化程序并不总能获得速度。线程带来一定的开销。除非在非线程代码中存在足够的低效率来弥补开销，否则您将看不到改进。即使您编写的程序运行速度较慢，也可以了解多线程的工作原理。

Answer 3

我知道为什么会发生这种情况。问题是我使用clock（）来计算时间。在多线程程序中使用它是错误的，因为该函数将根据线程数量实时多次。当我使用time（）来计算时间时，结果是正确的。

使用多线程计算数据但不会减少时间

3 个答案: