Question

我试图使用单线程和多线程程序比较合并排序的性能差异。使用单个线程对大小~50000的数组进行排序花费的时间为0.01x秒，而对于相同大小的数组，使用2/4/8个线程需要0.02-0.03秒。我知道，差别并不大，但我只是想知道多线程程序放缓的原因是什么？下面是单线程程序的代码（主要功能代码）：

 srand(clock());            //to seed-random numbers
 readData(A,n);
 clock_t start=clock();
 mergeSort(A,0,n-1);
 clock_t end=clock();

并且，对于多线程程序：

int n=50000;        //n is the size
int no_of_threads=4;
limit S;              //structure containing array,start and end index
srand(clock());         //to seed-random numbers
generateData(&S,n);
pthread_t id[no_of_threads];
int i=0,size=0,k=n/no_of_threads;
clock_t start=clock();
for(i=0; i<no_of_threads; i++)
{
        S.start=size,S.end=size+k-1;
        pthread_create(&id[i],NULL, sorter ,&S);
        size=size + k;
}
for(i=0; i<no_of_threads; i++)
        pthread_join(id[i],NULL);
mergeSort(S.A,0,n-1);
clock_t end=clock();

分拣机功能：

void* sorter(void *s)
{
    limit *S=(limit*)s;
    int start=S->start,end=S->end;
    mergeSort(S->A,start,end);
}

Answer 1

而不是分工，你正在做额外的工作。在每个线程中，当线程数为x时，您正在对数组的1/x进行排序。在所有线程完成之后，你再次在整个数组上调用merge sort，它将递归地将数组分区到底部并合并，忽略了子部分已经排序的事实。

您可以使用一种方法来克服这个问题，而不是再次调用mergeSort()函数，只需合并已排序的子部分，这可以在O(nx)时间内完成。

Answer 2

看起来你正在使用S的通用结构，S可能与线程创建并行更新？也许使S成为no_of_threads结构的数组，然后对每个创建线程使用S [i]。

#define no_of_threads 4
limit S[no_of_threads];
// ...
    for(i=0; i<no_of_threads; i++)
    {
        S[i].start=size,S[i].end=size+k-1;
        pthread_create(&id[i], NULL, sorter, &S[i]);
        size=size + k;
    }
// ...   after the joins, do a k-way merge (not a merge sort).

我使用自下而上的合并排序做了一段时间，而自上而下合并排序的示例使用相同的想法。对于k个线程，将数组拆分为k个部分（在我的简单示例中，我假设数组大小是k的倍数），然后合并并行排列k个部分，到目前为止，这与您的代码相同（除了常见的S）结构体）。然后我的版本使用k / 2个线程合并大小为k的运行对，每个执行2路合并，然后使用k / 4线程合并大小为2k的运行对，再次执行双向合并，....在我测试之前，我预计收益不大，因为我虽然在合并部分中紧密循环（比较两个元素，移动较小）将是内存带宽有限，但事实证明循环是cpu限制的。在具有4个内核的Intel 3770k 3.5ghz上，对于k = 4，合并排序的速度是单线程合并排序的3倍，而对于k = 8，合并排序的速度是快速的3.9倍。大多数加速是由于每个核心中的本地L1和L2缓存。链接到我之前关于此的线程，虽然它是一个带有独立主线程函数的Windows示例，但请将其视为多线程合并排序比单线程合并排序更快的概念证明。

https://codereview.stackexchange.com/questions/148025/multithreaded-bottom-up-merge-sort

用于性能测量的多线程合并排序

2 个答案: