Question

我正在尝试为素数制作一个计算器。

程序将n除以n之外的每个数字。如果除法的余数仅为0（不包括1的除法），则该数字为素数。在程序开始时，要求用户键入一个数字，然后对每个数字进行计算，直到用户输入一个数字。

这是一个非并行任务，但我试图通过划分核心之间的数字来使其并行。

这是在线程之间划分任务的代码段。

void division(int number)
{
    int ithread[8]{};
    int sum = 0;
    cout << "Preparation..";
    /* Calculating how many numbers the checker will check. */
    ithread[0] = (int)number*0.125;
    ithread[1] = (int)number*0.125;
    ithread[2] = (int)number*0.125;
    ithread[3] = (int)number*0.125;
    ithread[4] = (int)number*0.125;
    ithread[5] = (int)number*0.125;
    ithread[6] = (int)number*0.125;
    ithread[7] = (int)number*0.125;

    /* Calculating from what number the checkers will begin.
    the first thread will begin from 0. the second checker will begin from the last number the first                            
    did. The third will begin from the sum of numbers checked by first and second and so on. */

    thread thread0(noprint, ithread[0], 0);
    sum += ithread[0];
    thread thread1(noprint, ithread[1], sum);
    sum += ithread[1];
    thread thread2(noprint, ithread[2], sum);
    sum += ithread[2];
    thread thread3(noprint, ithread[3], sum);
    sum += ithread[3];
    thread thread4(noprint, ithread[4], sum);
    sum += ithread[4];
    thread thread5(noprint, ithread[5], sum);
    sum += ithread[5];
    thread thread6(noprint, ithread[6], sum);
    sum += ithread[6];
    thread thread7(noprint, ithread[7], sum);
    thread0.join();
    cout << "thread1";
    thread1.join();
    cout << "thread2";
    thread2.join();
    cout << "thread3";
    thread3.join();
    cout << "thread4";
    thread4.join();
    cout << "thread5";
    thread5.join();
    cout << "thread6";
    thread6.join();
    cout << "thread7";
    thread7.join();
    cout << "thread8";

}

问题是，有些线程在其他线程之前结束，这对于大数字来说可能是个大问题。例如，4只需要检查2的两倍，而8则需要两倍于4。因此，如果我要求程序检查所有数字直到100万，第一个线程将从0到125000检查，这对于现在的CPU来说是一个非常简单的任务。第二个是从125000到250000进行检查，因此难度是其两倍，依此类推。

现在我正在寻找两个答案：如果你知道，请告诉我如何在线程之间平均分配负载。 2.请解释如何制作，以便用户可以选择线程数。我已经想象过如何使线程选择最多可以达到64个线程（好吧，实际上它甚至可以用于1个trilion线程，它只需要很多IF和1万亿个数字阵列）问题不在代码中，它在数学本身。我不知道如何将工作平分为8个内核，更不用说可变数量的内核了。

Answer 1

不要试图在开始时一次性完成工作 - 线程不可预测。您无法控制操作系统将在每个核心上放置的其他负载，以及您可能认为“等于”＃34;根据代码，工作负载实际上可能会有很大差异。相反，将工作负荷分成大量小得多的单位，并在完成前一个时让每个线程从下一个开始，直到完成所有线程。

至于让用户指定线程数，究竟是什么让你坚持下去？向用户询问一个数字，然后产生那么多线程似乎是一件简单的事情。但是，大多数多线程程序都不这样做。最好向系统查询它可以运行多少个线程（例如std::thread::hardware_concurrency），并使用它。

另外，另一方面，你检查质数的算法是非常低效的 - 可能这只是一个学习练习而不是严肃的代码？如果没有，你可能想看看其他算法 - 检查素数是一个研究得很好的问题。

但JBentley，如果我喜欢你，你会说操作不会 simultanious。是的，应用程序将使用不同的线程，但是另一种方式，那是什么意思？不会是一样的只使用一个线程？我很新，很抱歉，如果我错了。 - 亚历克斯

不，它仍然是平行的。您有一个共享数据结构，它跟踪最后分配的工作块。这可以像int一样简单，其中包含已检查的最后一个数字。当线程用完时，它会开始处理下一个N个数字，并将int增加适当的数量。执行此操作时，您需要注意多个线程不能同时使用共享变量 - C ++或第三方库中有各种机制可用于管理它。

伪代码：

lastChecked = 1
thread 1: lock lastChecked
thread 1: lastChecked = 10
thread 1: unlock lastChecked
thread 1: start working on numbers 1 to 10
thread 2: lock lastChecked
thread 2: lastChecked = 20
thread 2: unlock lastChecked
thread 2: start working on numbers 11 to 20
thread 1: complete work
thread 1: lock lastChecked
thread 1 lastChecked = 30
thread 1: unlock lastChecked
thread 1: start working on numbers 21 to 30
// etc.

注意：您应该仔细选择每个工作单位的大小。使它太大，你开始回到原来的问题，其中一些线程可能比其他线程晚完成。使它太小，并且当其他线程正在使用它时，你增加了线程等待太多而无法访问共享状态的风险，并且你花费了太多时间来分配每个工作负载的开销。

Answer 2

在本网站上查看pthreads库

https://computing.llnl.gov/tutorials/pthreads/

Answer 3

查看关于boss-worker模型的部分。

http://www-01.ibm.com/software/network/dce/library/publications/appdev/html/APPDEV14.HTM#HDRWQ168

此线程模型可能非常适用于您的问题。

Answer 4

检查素数的一种方法是使用三个阶段。初始阶段仅使用一个线程生成一个素数数组，最多为某个值p。下一阶段使用数组来检查最多p ^ 2（p平方）的素数，在线程之间均匀地划分从p到p ^ 2的范围。每个线程创建它自己的新找到的素数数组。线程完成后，将数组连接到原始数组，并将p设置为找到的最高素数。然后重复该循环直到p ^ 2> = n。然后最后阶段使用数组来检查n是否为素数。

C ++，如何在非并行任务中完全划分cpu工作

4 个答案: