Question

我想调查多线程与单线程相比，根据任务大小完成任务的速度快多少

我绘制的图表显示：

x_axis：在单个线程上完成任务的速度。
y轴：在两个线程上执行相同任务的速度要快多少。

我期望发生什么：

如果任务变得更长，创建线程的开销变得不那么重要了。所以比率（t_single / t_multi）增加
因为我使用两个线程，我会将比率（t_single / t_multi）表示收敛到2（两个线程=＆gt;一个线程的两倍）

我得到了什么：

单线程任务时间10e-2秒的峰值
峰值为2.5（多处理比单线程快2.5倍）

如何解释？

创建的图表平均超过10个meassurements。我在24核Linux机器上运行它。

CODE：

#include <string>
#include <iostream>
#include <thread>
#include <vector>
#include <ctime>
#include <math.h>
#include <chrono>

using namespace std;
using namespace std::chrono;

// function searches through vector and adds 1
// to the first element that equals 0
void task(int number)
{
    int s = 0;
    for(int i=0; i<number; i++){
        s = s + i;
    }
    // cout << "the sum is " << s << endl;
}

double get_time_single(int m){

    // init
    int n_threads = 2;
    int n = pow(10, m);

    high_resolution_clock::time_point start = high_resolution_clock::now();

    for(int jobs = 0; jobs < n_threads; jobs++){
        task(n);
    }

    high_resolution_clock::time_point end = high_resolution_clock::now();
    double time_single = duration<double, std::milli>(end - start).count();

    return time_single;
}

double get_time_multi(int m){

    // init
    int n_threads = 2;
    int n = pow(10, m);
    vector<thread> threads;

    high_resolution_clock::time_point start = high_resolution_clock::now();

        // execute threads
        for( int i = 1; i < n_threads + 1; i++ ){
            threads.push_back(thread(task, n));
        }

        // joint threads
        for( int i = 0; i < n_threads; i++ ){
            threads.at(i).join();
        }

        high_resolution_clock::time_point end = high_resolution_clock::now();
        double time_multi = duration<double, std::milli>(end - start).count();

    return time_multi;
}


int main()
{

    // print header of magnitude - multi-proc-time - single-proc-time table
    cout << "mag" << "\t" << "time multi" << "  \t" << "time single" << endl;
    cout << "-------------------------------------" << endl;

    // iterate through different task magnitudes
    for(int m = 3; m<10; m++){

        double t_single = 0;
        double t_multi  = 0;

        // get the mean over 10 runs
        for(int i = 0; i < 10; i++){
            t_multi = t_multi + get_time_multi(m);
            t_single = t_single + get_time_single(m);
        }

        t_multi = t_multi / 10;
        t_single = t_single / 10;

        cout << m << "\t" << t_multi << "  \t" << t_single << endl;

    }
}

输出：

mag     time multi      time single
-------------------------------------
3       0.133946        0.0082684
4       0.0666891       0.0393378
5       0.30651         0.681517
6       1.92084         5.19607
7       18.8701         41.1431
8       195.002         381.745
9       1866.32         3606.08

Answer 1

那么，当您的任务在5ms内完成时，您获得了峰值MT性能？在Linux中，最大。时间片是sysctl_sched_latency，通常是6ms，可能是相关的。

关于您的设置的更多内容。

在进行微观基准测试时，人们通常使用最快的值，而不是平均值。

在C ++中编写这些外部循环也是一个坏主意，因为CPU缓存（数据缓存和微型缓存）。更好的是，在命令行参数中传递参数，编写脚本以多次调用您的应用程序并在某处收集结果。

更新：通常，每个线程的理想任务时间是您在使用所有CPU核心时始终可以承受的最大值，并且满足其他要求（例如延迟）。

每个线程理想的任务时间？

1 个答案: