Question

我的理解是，当存在大量争用时，无锁结构会做得更好，如果争用率低，则锁定数据结构会更好。

为了测试，我编写了以下代码：

#include<thread>
#include<chrono>
#include<iostream>
#include<vector>
#include<stack>
#include<mutex>
#include<fstream>
#include <boost/lockfree/stack.hpp>
using namespace std;
mutex mut;

const static int totalNumberOfWorkItems = 100000;
const static int maxNumberOfThreads = 2000;
const static int threadIncrement = 5;

chrono::milliseconds calcRawSpawnTime(int n) {
    auto start = chrono::high_resolution_clock::now();
    vector<thread> ts;
    int j = 0;
    for (int i = 0; i < n; i++)
        ts.push_back(thread([&](){j += i; }));
    for (auto&& t : ts)
        t.join();
    auto end = chrono::high_resolution_clock::now();
    return chrono::duration_cast<chrono::milliseconds>(end - start);
}


chrono::milliseconds timeNThreadsLock(int n, int worksize){
    stack<int> data;
    vector<thread> ts;
    auto startSpawn = chrono::high_resolution_clock::now();
    for (int i = 0; i < n; i++)
        ts.push_back(thread([&]() {
        for (int j = 0; j < worksize; j++){
            mut.lock();
            data.push(7);
            mut.unlock();
        }
    }));
    auto startWait = chrono::high_resolution_clock::now();
    for (auto&& t : ts)
        t.join();
    auto endWait = chrono::high_resolution_clock::now();
    return chrono::duration_cast<chrono::milliseconds>(endWait - startSpawn);
}

chrono::milliseconds timeNThreadsLockFree(int n, int worksize)
{
    boost::lockfree::stack<int> data;
    vector<thread> ts;
    auto startSpawn = chrono::high_resolution_clock::now();
    for (int i = 0; i < n; i++)
        ts.push_back(thread([&](){
        for (int j = 0; j < worksize; j++)
            data.push(7);
    }));
    auto startWait = chrono::high_resolution_clock::now();
    for (auto&& t : ts)
        t.join();
    auto endWait = chrono::high_resolution_clock::now();
    return chrono::duration_cast<chrono::milliseconds>(endWait - startSpawn);
}
int main(int argc, char* argv [])
{
    ofstream lockFile("locklog.log");
    ofstream lockFreeFile("lockfreelog.log");
    ofstream spawnTimes("spawnTimes.log");
    for (int i = 1; i < maxNumberOfThreads; i += threadIncrement){
        cout << i << endl;
        spawnTimes << i << ",\t" << calcRawSpawnTime(i).count() << endl;
        lockFreeFile << i << ",\t" << timeNThreadsLockFree(i, totalNumberOfWorkItems / i).count() << endl;
        lockFile << i << ",\t" << timeNThreadsLock(i, totalNumberOfWorkItems / i).count() << endl;
    }
    return 0;
}

问题是我的无锁数据结构时间开始如下： enter image description here 。

我怀疑问题是线程创建时间（更多线程显然不是常量），但减去线程创建时间给出了这个图： enter image description here

这显然是错误的。

有关如何正确对此进行基准测试的任何想法？

Answer 1

我建议你实际上不测量时间，而是测量操作次数。所以我认为你可以启动所有线程，然后让主线程休眠一段时间（我认为1秒或更长时间是可以接受的）。在测试中，每个线程应该有一个到该线程的私有整数，它由您在数据结构上进行的每个操作递增。（如果需要，每个操作可以有不同的计数器。）

然后你可以用类似

的方式运行测试

int x = random() % 1000; //just some granularity
if(x > 500) {
    some_test_on_the_data_structure();
} else { //you can adjust the limits to perform different number of each operation.
    other_test_on_the_data_structure();
}

顺便说一句，这是我对我的数据结构和其他多线程基准测试的常规测试。

Answer 2

您是否考虑将此测试构建为吞吐量基准，而不是加速基准？

对于每个线程计数，请在一段时间内尽可能多地重复操作，并将吞吐量计算为“操作/周期”。这样做的好处是可以轻松调整实验时间（要检查的线程数*周期〜=实验时间）

您可以使用屏障来保存创建的线程，直到所有内容都准备就绪。

另外还有一点，这个基准测试中没有任何内容可以让你控制争用;这可能是你想要的东西。

这样做取决于您对用例的看法，以及您正在测试的数据结构。

在这种特殊情况下，一种方法是创建一个测试数据结构的数组。当您进行插入时，随机改变此数组中的线程。阵列越长，争用越少。这为数据结构建模，您需要“减少”操作才能获得最终值 - 对于某些任务而言，这可能是完全足够的，尤其是在一致性要求不高的情况下。

在简单堆栈/向量之外的许多其他数据结构中，争用最终成为输入的函数 - 例如对于HashMap，键的范围及其概率分布，您可以将其作为基准测试者控制。

正确的基准锁定与锁定数据结构的方法

2 个答案: