Question

我有一个程序在大型数组上执行相同的功能。我将数组分成相等的块并将它们传递给线程。目前线程执行函数并返回它们应该的内容，但是我添加的线程越多，每个线程运行的时间就越长。这完全否定了并发的目的。我尝试使用std::thread和std::async两者获得相同的结果。在下面的图像中，所有子线程和主线程处理的数据量大小相同（主要有6个点），但是主线程在~12秒内运行的子线程大约需要12个线程数，就好像他们是异步运行的。但它们都是在同一时间开始的，如果我从每个线程输出它们并发运行。这与他们如何加入有关吗？我已经尝试了我能想到的一切，非常感谢任何帮助/建议！在示例代码中，main不会运行该函数，直到子线程完成后，如果我在主运行后放入连接，它仍然不会运行直到子线程完成。下面您可以看到使用3和5个线程运行时的运行时。这些时间是在缩小的数据集上进行测试。

void foo(char* arg1, long arg2, std::promise<std::vector<std::vector<std::vector<std::vector<std::vector<long>>>>>> & ftrV) {
  std::vector<std::vector<std::vector<std::vector<std::vector<long>>>>> Grid;

  // does stuff....
  // fills in "Grid"

  ftrV.set_value(Grid);
}


int main(){

  int thnmb = 3;    // # of threads
  std::vector<long> buffers;    // fill in buffers
  std::vector<char*> pointers;  //fill in pointers 

  std::vector<std::promise<std::vector<std::vector<std::vector<std::vector<std::vector<long>>>>>>> PV(thnmb); // vector of promise grids
  std::vector<std::future<std::vector<std::vector<std::vector<std::vector<std::vector<long>>>>>>> FV(thnmb);    // vector of futures grids
  std::vector<std::thread> th(thnmb);   // vector of threads
  std::vector<std::vector<std::vector<std::vector<std::vector<std::vector<long>>>>>> vt1(thnmb);    // vector to store thread grids

  for (int i = 0; i < thnmb; i++) {

    th[i] = std::thread(&foo, pointers[i], buffers[i], std::ref(PV[i]));
  }
  for (int i = 0; i < thnmb; i++) {
    FV[i] = PV[i].get_future();
  }

  for (int i = 0; i < thnmb; i++) {
    vt1[i] = FV[i].get();
  }

  for (int i = 0; i < thnmb; i++) {
    th[i].join();
  }

  // main performs same function as foo here

  // combine data
  // do other stuff..

  return(0);
}

Answer 1

如果不知道foo做了什么，很难给出明确的答案，但您可能会遇到内存访问问题。对5维数组的每次访问都需要5次内存查找，并且只需要2或3个具有内存访问权限的线程就可以使典型系统可以提供的内容饱和。

main应该在创建线程之后但在获得承诺值之前执行foo工作。

并且foo应该以{{1}}结尾，以便不必制作该数组的副本。

缩放线程运行时的{c + +并发问题

1 个答案: