Question

我在磁盘上有10000多个文本文件。我的任务是，找到出现在这些文件中的最常见的三字单词序列。我逐字逐句地读取文件，并为每个文件中的每个序列增加全局std::map<std::string, int>。最后，我对地图进行排序并选择最常见的地图。我设法为它编写代码，但我读到我可以通过读取另一个线程中的每个文件来提高速度。

我的第一个想法是运行与文件一样多的线程，但我的程序从6s减慢到80s。

我的第二个想法是创建一些运行例如20个线程的线程池，等待它们完成（.join()）并开始接下来的20个线程（一遍又一遍）。这使我的程序运行时间从6s提升到4s。

但这不是最快的方式，因为主线程等到所有20个线程完成它们的工作后，我们为那些完成工作的线程提供了空间，为1-19。

我的问题是，如何在完成上一个文件的工作后立即实现这样的线程池？

我的第二个想法代码：

std::vector<std::thread> threads;

char byThreadPool = 20;
int nFileCount = 10495;

for (int i = 0; i < nFileCount; i += byThreadPool)
{
    for (int j = i; j < i+byThreadPool && j < nFileCount; j++)
    {
        std::string fileName = path + std::to_string(j) + PAGE_EXTENSION;
        threads.push_back(std::thread(&CWordParserFileSystem::FetchFile, this, fileName));
    }

    for (int j = 0; j < threads.size(); j++)
        threads[j].join();

    threads.clear();
}

C ++ 11多线程 - 将多个文件读入一个地方

0 个答案: