Question

我刚开始使用C ++进行OpenMP。我在C ++中的序列代码如下所示：

#include <iostream>
#include <string>
#include <sstream>
#include <vector>
#include <fstream>
#include <stdlib.h>

int main(int argc, char* argv[]) {
    string line;
    std::ifstream inputfile(argv[1]);

    if(inputfile.is_open()) {
        while(getline(inputfile, line)) {
            // Line gets processed and written into an output file
        }
    }
}

因为每条线都是独立处理的，所以我试图使用OpenMP来并行化，因为输入文件的大小是千兆字节。所以我猜我首先需要获取输入文件中的行数，然后以这种方式并行化代码。有人可以帮帮我吗？

#include <iostream>
#include <string>
#include <sstream>
#include <vector>
#include <fstream>
#include <stdlib.h>

#ifdef _OPENMP
#include <omp.h>
#endif

int main(int argc, char* argv[]) {
    string line;
    std::ifstream inputfile(argv[1]);

    if(inputfile.is_open()) {
        //Calculate number of lines in file?
        //Set an output filename and open an ofstream
        #pragma omp parallel num_threads(8)
        {
            #pragma omp for schedule(dynamic, 1000)
            for(int i = 0; i < lines_in_file; i++) {
                 //What do I do here? I cannot just read any line because it requires random access
            }
        }
    }
}

修改

重要事项

每条线都是独立处理的
结果顺序无关紧要

Answer 1

不是直接的OpenMP答案 - 但您可能正在寻找的是Map/Reduce方法。看一下Hadoop - 它是用Java完成的，但至少有一些C ++ API。

一般情况下，您希望在不同的机器上处理这一数量的数据，而不是在同一进程中的多个线程中处理（虚拟地址空间限制，缺少物理内存，交换等）。内核也必须带来无论如何顺序的磁盘文件（你想要的 - 否则硬盘只需要为你的每个线程做额外的搜索）。

当线路被独立处理时，如何并行输入文件中的读取线？

1 个答案: