std :: thread - 逐行读取文件

时间:2014-11-06 13:01:24

标签: c++ multithreading file-io line-by-line

我想从输出文件中逐行并行读取。每个线程读取一行然后处理数据。与此同时,下一个线程必须读取下一行。

std::ifstream infile("test.txt");
std::mutex mtx;

void read(int id_thread){
   while(infile.good()){
     mtx.lock();
     std::string sLine;
     getline(infile, sLine);
     std::cout << "Read by thread: " << id_thread;
     std::cout << sLine << std::endl;
     mtx.unlock();
   }
}

void main(){
  std::vector<std::thread> threads;
  for(int i = 0; i < num; i++){
     threads.push_back(std::thread(parallelFun, i));
  }

  for(auto& thread : threads){
      thread.join();
  }
  return 0;
}

当我运行此代码时,我得到了这个: 第一个线程读取所有行。如何让每个线程读取一行?

enter image description here

修改

正如评论中所提到的,我需要做的就是更大的测试文件。 谢谢你们!

3 个答案:

答案 0 :(得分:5)

我会将循环改为

while(infile.good()){
     mtx.lock();
     std::string sLine;
     getline(infile, sLine);
     mtx.unlock();
     std::cout << "Read by thread: " << id_thread;
     std::cout << sLine << std::endl;
   }

你的std :: cout东西是你想要在以后交换真实代码的测试循环的繁忙部分。这使得其他线程有时间启动。此外,使您的测试文件。线程初始化需要花费一些时间来完成第一个线程占用所有数据的情况并不罕见。

答案 1 :(得分:2)

如果您希望5个线程完全读取每5行,则必须同步读取,因此每个线程必须知道前一个线程已完成读取其部分。此要求可能会导致巨大的效率低下,因为某些线程可能需要等待很长时间才能运行。

概念代码,未经检验的使用风险自负。

让我们首先创建一个默认类来处理原子锁。我们将其对齐以避免错误共享和相关的缓存乒乓。

constexpr size_t CACHELINESIZE = 64; // could differ on your architecture
template<class dType>
class alignas(CACHELINESIZE) lockstep {
  std::atomic<dType> lock = dType(0);

public:
  // spinlock spins until the previous value is prev and then tries to set lock to value
  // until success, restart the spin if prev changes.
  dType Spinlock(dType prev = dType(0), dType next = dType(1)) {
     dType expected = prev;
     while (!lock.compare_exchange_weak(expected, next)) { // request for locked-exclusiv ~100 cycles?
       expected = prev;  // we wish to continue to wait for expected
       do {
         pause(); // on intel waits roughly one L2 latency time.
       } while(lock.load(std::memory_order_relaxed) != prev);  // only one cache miss per change
     }
     return expected;
  }

  void store(dType value) {
    lock.store(value);
  }
};

lockstep<int> lock { 0 };

constexpr int NoThreads = 5;

std::ifstream infile("test.txt");

void read(int id_thread) {
   locks[id_thread].lock = id_thread;
   bool izNoGood = false;
   int next = id_thread;

   while(!izNoGood){
     // get lock for next iteration
     lock.spinlock(next, next); // wait on our number

     // moved file check into locked region     
     izNoGood = !infile.good();
     if (izNoGood) {
       lock.store(next+1); // release next thread to end run.
       return;
     }

     std::string sLine;
     getline(infile, sLine);

     // release next thread
     lock.store(next+1);

     // do work asynchronous
     // ...

     // debug log, hopefully the whole line gets written in one go (atomic)
     // but can be in "random" order relative to other lines.
     std::cout << "Read by thread: " << id_thread << " line no. " << next
               << " text:" << sLine << std::endl;  // endl flushes cout, implicit sync?
     next += NoThreads;  // our next expected line to process
   }
}

void main() {
  std::vector<std::thread> threads;
  for(int i = 0; i < NoThreads; i++) {
     threads.push_back(std::thread(parallelFun, i));
  }

  for(auto& thread : threads){
      thread.join();
  }
  return 0;
}

答案 2 :(得分:1)

如果您希望每个线程读取一行(从您的描述中显而易见),请删除while循环,然后您需要确保您拥有与文件中行数相同的线程数。 / p>

要摆脱上述限制,您可以使用boost threadpool。