Question

我是新来的。尝试做一些我认为应该很容易但无法开始工作的事情。我有两个文件只有

中的简单数据

FILEA

和文件B

我想要读取这两个文件，然后打印出相同的值，并忽略标题。在这种情况下，我会得到1435467 3123191，两者都会被发送到新文件中。到目前为止我有

#include <cmath>
#include <cstdlib>
#include <string>
#include <iomanip>
#include <iostream>
#include <fstream>
#include <ctime>

using namespace std;

// Globals, to allow being called from several functions

// main program

int main() {
    float A, B;

    ifstream inA("FileA"); // input stream
    ifstream inB("FileB"); // second instream
    ofstream outA("OutA.txt"); // output stream

    while (inA >> A) {
        while (inB >> B) {

            if (A == B) {
                outA << A << "\t" << B << endl;
            }
        }
    }
    return 0;
}

这只会生成一个空文档OutA 我认为这会读取一行FileA，然后循环浏览FileB直到找到匹配项，发送到OutA，然后转到FileA的下一行任何帮助将不胜感激？

Answer 1

你需要把

inB.seekg(0, inB.beg)

到外部while循环的末尾。否则，您将留在inB的末尾，并且在处理inA

的第一个条目后将不会读取任何内容

Answer 2

另一个问题可能是你在A和B上使用float。请尝试int（或string），因为浮动可能与==的行为不符。有关详细信息，请参阅此问题：What is the most effective way for float and double comparison?。

此代码适用于我的平台：

...
while (inA >> A) {
  inB.clear();
  inB.seekg(0, inB.beg);
  while (inB >> B) {
    if (A == B) {
      outA << A << "\t" << B << endl;
    }
  }
}

请注意inB.clear()和inB.seekg(...)，A和B是字符串。

顺便说一下，这种方法只适用于快速和脏的实现，对于大文件来说并不是最佳的，因为你得到N * M复杂度（N - FileA的大小，M - 大小的FileB）。通过使用哈希集，您可能会达到近似线性（N + M）的复杂性。

哈希集实现的例子（C ++ 11）：

#include <string>
#include <iostream>
#include <fstream>
#include <unordered_set>

using namespace std;

int main() {
  string A, B;

  ifstream inA("FileA"); // input stream
  ifstream inB("FileB"); // second instream
  ofstream outA("OutA.txt"); // output stream

  unordered_set<string> setA;

  while (inA >> A) {
    setA.insert(A);
  }

  while (inB >> B) {
    if (setA.count(B)) {
      outA << A << "\t" << B << endl;
    }
  }

  return 0;
}

Answer 3

两个文件都足够小以读入内存吗？

您可以尝试类似以下内容：

int main(int argc, char**argv)
{
    std::vector<std::string> a;
    std::vector<std::string> b;

    ofstream outA("OutA.txt"); // output stream
    ifstream inA("FileA"); // input stream
    ifstream inB("FileB"); // second instream

    std::string value;

    inA >> value;                        //read first line (and don't use - discarding header)
    while (inA >> A) { a.push_back(A);}  //populate first vector
    inB >> value;                        //read first line (and don't use - discarding header)
    while (inB >> B) { b.push_back(B);}  //populate first vector

    //std::sort will perform a pretty efficient sort
    std::sort(a.begin(),a.end());
    std::sort(b.begin(),b.end());

    //now that it is sorted, comparing is easier
    for (std::vector<std::string>::iterator ita=a.begin(), std::vector<std::string>::iterator itb=b.begin(); ita!=a.end(), itb!=b.end();)
    {
        if(*ita > *itb)
            itb++;
        else if(*ita < *itb)
            ita++;
        else
            outA << *ita <<'\n';
    }
    return 0;
}

将两个文件读入内存，对它们进行排序，然后进行比较。比较只需要遍历每个文件一次，这极大地降低了复杂性O(a+b)而不是O(a*b)。当然，排序会产生开销，但对于较大的文件，这应该更有效，对于较短的文件，它应该足够快。（除非比较小文件的批次和批次）。我相信std :: sort最糟糕的情况是O(aloga + blogb)，这比O(a*b)

更好

Answer 4

最后我修好了它

#include <cmath>
#include <cstdlib>
#include <string>
#include <iomanip>
#include <iostream>
#include <fstream>
#include <ctime>

using namespace std;

//Globals, to allow being called from several functions


//main program

int main() {
string A, B;

    ifstream inA("FileA.txt"); //input stream
    ifstream inB("FileB.txt") ;//second instream 
    ofstream outA("OutA.txt"); //output stream

while(inA>>A){//take in first stream
        while(inB>>B){//whilst thats happening take in second stream

                if (A==B){//do they match? If so then send out the value 
                    outA<<A<<"\t"<<B<<endl; //THIS IS JUST SHOW A DOES = B!
                }

                    }//end of B loop
            inB.clear();//now clear the second stream (B)
            inB.seekg(0, inB.beg);//return to start of stream B
    }//move onto second input in stream A, and repeat
return 0;
}

比较两个文件并发出相等的值

4 个答案: