需要在c ++中创建一个单词匹配器

时间:2014-03-20 18:39:08

标签: c++

需要创建一个单词匹配器,用于计算文本文件中提到特定单词的次数。这是我到目前为止所做的,不知道我做错了什么。 1个文本文件包含一个长段,另一个只包含几个单词。我需要比较两个文本文件,例如单词“和”在短文本文件中。需要将其与长段进行比较,看看这些单词出现的时间,然后在程序结束时显示一个报告。

E.g和 - 6tmes,但是 - 0次,它 - 23次。

^^这样的事情。不知道如何开始制作这个

#include <iostream>
#include <fstream>
#include <string>
using namespace std;
int main()
{
    ifstream infile("text1.txt");
    if(!infile)
    {
        cout << "Error";
    }
    string words[250];
    int counter = 0;
    while (!infile.eof() )
    {
        infile >> words[counter];

        counter++;
    }
    ifstream infile2("banned.txt");
    if(!infile2)
    {
        cout << "Error";
    }
    string bannedwords[250];
    counter = 0;
    while (!infile2.eof() )
    {
        infile2 >> words[counter];
        counter++;
    }
    int eatcount= 0;
    int orcount = 0;
    int hellocount = 0;
    int number;
    for(int i=0; i<200; i++)
    {
        for(int j = 0; j < 8; j++)
        {
            if ( words[i] == bannedwords[j])
            {
                cout << words[i] << " ";
                if (words[i]=="eat")
                {
                    eatcount++;
                }
                else if (words[i] == "or")
                {
                    orcount++;
                }
                else if (words[i]== "hello")
                {
                    hellocount++;
                }

            }

        }

    }
    cout << endl;
    cout<< "eat was found "<<eatcount<<" times";
    cout << endl;
    cout<< "or was found "<<orcount<<" times";
    cout << endl;
    cout<< "hello was found "<<hellocount<<" times";
    system("pause");
}

2 个答案:

答案 0 :(得分:0)

为什么不使用std :: multiset?

ifstream infile("text1.txt");
if(!infile)
{
    cout << "Error";
}
std::multiset<string> words;
string tmp;
while (!infile.eof() )
{
    infile >> tmp;
    words.insert(tmp);
}

然后还使用地图作为禁止的单词:

ifstream infile2("banned.txt");
if(!infile2)
{
    cout << "Error";
}
std::map<string, int> banned;
string tmp;
while (!infile2.eof() )
{
    infile2 >> tmp;
    banned.insert(tmp);
}

然后你可以使用std :: multiset :: count(string)来查找没有所有额外循环的单词。您只需要一个循环来浏览您的禁止单词列表。 e.g:

std::map<string, int>::iterator bannedwordIter = bannedwords.begin();
for( ; bannedwordIter != bannedwords.end(); ++bannedwordIter )
{
  bannedwordIter->second = words.count(bannedwordIter->first);

  // you could print here as you process, or have another loop that prints it all after you finish
  cout << bannedwordIter->first << " - " << bannedwordIter->second << " times." << endl;
}

答案 1 :(得分:0)

最小的方法是使用正则表达式,如此

#include <iostream>
#include <fstream>
#include <string>
#include <regex>

using namespace std;

unsigned countMatches(std::istream &is, std::string const &word)
{
    string text;
    unsigned count(0);    
    std::regex  const expression(word);
    while (getline(is, text)) {
        count += distance(sregex_iterator(
            text.begin(), text.end(), expression), sregex_iterator());
    }
    return count;
}

因此您只需将输入流(在您的情况下为输入文件流)传递给它,并在创建与该单词匹配的正则表达式后计算指定单词的出现次数

int main()
{
    ifstream ifs;
    ifs.open("example_text_file.txt");
    cout << countMatches(ifs, "word_you_want_to_search_for") << endl;
    return 0;
}