Question

任何人都可以帮助我删除禁用词。我不能......在运行后仍然出现！

#include <iostream>
#include <cmath>
#include <fstream>
#include <cstdlib>
using namespace std;

int main()
{
char filename[50];    //open file
ifstream example;
cin.getline(filename , 50);
example.open(filename);
if(!example.is_open())
{
exit(EXIT_FAILURE);
}
char word[50];
example>>word;
while (example.good()&&word!="a"&& word!="an"&&word!="be"&& word!="at"&&  word!="the")
{
cout <<word<<" "; // remove stopwords
example>>word;

}

system("PAUSE");
return 0;
}

任何人都可以帮助我删除禁用词。我不能......在运行后仍然出现！

Answer 1

您无法将C字符串与==运算符进行比较。解决您问题的最简单方法是使用std::string：

string word;
example >> word;
while (example.good() && word != "a" && word != "an" && word != "be" && word != "at" && word != "the")
{
    cout << word << " "; // remove stopwords
    example >> word;
}

另一方面，这实际上并不会删除所有，就像你所说的那样，停用词。它将只打印所有单词，直到读取第一个“停用词”，然后整个循环将停止。

Answer 2

问题是你正在使用C风格的字符串，这些字符串很难正确使用。最简单的选择是使用C ++字符串库：

#include <string>

std::string word;

并且您的程序的其余部分应该按预期工作。这还可以防止在输入字太长时程序将遇到的可怕的堆栈损坏错误。

如果您真的想要将字符数组用于教育目的，那么您需要使用C字符串库来比较它们：

#include <cstring>

if (std::strcmp(word, "a") != 0 && ...)

您的代码将包含输入字的数组的地址与字符串文字的地址进行比较;这些永远不会是平等的。

Answer 3

删除stopwords时，不仅要删除其中一些。

此外，您应该将Porter算法应用于您的代码段。

如果您想查看已过滤的文字，则必须应用Porter Stemmer关于字符串相似性。

是的，它在C中，但只应用几个单词（比如你的问题）并不是一个足够的停用词删除程序。如果除了删除停用词之外你真的想干掉C代码会给你一个印象。这取决于目的。

在2008年完成了两个文本片段的过滤。两者都是相关的。

HTH

Answer 4

打开警告的合格编译器将为您解决问题。以下是我的说法：

warning: result of comparison against a string literal is unspecified (use strncmp instead)
      [-Wstring-compare]
    while (example.good()&&word!="a"&& word!="an"&&word!="be"&& word!="at"&&  word!="the")
                               ^ ~~~

C ++代码中的停用词删除

4 个答案: