Question

我有问题。在我的项目中，我从数据集文件中逐行逐句，每行有一个句子。然后，我应该将句子分成单词。但是我找不到这个怎么办。

这是将从数据集中读取的类代码：

class Input{
...
public:
string *word;
string *sentence;
Couple *couple;    // int x , int y  order of sentence and word
int number;
int line;
...
void readInput(string input);
}

这是读取方法的代码：

void Input::readInput(string input)
{
cout << "Reading the " << input <<endl;

ifstream infile;
infile.open(input.c_str());

    if(!infile.is_open()){
    cerr << "Unable to open file: " << input << endl << endl;
    exit(-1);
}

for(int i=0; i<line ; i++){
    getline(infile, sentence[i]);
    //infile >> sentence[i];
}

for(int j=0;j<line ;j++){
// I want to separate sentences into words
}    

infile.close();
cout << "Finished Reading the " << input <<endl;

}

Answer 1

for(int j=0; j<line; ++j)
{
    std::istringstream iss(sentence[j]);
    for (std::string w; iss >> w; )
    {
        word[number] = w;
        ++number;
    }
}

如果您不希望那些与您的文字相关联，那么您需要对标点符号做一些事情。我觉得很简单。

Answer 2

如果您使用的方法是我：

for(int j=0;j<line ;j++){
    // I want to separate sentences into words
}

我会使用正则表达式来匹配sentence[j] boost regex中的所有单词，这是我过去使用过的非常成功的库。

Answer 3

您可以尝试通过使用std :: string :: find_first_of（）查找单词结束标记来遍历表示每行的std :: string。 find_first_of的参数是用于分隔文本文件中的单词的字符集（可以是空格，句点等）。

如何在c ++中将sentence [i]（string）分成单词（string）

3 个答案: