Question

这是另一个我似乎无法找到答案的问题，因为我能找到的每个例子都使用向量而我的老师不会让我们使用这个类的向量。

我需要使用（任意数量）空格来一次读一本书的纯文本版本 ' '和（任意数量）非字母字符作为分隔符;所以任何数量的空格或标点符号都需要分隔单词。以下是我只需要使用空格作为分隔符时的方法：

while(getline(inFile, line)) {
    istringstream iss(line);

    while (iss >> word) {
        table1.addItem(word);
    }
}

编辑：读入文本的示例，以及我如何将其分开。

＆＃34;如果他们知道的话;;你希望它，娱乐。会有＆＃34;

以下是第一行需要分开的方式：

如果

他们

有

已知

在

希望

它
     的


娱乐

将

有

该文本至少包含所有标准标点符号，但也包含省略号...双短划线--等。

一如既往，提前谢谢。

编辑：

所以使用第二个字符串流会是这样的吗？

while(getline(inFile, line)) {
    istringstream iss(line);

    while (iss >> word) {
        istringstream iss2(word);

        while(iss2 >> letter)  {
            if(!isalpha(letter))
                // do something?
        }
        // do something else?
        table1.addItem(word);
    }
}

Answer 1

我没有对此进行测试，因为我现在面前没有g ++编译器，但它应该可以工作（除了较小的C ++语法错误）

while (getline(inFile, line))
{
    istringstream iss(line);

    while (iss >> word)
    {
        // check that word has only alpha-numeric characters
        word.erase(std::remove_if(word.begin(), word.end(), 
                                  [](char& c){return !isalnum(c);}),
                   word.end());
        if (word != "")
            table1.addItem(word);
    }
}

Answer 2

如果您可以自由使用Boost，则可以执行以下操作：

$ cat kk.txt
If they had known;; you ... wished it, the entertainment.would have

如果需要，您可以自定义tokenizer的行为，但默认值应该足够。

#include <iostream>
#include <fstream>
#include <string>

#include <boost/tokenizer.hpp>

int main()
{
  std::ifstream is("./kk.txt");
  std::string line;

  while (std::getline(is, line)) {
    boost::tokenizer<> tokens(line);

    for (const auto& word : tokens)
      std::cout << word << '\n';
  }

  return 0;
}

最后

$ ./a.out
If
they
had
known
you
wished
it
the
entertainment
would
have

带有多个分隔符的stringstream

2 个答案: