Question

在C ++中，我想从文本文件中顺序读取单词，并将每个单词存储到数组中？之后，我将对此阵列执行一些操作。但我不知道如何处理第一阶段：从文本文件中顺序读取单词并将每个单词存储到数组中。我应该跳过那些标点，包括“。”，“，”，“？”

Answer 1

您需要为此使用流。看看这里的例子： Input/Output with files

Answer 2

这是一个完整的程序，它从一个名为“filename”的文件中读取单词，将它们存储在std::vector中，并从单词中删除标点符号。

#include <algorithm>  // iostream, vector, iterator, fstream, string

struct is_punct {
    bool operator()(char c) const {
        static const std::string punct(",.:;!?");
        return punct.find(c) != std::string::npos;
    }
};

int main(int argc, char* argv[])
{
    std::ifstream in("filename");
    std::vector<std::string> vec((std::istream_iterator<std::string>(in)),
                                 std::istream_iterator<std::string>());
    std::transform(vec.begin(), vec.end(),
                   vec.begin(),
                   [](std::string s) {
                       s.erase(std::remove_if(s.begin(), s.end(), is_punct()),
                               s.end());
                       return s;
                   });
    // manipulate vec
}

Answer 3

这听起来像是家庭作业。如果是，请直截了当。

首先，在C ++中使用原始数组几乎总是一个坏主意 - 使用向量是一个更好的主意。至于你关于标点符号的问题 - 这取决于你的客户，但我倾向于将空格分开。

无论如何，这是一种简单的方法，可以利用operator>>(istream&, string&)默认情况下分隔空格。

ifstream infile("/path/to/file.txt");
vector<string> words;
copy(istream_iterator<string>(file), istream_iterator<string>(), back_inserter(words));

Answer 4

你知道你会读多少字？如果没有，当你阅读越来越多的单词时，你需要增长数组。最简单的方法是使用标准容器为您执行此操作：std::vector。读取由空格分隔的单词很容易，因为它是std::ifstream::operator>>的默认行为。删除标点符号需要一些额外的工作，所以我们稍后会讨论。

从文件中读取单词的基本工作流程如下：

#include <fstream>
#include <string>
#include <vector>

int main()
{
    std::vector<std::string> words;
    std::string w;
    std::ifstream file("words.txt");  // opens the file for reading

    while (file >> w)  // read one word from the file, stops at end-of-file
    {
        // do some work here to remove punctuation marks            
        words.push_back(w);
    }

    return 0;
}

假设你在这里做作业，真正的关键是学习如何在将w添加到向量之前删除标点符号。我会研究以下概念来帮助你：

erase-remove idiom。请注意，std::string的行为类似于char的容器。
std::remove_if
cctype库中的ispunct功能

如果遇到麻烦，请随意发布更多问题。

Answer 5

另一种可能性，使用（我通常的）一个特殊方面：

class my_ctype : public std::ctype<char> {
public:
    mask const *get_table() { 
        // this copies the "classic" table used by <ctype.h>:
        static std::vector<std::ctype<char>::mask> 
            table(classic_table(), classic_table()+table_size);

        // Anything we want to separate tokens, we mark its spot in the table as 'space'.
        table[','] = (mask)space;
        table['.'] = (mask)space;
        table['?'] = (mask)space;

        // and return a pointer to the table:
        return &table[0];
    }
    my_ctype(size_t refs=0) : std::ctype<char>(get_table(), false, refs) { }
};

使用它，阅读单词非常简单：

int main(int argc, char **argv) { 
    std::ifstream infile(argv[1]);   // open the file.

    infile.imbue(std::locale(std::locale(), new my_ctype());  // use our classifier

    // Create a vector containing the words from the file:
    std::vector<std::string> words(
        (std::istream_iterator<std::string>(infile)),
        std::istream_iterator<std::string>());

    // and now we're ready to process the words in the vector
    // though it might be worth considering using `std::transform`, to take
    // the input from the file and process it directly.

用C ++顺序读取文本文件

5 个答案: