尝试从文件中读取并跳过C ++中的标点符号,提示?

时间:2011-02-28 23:02:03

标签: c++ text

我正在尝试从文件中读取,并从文件中生成所有单词的向量。我在下面尝试做的是让用户输入文件名,然后让代码打开文件,如果它们不是字母数字则跳过字符,然后将其输入到文件中。

现在它只是在我输入文件名时立即关闭。知道我可能做错了吗?

#include <vector>
#include <string>
#include <iostream>
#include <iomanip>
#include <fstream>
using namespace std;

int main() 
{

string line; //for storing words
vector<string> words; //unspecified size vector
string whichbook;
cout << "Welcome to the book analysis program. Please input the filename of the book you would like to analyze: ";
cin >> whichbook;
cout << endl;

ifstream bookread;
//could be issue
//ofstream bookoutput("results.txt"); 

bookread.open(whichbook.c_str());
//assert(!bookread.fail());

if(bookread.is_open()){
    while(bookread.good()){
        getline(bookread, line);
        cout << line;
        while(isalnum(bookread)){
            words.push_back(bookread);
        }
    }
}
cout << words[];
}

2 个答案:

答案 0 :(得分:2)

我想我的工作方式有点不同。由于您要忽略除字母数字字符以外的所有字符,我首先要定义一个区域设置,将所有其他字符视为空格:

struct digits_only: std::ctype<char> {
    digits_only(): std::ctype<char>(get_table()) {}

    static std::ctype_base::mask const* get_table() {
        static std::vector<std::ctype_base::mask> 
            rc(std::ctype<char>::table_size,std::ctype_base::space);

        std::fill(&rc['0'], &rc['9'], std::ctype_base::digit);
        std::fill(&rc['a'], &rc['z'], std::ctype_base::lower);
        std::fill(&rc['A'], &rc['Z'], std::ctype_base::upper);
        return &rc[0];
    }
};

这使得从流中读取单词/数字非常简单。例如:

int main() {
    char const test[] = "This is a bunch=of-words and 2@numbers#4(with)stuff to\tseparate,them, I think.";
    std::istringstream infile(test);
    infile.imbue(std::locale(std::locale(), new digits_only));

    std::copy(std::istream_iterator<std::string>(infile),
              std::istream_iterator<std::string>(),
              std::ostream_iterator<std::string>(std::cout, "\n"));

    return 0;
}

目前,我已将单词/数字复制到标准输出,但复制到向量只意味着为std::copy提供不同的迭代器。对于实际使用,我们无疑也希望从std::ifstream获取数据,但(再次)它只是提供正确的迭代器。只需打开文件,将其与区域设置一起使用,然后阅读您的文字/数字。所有标点符号等都将被自动忽略。

答案 1 :(得分:0)

以下内容将读取每一行,跳过非字母数字字符并将每一行作为项添加到输出向量。您可以对其进行调整,使其输出单词而不是行。我不想提供整个解决方案,因为这看起来有点像家庭作业问题。

#include <vector>
#include <sstream>
#include <string>
#include <iostream>
#include <iomanip>
#include <fstream>
using namespace std;


int _tmain(int argc, _TCHAR* argv[])
{   
    string line; //for storing words
    vector<string> words; //unspecified size vector
    string whichbook;
    cout << "Welcome to the book analysis program. Please input the filename of the book you would like to analyze: ";
    cin >> whichbook;
    cout << endl;

    ifstream bookread;
    //could be issue
    //ofstream bookoutput("results.txt"); 

    bookread.open(whichbook.c_str());
    //assert(!bookread.fail());

    if(bookread.is_open()){
         while(!(bookread.eof())){
            line = "";
            getline(bookread, line);


            string lineToAdd = "";

            for(int i = 0 ; i < line.size(); ++i)
            {
                if(isalnum(line[i]) || line[i] == ' ')
                {
                    if(line[i] == ' ')
                        lineToAdd.append(" ");
                    else
                    { // just add the newly read character to the string 'lineToAdd'
                        stringstream ss;
                        string s;
                        ss << line[i];
                        ss >> s;            
                        lineToAdd.append(s);
                    }
                }
            }

            words.push_back(lineToAdd);

        }
    }
    for(int i = 0 ; i < words.size(); ++i)
    cout << words[i] + " ";


    return 0;
}