所以我试图从文件中读取所有单词,并在我这样做时摆脱标点符号。以下是剥离标点符号的逻辑:
编辑:程序实际上完全停止运行,只是想明确
ifstream file("text.txt");
string str;
string::iterator cur;
for(file>>str; !file.eof(); file>>str){
for(cur = str.begin(); cur != str.end(); cur++){
if (!(isalnum(*cur))){
cur = str.erase(cur);
}
}
cout << str << endl;
...
}
假设我有一个文本文件:
This is a program. It has trouble with (non alphanumeric chars)
But it's my own and I love it...
当我cout
和endl;
我的字符串紧跟在这个逻辑之后,我会得到
This
is
a
program
It
has
trouble
with
non
alphanumeric
这就是所有人。 我的迭代器逻辑有问题吗? 我怎么能解决这个问题?
谢谢。
答案 0 :(得分:4)
我看到的迭代器的主要逻辑问题是,对于非字母数字字符,迭代器会增加两次:在erase
期间,它移动到下一个符号,然后从cur++
移动for
循环增加它,因此它会在非字母数字后跳过每个符号。
所以可能有以下几点:
string next;
string::iterator cur;
cur = next.begin()
while(cur != next.end()){
if (!(isalnum(*cur))){
cur = next.erase(cur);
} else {
cur++;
}
}
这只是删除了非字母数字字符。如果你需要对你的输入进行标记,你将不得不实现更多,即记住,你是否在一个单词内(至少读过一个字母数字字符),并采取相应的行动。
答案 1 :(得分:2)
在构建转换后的列表时,如何不在中复制标点符号。好。可能有点矫枉过正。
#include <iostream>
#include <fstream>
#include <iterator>
#include <vector>
#include <algorithm>
#include <cctype>
using namespace std;
// takes the file being processed as only command line param
int main(int argc, char *argv[])
{
if (argc != 2)
return EXIT_FAILURE;
ifstream inf(argv[1]);
vector<string> res;
std::transform(istream_iterator<string>(inf),
istream_iterator<string>(),
back_inserter(res),
[](const string& s) {
string tmp; copy_if(s.begin(), s.end(), back_inserter(tmp),
[](char c) { return std::isalnum(c); });
return tmp;
});
// optional dump to output
copy(res.begin(), res.end(), ostream_iterator<string>(cout, "\n"));
return EXIT_SUCCESS;
}
<强>输入强>
All the world's a stage,
And all the men and women merely players:
They have their exits and their entrances;
And one man in his time plays many parts,
His acts being seven ages. At first, the infant,
Mewling and puking in the nurse's arms.
<强>输出强>
All
the
worlds
a
stage
And
all
the
men
and
women
merely
players
They
have
their
exits
and
their
entrances
And
one
man
in
his
time
plays
many
parts
His
acts
being
seven
ages
At
first
the
infant
Mewling
and
puking
in
the
nurses
arms
答案 2 :(得分:1)
您应该使用ispunct
来测试标点字符。如果您还想过滤掉控制字符,则应使用iscntrl
。
一旦你过滤掉了标点符号,就可以拆分空格和换行符来获取单词。