我有以下代码从文本文件打印每个唯一的单词及其计数(包含> = 30k单词),但是它按空格分隔单词,我有这样的结果:
如何修改代码以指定预期的分隔符?
template <class KTy, class Ty>
void PrintMap(map<KTy, Ty> map)
{
typedef std::map<KTy, Ty>::iterator iterator;
for (iterator p = map.begin(); p != map.end(); p++)
cout << p->first << ": " << p->second << endl;
}
void UniqueWords(string fileName) {
// Will store the word and count.
map<string, unsigned int> wordsCount;
// Begin reading from file:
ifstream fileStream(fileName);
// Check if we've opened the file (as we should have).
if (fileStream.is_open())
while (fileStream.good())
{
// Store the next word in the file in a local variable.
string word;
fileStream >> word;
//Look if it's already there.
if (wordsCount.find(word) == wordsCount.end()) // Then we've encountered the word for a first time.
wordsCount[word] = 1; // Initialize it to 1.
else // Then we've already seen it before..
wordsCount[word]++; // Just increment it.
}
else // We couldn't open the file. Report the error in the error stream.
{
cerr << "Couldn't open the file." << endl;
}
// Print the words map.
PrintMap(wordsCount);
}
答案 0 :(得分:2)
您可以使用带有std::ctype<char>
构面imbue()
ed的流,它会将您想要的任何字符视为空格。这样做会是这样的:
#include<locale>
#include<cctype>
struct myctype_table {
std::ctype_base::mask table[std::ctype<char>::table_size];
myctype_table(char const* spaces) {
while (*spaces) {
table[static_cast<unsigned char>(*spaces)] = std::ctype_base::isspace;
}
}
};
class myctype
: private myctype_table,
, public std::ctype<char> {
public:
myctype(char const* spaces)
: myctype_table(spaces)
, std::ctype<char>(table) {
};
};
int main() {
std::locale myloc(std::locale(), new myctype(" \t\n\r?:.,!"));
std::cin.imbue(myloc);
for (std::string word; std::cin >> word; ) {
// words are separated by the extended list of spaces
}
}
此代码现在没有测试 - 我在移动设备上输入。我可能误用了一些std::cypte<char>
接口,但在修复名称等之后,这些行应该有效。
答案 1 :(得分:1)
正如您所期望的那样,在找到单词末尾的禁止字符时,您可以在将单词转换为wordsCount之前将其删除:
if(word[word.length()-1] == ';' || word[word.length()-1] == ',' || ....){
word.erase(word.length()-1);
}
答案 2 :(得分:0)
在fileStream >> word;
之后,您可以调用此函数。看看是否清楚:
string adapt(string word) {
string forbidden = "!?,.[];";
string ret = "";
for(int i = 0; i < word.size(); i++) {
bool ok = true;
for(int j = 0; j < forbidden.size(); j++) {
if(word[i] == forbidden[j]) {
ok = false;
break;
}
}
if(ok)
ret.push_back(word[i]);
}
return ret;
}
这样的事情:
fileStream >> word;
word = adapt(word);