我正在尝试将一段文本读入字符串向量,然后创建字典,保留每个单词出现次数的计数。到目前为止,它只加载文本的第一个单词,我不知道如何继续。我知道我对如何正确使用这些成员函数有点不清楚。
int main()
{
ifstream input1;
input1.open("Base_text.txt");
vector<string> base_file;
vector<int> base_count;
if (input1.fail())
{
cout<<"Input file 1 opening failed."<<endl;
exit(1);
}
make_dictionary(input1, base_file, base_count);
}
void make_dictionary(istream& file, vector<string>& words, vector<int>& count)
{
string line;
while (file>>line)
{
words.push_back(line);
}
cout<<words[0];
}
预期产出:
This is some simple base text to use for comparison with other files.
You may use your own if you so choose; your program shouldn't actually care.
For getting interesting results, longer passages of text may be useful.
In theory, a full novel might work, although it will likely be somewhat slow.
实际输出:
This
答案 0 :(得分:1)
嗯,你只打印第一个字:(这个想法是告诉你为什么你要爱STL)
cout<<words[0];
你可以
for(string& word : words) cout<<word;
或
for(size_t i=0; i<words.size(); ++i) cout<<words[i];
打印全部。 计算单词的一个非常简单的解决方案是使用map代替向量:
map<string,size_t> words;
...
string word;
while (file>>word) ++words[word];
...
for(const auto& w : words) cout<<endl<<w.first<<":"<<w.second;
WhozCraig提出了挑战。按频率订购单词:
multimap<int,string,greater<int>> byFreq;
for(const auto& w : words) byFreq.insert( make_pair(w.second, w.first));
for(const auto& w : byFreq) cout<<endl<<w.second <<":"<<w.first;
#include <iostream>
#include <map>
#include <functional>
#include <utility>
#include <cctype>
using namespace std;
int main()
{
map<string,size_t> words;
string word;
while (cin>>word)
{
for(char&c:word)c=tolower(c);
++words[word];
}
cout<<" ----- By word: ------" ;
for(const auto& w : words) cout<<endl<<w.first<<":"<<w.second;
cout<<endl<<endl<<" ----- By frequency: ------";
multimap<size_t,string,greater<int>> byFreq;
for(const auto& w : words) byFreq.insert( make_pair(w.second, w.first) );
for(const auto& w : byFreq) cout<<endl<<w.second <<":"<<w.first;
return 0;
}
答案 1 :(得分:1)
我猜你必须在循环中移动cout << words[0]
,否则它只会在循环结束时被调用一次。但是,这只会在每次迭代时打印出第一个单词。所以,每次打印最后一个单词:
while (file>>line)
{
words.push_back(line);
cout<<words.back(); // or cout << line, same thing really
}
最后一件事 - while(file >> line)
将逐字逐句阅读,而不是逐行阅读变量的名称。如果您需要,请使用while (getline(file, line))
。
答案 2 :(得分:1)
将文本文件中的单词内容读入字符串向量非常简单。下面的代码假定正在解析的文件名是第一个命令行参数。
#include <iostream>
#include <fstream>
#include <iterator>
#include <vector>
#include <string>
#include <map>
using namespace std;
int main(int argc, char *argv[])
{
if (argc < 2)
return EXIT_FAILURE;
// open file and read all words into the vector.
ifstream inf(argv[1]);
istream_iterator<string> inf_it(inf), inf_eof;
vector<string> words(inf_it, inf_eof);
// for populating a word-count dictionary:
map<string, unsigned int> dict;
for (auto &it : words)
++dict[it];
// print the dictionary
for (auto &it : dict)
cout << it.first << ':' << it.second << endl;
return EXIT_SUCCESS;
}
但是,您应该(可以)将两个操作组合成一个循环并完全避免使用中间向量:
#include <iostream>
#include <fstream>
#include <string>
#include <map>
using namespace std;
int main(int argc, char *argv[])
{
if (argc < 2)
return EXIT_FAILURE;
// open file and read all words into the vector.
ifstream inf(argv[1]);
map<string, unsigned int> dict;
string str;
while (inf >> str)
++dict[str];
// print the dictionary
for (auto &it : dict)
cout << it.first << ':' << it.second << endl;
return EXIT_SUCCESS;
}
将其从最高到最低的排序进行排序并不是那么简单,但可以使用排序床向量和std::sort()
。此外,条带化前导和尾随非字母字符(标点符号)也是一种增强。另一种方法是在插入地图之前将单词减少为全小写。这允许Ball和ball占用一个计数为2的单个字典槽。
答案 3 :(得分:0)
我有以下实现,尝试将单词转换为小写并删除标点符号。
#include<iostream>
#include<iterator>
#include<algorithm>
#include<fstream>
#include<string>
#include<unordered_map>
int main() {
std::vector<std::string> words;
{
std::ifstream fp("file.txt", std::ios::in);
std::copy(std::istream_iterator<std::string>(fp),
std::istream_iterator<std::string>(),
std::back_insert_iterator<std::vector<std::string>>(words));
}
std::unordered_map<std::string, int> frequency;
for(auto it=words.begin(); it!=words.end(); ++it) {
std::string word;
std::copy_if(it->begin(), it->end(),
std::back_insert_iterator<std::string>(word), ::isalpha);
std::transform(word.begin(), word.end(), word.begin(), ::tolower);
frequency[word]++;
}
for(auto p:frequency) {
std::cout<<p.first<<" => "<<p.second<<std::endl;
}
return 0;
}
如果file.txt
具有以下内容:
hello hello hello bye BYE dog DOG' dog.
word Word worD w'ord
该程序将产生:
word => 4
dog => 3
bye => 2
hello => 3