将一段文本读入字符串向量

时间:2013-04-26 19:37:06

标签: c++ string vector

我正在尝试将一段文本读入字符串向量,然后创建字典,保留每个单词出现次数的计数。到目前为止,它只加载文本的第一个单词,我不知道如何继续。我知道我对如何正确使用这些成员函数有点不清楚。

int main()
    {
        ifstream input1;
        input1.open("Base_text.txt");

    vector<string> base_file;
    vector<int> base_count;


    if (input1.fail())
    {
        cout<<"Input file 1 opening failed."<<endl;
        exit(1);
    }

    make_dictionary(input1, base_file, base_count);


}

void make_dictionary(istream& file, vector<string>& words, vector<int>& count)
{


    string line;


    while (file>>line)
    {
        words.push_back(line);
    }

    cout<<words[0];



}

预期产出:

This is some simple base text to use for comparison with other files.
You may use your own if you so choose; your program shouldn't actually care.
For getting interesting results, longer passages of text may be useful.
In theory, a full novel might work, although it will likely be somewhat slow.

实际输出:

This 

4 个答案:

答案 0 :(得分:1)

嗯,你只打印第一个字:(这个想法是告诉你为什么你要爱STL)

cout<<words[0];

你可以

for(string& word : words)             cout<<word;

for(size_t i=0; i<words.size(); ++i)  cout<<words[i];

打印全部。 计算单词的一个非常简单的解决方案是使用map代替向量:

map<string,size_t> words;
...
string word;
while (file>>word)           ++words[word];
...
for(const auto& w : words)  cout<<endl<<w.first<<":"<<w.second;

WhozCraig提出了挑战。按频率订购单词:

multimap<int,string,greater<int>> byFreq;
for(const auto& w : words)  byFreq.insert( make_pair(w.second, w.first));
for(const auto& w : byFreq) cout<<endl<<w.second <<":"<<w.first;

All will (ideone):

#include <iostream>
#include <map>
#include <functional> 
#include <utility>
#include <cctype>
using namespace std;

int main() 
{
   map<string,size_t> words;
   string word;

   while (cin>>word)  
   { 
       for(char&c:word)c=tolower(c);
       ++words[word];
   }
   cout<<"  ----- By word: ------" ;
   for(const auto& w : words)  cout<<endl<<w.first<<":"<<w.second;
   cout<<endl<<endl<<" ----- By frequency: ------";
   multimap<size_t,string,greater<int>> byFreq;
   for(const auto& w : words)  byFreq.insert( make_pair(w.second, w.first) );
   for(const auto& w : byFreq) cout<<endl<<w.second <<":"<<w.first;
   return 0;
}

答案 1 :(得分:1)

我猜你必须在循环中移动cout << words[0],否则它只会在循环结束时被调用一次。但是,这只会在每次迭代时打印出第一个单词。所以,每次打印最后一个单词:

while (file>>line)
{
     words.push_back(line);
     cout<<words.back(); // or cout << line, same thing really
}

最后一件事 - while(file >> line)将逐字逐句阅读,而不是逐行阅读变量的名称。如果您需要,请使用while (getline(file, line))

答案 2 :(得分:1)

将文本文件中的单词内容读入字符串向量非常简单。下面的代码假定正在解析的文件名是第一个命令行参数。

#include <iostream>
#include <fstream>
#include <iterator>
#include <vector>
#include <string>
#include <map>
using namespace std;

int main(int argc, char *argv[])
{
    if (argc < 2)
        return EXIT_FAILURE;

    // open file and read all words into the vector.
    ifstream inf(argv[1]);
    istream_iterator<string> inf_it(inf), inf_eof;
    vector<string> words(inf_it, inf_eof);

    // for populating a word-count dictionary:
    map<string, unsigned int> dict;
    for (auto &it : words)
        ++dict[it];

    // print the dictionary
    for (auto &it : dict)
        cout << it.first << ':' << it.second << endl;

    return EXIT_SUCCESS;
}

但是,您应该(可以)将两个操作组合成一个循环并完全避免使用中间向量:

#include <iostream>
#include <fstream>
#include <string>
#include <map>
using namespace std;

int main(int argc, char *argv[])
{
    if (argc < 2)
        return EXIT_FAILURE;

    // open file and read all words into the vector.
    ifstream inf(argv[1]);
    map<string, unsigned int> dict;
    string str;
    while (inf >> str)
        ++dict[str];

    // print the dictionary
    for (auto &it : dict)
        cout << it.first << ':' << it.second << endl;

    return EXIT_SUCCESS;
}

将其从最高到最低的排序进行排序并不是那么简单,但可以使用排序床向量和std::sort()。此外,条带化前导和尾随非字母字符(标点符号)也是一种增强。另一种方法是在插入地图之前将单词减少为全小写。这允许Ball和ball占用一个计数为2的单个字典槽。

答案 3 :(得分:0)

我有以下实现,尝试将单词转换为小写并删除标点符号。

#include<iostream>
#include<iterator>
#include<algorithm>
#include<fstream>
#include<string>
#include<unordered_map>

int main() {
  std::vector<std::string> words;
  {
    std::ifstream fp("file.txt", std::ios::in);
    std::copy(std::istream_iterator<std::string>(fp),
              std::istream_iterator<std::string>(),
              std::back_insert_iterator<std::vector<std::string>>(words));
  }

  std::unordered_map<std::string, int> frequency;
  for(auto it=words.begin(); it!=words.end(); ++it) {
    std::string word;
    std::copy_if(it->begin(), it->end(),
                 std::back_insert_iterator<std::string>(word), ::isalpha);
    std::transform(word.begin(), word.end(), word.begin(), ::tolower);
    frequency[word]++;
  }

  for(auto p:frequency) {
    std::cout<<p.first<<" => "<<p.second<<std::endl;
  }
  return 0;
}

如果file.txt具有以下内容:

hello hello hello bye BYE dog DOG' dog.

word Word worD w'ord

该程序将产生:

word => 4
dog => 3
bye => 2
hello => 3