使用vector和unordered_map中的计数值获取三个最常出现的单词

时间:2013-08-31 10:53:04

标签: c++ vector

我的下面的代码给了我来自字符串的大多数单词。我想从矢量中得到三个最常出现的单词及其计数值。有什么帮助吗?

我使用了vectorunordered_map。在代码的最后部分,我从vector获得了最多的单词。

int main(int argc,char *argv[])
    {
        typedef std::unordered_map<std::string,int> occurrences;
        occurrences s1;
        std::string input = argv[1];

        std::istringstream iss(std::move(input));
        std::vector<std::string> most;
        int max_count = 0,second=0,third=0;


//Here I get max_count, 2nd highest and 3rd highest count value 
       while (iss >> input)
        {
            int tmp = ++s1[input];
            if (tmp == max_count)
            {
                most.push_back(input);
            }
            else if (tmp > max_count)
            {
                max_count = tmp;
                most.clear();
                most.push_back(input);
                third = second;
                second = max_count;
            }
            else if (tmp > second)
            {
                third = second;
                second = tmp;
            }
            else if (tmp > third)
            {
                third = tmp;
            }
        }

//I have not used max_count, second, third below. I dont know how to access them for my purpose

      //Print each word with it's occurenece. This works fine 
      for (occurrences::const_iterator it = s1.cbegin();it != s1.cend(); ++it)
            std::cout << it->first << " : " << it->second << std::endl;;

      //Prints word which occurs max time. **Here I want to print 1st highest,2nd highest,3rd highest occuring word with there occurrence.  How to do?**
      std::cout << std::endl << "Maximum Occurrences" << std::endl;
        for (std::vector<std::string>::const_iterator it = most.cbegin(); it != most.cend(); ++it)
            std::cout << *it << std::endl;

       return 0;
    } 

有想法获得3个最常见的词吗?

3 个答案:

答案 0 :(得分:3)

我更愿意使用std::map<std::string, int>代替

将其用作源地图,插入std::vector<std::string>

中的值

现在创建multimap,一个翻译版本的源地图,std::greater<int>作为比较器

这张最终地图的前三个值是最常用词

示例:

#include<iostream>
#include<algorithm>
#include<map>
#include<vector>

int main()
{
 std::vector<std::string> most { "lion","tiger","kangaroo",
                                 "donkey","lion","tiger",
                                 "lion","donkey","tiger"
                                 };
std::map<std::string, int> src;
for(auto x:most)
    ++src[x];

std::multimap<int,std::string,std::greater<int> > dst;

std::transform(src.begin(), src.end(), std::inserter(dst, dst.begin()), 
                   [] (const std::pair<std::string,int> &p) {
                   return std::pair<int,std::string>(p.second, p.first);
                   }
                 );

std::multimap<int,std::string>::iterator it = dst.begin();

 for(int count = 0;count<3 && it !=dst.end();++it,++count)
   std::cout<<it->second<<":"<<it->first<<std::endl;

}

DEMO HERE

答案 1 :(得分:1)

使用堆来存储三个最常见的单词更容易,更清晰。它也可以很容易地扩展到大量最常出现的单词。

答案 2 :(得分:1)

如果我想知道n个最常出现的单词,我会有一个n元素数组,遍历单词列表,并将那些使它成为我的top n的数据存储到数组中(删除最低的单词数组) )。