C ++识别句子中出现的单词的频率

时间:2010-11-06 09:04:10

标签: c++

用于此任务的最佳STL是什么?我一直在使用Map, 我无法让它发挥作用。我不确定如何检查句子中出现的相同单词的数量,例如:

  

我爱他,我爱她,他爱她。

所以我希望程序提示用户输入一个整数,让我说输入3,输出将是爱,因为同一个单词在句子中出现3次。但是如果我想做这样的程序会使用什么方法?

目前我的程序提示用户输入单词,然后它将返回该单词出现的时间,这对于单词爱,是3.但现在我想要反过来。可以吗?使用哪种STL会更好?

4 个答案:

答案 0 :(得分:3)

我假设您使用地图来存储出现次数。 那么,首先必须了解这一点,因为您使用的是地图,所以密钥是唯一的,而存储的数据可能不是唯一的。 考虑一张地图,x 内容

x["I"]=3
x["Love"]=3
x["C"]=5

从键到值的唯一映射,而不是反过来,如果你想要这个一对一的映射,我会建议一个不同的数据结构。如果你想使用map,仍然搜索一个元素,使用STL搜索功能或您自己的。或者您可以编写搜索功能。 search()

map<string,int>::iterator ser;
cin>>check;
for(ser=x.begin();ser!=x.end();++ser)
{
    if(ser->second==check)
    {
       cout<<"Word"<<ser->first<<endl;
       break;
    }
}

答案 1 :(得分:3)

首先构建从word到count的映射,然后从中构建反向多映射。最后,您可以确定在给定频率下出现的单词:

#include <algorithm>
#include <iostream>
#include <iterator>
#include <map>
#include <set>
#include <sstream>
#include <string>
#include <utility>

int main()
{
    std::string str("I love him, I love her, he love her");
    std::istringstream ss(str);
    std::istream_iterator<std::string> begin(ss);
    std::istream_iterator<std::string> end;

    std::map<std::string, int> word_count;
    std::for_each(begin, end, [&](const std::string& s)
    {
        ++word_count[s];
    });

    std::multimap<int, std::string> count_words;
    std::for_each(word_count.begin(), word_count.end(),
                  [&](const std::pair<std::string, int>& p)
    {
        count_words.insert(std::make_pair(p.second, p.first));
    });

    auto its = count_words.equal_range(3);
    std::for_each(its.first, its.second,
                  [](const std::pair<int, std::string>& p)
    {
        std::cout << p.second << std::endl;
    });
}

答案 2 :(得分:2)

/******************************************************************
Name  :  Paul Rodgers
Source : HW1.CPP
Compiler :  Visual C++ .NET
Action : Program will read in from standard input and determine the
         frequency of word lengths found in input.  An appropriate
         table is also displayed.  Maximum word length is 15 characters
         words greater then 15 are counted as length 15. 
         Average word length also displayed.

Note   : Words include hyphenated and ones with apostrophes.  Words with
         apostrophes, i.e. Jim's, will count the apostrophe as part of the
         word length. Hyphen is counted if word on same line, else not.

         Also an int array is used to hold the number of words with
         length associated with matching subscript, with subscript 0
         not being used.  So subscript 1 corresponds to word length of 1,
         subscript 2 to word length of 2 and so on.
------------------------------------------------------------------------*/
#include <iostream>
#include <ctype.h>
#include <iomanip>
using namespace std;

int NextWordLength(void);                    // function prototypes
void DisplayFrequencyTable(const int Words[]);

const int WORD_LENGTH = 16;                // global constant for array

void main()
{
  int WordLength;                         // actual length of word 0 to X
  int NumOfWords[WORD_LENGTH] = {0};     // array holds # of lengths of words

  WordLength = NextWordLength();
  while (WordLength)                   // continue to loop until no word, i.e. 0
    {                                 // increment length counter
      (WordLength <= 14) ? (++NumOfWords[WordLength]) : (++NumOfWords[15]);
      WordLength = NextWordLength();
    }

  DisplayFrequencyTable(NumOfWords);
}

/**********************  NextWordLength  ********************************
Action  : Will determine the length of the next word. Hyphenated words and
          words with apostrophes are counted as one word accordingly
Parameters : none
Returns   : the length of word, 0 if none, i.e. end of file
-----------------------------------------------------------------------*/
int NextWordLength(void)
{
  char Ch;
  int EndOfWord = 0,       //tells when we have read in one word
      LengthOfWord = 0;

  Ch = cin.get();                           // get first character
  while (!cin.eof() && !EndOfWord)
   {
     while (isspace(Ch) || ispunct(Ch))      // Skips leading white spaces
        Ch = cin.get();                      // and leading punctation marks

     if (isalnum(Ch))          // if character is a letter or number
        ++LengthOfWord;        // then increment word length

     Ch = cin.get();           // get next character

     if ((Ch == '-') && (cin.peek() == '\n')) //check for hyphenated word over two lines
       {
         Ch = cin.get();       // don't count hyphen and remove the newline char
         Ch = cin.get();       // get next character then on next line
       }

     if ((Ch == '-') && (isalpha(cin.peek()))) //check for hyphenated word in one line
     {
         ++LengthOfWord;       // count the hyphen as part of word
         Ch = cin.get();       // get next character
     }

     if ((Ch == '\'') && (isalpha(cin.peek()))) // check for apostrophe in word
      {
        ++LengthOfWord;        // count apostrophe in word length
        Ch = cin.get();        // and get next letter
      }

     if (isspace(Ch) || ispunct(Ch) || cin.eof())  // is it end of word
       EndOfWord++;
   }

  return LengthOfWord;
}

/***********************  DisplayFrequencyTable  **************************
Action      :  Will display the frequency of length of words along with the
               average word length
Parameters
  IN        : Pointer to array holding the frequency of the lengths
Returns     : Nothing
Precondition: for loop does not go beyond WORD_LENGTH
------------------------------------------------------------------------*/
void DisplayFrequencyTable(const int Words[])
{
  int TotalWords = 0, TotalLength = 0;

  cout << "\nWord Length      Frequency\n";
  cout << "------------     ----------\n";

  for (int i = 1; i <= WORD_LENGTH-1; i++)
    {
     cout << setw(4) << i << setw(18) << Words[i] << endl;
     TotalLength += (i*Words[i]);
     TotalWords += Words[i];
    }

  cout << "\nAverage word length is ";

  if (TotalLength)
     cout << float(TotalLength)/TotalWords << endl;
  else
    cout << 0 << endl;
}

答案 3 :(得分:-1)

{{1}}