Question

我正在尝试构建一个计算.txt文件中重复单词的程序，并输出重复的单词以及重复的时间。我有一个方法可以计算有多少单词，但不计算重复单词。这是代码：

#include <iostream>
#include <string>
#include <math.h>
#include <iomanip>
#include <fstream>
#include <vector>
#include "ProcessStatistics.h"

using namespace std;

ProcessStatistics::ProcessStatistics()
{
    //constructor
}

//Finds out how many words are composed by an specific number of         characters.

void ProcessStatistics::Length(std::vector<std::string> ArrayOfWords, int numberOfWords)
{
    cout << "=== COMPUTING WORD S LENGTH ==== " << endl;
    int vectorLength[30] = {0};

    for(int i = 0; i < numberOfWords; i++)
    {
        for(int j = 0; j<20; j++)
        {
            if (ArrayOfWords [i].length()-1 == j)
                vectorLength[j] = vectorLength[j]+1;
        }
    }

    ofstream varlocal;
    remove("WORDS_LENGTH.txt");
    varlocal.open("WORDS_LENGTH.txt");
    if(varlocal.is_open())
    {
        varlocal << "Total: " << numberOfWords << endl;
        for(int i=0; i < 30; i++)
        {
            if(vectorLength[i] != 0)
            {
                varlocal << vectorLength[i] << " W " << i+1 << " CHAR " <<     " % " << setprecision(3) << vectorLength[i]*100/numberOfWords << endl;
            }
        }
    }
    varlocal.close();

Answer 1

我心情很好，所以这里有一些示例代码，使用std::map演示文本文件中的单词统计信息。

#include <algorithm>
#include <string>
#include <fstream>
#include <iostream>

using std::ifstream;
using std::cout;
using std::string;
using std::cin;
using std::map;

int main()
{
  static const char filename[] = "my_data.txt";
  ifstream input(filename);
  if (!input)
  {
    cout << "Error opening data file " << filename << "\n";
    return 1;
  }
  map<string, unsigned int> word_data;
  string word;
  while (input >> word)
  {
     if (word_data.find(word) != word_data.end())
     {
       word_data[word]++;
     }
     else
     {
       word_data[word] = 1;
     }
  }
  map<string, unsigned int>::iterator iter;
  for (iter = word_data.begin(); word_data != word_data.end(); ++iter)
  {
    cout << iter->second << "\t" << iter->first << "\n";
  }
  return 0;
}

在上面的代码中，word是map中的关键字。单词出现，计数或频率是map中的值。

如果map中存在该单词，则计数会递增。如果该单词不存在，则会将其添加到map并计数为1.

读取文件后，打印统计数据，计数后跟单词。

Answer 2

首先，我不确定你的算法在计算重复的单词时起作用。看起来你正在计算具有相同长度的单词的数量。请记住，如果向量中有两个不同的单词具有相同的大小，这将无效。 “猫”和“狗”将被视为同一个单词。如果你试图计算重复的单词，一个好的方法就是使用集合和地图。

//rest of your code
a = set<string>
b = map<string , int>
//rest of code
for (int i =0;i<number_of_words:i++){
   //check if string str is in the set:
   it = a.find(ArrayOfWords[i])
   if (it == a.end()){
      //str is a new string
      a.insert(ArrayOfWords[i])
      b[ArrayOfWords[i]] = 1
   }
   else{
      //str is a redundant word
      b[ArrayOfWords[i]] += 1
   }

find（）方法，如果元素尚未设置，则返回指向set结尾的迭代器，这样就可以检查单词是否为new.the map就像一个数组，而是您可以在[]中输入任何类型的数据，包括用于计算其中有多少数量的单词最后你可以迭代地图并打印那些值超过1的那些

计算.txt文件中重复单词的C ++程序

2 个答案: