Question

我正在输入一个字符文件，每个字在矢量中都有自己的位置。然后，我需要跟踪每个单词并找出每个单词出现的次数，以便：

树木有三棵树

应输出：

有1 是1 三个1 树3

我想知道如何使用字符串向量来保持每个单词的路径。我会做一个字符串向量，每个字符串都有一个int的向量吗？

Answer 1

不要用螺丝刀钉钉子。 std::vector对于此任务的最基本形式并不是特别有用：简单的频率计算。来自标准输入的任意输入将最好地利用关联容器，其中键是输入字符串，并且值是累积频率。

无序频率计算

无序映射类std::unordered_map，键入std::string并映射到该字符串的频率计数器，可用于跟踪基本频率。例如：

#include <iostream>
#include <vector>
#include <string>
#include <unordered_map>

int main()
{
    std::unordered_map<std::string, unsigned> m;
    std::string word;
    while (std::cin >> word)
        ++m[word]; // increment the count for this word

    for (auto const& pr : m)
        std::cout << pr.first << ':' << pr.second << '\n';
}

字典序列频率

注意：使用关联容器std::unordered_map（因此名称）没有特定的订单。如果您需要词典排序，您可以简单地使用常规std::map。如：

#include <iostream>
#include <vector>
#include <string>
#include <map>

int main()
{
    std::map<std::string, unsigned> m;
    std::string word;
    while (std::cin >> word)
        ++m[word];

    for (auto const& pr : m)
        std::cout << pr.first << ':' << pr.second << '\n';
}

位置保留频率计算

在计算频率计数器的同时，在输入流中保持一个单词的位置也是可能的，并且只需要更多的代码。像我们之前一样选择无序或有序关联容器，但不是映射到unsigned，而是映射到std::vector<unsigned>，我们在消耗输入字时累积字计数器。每个向量的总大小仍保留频率计数器，但向量本身保留相关单词出现的输入流内的位置。例如：

#include <iostream>
#include <vector>
#include <string>
#include <map>

int main()
{
    std::map<std::string, std::vector<unsigned int>> m;
    std::string word;
    unsigned ctr = 0;
    while (std::cin >> word)
        m[word].push_back(++ctr);

    for (auto const& pr : m)
    {
        std::cout << pr.first << ':' << pr.second.size() << " { ";
        for (auto pos : pr.second)
            std::cout << pos << ' ';
        std::cout << "}\n";
    }
}

这将产生以下形式的输出：

word : frequency { n1 n2 n3... }

其中word是一个不同的单词，frequency是输入流中的整体频率，而n1,n2,n3,...是在处理过程中出现该单词的位置（从1开始）。 / p>

希望其中一种方法很有用。

Answer 2

您可以使用c ++中的multiset类，它将跟踪您将每个单词添加到集合中的次数。另外请记住，您可以在c ++中读取流中的完整单词，它会自动跳过任何空格字符。

我将从stdin读取这个例子（注意，我没有编译它，它只是为了显示这个想法）。

#include <set>
using namespace std;

int main(){
  string word;
  multiset<string> ocurrences;
  while(cin >> word){
    ocurrences.insert(word);
  }
  for(string w : ocurrences){  // Iterate over all words in the set
    cout<<w<<" "<<counts.count(w)<<" ";
  }
}

正如评论中所提到的，如果你想按照第一次出现的顺序打印单词，只需保留一个vector<string>并添加你读过的每个单词（如果它不在集合中），然后迭代这个向量而不是集合。

#include <set>
using namespace std;

int main(){
  string word;
  vector<string> words;
  multiset<string> ocurrences;
  while(cin >> word){
    if(ocurrences.count(word) == 0) //Is this the first time we see this word?
      words.push_back(word);
    ocurrences.insert(word);
  }
  for(string w : words){ //Iterate over the words in the order
                         //they appeared in the input.
    cout<<w<<" "<<ocurrences.count(w)<<" ";
  }
}

另一方面，即使multiset更适合修复此特定问题，您在问题中询问的内容称为映射，这是一种将键与值（可能是不同类型）相关联的数据结构。 C++ already has a map implementation。在这种情况下，您需要map<string, int>将每个单词与其出现的时间相关联。

Answer 3

通过在词汇流上累积字典并使用C ++ 17结构化绑定，可以实现此目的：

int main()
{
    std::istringstream words( "There are three trees trees trees" );

    auto dic = std::accumulate(
        std::istream_iterator< std::string >( words ) ,
        std::istream_iterator< std::string >( ) ,
        std::unordered_map< std::string , int >( ) ,
        []( auto && map , auto && word ) -> decltype( auto )
        {
            auto [ it , success ] = map.try_emplace(
                std::forward< decltype( word ) >( word ) , 0 );

            ++ it->second;

            return std::forward< decltype( map ) >( map );
        } );

    for ( const auto & [ key , value ] : dic )
    {
        std::cout << key << ": " << value << std::endl;
    }
}

Live at Coliru（虽然有一些警告）

> trees: 3
> three: 1
> There: 1
> are: 1

字符串传染媒介与逆C ++的

3 个答案: