Question

我有一些代码，可以在容器中找到新单词并将其添加到类私有变量dict中。需要对功能Learner::Learn进行优化，以使其运行更快。 dict向量中的元素可以互相重复，但是'newWords'应该始终返回新（未重复）单词的计数。

#include <algorithm>
#include <string>
#include <vector>

using namespace std;

class Learner {
private:
  vector<string> dict;

public:
  int Learn(const vector<string>& words) {
    int newWords = 0;
    for (const auto& word : words) {
      if (find(dict.begin(), dict.end(), word) == dict.end()) {
        ++newWords;
        dict.push_back(word);
      }
    }
    return newWords;
  }

我尝试过这种方式，但是执行时间是相同的：

class Learner {
 private:
  vector<string> dict;

 public:
  int Learn(const vector<string>& words) {
    std::size_t index = dict.size();
    dict.resize(dict.size() + words.size());
    vector<string>::iterator nth = dict.begin() + index;
    int newWords = 0;
    for (const auto& word : words) {
      if (find(dict.begin(), dict.end(), word) == dict.end()) {
        ++newWords;
        *nth++ = word;
      }
    }
    return newWords;
  }

我应该避免以某种方式使用push_back()方法。

Answer 1

如果始终保持words排序，则可以使用二进制搜索的总运行时间为O（n log n），但必须移动整个向量才能在中间插入内容。（它将把它带回O（n ^ 2））

尽管如此，您应该切换到另一个容器进行重大改进：

std::set（O（log n）查找，O（log n）插入）
std::map（O（log n）查找，O（log n）插入）
std::unordered_set（O（1）查找，O（1）插入）

Answer 2

Trie是一种有效的替代方法，但在std中却不是，因此您必须自己编写或使用外部库。

在标准情况下，std::set / std::map和无序版本（std::unordered_set / std::unordered_map）可能会有所帮助

替代std :: find的方法运行速度更快

2 个答案: