Question

我是C ++的新手，并认为C ++ unordered_set应该像实现那样是一个哈希表，并且可以提供O（1）恒定时间访问，但是我要解决的问题似乎在find期间花费了线性时间。操作。

bool isWordInSet(string word, unordered_set<string> set) {
    return set.find(word) != set.end();
}

int main() {
...
unordered_set<string> wordSets[fileCount]; // pre-filled array of unordered_set<string>
...

for (int i = 0; i < fileCount; i++) {
    unordered_set<string>::const_iterator it = wordSets[i].begin();
    while (it != wordSets[i].end()) {
        haystack.insert(*it);
        // haystack.insert(*it + "ss"); // added in order to double set's size
        // haystack.insert(*it + "ss2"); // added in order to triple set's size
        it++;
    }
}

int common = 0;
double t = omp_get_wtime();
unordered_set<string>::const_iterator it = needle.begin();
// traverse needle set to find items that are common with haystack
while (it != needle.end()) {
    // if (haystack.find(*it2) != haystack.end()) -> this takes O(1), but below is linear
    if (isWordInSet(*it, haystack)) // takes proportional time to haystack's size
        common++;
    it++;
}
t = omp_get_wtime() - t;
cout << "SizeHay: " << haystack.size() << " Time: " << t * 10'000 << "\n";
}

当我取消注释行haystack.insert(*it + "ss");和haystack.insert(*it + "ss2")时，完成遍历和搜索针组所需的时间成比例增加。这是预期的行为，还是我执行find的方式有问题？

编辑：事实证明，当我调用isWordInSet函数时，它仅花费比例时间，而当我用haystack.find(*it) != haystack.end()内联时，它不花费比例时间。这对我来说真的很奇怪。我相信该函数只会被调用needle.size()次，而与haystack.size()无关。

unordered_set :: find操作需要线性时间

0 个答案: