我是C ++的新手,并认为C ++ unordered_set应该像实现那样是一个哈希表,并且可以提供O(1)恒定时间访问,但是我要解决的问题似乎在find
期间花费了线性时间。操作。
bool isWordInSet(string word, unordered_set<string> set) {
return set.find(word) != set.end();
}
int main() {
...
unordered_set<string> wordSets[fileCount]; // pre-filled array of unordered_set<string>
...
for (int i = 0; i < fileCount; i++) {
unordered_set<string>::const_iterator it = wordSets[i].begin();
while (it != wordSets[i].end()) {
haystack.insert(*it);
// haystack.insert(*it + "ss"); // added in order to double set's size
// haystack.insert(*it + "ss2"); // added in order to triple set's size
it++;
}
}
int common = 0;
double t = omp_get_wtime();
unordered_set<string>::const_iterator it = needle.begin();
// traverse needle set to find items that are common with haystack
while (it != needle.end()) {
// if (haystack.find(*it2) != haystack.end()) -> this takes O(1), but below is linear
if (isWordInSet(*it, haystack)) // takes proportional time to haystack's size
common++;
it++;
}
t = omp_get_wtime() - t;
cout << "SizeHay: " << haystack.size() << " Time: " << t * 10'000 << "\n";
}
当我取消注释行haystack.insert(*it + "ss");
和haystack.insert(*it + "ss2")
时,完成遍历和搜索针组所需的时间成比例增加。这是预期的行为,还是我执行find
的方式有问题?
编辑:事实证明,当我调用isWordInSet
函数时,它仅花费比例时间,而当我用haystack.find(*it) != haystack.end()
内联时,它不花费比例时间。这对我来说真的很奇怪。我相信该函数只会被调用needle.size()
次,而与haystack.size()
无关。