为什么无序集合会混合值

时间:2019-02-27 13:46:22

标签: c++ unordered-set

我正在尝试通过使用unordered_set从向量中删除重复项。但是我的设计创建了一个unordered_set,它不能正确维护顺序。在此示例中,“ z”不是结尾。我究竟做错了什么?预先谢谢你。

编辑:对不起,如果我不清楚我在寻找什么。我希望输出为“ e,d,a,b,c,z”,我想保持原始顺序,但删除重复项。我目前正在使用大约3种不同的for循环和init向量的额外副本进行工作。我只是在寻找可能更清洁的STL函数。

产生的输出: e d a b c a a a b b b b c z 打印无序集 e d a z b c

#include <iostream> 
#include <iterator>     
#include <algorithm>    
#include <string>
#include <unordered_set>
using namespace std;

int main() {
    vector<string>terminals = { "e", "d", "a", "b", "c", "a", "a", "a", "a", "b","b", "b", "b", "c", "z" };
    for (vector<string>::iterator it = terminals.begin(); it != terminals.end(); it++) // print given vector
        cout << *it << " ";
    cout << endl;
    unordered_set<string> newSet;
    copy(terminals.begin(), terminals.end(), inserter(newSet, newSet.end()));
    cout << "printing unordered set" << endl;
    for (unordered_set<string>::iterator it = newSet.begin(); it != newSet.end(); it++)
        cout << *it << " ";
    cout << endl;
    //system("pause");
    return 0;
}

5 个答案:

答案 0 :(得分:5)

std::unordered_set

  

在内部,元素没有以任何特定顺序排序,但是   整理成水桶。元素放入哪个存储桶取决于   完全取决于其价值。这样可以快速访问   各个元素,因为一旦计算出哈希值,它就是指   元素所在的确切存储桶。

如果需要订购独特的端子,请使用std::set

#include <iostream>
#include <vector>
#include <string>
#include <set>

int main() {
    std::vector<std::string>terminals = { "e", "d", "a", "b", "c", "a", "a", "a", "a", "b","b", "b", "b", "c", "z" };

    for(const std::string& terminal : terminals) // print given vector
        std::cout << terminal << " ";
    std::cout << "\n";;

    // populate the set directly from the vectors iterators:
    std::set<std::string> newSet(terminals.begin(), terminals.end());;

    std::cout << "printing the (ordered) set:" << "\n";;
    for(const std::string& terminal : newSet)
        std::cout << terminal << " ";
    std::cout << "\n";;
}

如果您要保持原始的顺序,则不能使用其中任何一个作为最终存储,但是可以使用std::unordered_set作为值的缓存/黑名单,已经插入了您的最终存储空间。

#include <iostream>
#include <vector>
#include <string>
#include <algorithm>
#include <unordered_set>

int main() {
    std::vector<std::string>terminals = { "e", "d", "a", "b", "c", "a", "a", "a", "a", "b","b", "b", "b", "c", "z" };

    for(const std::string& terminal : terminals) // print given vector
        std::cout << terminal << " ";
    std::cout << "\n";;

    std::vector<std::string> newSet; // not really a set anymore
    std::unordered_set<std::string> cache; // blacklist

    // try to insert all terminals and only when an insert is successful,
    // put the terminal in newSet

    std::for_each(terminals.begin(), terminals.end(),
        [&](const std::string& terminal) {
            auto [it, inserted] = cache.insert(terminal);
            if(inserted)
                newSet.push_back(terminal);
        }
    );

    std::cout << "printing the vector of unique terminals:" << "\n";;
    for(const std::string& terminal : newSet)
        std::cout << terminal << " ";
    std::cout << "\n";;
}

如果您希望原始订单不介意直接对原始terminals向量进行更改,则可以将std::remove_ifunordered_set结合使用很好,因为它不需要新的向量。这是@Marek R答案的带注释变体:

请先阅读以下内容:Erase–remove idiom

int main() {
    std::vector<std::string>terminals = { "e", "d", "a", "b", "c", "a", "a", "a", "a", "b","b", "b", "b", "c", "z" };

    for(const std::string& terminal : terminals) // print given vector
        std::cout << terminal << " ";
    std::cout << "\n";;

    std::unordered_set<std::string> cache; // blacklist

    // remove_if() moves all entries in your container, for which the
    // UnaryPredicate(*) returns true, to the end of the container. It returns
    // an iterator pointing to the first element in the vector that was
    // moved - which is a suitable starting point for a subsequent erase().
    //
    // (*) UnaryPredicate: A callable that returns true or false given a single
    //                     value.

    // auto past_new_end = std::vector<std::string>::iterator past_new_end
    auto past_new_end = std::remove_if(terminals.begin(), terminals.end(),
        // this lambda is the UnaryPredicate
        [&](const std::string& terminal) {
            // insert returns a std::pair<Iterator, bool>
            // where the bool (.second in the pair) is false
            // if the value was not inserted (=it was already present)
            return cache.insert(terminal).second == false;
        }
    );

    std::cout << "display all the entries (now with unspecified values) "
                 "that will be erased:\n";
    std::copy(past_new_end, terminals.end(),
                            std::ostream_iterator<std::string>(std::cout, "<"));
    std::cout << "\n";

    // erase all the moved entries
    terminals.erase(past_new_end, terminals.end());

    std::cout << "printing the unique terminals:" << "\n";;
    for(const std::string& terminal : terminals)
        std::cout << terminal << " ";
    std::cout << "\n";;
}

答案 1 :(得分:2)

好像您想使用(ordered) set

编辑:实际上看起来像您没有。 std::vector可以工作,但这可能不是最干净的解决方法。

答案 2 :(得分:2)

如果您想保留原始顺序,但要强制执行唯一性操作,则可能需要:

  1. 阅读项目。
  2. 尝试将其插入集合
  3. 如果成功,则它不在集合中,所以也将其复制到输出中
  4. 重复

如果要对输出进行排序(因此,在您的示例中,输出将为“ abcdez”),则可以将项目插入"serve": { "builder": "@angular-devkit/build-angular:dev-server", "options": { "browserTarget": "yourapp:build", "proxyConfig": "proxy.json" }, "configurations": { "production": { "browserTarget": "yourapp:build:production" } } } 中,也可以使用std::set后跟std::sort,以获取输入中每个唯一元素的确切位置。

答案 3 :(得分:0)

您还可以使用unordered map,然后将该项目存储为地图的键,并将索引存储为该键的对应值。

答案 4 :(得分:0)

  

我正在尝试通过使用unordered_set从向量中删除重复项。

为什么您认为unordered_set保留任何顺序? 名称清楚地表明没有任何特定的顺序。

您应该仅使用unordered_set来跟踪是否已按顺序找到项目。基于此,您可以从序列中删除项目,因此应如下所示:

void removeDuplicates(Data &data)
{
    std::unordered_set<std::string> foundItems;
    auto newEnd = std::remove_if(data.begin(), data.end(), [&foundItems](const auto &s)
                                 {
                                     return !foundItems.insert(s).second;
                                 });
    data.erase(newEnd, data.end());
}

https://wandbox.org/permlink/T24UfiLQep0XUQhQ