Question

我有一系列名字，但我只需要唯一的名字。我使用std::set以便清除重复内容。然而，我需要名称以与输入相同的顺序显示。这意味着如果我的输入是：

Mary
Mary
John
John
John
Apple
Apple
Apple

[编辑]：在检查评论/答案后，我想强调每个名字都出现在组中，并且稍后不会在输入中显示。参考示例，Mary出现两次，即。它稍后不再出现。[/ Edit]

我希望我的输出为：

Mary
John
Apple

使用std::set，我得到了排序的一个：

Apple
John
Mary

我发现有unordered_set（来自{cplusplus.com}）。这个再次不保持输入顺序。

问题：

有没有办法阻止std::set排序？
我看过{one can write own's sorting method for std::set}。现在，如果我无法阻止set排序，那么编写我自己的排序方法怎么样，但总是将输入的第一个元素作为最小元素返回？（如果我能了解如何做到这一点......）
或std中还有其他东西可以将一组字符串减少为一个唯一的集合，但是不对它进行排序吗？

谢谢！

Answer 1

最简单的方法是保留2个集合，vector和set（或unordered_set）。这将消耗更多内存，但会使用set检查重复项（O(log N)时间内）和vector以维持顺序。

set也可以包含项目向量中的位置，并具有谓词{{1}}。稍微复杂，因为您需要在特殊谓词中存储向量/指针。但是它可以完成并且将使用可能更少的内存，因为您只有一个字符串集合而另一个是int。此外，它还充当索引，能够快速找到特定项目的位置。

Answer 2

您正在尝试更改基本设计实施。相反，你应该重新思考自己的设计，而不是试图违背标准库的内容。

我的解决方案是使用std::vector<std::string>并根据您的计划目标：

在推送向量之前检查重复

或

创建一个函数以返回唯一名称的新向量

这些实现中的任何一个都将保留插入顺序，并且您可以按照自己的条件处理重复项。

这是第二个版本：

#include <iostream>
#include <string>
#include <vector>

std::vector<std::string> collection;

std::vector<std::string> getUniques(std::vector<std::string> collection)
{
    std::vector<std::string> uniques;
    for (std::string name : collection)
    {
        if (std::find(uniques.begin(), uniques.end(), name) == uniques.end())
            uniques.push_back(name);
    }

    return uniques;
}

int main()
{
    collection.push_back("John");
    collection.push_back("John");
    collection.push_back("Sally");
    collection.push_back("Kent");
    collection.push_back("Jim");
    collection.push_back("Sally");

    std::vector<std::string> uniques = getUniques(collection);

    for (std::string name : uniques)
        std::cout << name << std::endl;
}

收率：

John
Sally
Kent
Jim

Answer 3

第一个问题：没有。根据cplusplus.com：

集合是按特定订单存储唯一元素的容器。

第二个问题：你需要有2点数据才能做到这一点。第一个是你的实际字符串，第二个是'插入索引'，所以你可以存储插入顺序。

所以基本上，如果你把std :: pair放在你的std :: set中并且基本上增加你放在std :: pair中的数字，你就可以这样做。但是，一旦你这样做，就意味着每个std :: pair都是唯一的，这意味着'std :: set'的使用已经消失了。

上面已经听起来太复杂了，为什么不选择更合适的容器呢？例如，您可以使用std :: vector并在插入时删除双精度数。

如果这太慢（O（N）插入），你可以有一个std :: vector用于有序存储，并在它旁边保留一个std :: set，以便能够快速检查唯一性。

Answer 4

从你的例子来看，似乎相等的价值观相互依存。

如果是这种情况，则不需要复杂性：您可以开始填充新数组并逐个复制元素，除非它们与前一个相同。这是一个简单的O（N）过程。

Answer 5

而不是std :: set使用std :: unique

#include <iostream>
#include <algorithm>
#include <vector>
#include <cstring>

using namespace std;

bool myfunction (char *i,char *j) 
{
    int x=strcmp(i,j);
    if(!x)
        return 1;
    else
        return 0;
}

int main () 
{
  char mywords[][10] = {"Mary","Mary","John","John","John","Apple","Apple","Apple"};
  vector<char*> myvector (mywords,mywords+8);
  vector<char*>::iterator it;
  it = unique (myvector.begin(), myvector.end(), myfunction);
  myvector.resize(distance(myvector.begin(),it));

  cout << "Output:";
  for (it=myvector.begin(); it!=myvector.end(); ++it)
    cout << ' ' << *it;
  cout << endl;

  return 0;
}

Answer 6

在阅读完所有评论和答案后，我认为回答自己问题的最直接方法是使用std::vector和std::unique。

要注意的是：

我有一个很小的名字列表。不应超过2000个名字。
每个名称都显示在群集中。如果Mary出现2次，则列表的其余部分将不再显示。
我只需要获得一组唯一的名称，但保留初始排序。
获得该唯一集后，我不需要再对该集进行任何操作（插入/删除/等）。

所以这是我的编码：

#include <vector>

int main()
{
    std::vector<std::string> names;
    std::vector<std::string>::iterator last;
    std::vector<std::string>::iterator it;

    names.push_back("Mary");
    names.push_back("Mary");
    names.push_back("John");
    names.push_back("John");
    names.push_back("John");
    names.push_back("Apple");
    names.push_back("Apple");
    names.push_back("Apple");

    last = std::unique(names.begin(), names.end());
    for (it = names.begin(); it != last; ++it)
        std::cout << *it << endl;
}

所以输出将是（我想要的）：

Mary
John
Apple

就是这样。感谢那些贡献者。随意评论，尤其是关于效率的评论。

如何从排序中停止std :: set？

6 个答案: