Question

如何确定2个向量的差异是什么？

我有vector<int> v1和vector<int> v2;

我要找的是vector<int> vDifferences仅包含v1或v2中的元素。

有没有一种标准方法可以做到这一点？

Answer 1

以下是完整且正确的答案。在使用set_symmetric_difference算法之前，必须订购的源范围：

using namespace std; // For brevity, don't do this in your own code... vector<int> v1; vector<int> v2; // ... Populate v1 and v2 // For the set_symmetric_difference algorithm to work, // the source ranges must be ordered! vector<int> sortedV1(v1); vector<int> sortedV2(v2); sort(sortedV1.begin(),sortedV1.end()); sort(sortedV2.begin(),sortedV2.end()); // Now that we have sorted ranges (i.e., containers), find the differences vector<int> vDifferences; set_symmetric_difference( sortedV1.begin(), sortedV1.end(), sortedV2.begin(), sortedV2.end(), back_inserter(vDifferences)); // ... do something with the differences
应该注意，排序是一种昂贵的操作（即O(n log n) for common STL implementations）。特别是对于一个或两个容器非常大（即，数百万或更多）的情况，基于算法复杂度，使用散列表的不同算法可能是优选的。以下是该算法的高级描述：



将每个容器加载到哈希表中。

如果两个容器的大小不同，则对应于较小的哈希表将在步骤3中用于遍历。否则，将使用两个哈希表中的第一个。

遍历在步骤2中选择的哈希表，检查两个哈希表中是否存在每个项目。如果是，请将它们从两个中删除。哈希表较小的原因   遍历的首选是因为无论容器大小如何，哈希表查找均为 O（1）。   因此，遍历时间是n的线性函数（即 O（n）），其中n是遍历的哈希表的大小。

获取哈希表中其余项的并集，并将结果存储在差异中   容器


C ++ 11通过标准化unordered_multiset容器为我们提供了这种解决方案的一些功能。我还使用auto关键字的新用法进行显式初始化，以使以下基于哈希表的解决方案更简洁：

using namespace std; // For brevity, don't do this in your own code... // The remove_common_items function template removes some and / or all of the // items that appear in both of the multisets that are passed to it. It uses the // items in the first multiset as the criteria for the multi-presence test. template <typename tVal> void remove_common_items(unordered_multiset<tVal> &ms1, unordered_multiset<tVal> &ms2) { // Go through the first hash table for (auto cims1=ms1.cbegin();cims1!=ms1.cend();) { // Find the current item in the second hash table auto cims2=ms2.find(*cims1); // Is it present? if (cims2!=ms2.end()) { // If so, remove it from both hash tables cims1=ms1.erase(cims1); ms2.erase(cims2); } else // If not ++cims1; // Move on to the next item } } int main() { vector<int> v1; vector<int> v2; // ... Populate v1 and v2 // Create two hash tables that contain the values // from their respective initial containers unordered_multiset<int> ms1(v1.begin(),v1.end()); unordered_multiset<int> ms2(v2.begin(),v2.end()); // Remove common items from both containers based on the smallest if (v1.size()<=v2.size) remove_common_items(ms1,ms2); else remove_common_items(ms2,ms1); // Create a vector of the union of the remaining items vector<int> vDifferences(ms1.begin(),ms1.end()); vDifferences.insert(vDifferences.end(),ms2.begin(),ms2.end()); // ... do something with the differences }

为了确定哪种解决方案对特定情况更好，分析两种算法将是一种明智的行动方案。尽管基于哈希表的解决方案在O（n）中，但是它需要更多代码并且每找到一个副本（即，哈希表删除）执行更多工作。它（遗憾地）它使用自定义差分函数而不是标准STL算法。

应该注意的是，两种解决方案都呈现出一种顺序的差异，该顺序很可能与元素出现在原始容器中的顺序完全不同。通过使用哈希表解决方案的变体，可以解决这个问题。接下来是高级描述（仅在步骤4中与前面的解决方案不同）：



将每个容器加载到哈希表中。

如果两个容器的大小不同，则在步骤3中将使用较小的哈希表进行遍历。否则，将使用这两个中的第一个。

遍历在步骤2中选择的哈希表，检查两个哈希表中是否存在每个项目。如果是，请将它们从两个中删除。

要形成差异容器，请按顺序遍历原始容器（即第二个容器在第二个容器之前）。在每个容器的相应哈希表中查找每个项目。如果找到，则将该项添加到差异容器中并从其哈希表中删除。将跳过不存在于相应哈希表中的项目。因此，只有哈希表中存在的项目才会在差异容器中结束，并且它们的出现顺序将保持与原始容器中的相同，因为这些容器决定了最终遍历的顺序。


为了维持原始订单，步骤4变得比以前的解决方案更昂贵，特别是如果删除的项目数量很高。这是因为：

将通过各自哈希表中的状态测试，第二次测试所有项目是否有资格出现在差异容器中。

哈希表将在差异容器形成时一次删除其余项目，作为在项目1的差异测试中的一部分。

Answer 2

您是否希望来自 v1和v2的元素是唯一的而不是其他序列中的元素？这对我来说听起来像std::set_symmetric_difference。

复制不存在的范围[first1，last1]的元素在[first2，last2]范围内，以及范围的元素 [first1，last2]不在[first1，last1]范围内范围从结果开始。构造范围内的元素排序。

std :: vector差异

2 个答案: