如何有效地将唯一对象从一个向量复制到另一个向量(由相同对象的子集组成)?

时间:2016-08-21 19:11:25

标签: c++ algorithm vector

如何有效地将对象(或一系列对象)从向量A复制到向量B

其中,向量B已包含与向量A

相同的某些对象

所以没有从矢量A复制的对象已经在向量B中列出了吗?

我将图表存储为std::vector<MinTreeEdge>minTreeInput中的边矢量。

我有一个根据此图创建的最小生成树,存储在std::vector<MinTreeEdge>minTreeOutput

我正在尝试添加一个随机添加一定数量的边回到minTreeOutput。为此,我想将minTreeInput中的元素复制回minTreeOutput,直到后者包含所需的边数。当然,复制的每个边缘对象都不能存储minTreeOutput。此图表中不能有重复的边缘。

以下是我到目前为止所提出的问题。它工作,但它真的很长,我知道循环必须运行多次,具体取决于图形和树。我想知道如何正确地做到这一点:

    // Edge class
    struct MinTreeEdge
    {
        // For std::unique() between objects
        bool operator==(MinTreeEdge const &rhs) const noexcept
        {
            return lhs == rhs.lhs;
        }
        int lhs;

        int node1ID;
        int node2ID;
        int weight;
        ......
    };

             ......

    // The usage
    int currentSize = minTreeOutput.size();
    int targetSize = currentSize + numberOfEdgesToReturn;
    int sizeDistance = targetSize - currentSize;
    while(sizeDistance != 0)
    {
        //Probably really inefficient

        for(std::vector<MinTreeEdge>::iterator it = minTreeInput.begin(); it != minTreeInput.begin()+sizeDistance; ++it)
            minTreeOutput.push_back(*it);

        std::vector<MinTreeEdge>::iterator mto_it;
        mto_it = std::unique (minTreeOutput.begin(), minTreeOutput.end());

        currentSize = minTreeOutput.size();
        sizeDistance = targetSize - currentSize;
    }

或者,有没有办法只列出minTreeInput(树)中<{1}}(树)中的所有边缘,而无需检查每个单独的元素在前者对抗后者?

3 个答案:

答案 0 :(得分:5)

  

如何有效地将对象(或一系列对象)从向量A复制到向量B中,其中向量B已经包含与向量A相同的某些对象,因此没有从向量A复制的对象已经在向量中列出乙

如果我正确理解了这个问题,可以解释为“如何创建两个向量的集合?”。

答案:std::set_union

set_union,其中MinTreeEdge便宜复制

请注意,要使其工作,需要对两个向量进行排序。这是出于效率原因,正如您已经提到的那样。

#include <vector>
#include <algorithm>
#include <cassert>
#include <iterator>

struct MinTreeEdge
    {
        // For std::unique() between objects
        bool operator==(MinTreeEdge const &rhs) const noexcept
        {
            return lhs == rhs.lhs;
        }
        int lhs;

        int node1ID;
        int node2ID;
        int weight;
    };

struct lower_lhs
{
  bool operator()(const MinTreeEdge& l, const MinTreeEdge& r) const noexcept
  {
    return l.lhs < r.lhs;
  }
};

std::vector<MinTreeEdge> merge(std::vector<MinTreeEdge> a, 
                               std::vector<MinTreeEdge> b)
{
  // let's pessimistically assume that the inputs are not sorted
  // we could simply assert that they are if the caller is aware of
  // the requirement

  std::sort(a.begin(), a.end(), lower_lhs());
  std::sort(b.begin(), b.end(), lower_lhs());

  // alternatively...
  // assert(std::is_sorted(a.begin(), a.end(), lower_lhs()));
  // assert(std::is_sorted(b.begin(), b.end(), lower_lhs()));

  // optional step if the inputs are not already `unique`
  a.erase(std::unique(a.begin(), a.end()), a.end());
  b.erase(std::unique(b.begin(), b.end()), b.end());

  std::vector<MinTreeEdge> result;
  result.reserve(a.size() + b.size());

  std::set_union(a.begin(), a.end(),
                        b.begin(), b.end(),
                        std::back_inserter(result), 
                        lower_lhs());

  return result;
}

int main()
{
  // example use case

  auto a = std::vector<MinTreeEdge>{};
  auto b = std::vector<MinTreeEdge>{};

  b = merge(std::move(a), std::move(b));
}

set_union,其中MinTreeEdge的复制成本很高

有一些提到要完成此事的集合。可以公平地说,如果:

  1. MinTreeEdge 昂贵,无法复制
  2. 有很多
  3. 然后我们可以期望在使用unordered_set时看到性能优势。但是,如果复制昂贵的对象,那么我们可能希望通过引用将它们存储在我们的临时集中。

    我可能会这样做:

    // utility class which converts unary and binary operations on
    // a reference_wrapper into unary and binary operations on the 
    // referred-to objects
    template<class unary, class binary>
    struct reference_as_object
    {
        template<class U>
        decltype(auto) operator()(const std::reference_wrapper<U>& l) const {
            return _unary(l.get());
        }
    
        template<class U, class V>
        decltype(auto) operator()(const std::reference_wrapper<U>& l,
                                  const std::reference_wrapper<V>& r) const {
            return _binary(l.get(), r.get());
        }
    
        unary _unary;
        binary _binary;
    };
    
    // utility to help prevent typos when defining a set of references
    template<class K, class H, class C> using unordered_reference_set =
    std::unordered_set<
    std::reference_wrapper<K>,
    reference_as_object<H, C>,
    reference_as_object<H, C>
    >;
    
    // define unary and binary operations for our set. This way we can
    // avoid polluting MinTreeEdge with artificial relational operators
    
    struct mte_hash
    {
        std::size_t operator()(const MinTreeEdge& mte) const
        {
            return std::hash<int>()(mte.lhs);
        }
    };
    
    struct mte_equal
    {
        bool operator()(MinTreeEdge const& l, MinTreeEdge const& r) const
        {
            return l.lhs == r.lhs;
        }
    };
    
    // merge function. arguments by value since we will be moving
    // *expensive to copy* objects out of them, and the vectors themselves
    // can be *moved* into our function very cheaply
    
    std::vector<MinTreeEdge> merge2(std::vector<MinTreeEdge> a,
                                    std::vector<MinTreeEdge> b)
    {
        using temp_map_type = unordered_reference_set<MinTreeEdge, mte_hash, mte_equal>;
    
        // build a set of references to existing objects in b
        temp_map_type tmap;
        tmap.reserve(b.capacity());
    
        // b first, since the requirements mentioned 'already in B'
        for (auto& ob : b) { tmap.insert(ob); }
    
        // now add missing references in a
        for (auto& oa : a) { tmap.insert(oa); }
    
        // now build the result, moving objects from a and b as required
        std::vector<MinTreeEdge> result;
        result.reserve(tmap.size());
    
        for (auto r : tmap) {
            result.push_back(std::move(r.get()));
        }
    
        return result;
    
        // a and b now have elements which are valid but in an undefined state
        // The elements which are defined are the duplicates we don't need
        // on summary, they are of no use to us so we drop them.
    }
    

    Trimmings - MinTreeEdge复制起来很昂贵,但移动起来非常便宜

    让我们说我们想要坚持使用矢量方法(我们几乎总是应该这样),但MinTreeEdge的复制费用有点贵。假设它使用pimpl习惯用于内部多态,这将不可避免地意味着复制上的内存分配。但是,让我们说它的价格便宜。我们还想象一下,在将数据发送给我们之前,不能指望调用者对数据进行排序或唯一化。

    我们仍然可以通过标准算法和载体实现良好的效率:

    std::vector<MinTreeEdge> merge(std::vector<MinTreeEdge> a,
                                   std::vector<MinTreeEdge> b)
    {
        // sorts a range if not already sorted
        // @return a reference to the range
        auto maybe_sort = [] (auto& c) -> decltype(auto)
        {
            auto begin = std::begin(c);
            auto end = std::end(c);
            if (not std::is_sorted(begin, end, lower_lhs()))
                std::sort(begin, end, lower_lhs());
            return c;
        };
    
        // uniqueify a range, returning the new 'end' of
        // valid data
        // @pre c is sorted
        // @return result of std::unique(...)
        auto unique = [](auto& c) -> decltype(auto)
        {
            auto begin = std::begin(c);
            auto end = std::end(c);
            return std::unique(begin, end);
        };
    
        // turn an iterator into a move-iterator        
        auto mm = [](auto iter) { return std::make_move_iterator(iter); };
    
    
        std::vector<MinTreeEdge> result;
        result.reserve(a.size() + b.size());
    
        // create a set_union from two input containers.
        // @post a and b shall be in a valid but undefined state
    
        std::set_union(mm(a.begin()), mm(unique(maybe_sort(a))),
                       mm(b.begin()), mm(unique(maybe_sort(b))),
                       std::back_inserter(result),
                       lower_lhs());
    
        return result;
    }
    

    如果一个提供自由函数void swap(MinTreeEdge& l, MinTreeEdge& r) nothrow,那么此函数将需要恰好N个移动,其中N是结果集的大小。因为在pimpl类中,移动只是一个指针交换,这个算法仍然有效。

答案 1 :(得分:1)

由于输出向量不应包含重复项,因此完成不存储重复项的一种方法是将输出容器更改为std::set<MinEdgeTree>而不是std::vector<MinEdgeTree>。原因是std::set不存储重复项,因此您不必编写代码来自行检查。

首先,您需要为operator <类定义MinEdgeTree

 struct MinTreeEdge
 {
     // For std::unique() between objects
     bool operator==(MinTreeEdge const &rhs) const noexcept
     {
         return lhs == rhs.lhs;
     }
     // For std::unique() between objects
     bool operator<(MinTreeEdge const &rhs) const noexcept
     {
         return lhs < rhs.lhs;
     }
//...
};

一旦这样做,while循环可以替换为以下内容:

#include <set>
#include <vector>
#include <iterator>
#include <algorithm>
//...
std::vector<MinTreeEdge> minTreeInput;
//...
std::set<MinTreeEdge> minTreeOutput;
//...
std::copy(minTreeInput.begin(), minTreeInput.end(), 
          std::inserter(minTreeOutput, minTreeOutput.begin())); 

根本不需要致电std::unique,因为std::set会检查重复项。

如果输出容器必须保留为std::vector,您仍然可以使用临时std::set执行上述操作,然后将std::set复制到输出向量:

std::vector<MinTreeEdge> minTreeInput;
std::vector<MinTreeEdge> minTreeOutput;
//... 
std::set<MinTreeEdge> tempSet;
std::copy(minTreeInput.begin(), minTreeInput.end(), 
          std::inserter(tempSet, tempSet.begin())); 

std::copy(tempSet.begin(), tempSet.end(),std::back_inserter(minTreeOutput));

答案 2 :(得分:0)

您可以使用以下内容:

struct MinTreeEdge
{
    bool operator<(MinTreeEdge const &rhs) const noexcept
    {
        return id < rhs.id;
    }
    int id;

    int node1ID;
    int node2ID;
    int weight;
};

std::vector<MinTreeEdge> CreateRandomGraph(const std::vector<MinTreeEdge>& minSpanningTree,
                                           const std::vector<MinTreeEdge>& wholeTree,
                                           std::mt19937& rndEng,
                                           std::size_t expectedSize)
{
    assert(std::is_sorted(minSpanningTree.begin(), minSpanningTree.end())); 
    assert(std::is_sorted(wholeTree.begin(), wholeTree.end())); 
    assert(minSpanningTree.size() <= expectedSize);
    assert(expectedSize <= wholeTree.size());

    std::vector<MinTreeEdge> res;
    std::set_difference(wholeTree.begin(), wholeTree.end(),
                        minSpanningTree.begin(), minSpanningTree.end(),
                        std::back_inserter(res));

    std::shuffle(res.begin(), res.end(), rndEng);
    res.resize(expectedSize - minSpanningTree.size());
    res.insert(res.end(), minSpanningTree.begin(), minSpanningTree.end());
    // std::sort(res.begin(), res.end());
    return res;
}