Question

我试图找到一种合理的算法来组合下面定义的多个列表/矢量/数组。

每个元素都包含一个浮点数，用于声明其有效范围的起点以及在此范围内使用的常量。如果不同列表的范围重叠，则需要添加它们的常量以生成一个全局列表。

我已经尝试过下面的插图，尝试很好地了解我的意思：

fputs("mystring\n", stdout)

在n个列表的情况下，我无法想出一个明智的方法来解决这个问题。只是2很容易暴力。

欢迎任何提示或想法。每个列表都表示为C ++ First List: 0.5---------------2------------3.2--------4 a1 a2 a3 Second List: 1----------2----------3---------------4.5 b1 b2 b3 Desired Output: 0.5----1----------2----------3-3.2--------4--4.5 a1 a1+b1 a2+b2 ^ a3+b3 b3 b3+a2（因此可以随意使用标准算法），并按范围值的开头进行排序。干杯！

编辑：感谢您的建议，我提出了一个天真的实施，不知道为什么我不能先自己到这里。在我看来，显而易见的改进是为每个向量存储一个迭代器，因为它们已经排序并且不必为每个点重新遍历每个向量。鉴于大多数向量将包含少于100个元素，但可能存在许多向量，这可能是也可能不值得。我不得不介绍一下。

对此有何想法？

std::vector

编辑2： @ Stas的代码是解决此问题的一种非常好的方法。我只是在我能想到的所有边缘情况上测试过它。这是我的#include <vector> #include <iostream> struct DataType { double intervalStart; int data; // More data here, the data is not just a single int, but that // works for our demonstration }; int main(void) { // The final "data" of each vector is meaningless as it refers to // the coming range which won't be used as this is only for // bounded ranges std::vector<std::vector<DataType> > input = {{{0.5, 1}, {2.0, 3}, {3.2, 3}, {4.0, 4}}, {{1.0, 5}, {2.0, 6}, {3.0, 7}, {4.5, 8}}, {{-34.7895, 15}, {-6.0, -2}, {1.867, 5}, {340, 7}}}; // Setup output vector std::vector<DataType> output; std::size_t inputSize = 0; for (const auto& internalVec : input) inputSize += internalVec.size(); output.reserve(inputSize); // Fill output vector for (const auto& internalVec : input) std::copy(internalVec.begin(), internalVec.end(), std::back_inserter(output)); // Sort output vector by intervalStartPoints std::sort(output.begin(), output.end(), [](const DataType& data1, const DataType& data2) { return data1.intervalStart < data2.intervalStart; }); // Remove DataTypes with same intervalStart - each interval can only start once output.erase(std::unique(output.begin(), output.end(), [](const DataType& dt1, const DataType& dt2) { return dt1.intervalStart == dt2.intervalStart; }), output.end()); // Output now contains all the right intersections, just not with the right data // Lambda to find the associated data value associated with an // intervsalStart value in a vector auto FindDataValue = [&](const std::vector<DataType> v, double startValue) { auto iter = std::find_if(v.begin(), v.end(), [startValue](const DataType& data) { return data.intervalStart > startValue; }); if (iter == v.begin() || iter == v.end()) { return 0; } return (iter-1)->data; }; // For each interval in the output traverse the input and sum the // data constants for (auto& val : output) { int sectionData = 0; for (const auto& iv : input) sectionData += FindDataValue(iv, val.intervalStart); val.data = sectionData; } for (const auto& i : output) std::cout << "loc: " << i.intervalStart << " data: " << i.data << std::endl; return 0; }实施，以防有人感兴趣。我必须对Stas提供的片段做出的唯一微小改变是：

merge_intervals

在按照建议组合载体之前。谢谢！

for (auto& v : input)
    v.back().data = 0;

Answer 1

不幸的是，你的算法本来就很慢。分析或应用某些特定于C ++的调整没有意义，它无济于事。它永远不会停止计算非常小的集合，例如合并1000个每个元素的1000个列表。

让我们试着评估算法的时间复杂度。为简单起见，我们只合并相同长度的列表。

L - 列表的长度
N - 要合并的列表数量
T = L * N - 整个连续列表的长度

算法步骤的复杂性：

创建输出向量 - O(T)
排序输出向量 - O(T*log(T))
过滤输出向量 - O(T)
修复输出向量中的数据 - O(T*T)

请参阅，最后一步定义了整个算法的复杂性：O(T*T) = O(L^2*N^2)。实际应用是不可接受的。请参阅，要合并每个10000个元素的1000个列表，算法应运行10^14个周期。

实际上，任务非常复杂，所以不要试图一步解决它。分而治之！

编写一个将两个列表合并为一个
使用它来合并列表列表

将两个列表合并为一个

这相对容易实现（但要小心角落情况）。该算法应具有线性时间复杂度：O(2*L)。看看std::merge是如何实现的。您只需要编写std::merge的自定义变体，我们称之为merge_intervals。

将合并算法应用于列表列表

这有点棘手，但又一次，分而治之！我们的想法是进行递归合并：将两个列表中的列表拆分并合并。

template<class It, class Combine>
auto merge_n(It first, It last, Combine comb)
   -> typename std::remove_reference<decltype(*first)>::type
{
    if (first == last)
        throw std::invalid_argument("Empty range");

    auto count = std::distance(first, last);

    if (count == 1)
       return *first;

    auto it = first;
    std::advance(it, count / 2);
    auto left = merge_n(first, it, comb);
    auto right = merge_n(it, last, comb);
    return comb(left, right);
}

用法：

auto combine = [](const std::vector<DataType>& a, const std::vector<DataType>& b)
{
   std::vector<DataType> result;
   merge_intervals(a.begin(), a.end(), b.begin(), b.end(),
         std::back_inserter(result));
   return result;
};

auto output = merge_n(input.begin(), input.end(), combine);

这种递归方法的优点是时间复杂度：对于整个算法，它是O(L*N*log(N))。因此，要合并每个10000个元素的1000个列表，该算法应运行10000 * 1000 * 9.966 = 99,660,000个周期。它比原始算法快1,000,000倍。

此外，这种算法本质上是可并行化的。编写merge_n的并行版本并在线程池上运行它并不是什么大问题。

Answer 2

我知道我参加派对有点晚了，但是当我开始写这篇文章时你还没有一个合适的答案，我的解决方案应该有一个相对较好的时间复杂度，所以你走了：

我认为最简单的方法是将每个排序列表视为事件流：在给定时间，（该流的）值更改为新值：

template<typename T>
struct Point {
  using value_type = T;
  float time;
  T value;
};

您希望将这些流叠加到单个流中（即在任何给定点处将它们的值相加）。为此，您从所有流中获取最早的事件，并将其效果应用于结果流。因此，您需要首先“撤消”该流中先前值对结果流的影响，然后将新值添加到结果流的当前值。

为了能够做到这一点，你需要记住每个流的最后一个值，下一个值（当流为空时）：

std::vector<std::tuple<Value, StreamIterator, StreamIterator>> streams;

元组的第一个元素是该流对结果流的最后影响，第二个是指向流下一个事件的迭代器，最后一个是该流的结束迭代器：

transform(from, to, inserter(streams, begin(streams)),
    [] (auto & stream) {
      return make_tuple(static_cast<Value>(0), begin(stream), end(stream));
    });

为了能够始终获得所有流的最早事件，有助于将（关于）信息流保存在（min）堆中，其中top元素是具有下一个（最早）事件的流。这是以下比较器的目的：

auto heap_compare = [] (auto const & lhs, auto const & rhs) {
       bool less = (*get<1>(lhs)).time < (*get<1>(rhs)).time;
       return (not less);
     };

然后，只要还有一些事件（即某些非空的流），首先（重新）构建堆，获取top元素并将其下一个事件应用于结果流，然后删除该元素从溪流。最后，如果流现在为空，请将其删除。

// The current value of the result stream.
Value current = 0;
while (streams.size() > 0) {
  // Reorder the stream information to get the one with the earliest next
  // value into top ...
  make_heap(begin(streams), end(streams), heap_compare);
  // .. and select it.
  auto & earliest = streams[0];
  // New value is the current one, minus the previous effect of the selected
  // stream plus the new value from the selected stream
  current = current - get<0>(earliest) + (*get<1>(earliest)).value;
  // Store the new time point with the new value and the time of the used
  // time point from the selected stream
  *out++ = Point<Value>{(*get<1>(earliest)).time, current};
  // Update the effect of the selected stream
  get<0>(earliest) = (*get<1>(earliest)).value;
  // Advance selected stream to its next time point
  ++(get<1>(earliest));
  // Remove stream if empty
  if (get<1>(earliest) == get<2>(earliest)) {
    swap(streams[0], streams[streams.size() - 1u]);
    streams.pop_back();
  }
}

这将返回一个流，其中可能存在多个具有相同时间但具有不同值的点。当同时存在多个“事件”时会发生这种情况。如果您只想要最后一个值，即所有这些事件发生后的值，那么需要将它们组合起来：

merge_point_lists(begin(input), end(input), inserter(merged, begin(merged)));
// returns points with the same time, but with different values. remove these
// duplicates, by first making them REALLY equal, i.e. setting their values
// to the last value ...
for (auto write = begin(merged), read = begin(merged), stop = end(merged);
    write != stop;) {
  for (++read; (read != stop) and (read->time == write->time); ++read) {
    write->value = read->value;
  }
  for (auto const cached = (write++)->value; write != read; ++write) {
    write->value = cached;
  }
}
// ... and then removing them.
merged.erase(
    unique(begin(merged), end(merged),
        [](auto const & lhs, auto const & rhs) {
          return (lhs.time == rhs.time);}),
    end(merged));

（Live example here）

关于时间复杂度：这是迭代所有“事件”，因此它取决于事件的数量e。第一个make_heap调用必须构建一个完整的新堆，这具有3 * s的最坏情况复杂性，其中s是函数必须合并的流的数量。在后续调用中，make_heap只需要更正第一个元素，这是log(s')的最坏情况复杂度。我写s'因为流的数量（需要考虑）将减少到零。这个给出

3s + (e-1) * log(s')

复杂性。假设最坏的情况，s'缓慢减少（当事件在流中均匀分布时发生这种情况，即所有流具有相同数量的事件：

3s + (e - 1 - s) * log(s) + (sum (log(i)) i = i to s)

Answer 3

您真的需要数据结构吗？我不这么认为。实际上，您正在定义几个可以添加的功能。您给出的示例使用＆＃39; start，value（，隐含结束）＆＃39;进行编码。元组。基本构建块是一个在某一点查找其值的函数：

double valueAt(const vector<edge> &starts, float point) {
    auto it = std::adjacent_find(begin(starts), end(starts),
        [&](edge e1, edge e2) {
            return e1.x <= point && point < e2.x;
    });
    return it->second;
};

点的函数值是所有代码系列的函数值的总和。

如果你真的需要一个列表，你可以加入并排序所有系列的所有edge.x值，并从中创建列表。

除非性能成为问题：）

Answer 4

如果你可以结合其中两种结构，你可以组合很多。

首先，将您的std::vector封装到一个类中。如果您愿意，可以实现您所知的operator+=（并根据此定义operator+）。有了它，你可以组合任意多个，只需重复添加即可。您甚至可以使用std::accumulate来组合它们的集合。

以特定方式组合数组/列表

4 个答案: