Question

考虑(start, end)形式的间隔列表。区间已按其start组件在列表中排序。

我的问题是，在O(n)时间内是否有办法计算每个不同的时间间隔，与其重叠的时间间隔。

我设想在O(n lg n)时间内有几种变体工作，但我对O(n)约束感到好奇。

例如，O(n lg n)解决方案是：

将所有start和end值添加到数组A并对数组进行排序（因此我们现在位于O(n lg n)域*中），从而消除了任何重复项过程
从该数组A创建一个数组R个区域（带有R[i] = (A[i], A[i+1])的N-1个区域）。
现在只需迭代间隔数组并增加其所有相关区域的值即可。这是在O(n)。

*好吧，如果我们知道间隔在一个小区域密集包装，我们可以使用计数排序，这会让我们回到O(n)，但这似乎不是一个好的假设一般情况。

有什么方法可以将其改进为O(n)？

Answer 1

让每个时间间隔i表示为s_i,e_i（start_i，end_i）。

我可以显示 O(nlogk) 的算法，其中k是与(s_i,s_{i+1})相交的最大间隔数 - 对于某些i 。
在最坏的情况下，k位于O(n)，但它确实可以提高更多稀疏间隔的性能。

我们将在迭代列表时使用最小堆来存储间隔，最小堆将根据最终值（e_i）进行排序。

我们的想法是按升序开始迭代列表，并计算看到的间隔数，但结束值高于间隔。

伪代码（附带解释）：

h = new min heap //sorted by end value
h.push (-infinity,infinity) //add a dummy interval for avoiding dealing with empty heap cases
res = 0
for each interval (s_i,e_i) in ascending order of s_i:
        //push out all already "expired" intervals:
        while (heap.min() < s_i):
            heap.pop()
        // at this point, all intervals in the heap:
        //    1. started before s_i
        //    2. finish after s_i
        // thus, each of them is intersecting with current interval.
        res = res + heap.size() - 1 //-1 for removing dummy interval (-inf,inf)
        heap.push(e_i)
return res

时间复杂度：

每个步骤的堆大小最多为k（如上所述）。
每个间隔被推一次并移除一次，每次O（logk）。
这总计O(nlogk)。

<强>正确性：

，根据权利要求：

两个区间（s_i，e_i）和（s_j，e_j）相交，当且仅当：

s_i <= s_j <= e_i   OR s_j <= s_i <= e_j

通过检查2个区间的所有可能性，证明很简单（自4!/(2!2!)=6起我们有s_i<=e_i, s_j<=e_j种可能性）

(1) s_i <= e_i <= s_j <= e_j           - no overlap
(2) s_j <= e_j <= s_i <= e_j           - no overlap
(3) s_j <= s_i <= e_i <= e_j           - overlap, and condition meets
(4) s_i <= s_j <= e_j <= e_j           - overlap, and condition meets
(5) s_j <= s_i <= e_j <= e_i           - overlap, and condition meets
(6) s_i <= s_j <= e_i <= e_j           - overlap, and condition meets

回到证明：

所以，我们知道如果两个区间相交，当遇到第二个区间时（让它为(s_i,e_i)），第一个(s_j,e_j)仍然在堆中s_i <= e_j ，我们将(s_i,e_i)与(s_j,e_j)的交集添加到计数中。我们知道它也是正确的插入，因为我们已经看到s_j，所以我们知道s_j <= e_j <= s_i，并且通过上述声明 - 这确实是一个交叉间隔。

此外，因为对于每个交叉间隔(s_i,e_i)和(s_j,e_j)，我们保证在处理(s_j,e_j)时(s_i,e_i)仍然在堆中（来自上述声明，并且因为我们永远不会删除它，因为对于我们已经处理过的每个k：s_k <= s_i <= e_j -> e_j >= s_k），我们保证在我们添加时会计算(s_j,e_j)和(s_i,e_i)的交集第二个区间遍历中堆的大小。

QED

小假设：不确定这会很好地处理重复项，应该仔细查看<和<=比较来处理这些边缘情况。

Python代码：

intervals = [(0,3.5),(1,2),(1.5,2.5),(2.1,3),(4,5)]
#5 overlaps
def findNumOverlapping(intervals):
    import heapq
    h = []
    heapq.heappush(h, 10000) #infinity
    res = 0
    for (s,e) in intervals:
        while (heapq.nsmallest(1, h)[0] < s):
            heapq.heappop(h)
        res = res + len(h) - 1
        heapq.heappush(h,e)
    return res

Answer 2

不使用基于比较的算法。（很可能这个论点可以扩展到固定度数的代数决策树，但这会使它更具技术性。）

与排序一样，关键是要注意，用n！输出的可能性，我们不能比lg n运行得更快！ = Theta（n log n）比较，每个比较产生一个比特（假设非简并输入，因为我们在这里争论下限，在我们的控制之下，所以根本不是假设）。以下是编码/解码算法。（正式证据留作练习。）

Encoding: input is a1, ..., an such that aj in {1, ..., n + 1 - j}
          output is overlap counts c1, ..., cn of some instance

For j = 1 to n,
    Add an interval (j, j + aj - 1/2)
For j = 1 to n,
    Output the overlap count cj of the interval beginning at j

Decoding: input is overlap counts c1, ..., cn.
          output is a1, ..., an

For j = 1 to n,
    Initialize cj' = cj
For j = 1 to n,
    Set aj = cj' + 1
    For k = j + 1 to j + cj',
        ck' = ck' - 1

稍作修改，这个论证证明了参数k中amit算法的渐近最优性。

在O（n）时间内计算重叠间隔的数量？

2 个答案: