查找重叠时间范围

时间:2019-10-15 13:53:45

标签: python

我有以下数据,这些数据按开始时间以升序排列:

private void LoadListView()
{
   // Build up the ListViewItem that you're calling emp in your original question...
   emp.UseItemStyleForSubItems = false;
   if (emp.SubItems[2].Text == "AANWEZIG")
   {
      emp.SubItems[2].BackColor = Color.Green;
   }
}

我正在使用以下逻辑找到重叠部分:

---------------------------
Name | start | end  | count|
A    | 3:00  | 4:00 | 6    |
B    | 3:00  | 4:00 | 6    |
C    | 3:00  | 4:00 | 6    |
D    | 3:00  | 3:30 | 6    |
E    | 3:32  | 4:00 | 6    |
F    | 4:01  | 5:00 | 6    |
----------------------------

我需要生成以下输出。基本上找到所有重叠和非重叠数据

max(start1,start2) < min(end1,end2)

1 个答案:

答案 0 :(得分:1)

如果我正确理解了您的问题,如果您用每行都有一个顶点的图形表示数据,并且如果两行之间的时隙重叠,则在两行之间有一条边,那么您要寻找的是{{3 }}。使用您的输入数据,并使用maximal cliques查找团体:

import networkx as nx
G = nx.Graph()
G.add_nodes_from(['A', 'B', 'C', 'D', 'E', 'F'])
G.add_edges_from([['A', 'B'], ['A', 'C'], ['A', 'D'], ['A', 'E'], ['B', 'C'], ['B', 'D'], ['B', 'E'], ['C', 'D'], ['C', 'E']])

print(list(nx.find_cliques(G)))
# Output: [['A', 'C', 'B', 'E'], ['A', 'C', 'B', 'D'], ['F']]

您在评论中提到实际上您的数据以秒为单位,所以让我假设您提供的输入内容是整数时间。然后,您可以使用以下方法,如下所示:

def overlap(df):
    G = nx.Graph()
    G.add_nodes_from(df.Name)
    for i in range(len(df)):
        a = df.iloc[i]
        for j in range(i + 1, len(df)):
            b = df.iloc[j]
            if (a.start <= b.start and a.end >= b.start) or (b.start <= a.start and b.end >= a.start):
                G.add_edge(a.Name, b.Name)
    for clique in nx.find_cliques(G):
        yield clique, df.set_index('Name').loc[clique]['count'].sum()

以您的示例为例:

In [53]: df
Out[53]:
  Name  start  end  count
0    A    180  240      6
1    B    180  240      6
2    C    180  240      6
3    D    180  210      6
4    E    212  240      6
5    F    241  300      6

In [54]: list(overlap(df))
Out[54]: [(['F'], 6), (['B', 'C', 'A', 'D'], 24), (['B', 'C', 'A', 'E'], 24)]

或者,您感兴趣的是在任何给定时间可能存在的重叠(与上面所述不同)。注意到唯一需要考虑的时间是开始时间或结束时间,这些时间也很容易找到:

In [69]: set(tuple(df[(df.start <= t) & (df.end >= t)].Name) for t in set(df.start).union(df.end))
Out[69]: {('A', 'B', 'C', 'D'), ('A', 'B', 'C', 'E'), ('F',)}

这可以与集团查找方法相同的方式使用:

def overlap2(df):
    for overlap in set(tuple(df[(df.start <= t) & (df.end >= t)].Name) for t in set(df.start).union(df.end)):
        yield overlap, df.set_index('Name').loc[list(overlap)]['count'].sum()

例如:

In [88]: list(overlap2(df))
Out[88]: [(('F',), 6), (('A', 'B', 'C', 'E'), 24), (('A', 'B', 'C', 'D'), 24)]

举个例子,考虑一下当添加一行以200开头和220结尾的行时发生的情况:

In [90]: df
Out[90]:
  Name  start  end  count
0    A    180  240      6
1    B    180  240      6
2    C    180  240      6
3    D    180  210      6
4    E    212  240      6
5    F    241  300      6
6    G    200  220      3

In [94]: list(overlap(df))
Out[94]: [(['F'], 6), (['G', 'B', 'C', 'A', 'D'], 27), (['G', 'B', 'C', 'A', 'E'], 27)]

In [95]: list(overlap2(df))
Out[95]:
[(('A', 'B', 'C', 'E', 'G'), 27),
 (('F',), 6),
 (('A', 'B', 'C', 'D', 'G'), 27),
 (('A', 'B', 'C', 'E'), 24),
 (('A', 'B', 'C', 'D'), 24)]