我有以下数据,这些数据按开始时间以升序排列:
private void LoadListView()
{
// Build up the ListViewItem that you're calling emp in your original question...
emp.UseItemStyleForSubItems = false;
if (emp.SubItems[2].Text == "AANWEZIG")
{
emp.SubItems[2].BackColor = Color.Green;
}
}
我正在使用以下逻辑找到重叠部分:
---------------------------
Name | start | end | count|
A | 3:00 | 4:00 | 6 |
B | 3:00 | 4:00 | 6 |
C | 3:00 | 4:00 | 6 |
D | 3:00 | 3:30 | 6 |
E | 3:32 | 4:00 | 6 |
F | 4:01 | 5:00 | 6 |
----------------------------
我需要生成以下输出。基本上找到所有重叠和非重叠数据
max(start1,start2) < min(end1,end2)
答案 0 :(得分:1)
如果我正确理解了您的问题,如果您用每行都有一个顶点的图形表示数据,并且如果两行之间的时隙重叠,则在两行之间有一条边,那么您要寻找的是{{3 }}。使用您的输入数据,并使用maximal cliques查找团体:
import networkx as nx
G = nx.Graph()
G.add_nodes_from(['A', 'B', 'C', 'D', 'E', 'F'])
G.add_edges_from([['A', 'B'], ['A', 'C'], ['A', 'D'], ['A', 'E'], ['B', 'C'], ['B', 'D'], ['B', 'E'], ['C', 'D'], ['C', 'E']])
print(list(nx.find_cliques(G)))
# Output: [['A', 'C', 'B', 'E'], ['A', 'C', 'B', 'D'], ['F']]
您在评论中提到实际上您的数据以秒为单位,所以让我假设您提供的输入内容是整数时间。然后,您可以使用以下方法,如下所示:
def overlap(df):
G = nx.Graph()
G.add_nodes_from(df.Name)
for i in range(len(df)):
a = df.iloc[i]
for j in range(i + 1, len(df)):
b = df.iloc[j]
if (a.start <= b.start and a.end >= b.start) or (b.start <= a.start and b.end >= a.start):
G.add_edge(a.Name, b.Name)
for clique in nx.find_cliques(G):
yield clique, df.set_index('Name').loc[clique]['count'].sum()
以您的示例为例:
In [53]: df
Out[53]:
Name start end count
0 A 180 240 6
1 B 180 240 6
2 C 180 240 6
3 D 180 210 6
4 E 212 240 6
5 F 241 300 6
In [54]: list(overlap(df))
Out[54]: [(['F'], 6), (['B', 'C', 'A', 'D'], 24), (['B', 'C', 'A', 'E'], 24)]
或者,您感兴趣的是在任何给定时间可能存在的重叠(与上面所述不同)。注意到唯一需要考虑的时间是开始时间或结束时间,这些时间也很容易找到:
In [69]: set(tuple(df[(df.start <= t) & (df.end >= t)].Name) for t in set(df.start).union(df.end))
Out[69]: {('A', 'B', 'C', 'D'), ('A', 'B', 'C', 'E'), ('F',)}
这可以与集团查找方法相同的方式使用:
def overlap2(df):
for overlap in set(tuple(df[(df.start <= t) & (df.end >= t)].Name) for t in set(df.start).union(df.end)):
yield overlap, df.set_index('Name').loc[list(overlap)]['count'].sum()
例如:
In [88]: list(overlap2(df))
Out[88]: [(('F',), 6), (('A', 'B', 'C', 'E'), 24), (('A', 'B', 'C', 'D'), 24)]
举个例子,考虑一下当添加一行以200开头和220结尾的行时发生的情况:
In [90]: df
Out[90]:
Name start end count
0 A 180 240 6
1 B 180 240 6
2 C 180 240 6
3 D 180 210 6
4 E 212 240 6
5 F 241 300 6
6 G 200 220 3
In [94]: list(overlap(df))
Out[94]: [(['F'], 6), (['G', 'B', 'C', 'A', 'D'], 27), (['G', 'B', 'C', 'A', 'E'], 27)]
In [95]: list(overlap2(df))
Out[95]:
[(('A', 'B', 'C', 'E', 'G'), 27),
(('F',), 6),
(('A', 'B', 'C', 'D', 'G'), 27),
(('A', 'B', 'C', 'E'), 24),
(('A', 'B', 'C', 'D'), 24)]