给定一个数据帧df
df = pandas.DataFrame(data=[1,0,0,1,1,1,0,1,0,1,1,1],columns = ['A'])
df
Out[20]:
A
0 1
1 0
2 0
3 1
4 1
5 1
6 0
7 1
8 0
9 1
10 1
11 1
我想找到间隔大于3的开始和结束索引。 在这种情况下,我期望的是 (3,5和9,11)
答案 0 :(得分:2)
使用移位累积技巧来标记连续的组,然后使用groupby
获取索引并根据您的条件进行过滤。
v = (df['A'] != df['A'].shift()).cumsum()
u = df.groupby(v)['A'].agg(['all', 'count'])
m = u['all'] & u['count'].ge(3)
df.groupby(v).apply(lambda x: (x.index[0], x.index[-1]))[m]
A
3 (3, 5)
7 (9, 11)
dtype: object
答案 1 :(得分:0)
我没有明确认识Pandas,但我确实了解Python,因此将其视为一个小挑战:
def find_sub_in_list(my_list, sublist, greedy=True):
matches = []
results = []
for item in range(len(my_list)):
aux_list = my_list[item:]
if len(sublist) > len(aux_list) or len(aux_list) == 0:
break
start_match = None
end_pos = None
if sublist[0] == my_list[item]:
start_match = item
for sub_item in range(len(sublist)):
if sublist[sub_item] != my_list[item+sub_item]:
end_pos = False
if end_pos == None and start_match != None:
end_pos = start_match+len(sublist)
matches.append([start_match, end_pos])
if greedy:
results = []
for match in range(len(matches)-1):
if matches[match][1] > matches[match+1][0]:
results.append([matches[match][0], matches[match+1][1]])
else:
results.append(matches[match])
else:
results = matches
return results
my_list = [1,1,1,0,1,1,0,1,1,1,1]
interval = 3
sublist = [1]*interval
matches = find_sub_in_list(my_list, sublist)
print(matches)