我有以下数据框(1,2毫米行):
df_test_2 = pd.DataFrame({"A":["end","beginn","end","end","beginn","beginn","end","end","end","beginn","end"],"B":[1,10,50,60,70,80,90,100,110,111,112]})`
现在,我尝试查找序列。每个“开始”应匹配第一个“结束”,其中基于列B的距离至少为40 发生。 对于提供的数据框,这意味着:
令人困扰的问题是 非常感谢您的帮助。
答案 0 :(得分:2)
我将假定您想要一个具有起始值和结束值的序列列表作为您的输出。您在图片中识别出的第二个序列的距离小于40,所以我还认为那是一个错误。
import pandas as pd
from collections import namedtuple
df_test_2 = pd.DataFrame({"A":["end","beginn","end","end","beginn","beginn","end","end","end","beginn","end"],"B":[1,10,50,60,70,80,90,100,110,111,112]})
sequence_list = []
Sequence = namedtuple('Sequence', ['beginn', 'end'])
beginn_flag = False
beginn_value = 0
for i, row in df_test_2.iterrows():
state = row['A']
value = row['B']
if not beginn_flag and state == 'beginn':
beginn_flag = True
beginn_value = value
elif beginn_flag and state == 'end':
if value >= beginn_value + 40:
new_seq = Sequence(beginn_value, value)
sequence_list.append(new_seq)
beginn_flag = False
print(sequence_list)
此代码输出以下内容:
[Sequence(beginn=10, end=50), Sequence(beginn=70, end=110)]
两个序列,一个序列从10开始到50结束,另一个序列从70开始到110结束。