Python Pandas:在DataFrame中查找模式

时间:2018-10-06 12:41:02

标签: python pandas

我有以下数据框(1,2毫米行):

df_test_2 = pd.DataFrame({"A":["end","beginn","end","end","beginn","beginn","end","end","end","beginn","end"],"B":[1,10,50,60,70,80,90,100,110,111,112]})`

现在,我尝试查找序列。每个“开始”应匹配第一个“结束”,其中基于列B的距离至少为40 发生。 对于提供的数据框,这意味着: enter image description here

令人困扰的问题是 非常感谢您的帮助。

1 个答案:

答案 0 :(得分:2)

我将假定您想要一个具有起始值和结束值的序列列表作为您的输出。您在图片中识别出的第二个序列的距离小于40,所以我还认为那是一个错误。

import pandas as pd
from collections import namedtuple
df_test_2 = pd.DataFrame({"A":["end","beginn","end","end","beginn","beginn","end","end","end","beginn","end"],"B":[1,10,50,60,70,80,90,100,110,111,112]})

sequence_list = []
Sequence = namedtuple('Sequence', ['beginn', 'end'])

beginn_flag = False
beginn_value = 0
for i, row in df_test_2.iterrows():
    state = row['A']
    value = row['B']

    if not beginn_flag and state == 'beginn':
        beginn_flag = True
        beginn_value = value 
    elif beginn_flag and state == 'end':
        if value >= beginn_value + 40:
            new_seq = Sequence(beginn_value, value)
            sequence_list.append(new_seq)
            beginn_flag = False

 print(sequence_list)

此代码输出以下内容:

[Sequence(beginn=10, end=50), Sequence(beginn=70, end=110)]

两个序列,一个序列从10开始到50结束,另一个序列从70开始到110结束。