比较每组的前两行和后两行,直到最后一条记录

时间:2019-08-27 07:06:36

标签: python python-3.x pandas dataframe pandas-groupby

我之前有一个问题,该问题已删除,现在修改为不太冗长的形式,以方便您阅读。

我有一个如下所示的数据框

df = pd.DataFrame({'subject_id' :[1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2],'day':[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20] , 'PEEP' :[7,5,10,10,11,11,14,14,17,17,21,21,23,23,25,25,22,20,26,26,5,7,8,8,9,9,13,13,15,15,12,12,15,15,19,19,19,22,22,15]})
df['fake_flag'] = ''

我想根据以下规则填充列fake_flag中的值

1)如果前两行是恒定的(ex:5,5)或递减(7,5),则选择两行中的最高行。在这种情况下,(7,5)为7,(5,5)为5

2)检查当前行是否比规则1的输出大3个或更多点(> = 3),并在另一行(下一行)重复(两次出现相同的值)。可以是8 / gt 8(如果规则1输出为5)。例如:(n行中的8,n+1行中的8或n行中的10,n+1行中的10)如果是,则键入fake VACfake_flag column

这是我尝试过的

for i in t1.index:
if i >=2:
    print("current value is  ", t1[i])
    print("preceding 1st (n-1) ", t1[i-1])
    print("preceding 2nd (n-2) ", t1[i-2])
    if (t1[i-1] == t1[i-2] or t1[i-2] >= t1[i-1]): # rule 1 check
        r1_output = t1[i-2] # we get the max of these two values (t1[i-2]), it doesn't matter when it's constant(t1[i-2] or t1[i-1]) will have the same value anyway
        print("rule 1 output is ", r1_output)
        if t1[i] >= r1_output + 3:
            print("found a value for rule 2", t1[i])
            print("check for next value is same as current value", t1[i+1])
            if (t1[i]==t1[i+1]): # rule 2 check
                print("fake flag is being set")
                df['fake_flag'][i] = 'fake_vac'

此检查应针对每个subject_id的所有记录(一个接一个)进行。我有一个包含数百万条记录的数据集。任何有效而优雅的解决方案都是有帮助的。我无法遍历百万条记录。

我希望我的输出如下所示

subject_id = 1

enter image description here

subject_id = 2

enter image description here

1 个答案:

答案 0 :(得分:3)

import pandas as pd
df = pd.DataFrame({'subject_id' :[1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2],'day':[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20] , 'PEEP' :[7,5,10,10,11,11,14,14,17,17,21,21,23,23,25,25,22,20,26,26,5,7,8,8,9,9,13,13,15,15,12,12,15,15,19,19,19,22,22,15]})

df['shift1']=df['PEEP'].shift(1)
df['shift2']=df['PEEP'].shift(2)

df['fake_flag'] = np.where((df['shift1'] ==df['shift2']) | (df['shift1'] < df['shift2']), 'fake VAC', '')
df.drop(['shift1','shift2'],axis=1)

输出

0   1   1   7   
1   1   2   5   
2   1   3   10  fake VAC
3   1   4   10  
4   1   5   11  fake VAC
5   1   6   11  
6   1   7   14  fake VAC
7   1   8   14  
8   1   9   17  fake VAC
9   1   10  17  
10  1   11  21  fake VAC
11  1   12  21  
12  1   13  23  fake VAC
13  1   14  23  
14  1   15  25  fake VAC
15  1   16  25  
16  1   17  22  fake VAC
17  1   18  20  fake VAC
18  1   19  26  fake VAC
19  1   20  26  
20  2   1   5   fake VAC
21  2   2   7   fake VAC
22  2   3   8   
23  2   4   8   
24  2   5   9   fake VAC
25  2   6   9   
26  2   7   13  fake VAC