Question

我之前有一个问题，该问题已删除，现在修改为不太冗长的形式，以方便您阅读。

我有一个如下所示的数据框

df = pd.DataFrame({'subject_id' :[1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2],'day':[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20] , 'PEEP' :[7,5,10,10,11,11,14,14,17,17,21,21,23,23,25,25,22,20,26,26,5,7,8,8,9,9,13,13,15,15,12,12,15,15,19,19,19,22,22,15]})
df['fake_flag'] = ''

我想根据以下规则填充列fake_flag中的值

1）如果前两行是恒定的（ex：5,5）或递减（7,5），则选择两行中的最高行。在这种情况下，（7,5）为7，（5,5）为5

2）检查当前行是否比规则1的输出大3个或更多点（> = 3），并在另一行（下一行）重复（两次出现相同的值）。可以是8 / gt 8（如果规则1输出为5）。例如：（n行中的8，n+1行中的8或n行中的10，n+1行中的10）如果是，则键入fake VAC在fake_flag column

中

这是我尝试过的

for i in t1.index:
if i >=2:
    print("current value is  ", t1[i])
    print("preceding 1st (n-1) ", t1[i-1])
    print("preceding 2nd (n-2) ", t1[i-2])
    if (t1[i-1] == t1[i-2] or t1[i-2] >= t1[i-1]): # rule 1 check
        r1_output = t1[i-2] # we get the max of these two values (t1[i-2]), it doesn't matter when it's constant(t1[i-2] or t1[i-1]) will have the same value anyway
        print("rule 1 output is ", r1_output)
        if t1[i] >= r1_output + 3:
            print("found a value for rule 2", t1[i])
            print("check for next value is same as current value", t1[i+1])
            if (t1[i]==t1[i+1]): # rule 2 check
                print("fake flag is being set")
                df['fake_flag'][i] = 'fake_vac'

此检查应针对每个subject_id的所有记录（一个接一个）进行。我有一个包含数百万条记录的数据集。任何有效而优雅的解决方案都是有帮助的。我无法遍历百万条记录。

我希望我的输出如下所示

subject_id = 1

subject_id = 2

Answer 1

import pandas as pd
df = pd.DataFrame({'subject_id' :[1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2],'day':[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20] , 'PEEP' :[7,5,10,10,11,11,14,14,17,17,21,21,23,23,25,25,22,20,26,26,5,7,8,8,9,9,13,13,15,15,12,12,15,15,19,19,19,22,22,15]})

df['shift1']=df['PEEP'].shift(1)
df['shift2']=df['PEEP'].shift(2)

df['fake_flag'] = np.where((df['shift1'] ==df['shift2']) | (df['shift1'] < df['shift2']), 'fake VAC', '')
df.drop(['shift1','shift2'],axis=1)

输出

0   1   1   7   
1   1   2   5   
2   1   3   10  fake VAC
3   1   4   10  
4   1   5   11  fake VAC
5   1   6   11  
6   1   7   14  fake VAC
7   1   8   14  
8   1   9   17  fake VAC
9   1   10  17  
10  1   11  21  fake VAC
11  1   12  21  
12  1   13  23  fake VAC
13  1   14  23  
14  1   15  25  fake VAC
15  1   16  25  
16  1   17  22  fake VAC
17  1   18  20  fake VAC
18  1   19  26  fake VAC
19  1   20  26  
20  2   1   5   fake VAC
21  2   2   7   fake VAC
22  2   3   8   
23  2   4   8   
24  2   5   9   fake VAC
25  2   6   9   
26  2   7   13  fake VAC

比较每组的前两行和后两行，直到最后一条记录

1 个答案: