基于每组天数的条件

时间:2018-04-25 13:34:26

标签: python pandas conditional

            A                   B       C   D   E
0  2002-01-12 2018-04-25 10:00:00    John  19  19
1  2002-01-12 2018-04-25 11:00:00    John   6  25
2  2002-01-13 2018-04-25 09:00:00    John   5  30
3  2002-01-13 2018-04-25 11:00:00    John -25   5
4  2002-01-14 2018-04-25 11:00:00    John   1   6
5  2002-01-14 2018-04-25 12:00:00    John  44  50
6  2002-01-25 2018-04-25 11:00:00  George  18  18
7  2002-01-25 2018-04-25 12:00:00  George  12  30
8  2002-01-26 2018-04-25 11:00:00  George  -8  22
9  2002-01-26 2018-04-25 12:00:00  George -10  12
10 2002-01-27 2018-04-25 10:00:00  George  13  25
11 2002-01-27 2018-04-25 11:00:00  George   1  26

df['A'] = df['A'].apply(pd.to_datetime)
df['B'] = df['B'].apply(pd.to_datetime)
df["E"] = df.groupby("C")["D"].cumsum()

我想为每个C组选择一行,具有以下条件:

  • E>=20B==11:00:00的第一行开始,仅在每个A组的第C天申请。
  • 如果不存在满足该条件的任何行,请取C组的第一行。

输出应为:

            A                   B       C   D   E
0  2002-01-12 2018-04-25 10:00:00    John  19  19
8  2002-01-26 2018-04-25 11:00:00  George  -8  22

我试过了:

def eleven(g):
    cond = g[g.B==time(11)].E.ge(20)
    if cond.any():
        return g[cond].iloc[0]
    else:
        return g.iloc[1]

r = df.groupby('C', as_index=False).apply(eleven)

1 个答案:

答案 0 :(得分:1)

我认为需要更改条件,链条件用于比较E,第二组用A使用factorize,第二组使用>0

def eleven(g):
    cond = (g.B.dt.hour==11) & (g.E.ge(20) & pd.factorize(g.A)[0]) > 0
    if cond.any():
        return g[cond].iloc[0]
    else:
        return g.iloc[0]

r = df.groupby('C', as_index=False, sort=False).apply(eleven)
print (r)
           A                   B       C   D   E
0 2002-01-12 2018-04-25 10:00:00    John  19  19
1 2002-01-26 2018-04-25 11:00:00  George  -8  22