选择条件列值的行

时间:2018-03-18 21:15:07

标签: python pandas

            A         B    C
0  2002-01-13  15:00:00  120
1  2002-01-13  15:30:00  110
2  2002-01-13  16:00:00  130
3  2002-01-13  16:30:00  140
4  2002-01-14  15:00:00  180
5  2002-01-14  15:30:00  165
6  2002-01-14  16:00:00  150
7  2002-01-14  16:30:00  170

我想为每个 A组选择一行,具有以下条件:

  • 选择“最小C列值+ 10”
  • 如果不存在“最小C列值+ 10”,请取下一个 C列值

输出应为:

            A         B    C
0  2002-01-13  15:00:00  120
5  2002-01-14  15:30:00  165

2 个答案:

答案 0 :(得分:2)

正如@Anton vBR评论的那样,首先按每个组的条件删除行,然后按idxmax的最小C获取行,并按loc选择:

df = df[df.groupby('A')['C'].transform(lambda x: x >= x.min() + 10)]
#filtering with  transform `min` only
#df = df[df.groupby('A')['C'].transform('min') + 10 <= df['C']]
print (df)
            A         B    C
0  2002-01-13  15:00:00  120
2  2002-01-13  16:00:00  130
3  2002-01-13  16:30:00  140
4  2002-01-14  15:00:00  180
5  2002-01-14  15:30:00  165
7  2002-01-14  16:30:00  170

df = df.loc[df.groupby('A')['C'].idxmin()]

与...相同:

idx=df.sort_values(['A','C']).groupby('A')['C'].apply(lambda x: (x >= x.min() + 10).idxmax())
df = df.loc[idx]

sort_valuesdrop_duplicates的替代解决方案:

df = df.sort_values(['A','C'])
df = df[df.groupby('A')['C'].transform(lambda x: x >= x.min() + 10)].drop_duplicates(['A'])
print (df)
            A         B    C
0  2002-01-13  15:00:00  120
5  2002-01-14  15:30:00  165

答案 1 :(得分:1)

这是一个矢量化解决方案。有时,辅助列比基于lambda的内联解决方案更有效。

df['Floor'] = df['C'] - (df.groupby('A')['C'].transform('min') + 10)

res = df.loc[df[df['Floor'] >= 0].groupby('A')['Floor'].idxmin()]

结果:

            A         B    C  Floor
0  2002-01-13  15:00:00  120      0
5  2002-01-14  15:30:00  165      5