我有以下数据框。
import pandas as pd
import numpy as np
d ={
'ID1':['abc1','abc2','abc3','abc4','abc5','abc1','abc1','abc1','abc1','abc1','abc2','abc2','abc2','abc3'],
'Item':['orange','mango','jack','cucumber','banana','pineapple','sapota','grapes','papaya','watermelon','guava','pomogranate','mosambi','apple'],
'Type':['A','B','A','B','A','B','A','B','A','B','A','B','A','B'],
'Price':[25,30,15,20,25,30,15,20,25,30,15,20,25,30]
}
df = pd.DataFrame(data = d)
df
对于分组条件,以下代码:
df.groupby('ID1').filter(lambda s: s.Price.sum()>=80).sort_values(by='ID1',ascending = True)
如何在以下多个条件下过滤ID:
预期输出:
ID1 Item Type Price
0 abc1 orange A 25
5 abc1 pineapple B 30
6 abc1 sapota A 15
7 abc1 grapes B 20
8 abc1 papaya A 25
9 abc1 watermelon B 30
答案 0 :(得分:2)
您可以将GroupBy.transform
与sum
一起使用-从第二个条件开始,按Series.eq
,Series.ge
和Series.between
计算每个条件的True
个值按链&
的最后一个链条条件,按AND
进行过滤,按boolean indexing
的条件进行过滤:
m1 = df.groupby('ID1')['Price'].transform('sum') > 90
m2 = df['Type'].eq('A').groupby(df['ID1']).transform('sum') == 3
m3 = df['Type'].eq('B').groupby(df['ID1']).transform('sum') == 3
m4 = df['Price'].between(15, 20).groupby(df['ID1']).transform('sum') == 2
m5 = df['Price'].ge(25).groupby(df['ID1']).transform('sum') == 4
或者:
m1 = df.groupby('ID1')['Price'].transform('sum').gt(90)
m2 = df['Type'].eq('A').groupby(df['ID1']).transform('sum').eq(3)
m3 = df['Type'].eq('B').groupby(df['ID1']).transform('sum').eq(3)
m4 = df['Price'].between(15, 20).groupby(df['ID1']).transform('sum').eq(2)
m5 = df['Price'].ge(25).groupby(df['ID1']).transform('sum').eq(4)
df = df[m1 & m2 & m3 & m4 & m5]
print (df)
ID1 Item Type Price
0 abc1 orange A 25
5 abc1 pineapple B 30
6 abc1 sapota A 15
7 abc1 grapes B 20
8 abc1 papaya A 25
9 abc1 watermelon B 30