如果使用groupby方法满足另一列中的条件,则使用多列进行条件过滤

时间:2019-01-13 12:10:04

标签: python pandas dataframe group-by pandas-groupby

我有df;

      ID  YEART   Commdate        Cat  Category
0   LVI6AE2   1993 2017-03-24  LVI6AE2_1        56
1   LVI6BE2   1994 2017-03-24  LVI6BE2_1        67
2   APJ5LEV   1975 2017-03-13  APJ5LEV_1        78
3   LQL0AE3   1986 2017-03-16  LQL0AE3_1        87
4   BLR3UEV   1982 2017-03-15  BLR3UEV_1        90
5   BRL1NEV   1981 2017-03-15  BRL1NEV_1        90
6   BRL1NEV   1981 2017-03-16  BRL1NEV_1        90
7   BRL1NEV   1981 2017-03-22  Ungrouped       190
8   BRL1NEV   1981 2017-03-17  Ungrouped       190
9   BRL1NEV   1981 2017-03-17  Ungrouped       190
10  BRL1NEV   1981 2017-03-22  Ungrouped       190
11  BRL1NEV   1981 2017-03-20  BRL1NEV_1        90
12  BRL1NEV   1981 2017-02-01  BRL1NEV_1        90
13  UEE6JSV   2000 2017-03-15  UEE6JSV_1        34
14  UGQ4VE2   1993 2014-07-25  UGQ4VE2_1        45
15  UTU6BE1   1986 2017-03-13  UTU6BE1_1        12
16      NVT   1999 2017-03-10      NVT_1        12
17  OTL3JE1   2001 2017-02-01  OTL3JE1_1        12
18  OTL5XS1   2003 2017-03-01  OTL5XS1_1        12
19  OTL6AE1   2001 2017-03-01  OTL6AE1_1        12
20  JVU6AE1   1999 2017-03-31  JVU6AE1_1        12
21  JVU6AE2   1993 2017-03-31  Ungrouped       120

仅当它们属于未分组的“猫”或类别> 100时,我才想计算出具有类似“ ID”和“ YEART”的每个组中最早的“奖励”

我想出了以下一行

#To Datetime
df['Commdate'] =pd.to_datetime(df['Commdate'])

#groupby
df["EarliestD"] =df.groupby(['ID', 'YEART']).filter(lambda x : x['Category'].count()>=90)['Commdate'].min()

结果为“最早的D”返回“ NaT”

    ID  YEART   Commdate        Cat  Category EarliestD
0   LVI6AE2   1993 2017-03-24  LVI6AE2_1        56       NaT
1   LVI6BE2   1994 2017-03-24  LVI6BE2_1        67       NaT
2   APJ5LEV   1975 2017-03-13  APJ5LEV_1        78       NaT
3   LQL0AE3   1986 2017-03-16  LQL0AE3_1        87       NaT
4   BLR3UEV   1982 2017-03-15  BLR3UEV_1        90       NaT

问题;

1。如果满足不同列中的条件,是否可以有条件地使用多个列进行分组? 2.是否可以通过def函数调用多个条件分组?

谢谢

1 个答案:

答案 0 :(得分:0)

您可以使用布尔过滤器和groupby + transform

# convert Commdate to datetime if necessary
df['Commdate'] = pd.to_datetime(df['Commdate'])

# calculate mask for splitting dataframe
cat_mask = (df['Cat'] == 'Ungrouped') | (df['Category'] > 100)

# groupby uncategorised / category > 100
df.loc[cat_mask, 'Commdate'] = df.loc[cat_mask].groupby(['ID', 'YEART'])['Commdate'].transform('min')

结果:

print(df)

         ID  YEART   Commdate        Cat  Category
0   LVI6AE2   1993 2017-03-24  LVI6AE2_1        56
1   LVI6BE2   1994 2017-03-24  LVI6BE2_1        67
2   APJ5LEV   1975 2017-03-13  APJ5LEV_1        78
3   LQL0AE3   1986 2017-03-16  LQL0AE3_1        87
4   BLR3UEV   1982 2017-03-15  BLR3UEV_1        90
5   BRL1NEV   1981 2017-03-15  BRL1NEV_1        90
6   BRL1NEV   1981 2017-03-16  BRL1NEV_1        90
7   BRL1NEV   1981 2017-03-17  Ungrouped       190
8   BRL1NEV   1981 2017-03-17  Ungrouped       190
9   BRL1NEV   1981 2017-03-17  Ungrouped       190
10  BRL1NEV   1981 2017-03-17  Ungrouped       190
11  BRL1NEV   1981 2017-03-20  BRL1NEV_1        90
12  BRL1NEV   1981 2017-02-01  BRL1NEV_1        90
13  UEE6JSV   2000 2017-03-15  UEE6JSV_1        34
14  UGQ4VE2   1993 2014-07-25  UGQ4VE2_1        45
15  UTU6BE1   1986 2017-03-13  UTU6BE1_1        12
16      NVT   1999 2017-03-10      NVT_1        12
17  OTL3JE1   2001 2017-02-01  OTL3JE1_1        12
18  OTL5XS1   2003 2017-03-01  OTL5XS1_1        12
19  OTL6AE1   2001 2017-03-01  OTL6AE1_1        12
20  JVU6AE1   1999 2017-03-31  JVU6AE1_1        12
21  JVU6AE2   1993 2017-03-31  Ungrouped       120