我有一个如下所示的数据框
Contract_ID Unit_ID Start_date End_Date Status
1 A 2014-05-01 2015-05-01 Closed
2 A 2016-05-01 2017-05-01 Expired
3 A 2018-05-01 2020-05-01 Active
4 B 2014-05-01 2015-05-01 Closed
5 B 2015-05-01 2016-05-01 Closed
6 C 2016-05-01 2017-05-01 Closed
7 C 2017-05-01 2018-05-01 Expired
8 D 2016-05-01 2017-05-01 Closed
9 D 2017-06-01 2018-05-01 Expired
10 D 2018-07-01 2020-08-01 Active
从上面我想找出没有激活状态的单位。
在上表中,单元A和D处于活动状态。
预期产量
Contract_ID Unit_ID Start_date End_Date Status
4 B 2014-05-01 2015-05-01 Closed
5 B 2015-05-01 2016-05-01 Closed
6 C 2016-05-01 2017-05-01 Closed
7 C 2017-05-01 2018-05-01 Expired
答案 0 :(得分:2)
第一个想法是,如果每个组中没有GroupBy.transform
和GroupBy.all
来过滤所有组中的值Active
:
df1 = df[df.assign(New=df['Status'].ne('Active')).groupby('Unit_ID')['New'].transform('all')]
或者首先用DataFrame.loc
过滤至少一个Active
的所有组,然后用没有Active
组的倒置掩码组按Series.isin
过滤:
df1 = df[~df['Unit_ID'].isin(df.loc[df['Status'].eq('Active'), 'Unit_ID'])]
print (df1)
Contract_ID Unit_ID Start_date End_Date Status
3 4 B 2014-05-01 2015-05-01 Closed
4 5 B 2015-05-01 2016-05-01 Closed
5 6 C 2016-05-01 2017-05-01 Closed
6 7 C 2017-05-01 2018-05-01 Expired
答案 1 :(得分:2)
使用pd.crosstab
和Series.map
的另一种方法
new_df = df[df['Unit_ID'].map(pd.crosstab(df['Unit_ID'],df['Status'])['Active'].eq(0))]
new_df = df[df['Status'].ne('Active').groupby(df['Unit_ID']).transform('all')]
输出
Contract_ID Unit_ID Start_date End_Date Status
3 4 B 2014-05-01 2015-05-01 Closed
4 5 B 2015-05-01 2016-05-01 Closed
5 6 C 2016-05-01 2017-05-01 Closed
6 7 C 2017-05-01 2018-05-01 Expired