我想为“多列”赋予优先级,并根据优先级选择行
我想在“类别”列中选择具有RC优先级的ID,在“状态”列中选择“待处理优先级”,并相应地选择“行”
示例:输入数据框
ID Category Status Date
1 GC Pending 01-03-2015
1 RC Resolved 05-10-2016
1 GC Resolved 06-03-2017
2 RC Pending 09-08-2016
2 RC Resolved 10-05-2014
3 GC Resolved 10-08-2018
3 RC Pending 13-05-2019
4 GC Pending 10-06-2018
4 GC Resolved 15-09-2014
输出数据框
ID Category Status Date
1 RC Resolved 05-10-2016
2 RC Pending 09-08-2016
3 RC Pending 13-05-2019
4 GC Pending 10-06-2018
答案 0 :(得分:2)
通过将列表传递给categories
参数,然后按DataFrame.sort_values
按3列排序,最后用DataFrame.drop_duplicates
删除重复项,将列转换为具有优先级的有序分类。
df['Category'] = pd.Categorical(df['Category'], ordered=True, categories=['GC','RC'])
df['Status'] = pd.Categorical(df['Status'], ordered=True, categories=['Resolved','Pending'])
df = df.sort_values(['ID','Category','Status']).drop_duplicates('ID', keep='last')
print (df)
ID Category Status Date
1 1 RC Resolved 05-10-2016
3 2 RC Pending 09-08-2016
6 3 RC Pending 13-05-2019
7 4 GC Pending 10-06-2018