从每个组的最大客户数量中提取groupby
之后的第二个索引的有效方法是什么。
假设一个数据框df
具有各种状态,并且在每个状态下有10个管理人员(名称为Officer 1
至Officer 10
)。列Current Status
将始终具有值Customer
:
State List Sales Officer Current Status
0 UP Officer 4 Customer
1 MH Officer 5 Customer
2 AP Officer 6 Customer
3 AN Officer 2 Customer
4 GJ Officer 3 Customer
.... so on
预期输出由在每个州中拥有最高客户数量的销售人员组成:
State List Sales Officer
AN Officer 6 403
AP Officer 1 266
Officer 8 266
... and so on
到目前为止,我已经执行了以下操作:
df.groupby(['State List', 'Sales Officer'])['Current Status'].count()#.reset_index()
给我以下内容:
State List Sales Officer
AN Officer 1 376
Officer 10 401
Officer 2 353
Officer 3 373
Officer 4 375
Officer 5 382
Officer 6 403
Officer 7 400
Officer 8 385
Officer 9 378
AP Officer 1 266
Officer 10 228
Officer 2 240
Officer 3 248
Officer 4 235
Officer 5 229
Officer 6 242
Officer 7 238
Officer 8 266
Officer 9 243
现在,我陷入了以每个客户Sales Officer
的最大数量吸引State List
的困境。任何想法!
答案 0 :(得分:4)
将boolean indexing
与max
中的transform
一起使用,返回Series
的大小与原始大小相同:
s = df.groupby(['State List', 'Sales Officer'])['Current Status'].count()
df = s[s == s.groupby('State List').transform('max')]
print (df)
State List Sales Officer
AN Officer 6 403
AP Officer 1 266
Officer 8 266
Name: a, dtype: int64
详细信息:
print (s.groupby('State List').transform('max'))
State List Sales Officer
AN Officer 1 403
Officer 10 403
Officer 2 403
Officer 3 403
Officer 4 403
Officer 5 403
Officer 6 403
Officer 7 403
Officer 8 403
Officer 9 403
AP Officer 1 266
Officer 10 266
Officer 2 266
Officer 3 266
Officer 4 266
Officer 5 266
Officer 6 266
Officer 7 266
Officer 8 266
Officer 9 266
Name: a, dtype: int64