我有一个数据框,我试图以某种方式排序。
输入:
CompanyName count assignee_name CallType recvd_dttm
Company3 4 Jill Machine1 8/28/2015 13:46
Company3 4 Jill Machine1 8/27/2015 13:26
Company3 4 Jack Machine2 8/27/2015 11:46
Company3 4 Jill Machine1 8/25/2015 9:56
Company2 3 Brad Machine1 8/29/2015 12:43
Company2 3 Lee Machine2 8/28/2015 13:44
Company2 3 Lee Machine1 8/22/2015 19:45
Company1 2 Lee Machine1 8/12/2015 14:47
Company1 2 Lee Machine2 8/11/2015 13:44
Company0 1 Tracy Machine2 8/31/2015 13:32
我想要的是什么:
Company3 Company2 Company1 Company0
4 3 2 1
Jill Lee Lee Tracy
Machine1 Machine1 Machine1 Machine2
8/28/2015 8/29/2015 8/12/2015 8/31/2015
它应该按照最多显示在数据框中的人的顺序输出公司名称。然后它应该显示接听MOST呼叫的人。然后CallType和recvd_dttm的信息应该是最新的信息。
我用过这个:
mode = (lambda ts: ts.value_counts(sort=True).index[0]
if len(ts.value_counts(sort=True)) else None)
cols = df['CompanyName'].value_counts().index
df = df.groupby('CompanyName')[['count','assignee_name', 'CallType', 'receiveddate']].agg(mode).T.reindex(columns=cols)
它按公司名称和计数正确输出,但选择随机调用其他信息,而不是最近的电话。
我也在考虑使用像df.groupby(['CompanyName','count']).agg(lambda x:x.value_counts().index[0])
但收到UnicodeDecodeError: 'ascii' codec can't decode byte 0xef in position 21285: ordinal not in range(128)
错误。
答案 0 :(得分:1)
这个怎么样:
In [121]: most = df.groupby('CompanyName')['assignee_name'].transform(lambda x: x.value_counts().idxmax()))
In [122]: df = df[df['assignee_name'] == most]
In [123]: df = df.sort(['CompanyName', 'recvd_dttm'])
In [124]: df = df.groupby('CompanyName').last()
In [125]: df
Out[125]:
count assignee_name CallType recvd_dttm
CompanyName
Company0 1 Tracy Machine2 2015-08-31 13:32:00
Company1 2 Lee Machine1 2015-08-12 14:47:00
Company2 3 Lee Machine2 2015-08-28 13:44:00
Company3 4 Jill Machine1 2015-08-28 13:46:00
答案 1 :(得分:1)
# convert datetime string to pd.timestamp
df['recvd_dttm'] = pd.to_datetime(df['recvd_dttm'], format='%m/%d/%Y %H:%M')
def func(g):
temp = g[g['recvd_dttm'] == g['recvd_dttm'].max()].iloc[0]
temp['assignee_name'] = g['assignee_name'].value_counts().index[0]
return temp.drop('CompanyName')
df.groupby('CompanyName').apply(func).sort(['count'], ascending=False).T
CompanyName Company3 Company2 Company1 Company0
count 4 3 2 1
assignee_name Jill Lee Lee Tracy
CallType Machine1 Machine1 Machine1 Machine2
recvd_dttm 2015-08-28 13:46:00 2015-08-29 12:43:00 2015-08-12 14:47:00 2015-08-31 13:32:00