使用Python中的Pandas Dataframes对数据进行排序

时间:2015-09-04 16:20:51

标签: python sorting pandas count dataframe

我有一个数据框,我试图以某种方式排序。

输入:

CompanyName   count    assignee_name   CallType        recvd_dttm
Company3       4         Jill           Machine1       8/28/2015 13:46
Company3       4         Jill           Machine1       8/27/2015 13:26
Company3       4         Jack           Machine2       8/27/2015 11:46
Company3       4         Jill           Machine1       8/25/2015 9:56
Company2       3         Brad           Machine1       8/29/2015 12:43
Company2       3         Lee            Machine2       8/28/2015 13:44
Company2       3         Lee            Machine1       8/22/2015 19:45
Company1       2         Lee            Machine1       8/12/2015 14:47
Company1       2         Lee            Machine2       8/11/2015 13:44
Company0       1         Tracy          Machine2       8/31/2015 13:32

我想要的是什么:

Company3         Company2       Company1        Company0
4                3              2               1
Jill             Lee           Lee             Tracy
Machine1         Machine1       Machine1        Machine2
8/28/2015        8/29/2015      8/12/2015       8/31/2015

它应该按照最多显示在数据框中的人的顺序输出公司名称。然后它应该显示接听MOST呼叫的人。然后CallType和recvd_dttm的信息应该是最新的信息。

我用过这个:

mode = (lambda ts: ts.value_counts(sort=True).index[0] 
                   if len(ts.value_counts(sort=True)) else None)
cols = df['CompanyName'].value_counts().index

df = df.groupby('CompanyName')[['count','assignee_name', 'CallType', 'receiveddate']].agg(mode).T.reindex(columns=cols)

它按公司名称和计数正确输出,但选择随机调用其他信息,而不是最近的电话。

我也在考虑使用像df.groupby(['CompanyName','count']).agg(lambda x:x.value_counts().index[0])

这样的东西

但收到UnicodeDecodeError: 'ascii' codec can't decode byte 0xef in position 21285: ordinal not in range(128)错误。

2 个答案:

答案 0 :(得分:1)

这个怎么样:

In [121]: most = df.groupby('CompanyName')['assignee_name'].transform(lambda x: x.value_counts().idxmax()))

In [122]: df = df[df['assignee_name'] == most]

In [123]: df = df.sort(['CompanyName', 'recvd_dttm'])

In [124]: df = df.groupby('CompanyName').last()

In [125]: df
Out[125]: 
             count assignee_name  CallType          recvd_dttm
CompanyName                                                   
Company0         1         Tracy  Machine2 2015-08-31 13:32:00
Company1         2           Lee  Machine1 2015-08-12 14:47:00
Company2         3           Lee  Machine2 2015-08-28 13:44:00
Company3         4          Jill  Machine1 2015-08-28 13:46:00

答案 1 :(得分:1)

# convert datetime string to pd.timestamp
df['recvd_dttm'] = pd.to_datetime(df['recvd_dttm'], format='%m/%d/%Y %H:%M')

def func(g):
    temp = g[g['recvd_dttm'] == g['recvd_dttm'].max()].iloc[0]
    temp['assignee_name'] = g['assignee_name'].value_counts().index[0]
    return temp.drop('CompanyName')

df.groupby('CompanyName').apply(func).sort(['count'], ascending=False).T

CompanyName               Company3             Company2             Company1             Company0
count                            4                    3                    2                    1
assignee_name                 Jill                  Lee                  Lee                Tracy
CallType                  Machine1             Machine1             Machine1             Machine2
recvd_dttm     2015-08-28 13:46:00  2015-08-29 12:43:00  2015-08-12 14:47:00  2015-08-31 13:32:00