带排序

时间:2018-02-17 13:31:38

标签: python pandas

我遇到了以所需方式呈现数据的问题。我的数据帧已格式化,然后按“站点ID”排序。我需要按站点ID显示数据,并将所有日期实例分组。

就我希望如何使用pivot_table

而言,我有90%
df_pivot = pd.pivot_table(df, index=['Site Ref','Site Name', 'Date'])

但是日期列未排序。 (微小的示例输出显示排序但是**** Thu Jan 11 2018 10:43:20条目****说明了我在大数据集上的问题)

我无法弄清楚如何像下面那样呈现它,还有每个网站ID排序的日期

感谢任何帮助

df = pd.DataFrame.from_dict([{'Site Ref': '1234567', 'Site Name': 'Building A', 'Date': 'Mon Jan 08 2018 10:43:20', 'Duration': 120}, {'Site Ref': '1245678', 'Site Name':'Building B', 'Date': 'Mon Jan 08 2018 10:43:20', 'Duration': 120}, {'Site Ref': '1245678', 'Site Name':'Building B', 'Date': 'Tue Jan 09 2018 10:43:20', 'Duration': 70}, {'Site Ref': '1245678', 'Site Name':'Building B', 'Date': 'Wed Jan 10 2018 10:43:20', 'Duration': 120}, {'Site Ref': '1212345', 'Site Name':'Building C', 'Date': 'Fri Jan 12 2018 10:43:20', 'Duration': 100}, {'Site Ref': '1123456', 'Site Name':'Building D', 'Date': 'Thu Jan 11 2018 10:43:20', 'Duration': 80}, {'Site Ref': '1123456', 'Site Name':'Building D', 'Date': 'Fri Jan 12 2018 12:22:20', 'Duration': 80}, {'Site Ref': '1123456', 'Site Name':'Building D', 'Date': 'Mon Jan 15 2018 11:43:20', 'Duration': 90}, {'Site Ref': '1123456', 'Site Name':'Building D', 'Date': 'Wed Jan 17 2018 10:43:20', 'Duration': 220}])

df = DataFrame(df, columns=['Site Ref', 'Site Name', 'Date', 'Duration'])
df = df.sort_values(by=['Site Ref'])
df

    Site Ref    Site Name   Date                        Duration
5   1123456     Building D  Thu Jan 11 2018 10:43:20    80
6   1123456     Building D  Fri Jan 12 2018 12:22:20    80
7   1123456     Building D  Mon Jan 15 2018 11:43:20    90
8   1123456     Building D  Wed Jan 17 2018 10:43:20    220
4   1212345     Building C  Fri Jan 12 2018 10:43:20    100
0   1234567     Building A  Mon Jan 08 2018 10:43:20    120
1   1245678     Building B  Mon Jan 08 2018 10:43:20    120
2   1245678     Building B  Tue Jan 09 2018 10:43:20    70
3   1245678     Building B  Wed Jan 10 2018 10:43:20    120

df_pivot = pd.pivot_table(df, index=['Site Ref','Site Name', 'Date'])
df_pivot

Site Ref    Site Name   Date    
1123456     Building D  Fri Jan 12 2018 12:22:20    80
                        Mon Jan 15 2018 11:43:20    90
                        ****Thu Jan 11 2018 10:43:20    80****
                        Wed Jan 17 2018 10:43:20    220
1212345     Building C  Fri Jan 12 2018 10:43:20    100
1234567     Building A  Mon Jan 08 2018 10:43:20    120
1245678     Building B  Mon Jan 08 2018 10:43:20    120
                        Tue Jan 09 2018 10:43:20    70
                        Wed Jan 10 2018 10:43:20    120

3 个答案:

答案 0 :(得分:3)

它已排序lexicographically,因为Date有对象(字符串)dtype

解决方法 - 添加datetime dtype的新列,在Date pivot_table之前使用它,然后删除它:

In [74]: (df.assign(x=pd.to_datetime(df['Date']))
            .pivot_table(df, index=['Site Ref','Site Name', 'x', 'Date'])
            .reset_index(level='x', drop=True))
Out[74]:
                                              Duration
Site Ref Site Name  Date
1123456  Building D Thu Jan 11 2018 10:43:20        80
                    Fri Jan 12 2018 12:22:20        80
                    Mon Jan 15 2018 11:43:20        90
                    Wed Jan 17 2018 10:43:20       220
1212345  Building C Fri Jan 12 2018 10:43:20       100
1234567  Building A Mon Jan 08 2018 10:43:20       120
1245678  Building B Mon Jan 08 2018 10:43:20       120
                    Tue Jan 09 2018 10:43:20        70
                    Wed Jan 10 2018 10:43:20       120

答案 1 :(得分:1)

按Site Ref对值进行排序,groupby表示使用js-form-submit

sort = False

答案 2 :(得分:0)

您需要将日期转换为日期时间值而不是字符串。以下内容适用于您当前的数据透视表:

df_pivot.reset_index(inplace=True)
df_pivot['Date'] = pd.to_datetime(df_pivot['Date'])
df_pivot.sort_values(by=['Site Ref', 'Date'], inplace=True)