我遇到了以所需方式呈现数据的问题。我的数据帧已格式化,然后按“站点ID”排序。我需要按站点ID显示数据,并将所有日期实例分组。
就我希望如何使用pivot_table
而言,我有90%df_pivot = pd.pivot_table(df, index=['Site Ref','Site Name', 'Date'])
但是日期列未排序。 (微小的示例输出显示排序但是**** Thu Jan 11 2018 10:43:20条目****说明了我在大数据集上的问题)
我无法弄清楚如何像下面那样呈现它,还有每个网站ID排序的日期
感谢任何帮助
df = pd.DataFrame.from_dict([{'Site Ref': '1234567', 'Site Name': 'Building A', 'Date': 'Mon Jan 08 2018 10:43:20', 'Duration': 120}, {'Site Ref': '1245678', 'Site Name':'Building B', 'Date': 'Mon Jan 08 2018 10:43:20', 'Duration': 120}, {'Site Ref': '1245678', 'Site Name':'Building B', 'Date': 'Tue Jan 09 2018 10:43:20', 'Duration': 70}, {'Site Ref': '1245678', 'Site Name':'Building B', 'Date': 'Wed Jan 10 2018 10:43:20', 'Duration': 120}, {'Site Ref': '1212345', 'Site Name':'Building C', 'Date': 'Fri Jan 12 2018 10:43:20', 'Duration': 100}, {'Site Ref': '1123456', 'Site Name':'Building D', 'Date': 'Thu Jan 11 2018 10:43:20', 'Duration': 80}, {'Site Ref': '1123456', 'Site Name':'Building D', 'Date': 'Fri Jan 12 2018 12:22:20', 'Duration': 80}, {'Site Ref': '1123456', 'Site Name':'Building D', 'Date': 'Mon Jan 15 2018 11:43:20', 'Duration': 90}, {'Site Ref': '1123456', 'Site Name':'Building D', 'Date': 'Wed Jan 17 2018 10:43:20', 'Duration': 220}])
df = DataFrame(df, columns=['Site Ref', 'Site Name', 'Date', 'Duration'])
df = df.sort_values(by=['Site Ref'])
df
Site Ref Site Name Date Duration
5 1123456 Building D Thu Jan 11 2018 10:43:20 80
6 1123456 Building D Fri Jan 12 2018 12:22:20 80
7 1123456 Building D Mon Jan 15 2018 11:43:20 90
8 1123456 Building D Wed Jan 17 2018 10:43:20 220
4 1212345 Building C Fri Jan 12 2018 10:43:20 100
0 1234567 Building A Mon Jan 08 2018 10:43:20 120
1 1245678 Building B Mon Jan 08 2018 10:43:20 120
2 1245678 Building B Tue Jan 09 2018 10:43:20 70
3 1245678 Building B Wed Jan 10 2018 10:43:20 120
df_pivot = pd.pivot_table(df, index=['Site Ref','Site Name', 'Date'])
df_pivot
Site Ref Site Name Date
1123456 Building D Fri Jan 12 2018 12:22:20 80
Mon Jan 15 2018 11:43:20 90
****Thu Jan 11 2018 10:43:20 80****
Wed Jan 17 2018 10:43:20 220
1212345 Building C Fri Jan 12 2018 10:43:20 100
1234567 Building A Mon Jan 08 2018 10:43:20 120
1245678 Building B Mon Jan 08 2018 10:43:20 120
Tue Jan 09 2018 10:43:20 70
Wed Jan 10 2018 10:43:20 120
答案 0 :(得分:3)
它已排序lexicographically,因为Date
有对象(字符串)dtype
解决方法 - 添加datetime
dtype的新列,在Date
pivot_table
之前使用它,然后删除它:
In [74]: (df.assign(x=pd.to_datetime(df['Date']))
.pivot_table(df, index=['Site Ref','Site Name', 'x', 'Date'])
.reset_index(level='x', drop=True))
Out[74]:
Duration
Site Ref Site Name Date
1123456 Building D Thu Jan 11 2018 10:43:20 80
Fri Jan 12 2018 12:22:20 80
Mon Jan 15 2018 11:43:20 90
Wed Jan 17 2018 10:43:20 220
1212345 Building C Fri Jan 12 2018 10:43:20 100
1234567 Building A Mon Jan 08 2018 10:43:20 120
1245678 Building B Mon Jan 08 2018 10:43:20 120
Tue Jan 09 2018 10:43:20 70
Wed Jan 10 2018 10:43:20 120
答案 1 :(得分:1)
按Site Ref对值进行排序,groupby表示使用js-form-submit
即
sort = False
答案 2 :(得分:0)
您需要将日期转换为日期时间值而不是字符串。以下内容适用于您当前的数据透视表:
df_pivot.reset_index(inplace=True)
df_pivot['Date'] = pd.to_datetime(df_pivot['Date'])
df_pivot.sort_values(by=['Site Ref', 'Date'], inplace=True)