我正在尝试使用Pandas 0.13.0(和numpy 1.8.0)读取最初无序的数据。例如,示例数据如下所示:
date_time, weeks, score
9/16/2013 14:05:00,73,160.9358
10/4/2013 13:20:00,75,159.61304
10/20/2013 13:44:00,78,158.06982
11/9/2013 17:18:00,80,156.30614
12/17/2013 14:20:00,86,158.5123664
9/19/2012 14:18:00,21,155.20384
7/7/2012 14:08:00,10,165.56546
7/11/2012 12:23:00,11,162.0381
7/14/2012 11:30:00,11,162.25856
7/17/2012 14:15:00,12,160.71534
请注意,日期不按顺序排列,日期和周期较晚。
当我读到这些数据时,Pandas会保留原始顺序:
In [9]: df=pd.read_csv('2_decimated.csv')
In [10]: df
Out[10]:
date_time weeks score
0 2013-09-16 14:05:00 73 160.935800
1 2013-10-04 13:20:00 75 159.613040
2 2013-10-20 13:44:00 78 158.069820
3 2013-11-09 17:18:00 80 156.306140
4 2013-12-17 14:20:00 86 158.512366
5 2012-09-19 14:18:00 21 155.203840
6 2012-07-07 14:08:00 10 165.565460
7 2012-07-11 12:23:00 11 162.038100
8 2012-07-14 11:30:00 11 162.258560
9 2012-07-17 14:15:00 12 160.715340
当我致电df.sort(columns='date_time', inplace=True)
时,我得到:
date_time weeks score
6 2012-07-07 14:08:00 10 165.565460
7 2012-07-11 12:23:00 11 162.038100
8 2012-07-14 11:30:00 11 162.258560
9 2012-07-17 14:15:00 12 160.715340
5 2012-09-19 14:18:00 21 155.203840
0 2013-09-16 14:05:00 73 160.935800
1 2013-10-04 13:20:00 75 159.613040
2 2013-10-20 13:44:00 78 158.069820
3 2013-11-09 17:18:00 80 156.306140
4 2013-12-17 14:20:00 86 158.512366
这是我想要的,但我希望date_time是索引,所以这可以是时间序列数据。调用df2.set_index('date_time')
似乎可以做我想要的,即
weeks score
date_time
2012-07-07 14:08:00 10 165.565460
2012-07-11 12:23:00 11 162.038100
2012-07-14 11:30:00 11 162.258560
2012-07-17 14:15:00 12 160.715340
2012-09-19 14:18:00 21 155.203840
2013-09-16 14:05:00 73 160.935800
2013-10-04 13:20:00 75 159.613040
2013-10-20 13:44:00 78 158.069820
2013-11-09 17:18:00 80 156.306140
2013-12-17 14:20:00 86 158.512366
然后调用df.plot()
显示与之前相同的图,然后当我再次调用df来检查它时,df已经丢失了对新索引的跟踪并返回到它的整数索引。从本质上讲,似乎set_index函数的行为并不像我期望的那样。
答案 0 :(得分:1)
set_index()
会返回一个视图,因此您需要致电df.set_index('date_time', inplace=True)
或df = df.set_index('date_time')
。
答案 1 :(得分:0)
我喜欢在DataFrame对象中使用from_csv方法:
In [1]: from pandas import DataFrame as df
In [2]: df.from_csv('2_decimated.csv')
Out[2]:
weeks score
date_time
2013-09-16 14:05:00 73 160.935800
2013-10-04 13:20:00 75 159.613040
2013-10-20 13:44:00 78 158.069820
2013-11-09 17:18:00 80 156.306140
2013-12-17 14:20:00 86 158.512366
2012-09-19 14:18:00 21 155.203840
2012-07-07 14:08:00 10 165.565460
2012-07-11 12:23:00 11 162.038100
2012-07-14 11:30:00 11 162.258560
2012-07-17 14:15:00 12 160.715340
与pd.read_csv对比:
In [3]: import pandas as pd
In [4]: pd.read_csv('2_decimated.csv')
Out[4]:
date_time weeks score
0 9/16/2013 14:05:00 73 160.935800
1 10/4/2013 13:20:00 75 159.613040
2 10/20/2013 13:44:00 78 158.069820
3 11/9/2013 17:18:00 80 156.306140
4 12/17/2013 14:20:00 86 158.512366
5 9/19/2012 14:18:00 21 155.203840
6 7/7/2012 14:08:00 10 165.565460
7 7/11/2012 12:23:00 11 162.038100
8 7/14/2012 11:30:00 11 162.258560
9 7/17/2012 14:15:00 12 160.715340
通过df.from_csv读取csv后,可以使用sort_index()对索引进行排序:
In [5]: df.from_csv('2_decimated.csv').sort_index()
Out[5]:
weeks score
date_time
2012-07-07 14:08:00 10 165.565460
2012-07-11 12:23:00 11 162.038100
2012-07-14 11:30:00 11 162.258560
2012-07-17 14:15:00 12 160.715340
2012-09-19 14:18:00 21 155.203840
2013-09-16 14:05:00 73 160.935800
2013-10-04 13:20:00 75 159.613040
2013-10-20 13:44:00 78 158.069820
2013-11-09 17:18:00 80 156.306140
2013-12-17 14:20:00 86 158.512366
那应该对你有所帮助。如果我完全误解了你的问题,请告诉我。