如何使用Pandas读取乱序数据并对其进行排序?

时间:2014-01-16 07:14:30

标签: python pandas

我正在尝试使用Pandas 0.13.0(和numpy 1.8.0)读取最初无序的数据。例如,示例数据如下所示:

date_time, weeks, score
9/16/2013 14:05:00,73,160.9358
10/4/2013 13:20:00,75,159.61304
10/20/2013 13:44:00,78,158.06982
11/9/2013 17:18:00,80,156.30614
12/17/2013 14:20:00,86,158.5123664
9/19/2012 14:18:00,21,155.20384
7/7/2012 14:08:00,10,165.56546
7/11/2012 12:23:00,11,162.0381
7/14/2012 11:30:00,11,162.25856
7/17/2012 14:15:00,12,160.71534

请注意,日期不按顺序排列,日期和周期较晚。

当我读到这些数据时,Pandas会保留原始顺序:

In [9]: df=pd.read_csv('2_decimated.csv')
In [10]: df
Out[10]: 
             date_time   weeks       score
0  2013-09-16 14:05:00      73  160.935800
1  2013-10-04 13:20:00      75  159.613040
2  2013-10-20 13:44:00      78  158.069820
3  2013-11-09 17:18:00      80  156.306140
4  2013-12-17 14:20:00      86  158.512366
5  2012-09-19 14:18:00      21  155.203840
6  2012-07-07 14:08:00      10  165.565460
7  2012-07-11 12:23:00      11  162.038100
8  2012-07-14 11:30:00      11  162.258560
9  2012-07-17 14:15:00      12  160.715340

当我致电df.sort(columns='date_time', inplace=True)时,我得到:

            date_time   weeks       score
6 2012-07-07 14:08:00      10  165.565460
7 2012-07-11 12:23:00      11  162.038100
8 2012-07-14 11:30:00      11  162.258560
9 2012-07-17 14:15:00      12  160.715340
5 2012-09-19 14:18:00      21  155.203840
0 2013-09-16 14:05:00      73  160.935800
1 2013-10-04 13:20:00      75  159.613040
2 2013-10-20 13:44:00      78  158.069820
3 2013-11-09 17:18:00      80  156.306140
4 2013-12-17 14:20:00      86  158.512366

这是我想要的,但我希望date_time是索引,所以这可以是时间序列数据。调用df2.set_index('date_time')似乎可以做我想要的,即

                      weeks       score
date_time                              
2012-07-07 14:08:00      10  165.565460
2012-07-11 12:23:00      11  162.038100
2012-07-14 11:30:00      11  162.258560
2012-07-17 14:15:00      12  160.715340
2012-09-19 14:18:00      21  155.203840
2013-09-16 14:05:00      73  160.935800
2013-10-04 13:20:00      75  159.613040
2013-10-20 13:44:00      78  158.069820
2013-11-09 17:18:00      80  156.306140
2013-12-17 14:20:00      86  158.512366

然后调用df.plot()显示与之前相同的图,然后当我再次调用df来检查它时,df已经丢失了对新索引的跟踪并返回到它的整数索引。从本质上讲,似乎set_index函数的行为并不像我期望的那样。

2 个答案:

答案 0 :(得分:1)

set_index()会返回一个视图,因此您需要致电df.set_index('date_time', inplace=True)df = df.set_index('date_time')

答案 1 :(得分:0)

我喜欢在DataFrame对象中使用from_csv方法:

In [1]: from pandas import DataFrame as df

In [2]: df.from_csv('2_decimated.csv')
Out[2]: 
                      weeks       score
date_time                              
2013-09-16 14:05:00      73  160.935800
2013-10-04 13:20:00      75  159.613040
2013-10-20 13:44:00      78  158.069820
2013-11-09 17:18:00      80  156.306140
2013-12-17 14:20:00      86  158.512366
2012-09-19 14:18:00      21  155.203840
2012-07-07 14:08:00      10  165.565460
2012-07-11 12:23:00      11  162.038100
2012-07-14 11:30:00      11  162.258560
2012-07-17 14:15:00      12  160.715340

与pd.read_csv对比:

In [3]: import pandas as pd

In [4]: pd.read_csv('2_decimated.csv')
Out[4]: 
             date_time   weeks       score
0   9/16/2013 14:05:00      73  160.935800
1   10/4/2013 13:20:00      75  159.613040
2  10/20/2013 13:44:00      78  158.069820
3   11/9/2013 17:18:00      80  156.306140
4  12/17/2013 14:20:00      86  158.512366
5   9/19/2012 14:18:00      21  155.203840
6    7/7/2012 14:08:00      10  165.565460
7   7/11/2012 12:23:00      11  162.038100
8   7/14/2012 11:30:00      11  162.258560
9   7/17/2012 14:15:00      12  160.715340

通过df.from_csv读取csv后,可以使用sort_index()对索引进行排序:

In [5]: df.from_csv('2_decimated.csv').sort_index()
Out[5]: 
                      weeks       score
date_time                              
2012-07-07 14:08:00      10  165.565460
2012-07-11 12:23:00      11  162.038100
2012-07-14 11:30:00      11  162.258560
2012-07-17 14:15:00      12  160.715340
2012-09-19 14:18:00      21  155.203840
2013-09-16 14:05:00      73  160.935800
2013-10-04 13:20:00      75  159.613040
2013-10-20 13:44:00      78  158.069820
2013-11-09 17:18:00      80  156.306140
2013-12-17 14:20:00      86  158.512366

那应该对你有所帮助。如果我完全误解了你的问题,请告诉我。