重新索引堆叠的DataFrame

时间:2016-08-03 15:27:09

标签: python pandas

我想堆叠一个数据帧并重新索引。

初始数据框如下所示:

          00:00  00:30  01:00  01:30  02:00  02:30  03:00  03:30  04:00  
Date                                                                        
2015-09-30   1.18   1.18   1.21   1.20   1.14   1.22   1.17   1.16   1.18   
2015-01-10   1.19   1.22   1.19   1.21   1.15   1.19   1.22   1.18   1.93   
2015-02-10   1.19   1.19   1.14   1.16   1.20   1.19   1.13   1.16   1.41   
2015-03-10   1.16   1.19   1.16   1.15   1.16   1.16   1.18   1.12   1.16   
2015-04-10   1.21   1.22   1.15   1.18   1.21   1.15   1.21   1.17   1.14   
2015-05-10   1.18   1.20   1.14   1.19   1.13   1.23   1.18   1.13   1.98   
2015-06-10   2.19   1.90   1.25   1.21   1.25   1.22   1.18   1.22   1.26 

stacked = df.stack()之后我得到以下内容:

Date             
2015-09-30  00:00     1.18
            00:30     1.18
            01:00     1.21
            01:30     1.20
            02:00     1.14
            02:30     1.22
            03:00     1.17
            03:30     1.16
            04:00     1.18
            04:30     3.21
            05:00    13.70
            05:30    10.55
            06:00     6.77
            06:30     4.69
            07:00     3.52
            07:30     3.04
            08:00     5.42
            08:30     4.92
            09:00     5.31
            09:30     5.89
            10:00     5.61
            10:30     5.48
            11:00     4.15
            11:30     4.13
            12:00     5.40
            12:30     6.22
            13:00     4.98
            13:30     4.12
            14:00     4.32
            14:30     5.29

2016-01-07  09:00     6.36
            09:30     6.74

这是我所期望的,但后来我想重新索引,所以索引是正确的时间戳,如YYYY-MM-DD HH:mm:ss。我试过了:

date_index = pd.date_range(data.index[0], end='2016-01-07 23:30:00', freq='30T' ) stackedreindex(date_index)

但我收到了错误

ValueError: cannot include dtype 'M' in a buffer

有什么想法吗?感谢

1 个答案:

答案 0 :(得分:3)

让我们创建一个类似于OP的数据框,

import pandas as pd
cols = ['00:00', '00:30', '01:00', '01:30', '02:00', '02:30', '03:00', '03:30', '04:00']
df = pd.DataFrame(columns = cols)
df.ix['2015-09-30'] = [1.18, 1.18, 1.21, 1.20, 1.14, 1.22, 1.17, 1.16, 1.18]
df.ix['2015-01-10'] = [1.18, 1.18, 1.21, 1.20, 1.14, 1.22, 1.17, 1.16, 1.18]
df.ix['2015-02-10'] = [1.18, 1.18, 1.21, 1.20, 1.14, 1.22, 1.17, 1.16, 1.18]
df.ix['2015-03-10'] = [1.18, 1.18, 1.21, 1.20, 1.14, 1.22, 1.17, 1.16, 1.18]
df.ix['2015-04-10'] = [1.18, 1.18, 1.21, 1.20, 1.14, 1.22, 1.17, 1.16, 1.18]
df.ix['2015-05-30'] = [1.18, 1.18, 1.21, 1.20, 1.14, 1.22, 1.17, 1.16, 1.18]

当我们堆叠这个数据帧时,我们得到一个多级索引。可以使用'reset_index'将其转换为单级索引。 'reset_index'允许我们以列的形式访问级别,如此,

stacked = df.stack().reset_index()
print (stacked.head())

      level_0 level_1     0
0  2015-09-30   00:00  1.18
1  2015-09-30   00:30  1.18
2  2015-09-30   01:00  1.21
3  2015-09-30   01:30  1.20
4  2015-09-30   02:00  1.14

现在,我们可以合并level_0和level_1,将其转换为日期时间索引,删除level_0和level_1列以获得所需的结果,如下所示,

new_index = pd.to_datetime(stacked.level_0.astype('str')+' '+stacked.level_1.astype('str'))
stacked.set_index(new_index, inplace = True)
stacked.drop(['level_0', 'level_1'], axis =1, inplace = True)
print (stacked.head())

                        0
2015-09-30 00:00:00  1.18
2015-09-30 00:30:00  1.18
2015-09-30 01:00:00  1.21
2015-09-30 01:30:00  1.20
2015-09-30 02:00:00  1.14