我想堆叠一个数据帧并重新索引。
初始数据框如下所示:
00:00 00:30 01:00 01:30 02:00 02:30 03:00 03:30 04:00
Date
2015-09-30 1.18 1.18 1.21 1.20 1.14 1.22 1.17 1.16 1.18
2015-01-10 1.19 1.22 1.19 1.21 1.15 1.19 1.22 1.18 1.93
2015-02-10 1.19 1.19 1.14 1.16 1.20 1.19 1.13 1.16 1.41
2015-03-10 1.16 1.19 1.16 1.15 1.16 1.16 1.18 1.12 1.16
2015-04-10 1.21 1.22 1.15 1.18 1.21 1.15 1.21 1.17 1.14
2015-05-10 1.18 1.20 1.14 1.19 1.13 1.23 1.18 1.13 1.98
2015-06-10 2.19 1.90 1.25 1.21 1.25 1.22 1.18 1.22 1.26
stacked = df.stack()
之后我得到以下内容:
Date
2015-09-30 00:00 1.18
00:30 1.18
01:00 1.21
01:30 1.20
02:00 1.14
02:30 1.22
03:00 1.17
03:30 1.16
04:00 1.18
04:30 3.21
05:00 13.70
05:30 10.55
06:00 6.77
06:30 4.69
07:00 3.52
07:30 3.04
08:00 5.42
08:30 4.92
09:00 5.31
09:30 5.89
10:00 5.61
10:30 5.48
11:00 4.15
11:30 4.13
12:00 5.40
12:30 6.22
13:00 4.98
13:30 4.12
14:00 4.32
14:30 5.29
2016-01-07 09:00 6.36
09:30 6.74
这是我所期望的,但后来我想重新索引,所以索引是正确的时间戳,如YYYY-MM-DD HH:mm:ss
。我试过了:
date_index = pd.date_range(data.index[0], end='2016-01-07 23:30:00', freq='30T' )
stackedreindex(date_index)
但我收到了错误
ValueError: cannot include dtype 'M' in a buffer
有什么想法吗?感谢
答案 0 :(得分:3)
让我们创建一个类似于OP的数据框,
import pandas as pd
cols = ['00:00', '00:30', '01:00', '01:30', '02:00', '02:30', '03:00', '03:30', '04:00']
df = pd.DataFrame(columns = cols)
df.ix['2015-09-30'] = [1.18, 1.18, 1.21, 1.20, 1.14, 1.22, 1.17, 1.16, 1.18]
df.ix['2015-01-10'] = [1.18, 1.18, 1.21, 1.20, 1.14, 1.22, 1.17, 1.16, 1.18]
df.ix['2015-02-10'] = [1.18, 1.18, 1.21, 1.20, 1.14, 1.22, 1.17, 1.16, 1.18]
df.ix['2015-03-10'] = [1.18, 1.18, 1.21, 1.20, 1.14, 1.22, 1.17, 1.16, 1.18]
df.ix['2015-04-10'] = [1.18, 1.18, 1.21, 1.20, 1.14, 1.22, 1.17, 1.16, 1.18]
df.ix['2015-05-30'] = [1.18, 1.18, 1.21, 1.20, 1.14, 1.22, 1.17, 1.16, 1.18]
当我们堆叠这个数据帧时,我们得到一个多级索引。可以使用'reset_index'将其转换为单级索引。 'reset_index'允许我们以列的形式访问级别,如此,
stacked = df.stack().reset_index()
print (stacked.head())
level_0 level_1 0
0 2015-09-30 00:00 1.18
1 2015-09-30 00:30 1.18
2 2015-09-30 01:00 1.21
3 2015-09-30 01:30 1.20
4 2015-09-30 02:00 1.14
现在,我们可以合并level_0和level_1,将其转换为日期时间索引,删除level_0和level_1列以获得所需的结果,如下所示,
new_index = pd.to_datetime(stacked.level_0.astype('str')+' '+stacked.level_1.astype('str'))
stacked.set_index(new_index, inplace = True)
stacked.drop(['level_0', 'level_1'], axis =1, inplace = True)
print (stacked.head())
0
2015-09-30 00:00:00 1.18
2015-09-30 00:30:00 1.18
2015-09-30 01:00:00 1.21
2015-09-30 01:30:00 1.20
2015-09-30 02:00:00 1.14