我有以下熊猫系列:
data = {(pd.Timestamp('2016-01-01 00:00:00'), datetime.time(0, 0)): 6.885,
(pd.Timestamp('2016-01-01 00:00:00'), datetime.time(0, 5)): 6.363,
(pd.Timestamp('2016-01-01 00:00:00'), datetime.time(0, 10)): 6.093,
(pd.Timestamp('2016-01-01 00:00:00'), datetime.time(0, 15)): 6.768,
(pd.Timestamp('2016-01-01 00:00:00'), datetime.time(0, 20)): 7.11}
s = pd.Series(data)
2016-01-01 00:00:00 6.885
00:05:00 6.363
00:10:00 6.093
00:15:00 6.768
00:20:00 7.110
dtype: float64
如何组合两个索引列以创建DatetimeIndex
,如下所示:
2016-01-01 00:00:00 6.885
2016-01-01 00:05:00 6.363
2016-01-01 00:10:00 6.093
2016-01-01 00:15:00 6.768
2016-01-01 00:20:00 7.110
dtype: float64
答案 0 :(得分:4)
将MultiIndex
的第二个值转换为to_timedelta
并添加到第一级:
s.index=s.index.get_level_values(0)+pd.to_timedelta(s.index.get_level_values(1).astype(str))
print (s)
2016-01-01 00:00:00 6.885
2016-01-01 00:05:00 6.363
2016-01-01 00:10:00 6.093
2016-01-01 00:15:00 6.768
2016-01-01 00:20:00 7.110
Freq: 5T, dtype: float64
答案 1 :(得分:4)
直观的答案
使用pd.Index.map
和pd.Timedelta
s.index = s.index.map(lambda t: t[0] + pd.Timedelta(str(t[1])))
s
2016-01-01 00:00:00 6.885
2016-01-01 00:05:00 6.363
2016-01-01 00:10:00 6.093
2016-01-01 00:15:00 6.768
2016-01-01 00:20:00 7.110
dtype: float64
快速回答
如果速度是你的追求,试试这个
t = np.array(
[t.hour * 60 + t.minute for t in s.index.get_level_values(1)],
'timedelta64[m]'
)
s.index = s.index.get_level_values(0) + t
2016-01-01 00:00:00 6.885
2016-01-01 00:05:00 6.363
2016-01-01 00:10:00 6.093
2016-01-01 00:15:00 6.768
2016-01-01 00:20:00 7.110
dtype: float64
请注意,如果您关心优化,则仅。否则,请使用您认为正确的选择。
jez = lambda s: s.index.get_level_values(0) + pd.to_timedelta(s.index.get_level_values(1).astype(str))
pir1 = lambda s: s.index.map(lambda t: t[0] + pd.Timedelta(str(t[1])))
pir2 = lambda s: s.index.get_level_values(0) + np.array([t.hour * 60 + t.minute for t in s.index.get_level_values(1)], 'timedelta64[m]')
res = pd.DataFrame(
np.nan, [10, 30, 100, 300, 1000, 3000, 10000, 30000],
'jez pir1 pir2'.split()
)
for i in res.index:
s_ = pd.concat([s] * i)
for j in res.columns:
stmt = f'{j}(s_)'
setp = f'from __main__ import {j}, s_'
res.at[i, j] = timeit(stmt, setp, number=100)
res.plot(loglog=True)
res.div(res.min(1), 0)
jez pir1 pir2
10 2.400808 3.530032 1.0
30 4.045287 8.378484 1.0
100 6.337601 18.610263 1.0
300 8.664829 30.363422 1.0
1000 11.593935 44.210358 1.0
3000 11.899037 47.425953 1.0
10000 12.226166 49.546467 1.0
30000 12.543602 50.730653 1.0