pandas:组合两个索引列

时间:2018-03-11 07:38:47

标签: python pandas

我有以下熊猫系列:

data = {(pd.Timestamp('2016-01-01 00:00:00'), datetime.time(0, 0)): 6.885,
        (pd.Timestamp('2016-01-01 00:00:00'), datetime.time(0, 5)): 6.363, 
        (pd.Timestamp('2016-01-01 00:00:00'), datetime.time(0, 10)): 6.093,
        (pd.Timestamp('2016-01-01 00:00:00'), datetime.time(0, 15)): 6.768, 
        (pd.Timestamp('2016-01-01 00:00:00'), datetime.time(0, 20)): 7.11}
s = pd.Series(data)

2016-01-01  00:00:00    6.885
            00:05:00    6.363
            00:10:00    6.093
            00:15:00    6.768
            00:20:00    7.110
dtype: float64

如何组合两个索引列以创建DatetimeIndex,如下所示:

2016-01-01 00:00:00    6.885
2016-01-01 00:05:00    6.363
2016-01-01 00:10:00    6.093
2016-01-01 00:15:00    6.768
2016-01-01 00:20:00    7.110
dtype: float64

2 个答案:

答案 0 :(得分:4)

MultiIndex的第二个值转换为to_timedelta并添加到第一级:

s.index=s.index.get_level_values(0)+pd.to_timedelta(s.index.get_level_values(1).astype(str))
print (s)
2016-01-01 00:00:00    6.885
2016-01-01 00:05:00    6.363
2016-01-01 00:10:00    6.093
2016-01-01 00:15:00    6.768
2016-01-01 00:20:00    7.110
Freq: 5T, dtype: float64

答案 1 :(得分:4)

直观的答案
使用pd.Index.mappd.Timedelta

s.index = s.index.map(lambda t: t[0] + pd.Timedelta(str(t[1])))
s

2016-01-01 00:00:00    6.885
2016-01-01 00:05:00    6.363
2016-01-01 00:10:00    6.093
2016-01-01 00:15:00    6.768
2016-01-01 00:20:00    7.110
dtype: float64

快速回答
如果速度是你的追求,试试这个

t = np.array(
    [t.hour * 60 + t.minute for t in s.index.get_level_values(1)],
    'timedelta64[m]'
)

s.index = s.index.get_level_values(0) + t

2016-01-01 00:00:00    6.885
2016-01-01 00:05:00    6.363
2016-01-01 00:10:00    6.093
2016-01-01 00:15:00    6.768
2016-01-01 00:20:00    7.110
dtype: float64

时间测试

请注意,如果您关心优化,则。否则,请使用您认为正确的选择。

jez = lambda s: s.index.get_level_values(0) + pd.to_timedelta(s.index.get_level_values(1).astype(str))
pir1 = lambda s: s.index.map(lambda t: t[0] + pd.Timedelta(str(t[1])))
pir2 = lambda s: s.index.get_level_values(0) + np.array([t.hour * 60 + t.minute for t in s.index.get_level_values(1)], 'timedelta64[m]')

res = pd.DataFrame(
    np.nan, [10, 30, 100, 300, 1000, 3000, 10000, 30000],
    'jez pir1 pir2'.split()
)

for i in res.index:
    s_ = pd.concat([s] * i)
    for j in res.columns:
        stmt = f'{j}(s_)'
        setp = f'from __main__ import {j}, s_'
        res.at[i, j] = timeit(stmt, setp, number=100)

res.plot(loglog=True)

enter image description here

res.div(res.min(1), 0)

             jez       pir1  pir2
10      2.400808   3.530032   1.0
30      4.045287   8.378484   1.0
100     6.337601  18.610263   1.0
300     8.664829  30.363422   1.0
1000   11.593935  44.210358   1.0
3000   11.899037  47.425953   1.0
10000  12.226166  49.546467   1.0
30000  12.543602  50.730653   1.0