通过添加毫秒来重复复制timeindex

时间:2016-07-27 13:54:29

标签: python pandas

我想将时间日期时间用作主索引,但在那里有很多重复项。我想要的是添加人工毫秒,在每组秒内作为“计数器”。

例如 - 原始数据框如下所示:

                         Bid  BidVol
2016-06-27 13:00:10  4183.50       0
2016-06-27 13:00:10  4183.50       0
2016-06-27 13:00:10  4183.50       0
2016-06-28 13:00:10  4249.25       1
2016-06-28 13:00:10  4249.25       1
2016-06-28 13:00:10  4249.00       1
2016-06-28 13:00:10  4248.75       1
2016-06-28 13:00:10  4248.75       2
2016-06-28 13:00:10  4248.75       1
2016-06-28 13:00:10  4248.75       2
2016-06-28 13:00:12  4248.50       0
2016-06-28 13:00:12  4248.50       0
2016-06-29 13:00:12  4353.75       0
2016-06-29 13:00:12  4353.75       0
2016-06-29 13:00:12  4353.75       0
2016-06-29 13:00:12  4354.00       1
2016-06-29 13:00:12  4354.00       1
2016-06-29 13:00:12  4353.75       0
2016-06-29 13:00:12  4354.00       1
2016-06-29 13:00:12  4354.00       1
2016-06-29 13:00:12  4354.00       1
2016-06-29 13:00:12  4354.00       1
2016-06-30 13:00:10  4394.00       0
2016-06-30 13:00:11  4394.25       1
2016-06-30 13:00:11  4394.00       0

我的目标是将双重行更改为:

2016-06-28 13:00:10
2016-06-28 13:00:10.001000
2016-06-28 13:00:10.002000
2016-06-28 13:00:10.003000
2016-06-28 13:00:10.004000
2016-06-28 13:00:10.005000
2016-06-28 13:00:10.006000

我试图使用groupby函数,我可以使用循环打印毫秒:

for name, group in test.groupby(test.index):
    print ('------')
    i=0
    for idx, values in group.iterrows():
        print (idx+pd.Timedelta(milliseconds=i))
        i+=1

但是我不知道如何更改索引是获得我需要的结果的最有效方法?特别是考虑到效率(主要数据集非常大)。

1 个答案:

答案 0 :(得分:2)

您可以使用cumcount创建ms,将其转换为to_timedelta并添加到index

a = df.groupby(level=0).cumcount()
print (a)
2016-06-27 13:00:10    0
2016-06-27 13:00:10    1
2016-06-27 13:00:10    2
2016-06-28 13:00:10    0
2016-06-28 13:00:10    1
2016-06-28 13:00:10    2
2016-06-28 13:00:10    3
2016-06-28 13:00:10    4
2016-06-28 13:00:10    5
2016-06-28 13:00:10    6
2016-06-28 13:00:12    0
2016-06-28 13:00:12    1
2016-06-29 13:00:12    0
2016-06-29 13:00:12    1
2016-06-29 13:00:12    2
2016-06-29 13:00:12    3
2016-06-29 13:00:12    4
2016-06-29 13:00:12    5
2016-06-29 13:00:12    6
2016-06-29 13:00:12    7
2016-06-29 13:00:12    8
2016-06-29 13:00:12    9
2016-06-30 13:00:10    0
2016-06-30 13:00:11    0
2016-06-30 13:00:11    1
dtype: int64
df.index = df.index + pd.to_timedelta(a, unit='ms')
print (df)
                             Bid  BidVol
2016-06-27 13:00:10.000  4183.50       0
2016-06-27 13:00:10.001  4183.50       0
2016-06-27 13:00:10.002  4183.50       0
2016-06-28 13:00:10.000  4249.25       1
2016-06-28 13:00:10.001  4249.25       1
2016-06-28 13:00:10.002  4249.00       1
2016-06-28 13:00:10.003  4248.75       1
2016-06-28 13:00:10.004  4248.75       2
2016-06-28 13:00:10.005  4248.75       1
2016-06-28 13:00:10.006  4248.75       2
2016-06-28 13:00:12.000  4248.50       0
2016-06-28 13:00:12.001  4248.50       0
2016-06-29 13:00:12.000  4353.75       0
2016-06-29 13:00:12.001  4353.75       0
2016-06-29 13:00:12.002  4353.75       0
2016-06-29 13:00:12.003  4354.00       1
2016-06-29 13:00:12.004  4354.00       1
2016-06-29 13:00:12.005  4353.75       0
2016-06-29 13:00:12.006  4354.00       1
2016-06-29 13:00:12.007  4354.00       1
2016-06-29 13:00:12.008  4354.00       1
2016-06-29 13:00:12.009  4354.00       1
2016-06-30 13:00:10.000  4394.00       0
2016-06-30 13:00:11.000  4394.25       1
2016-06-30 13:00:11.001  4394.00       0