我一直在为此打自己。尽管我认为我是对的,但我并没有说服我,我想与他人分享我的解决方案,告诉别人是或不是?
我想沿着时间序列索引以相同的开始时间和结束时间连接多个数据帧,但是每个数据帧的长度都不同。然后,我想确保针对丢失的时间戳重新调整时间序列中的所有中断,并与原始数据帧中的数据相关地对丢失的值进行填充。
DataFrame1
Time O H L C Symbol
00:00:00 2 3 1 1 XXX/XXX
01:00:00 1 4 1 1 XXX/XXX
02:00:00 1 4 1 1 XXX/XXX
03:00:00 1 4 1 1 XXX/XXX
04:00:00 2 3 1 1 XXX/XXX
05:00:00 1 3 1 1 XXX/XXX
06:00:00 1 3 1 1 XXX/XXX
07:00:00 2 4 1 1 XXX/XXX
08:00:00 2 3 1 1 XXX/XXX
09:00:00 1 4 1 1 XXX/XXX
10:00:00 1 3 1 1 XXX/XXX
11:00:00 2 4 1 1 XXX/XXX
12:00:00 1 4 1 1 XXX/XXX
13:00:00 2 3 1 1 XXX/XXX
14:00:00 2 4 1 1 XXX/XXX
Len = 15
DataFrame2:
Time O H L C Symbol
00:00:00 2 3 1 1 XXX/YYY
01:00:00 1 4 1 1 XXX/YYY
02:00:00 1 4 1 1 XXX/YYY
03:00:00 1 4 1 1 XXX/YYY
04:00:00 2 3 1 1 XXX/YYY
06:00:00 1 3 1 1 XXX/YYY
07:00:00 1 3 1 1 XXX/YYY
08:00:00 2 4 1 1 XXX/YYY
09:00:00 2 3 1 1 XXX/YYY
10:00:00 1 4 1 1 XXX/YYY
12:00:00 1 3 1 1 XXX/YYY
13:00:00 2 4 1 1 XXX/YYY
14:00:00 1 4 1 1 XXX/YYY
Len = 13
DataFrame3:
Time O H L C Symbol
00:00:00 2 3 1 1 XXX/ZZZ
02:00:00 1 4 1 1 XXX/ZZZ
03:00:00 1 4 1 1 XXX/ZZZ
04:00:00 1 4 1 1 XXX/ZZZ
05:00:00 2 3 1 1 XXX/ZZZ
06:00:00 1 3 1 1 XXX/ZZZ
07:00:00 1 3 1 1 XXX/ZZZ
08:00:00 2 4 1 1 XXX/ZZZ
10:00:00 1 4 1 1 XXX/ZZZ
11:00:00 1 3 1 1 XXX/ZZZ
12:00:00 2 4 1 1 XXX/ZZZ
14:00:00 1 4 1 1 XXX/ZZZ
Len = 12
最终结果应为: Aligned dataframe which shows all data before padding forward
Time O H L C Symbol Time O H L C Symbol Time O H L C Symbol
00:00:00 2 3 1 1 XXX/XXX 00:00:00 2 3 1 1 XXX/YYY 00:00:00 2 3 1 1 XXX/ZZZ
01:00:00 1 4 1 1 XXX/XXX 01:00:00 1 4 1 1 XXX/YYY 01:00:00 nan nan nan nan nan
02:00:00 1 4 1 1 XXX/XXX 02:00:00 1 4 1 1 XXX/YYY 02:00:00 1 4 1 1 XXX/ZZZ
03:00:00 1 4 1 1 XXX/XXX 03:00:00 1 4 1 1 XXX/YYY 03:00:00 1 4 1 1 XXX/ZZZ
04:00:00 2 3 1 1 XXX/XXX 04:00:00 2 3 1 1 XXX/YYY 04:00:00 1 4 1 1 XXX/ZZZ
05:00:00 1 3 1 1 XXX/XXX 05:00:00 nan nan nan nan nan 05:00:00 2 3 1 1 XXX/ZZZ
06:00:00 1 3 1 1 XXX/XXX 06:00:00 1 3 1 1 XXX/YYY 06:00:00 1 3 1 1 XXX/ZZZ
07:00:00 2 4 1 1 XXX/XXX 07:00:00 1 3 1 1 XXX/YYY 07:00:00 1 3 1 1 XXX/ZZZ
08:00:00 2 3 1 1 XXX/XXX 08:00:00 2 4 1 1 XXX/YYY 08:00:00 2 4 1 1 XXX/ZZZ
09:00:00 1 4 1 1 XXX/XXX 09:00:00 2 3 1 1 XXX/YYY 09:00:00 nan nan nan nan nan
10:00:00 1 3 1 1 XXX/XXX 10:00:00 1 4 1 1 XXX/YYY 10:00:00 1 4 1 1 XXX/ZZZ
11:00:00 2 4 1 1 XXX/XXX 11:00:00 nan nan nan nan nan 11:00:00 1 3 1 1 XXX/ZZZ
12:00:00 1 4 1 1 XXX/XXX 12:00:00 1 3 1 1 XXX/YYY 12:00:00 2 4 1 1 XXX/ZZZ
13:00:00 2 3 1 1 XXX/XXX 13:00:00 2 4 1 1 XXX/YYY 13:00:00 nan nan nan nan nan
14:00:00 2 4 1 1 XXX/XXX 14:00:00 1 4 1 1 XXX/YYY 14:00:00 1 4 1 1 XXX/ZZZ
我采用的方法是: 要沿时间索引连接每个dataFrame
> table =
> DataTableEurUsd.reset_index("Time").join(DataTableAudUsd.reset_index("Time"),
> lsuffix="_y", rsuffix="_x").join(DataTableEurChf.reset_index("Time"),
> lsuffix="_y", rsuffix="_x")
位置:
DataTableEurUsd =
Open High Low Close RealVolume Spread TickVolume Symbol
Time
2010.01.04 00:00:00 1.43259 1.43336 1.43151 1.43153 0.0 12.0 969.0 EURUSD
2010.01.04 01:00:00 1.43151 1.43153 1.42879 1.42886 0.0 15.0 2098.0 EURUSD
2010.01.04 02:00:00 1.42885 1.42885 1.42569 1.42705 0.0 15.0 2082.0 EURUSD
2010.01.04 03:00:00 1.42702 1.42989 1.42700 1.42939 0.0 14.0 1544.0 EURUSD
2010.01.04 05:00:00 1.42938 1.42968 1.42718 1.42848 0.0 15.0 1131.0 EURUSD
DataTableAudUsd =
Open High Low Close RealVolume Spread TickVolume Symbol
Time
2010.01.04 00:00:00 0.89938 0.89953 0.89709 0.89711 0.0 30.0 1144.0 AUDUSD
2010.01.04 01:00:00 0.89712 0.89795 0.89612 0.89632 0.0 35.0 1735.0 AUDUSD
2010.01.04 02:00:00 0.89634 0.89645 0.89372 0.89500 0.0 30.0 1771.0 AUDUSD
2010.01.04 04:00:00 0.89502 0.89653 0.89502 0.89613 0.0 35.0 1242.0 AUDUSD
2010.01.04 05:00:00 0.89611 0.89648 0.89479 0.89633 0.0 30.0 663.0 AUDUSD
DataTableEurChf =
Open High Low Close RealVolume Spread TickVolume Symbol
Time
2010.01.04 00:00:00 1.48238 1.48354 1.48227 1.48334 0.0 36.0 1232.0 EURCHF
2010.01.04 02:00:00 1.48327 1.48470 1.48087 1.48250 0.0 34.0 2186.0 EURCHF
2010.01.04 03:00:00 1.48251 1.48311 1.48150 1.48294 0.0 34.0 1939.0 EURCHF
2010.01.04 04:00:00 1.48292 1.48317 1.48114 1.48239 0.0 34.0 1510.0 EURCHF
2010.01.04 05:00:00 1.48235 1.48245 1.48150 1.48181 0.0 34.0 1230.0 EURCHF
然后我将在Nan上前进
table = table.fillna(method='ffill')
我想确保所有原始数据都保留在正确的位置,并且时间序列索引填充了索引上缺少的小时,如我发布的excel屏幕截图中所示。
如果不清楚,我很乐意发布更多信息以帮助解释。
最良好的祝愿