熊猫:如何在两个不同的重叠时间序列上合并两个数据框

时间:2020-01-01 14:13:01

标签: pandas dataframe merge time-series

我创建了两个数据框,它们都覆盖了2012年全年,而日期时间作为重叠的列。
df1行/样本以毫秒为单位,而df2每15分钟就有一行。
它们显然重叠,但是如何将它们合并到基于Time将df2行插入DF1的位置。
我尝试过合并外部,应该是正确的选择,但是我也尝试过“内部”,“左”甚至“右”。

它将添加df2中的列,但似乎不添加df2中的行
我只添加了一个数据集样本,因为df1拥有超过1亿个样本。
这是10.000的csv:
df1
df2
帮助将不胜感激:-)

import pandas as pd
import datetime

def mergeDF(lowTF,highTF):
    tf_merge = pd.merge(lowTF, highTF, on='Time', how='outer')
    fill_merge = tf_merge.fillna(method='ffill')
    return fill_merge

df1:

Time,Year,Month,Day,Hour
2012-01-09 00:00:00.653,2012,1,9,0
2012-01-09 00:00:01.388,2012,1,9,0
2012-01-09 00:00:01.739,2012,1,9,0
2012-01-09 00:00:02.265,2012,1,9,0
2012-01-09 00:00:03.349,2012,1,9,0
2012-01-09 00:00:03.489,2012,1,9,0
2012-01-09 00:00:04.311,2012,1,9,0
2012-01-09 00:00:04.719,2012,1,9,0
2012-01-09 00:00:05.384,2012,1,9,0
2012-01-09 00:00:05.800,2012,1,9,0

df2:

Time,DayOfWeak,ext_Volume,15_Absorption Volume,15_Bag Holding
2012-01-09 00:00:00,1,679,0,0
2012-01-09 00:15:00,1,988,0,0
2012-01-09 00:30:00,1,718,0,0
2012-01-09 00:45:00,1,583,0,0
2012-01-09 01:00:00,1,885,0,0
2012-01-09 01:15:00,1,589,0,0
2012-01-09 01:30:00,1,611,0,0
2012-01-09 01:45:00,1,620,0,0
2012-01-09 02:00:00,1,657,0,0
2012-01-09 02:15:00,1,691,0,0

-

merged = mergeDF(df1,df2)
merged

Time,Year,Month,Day,Hour,DayOfWeak,ext_Volume,15_Absorption Volume,15_Bag Holding
2012-01-09 00:00:00.653,2012,1,9,0,,,,
2012-01-09 00:00:01.388,2012,1,9,0,,,,
2012-01-09 00:00:01.739,2012,1,9,0,,,,
2012-01-09 00:00:02.265,2012,1,9,0,,,,
2012-01-09 00:00:03.349,2012,1,9,0,,,,
2012-01-09 00:00:03.489,2012,1,9,0,,,,
2012-01-09 00:00:04.311,2012,1,9,0,,,,
2012-01-09 00:00:04.719,2012,1,9,0,,,,
2012-01-09 00:00:05.384,2012,1,9,0,,,,
2012-01-09 00:00:05.800,2012,1,9,0,,,,

1 个答案:

答案 0 :(得分:0)

我认为,最直观的方法是:

pd.merge_asof(DF1, DF2, on='Time')

为了展示更具启发性的示例,我最后更改了 minute DF1 15 的两行中得到:

                     Time  Year  Month  Day  Volume  DayOfWeek_x  ext_Volume  15_Absorption Volume
0 2012-01-09 00:00:00.653  2012      1    9       3            1         679                     0
1 2012-01-09 00:00:01.388  2012      1    9       2            1         679                     0
2 2012-01-09 00:00:01.739  2012      1    9       2            1         679                     0
3 2012-01-09 00:15:02.265  2012      1    9       2            1         988                     0
4 2012-01-09 00:15:03.349  2012      1    9       2            1         988                     0

如您所见,索引为 0 1 2 的行与 Time == 00:00:00 ,而最后2个具有 Time == 00:15:00 , 在 ext_Volume 列上易于验证的内容。