我创建了两个数据框,它们都覆盖了2012年全年,而日期时间作为重叠的列。
df1行/样本以毫秒为单位,而df2每15分钟就有一行。
它们显然重叠,但是如何将它们合并到基于Time将df2行插入DF1的位置。
我尝试过合并外部,应该是正确的选择,但是我也尝试过“内部”,“左”甚至“右”。
它将添加df2中的列,但似乎不添加df2中的行
我只添加了一个数据集样本,因为df1拥有超过1亿个样本。
这是10.000的csv:
df1
df2
帮助将不胜感激:-)
import pandas as pd
import datetime
def mergeDF(lowTF,highTF):
tf_merge = pd.merge(lowTF, highTF, on='Time', how='outer')
fill_merge = tf_merge.fillna(method='ffill')
return fill_merge
df1:
Time,Year,Month,Day,Hour
2012-01-09 00:00:00.653,2012,1,9,0
2012-01-09 00:00:01.388,2012,1,9,0
2012-01-09 00:00:01.739,2012,1,9,0
2012-01-09 00:00:02.265,2012,1,9,0
2012-01-09 00:00:03.349,2012,1,9,0
2012-01-09 00:00:03.489,2012,1,9,0
2012-01-09 00:00:04.311,2012,1,9,0
2012-01-09 00:00:04.719,2012,1,9,0
2012-01-09 00:00:05.384,2012,1,9,0
2012-01-09 00:00:05.800,2012,1,9,0
df2:
Time,DayOfWeak,ext_Volume,15_Absorption Volume,15_Bag Holding
2012-01-09 00:00:00,1,679,0,0
2012-01-09 00:15:00,1,988,0,0
2012-01-09 00:30:00,1,718,0,0
2012-01-09 00:45:00,1,583,0,0
2012-01-09 01:00:00,1,885,0,0
2012-01-09 01:15:00,1,589,0,0
2012-01-09 01:30:00,1,611,0,0
2012-01-09 01:45:00,1,620,0,0
2012-01-09 02:00:00,1,657,0,0
2012-01-09 02:15:00,1,691,0,0
-
merged = mergeDF(df1,df2)
merged
Time,Year,Month,Day,Hour,DayOfWeak,ext_Volume,15_Absorption Volume,15_Bag Holding
2012-01-09 00:00:00.653,2012,1,9,0,,,,
2012-01-09 00:00:01.388,2012,1,9,0,,,,
2012-01-09 00:00:01.739,2012,1,9,0,,,,
2012-01-09 00:00:02.265,2012,1,9,0,,,,
2012-01-09 00:00:03.349,2012,1,9,0,,,,
2012-01-09 00:00:03.489,2012,1,9,0,,,,
2012-01-09 00:00:04.311,2012,1,9,0,,,,
2012-01-09 00:00:04.719,2012,1,9,0,,,,
2012-01-09 00:00:05.384,2012,1,9,0,,,,
2012-01-09 00:00:05.800,2012,1,9,0,,,,
答案 0 :(得分:0)
我认为,最直观的方法是:
pd.merge_asof(DF1, DF2, on='Time')
为了展示更具启发性的示例,我最后更改了 minute 在 DF1 到 15 的两行中得到:
Time Year Month Day Volume DayOfWeek_x ext_Volume 15_Absorption Volume
0 2012-01-09 00:00:00.653 2012 1 9 3 1 679 0
1 2012-01-09 00:00:01.388 2012 1 9 2 1 679 0
2 2012-01-09 00:00:01.739 2012 1 9 2 1 679 0
3 2012-01-09 00:15:02.265 2012 1 9 2 1 988 0
4 2012-01-09 00:15:03.349 2012 1 9 2 1 988 0
如您所见,索引为 0 , 1 和 2 的行与 Time == 00:00:00 ,而最后2个具有 Time == 00:15:00 , 在 ext_Volume 列上易于验证的内容。