有两个数据框,它们的日期时间对象都以5分钟df_05min
或15分钟df_15min
的增量递增。
df_05min = pd.DataFrame({'dt':['2008-10-2404:12:30',
'2008-10-2404:12:35',
'2008-10-2404:12:40',
'2008-10-2404:12:45',
'2008-10-2404:12:50',
'2008-10-2404:13:00',
'2008-10-2404:13:05']})
df_15min = pd.DataFrame([['2008-10-2404:12:15', 'L'],
['2008-10-2404:12:30', 'r'],
['2008-10-2404:12:45', 'S' ],
['2008-10-2404:13:00', 'L'],
['2008-10-2404:13:15', 'L' ]], columns=['dt','col'])
目标是将df_15min
数据帧合并到datetime列df_05min
上的dt
数据帧中,并将一些附带的数据复制到适当的行中。这是外部合并的替代方式,在外部合并中,不匹配的值将获得NaN
。例如,在df_15min
中,“ 2008-10-2404:12:30”具有一个值np.nan
,我想将其复制到属于{{1 }}。这意味着12:30、12:35和12:40的值均为df_05min
。
所需的最终产品如下:
np.nan
答案 0 :(得分:1)
尝试将merge
与how='outer'
,fillna
和sort_values
结合使用:
print(df_05min.merge(df_15min,how='outer').ffill().sort_values('dt'))
输出:
dt col
7 2008-10-2404:12:15 L
0 2008-10-2404:12:30 r
1 2008-10-2404:12:35 r
2 2008-10-2404:12:40 r
3 2008-10-2404:12:45 S
4 2008-10-2404:12:50 S
5 2008-10-2404:13:00 L
6 2008-10-2404:13:05 L
8 2008-10-2404:13:15 L
如果您关心索引,请使用:
print(df_05min.merge(df_15min,how='outer').ffill().sort_values('dt').reset_index(drop=True))
答案 1 :(得分:1)
这里需要merge_asof
和外部联接,但尚未实现,因此可能的解决方案是DataFrame.merge
,按DataFrame.sort_values
排序,向前填充缺失值,最后按{{3}创建默认索引}}:
df_05min = pd.DataFrame({'dt':['2008-10-24 04:12:30',
'2008-10-24 04:12:35',
'2008-10-24 04:12:40',
'2008-10-24 04:12:45',
'2008-10-24 04:12:50',
'2008-10-24 04:13:00',
'2008-10-24 04:13:05']})
df_15min = pd.DataFrame([['2008-10-24 04:12:15', 'L'],
['2008-10-24 04:12:30', 'r'],
['2008-10-24 04:12:45', 'S' ],
['2008-10-24 04:13:00', 'L'],
['2008-10-24 04:13:15', 'L' ]], columns=['dt','col'])
df_05min['dt'] = pd.to_datetime(df_05min['dt'])
df_15min['dt'] = pd.to_datetime(df_15min['dt'])
df=pd.merge(df_05min, df_15min, how='outer').sort_values('dt').ffill().reset_index(drop=True)
print (df)
dt col
0 2008-10-24 04:12:15 L
1 2008-10-24 04:12:30 r
2 2008-10-24 04:12:35 r
3 2008-10-24 04:12:40 r
4 2008-10-24 04:12:45 S
5 2008-10-24 04:12:50 S
6 2008-10-24 04:13:00 L
7 2008-10-24 04:13:05 L
8 2008-10-24 04:13:15 L