我有两个数据框。
pd.DataFrame({'date': {10: Timestamp('2019-01-01 10:00:00'), 52: Timestamp('2019-01-03 04:00:00'), 54: Timestamp('2019-01-03 06:00:00'), 72: Timestamp('2019-01-04 00:00:00'), 74: Timestamp('2019-01-04 02:00:00')}, 'value_1': {10: 4380.0, 52: 4440.0, 54: 4630.0, 72: 4540.0, 74: 4460.0}, 'value_2': {10: 5, 52: 5, 54: 1, 72: 5, 74: 1}})
DF1
date value_1 value_2
10 2019-01-01 10:00:00 4380.0 5
52 2019-01-03 04:00:00 4440.0 5
54 2019-01-03 06:00:00 4630.0 1
72 2019-01-04 00:00:00 4540.0 5
74 2019-01-04 02:00:00 4460.0 1
DF2包含与DF1相同的日期列,起始日期为2019-01-01 00:00:00,结束于2019-12-31 00:00:00,以及其他不常见的列。
如果DF1和DF2中的日期匹配,我将DF1中的values_1的值放入DF2中,如下所示:
DF2['value_1'] = DF2['date'].map(DF1.set_index('date')['value_1'])
现在,我尝试将匹配日期的最后30分钟内的相同值放入DF2。换句话说,如果匹配的日期和时间为2019-01-01 10:00:00
,而value_1为4380.0
。然后,对于DF2中4380.0
到2019-01-01 09:30:00
日期的日期,value_1列应为2019-01-01 10:00:00
。
我该怎么做?
答案 0 :(得分:1)
我认为您需要merge_asof
,其默认值为direction='backward'
,然后是direction='forward'
,并按DataFrame.combine_first
合并两个DataFrame:
DF1 = pd.DataFrame({'date': {10: pd.Timestamp('2019-01-01 10:00:00'), 52: pd.Timestamp('2019-01-03 04:00:00'), 54: pd.Timestamp('2019-01-03 06:00:00'), 72: pd.Timestamp('2019-01-04 00:00:00'), 74: pd.Timestamp('2019-01-04 02:00:00')}, 'value_1': {10: 4380.0, 52: 4440.0, 54: 4630.0, 72: 4540.0, 74: 4460.0}, 'value_2': {10: 5, 52: 5, 54: 1, 72: 5, 74: 1}})
#small data for test
DF2 = pd.DataFrame({'date':pd.date_range('2019-01-01 08:00:00',
'2019-01-01 12:00:00', freq='20Min')})
print (DF2)
date
0 2019-01-01 08:00:00
1 2019-01-01 08:20:00
2 2019-01-01 08:40:00
3 2019-01-01 09:00:00
4 2019-01-01 09:20:00
5 2019-01-01 09:40:00
6 2019-01-01 10:00:00
7 2019-01-01 10:20:00
8 2019-01-01 10:40:00
9 2019-01-01 11:00:00
10 2019-01-01 11:20:00
11 2019-01-01 11:40:00
12 2019-01-01 12:00:00
df1 = pd.merge_asof(DF2, DF1, on='date', tolerance=pd.Timedelta('30Min'))
df2 = pd.merge_asof(DF2, DF1, on='date', tolerance=pd.Timedelta('30Min'), direction='forward')
df = df1.combine_first(df2)
print (df)
date value_1 value_2
0 2019-01-01 08:00:00 NaN NaN
1 2019-01-01 08:20:00 NaN NaN
2 2019-01-01 08:40:00 NaN NaN
3 2019-01-01 09:00:00 NaN NaN
4 2019-01-01 09:20:00 NaN NaN
5 2019-01-01 09:40:00 4380.0 5.0
6 2019-01-01 10:00:00 4380.0 5.0
7 2019-01-01 10:20:00 4380.0 5.0
8 2019-01-01 10:40:00 NaN NaN
9 2019-01-01 11:00:00 NaN NaN
10 2019-01-01 11:20:00 NaN NaN
11 2019-01-01 11:40:00 NaN NaN
12 2019-01-01 12:00:00 NaN NaN