df
Id timestamp data Date
30424 30665 2020-01-04 19:40:23.827 17.5 2020-01-04
31054 31295 2020-01-05 22:26:39.860 17.0 2020-01-05
32150 32391 2020-01-06 23:00:14.607 18.0 2020-01-06
33236 33477 2020-01-07 22:52:56.757 18.0 2020-01-07
34314 34555 2020-01-08 20:45:48.927 18.0 2020-01-08
35592 35833 2020-01-09 20:56:21.320 18.0 2020-01-09
36528 36769 2020-01-10 20:41:36.323 19.5 2020-01-10
37054 37295 2020-01-11 19:35:50.553 18.5 2020-01-11
37652 37893 2020-01-12 19:28:22.823 17.0 2020-01-12
38828 39069 2020-01-13 23:48:12.533 21.5 2020-01-13
40004 40245 2020-01-14 22:50:56.873 18.5 2020-01-14
df1
Date data
0 2020-01-04 NaN
1 2020-01-07 NaN
2 2020-01-08 19.0
3 2020-01-09 NaN
4 2020-01-11 NaN
5 2020-01-12 NaN
6 2020-01-16 NaN
7 2020-01-17 NaN
8 2020-01-24 18.5
如果data
的值不是df
,我想用df1['data']
中的值替换df1['data']
中的NaN
。
预期结果:
Id timestamp data Date
30424 30665 2020-01-04 19:40:23.827 17.5 2020-01-04
31054 31295 2020-01-05 22:26:39.860 17.0 2020-01-05
32150 32391 2020-01-06 23:00:14.607 18.0 2020-01-06
33236 33477 2020-01-07 22:52:56.757 18.0 2020-01-07
34314 34555 2020-01-08 20:45:48.927 19.0 2020-01-08 # This row changed
35592 35833 2020-01-09 20:56:21.320 18.0 2020-01-09
36528 36769 2020-01-10 20:41:36.323 19.5 2020-01-10
37054 37295 2020-01-11 19:35:50.553 18.5 2020-01-11
37652 37893 2020-01-12 19:28:22.823 17.0 2020-01-12
38828 39069 2020-01-13 23:48:12.533 21.5 2020-01-13
40004 40245 2020-01-14 22:50:56.873 18.5 2020-01-14
This answer与我的问题类似,但情况不完全相同。
我尝试过:
pd.merge(df, df1, how='left', on='Date')
返回了:
Id timestamp data_x Date data_y
0 30665 2020-01-04 19:40:23.827 17.5 2020-01-04 NaN
1 31295 2020-01-05 22:26:39.860 17.0 2020-01-05 NaN
2 32391 2020-01-06 23:00:14.607 18.0 2020-01-06 NaN
3 33477 2020-01-07 22:52:56.757 18.0 2020-01-07 NaN
4 34555 2020-01-08 20:45:48.927 18.0 2020-01-08 19.0
5 35833 2020-01-09 20:56:21.320 18.0 2020-01-09 NaN
6 36769 2020-01-10 20:41:36.323 19.5 2020-01-10 NaN
7 37295 2020-01-11 19:35:50.553 18.5 2020-01-11 NaN
更新:
尝试:
df['data'] = df['Date'].map(df1.set_index('Date')['data']).fillna(df['Date'])
但data
列似乎有问题:
Id timestamp data Date
30424 30665 2020-01-04 19:40:23.827 1.578096e+18 2020-01-04
31054 31295 2020-01-05 22:26:39.860 1.578182e+18 2020-01-05
32150 32391 2020-01-06 23:00:14.607 1.578269e+18 2020-01-06
33236 33477 2020-01-07 22:52:56.757 1.578355e+18 2020-01-07
34314 34555 2020-01-08 20:45:48.927 1.900000e+01 2020-01-08
35592 35833 2020-01-09 20:56:21.320 1.578528e+18 2020-01-09
36528 36769 2020-01-10 20:41:36.323 1.578614e+18 2020-01-10
答案 0 :(得分:1)
如果没有匹配的缺失值,请首先在Date
列中使用Series.map
,因此将数据用Series.fillna
替换为原始数据:
df['data'] = df['Date'].map(df1.set_index('Date')['data']).fillna(df['data'])
print (df)
Id timestamp data Date
30424 30665 2020-01-04 19:40:23.827 17.5 2020-01-04
31054 31295 2020-01-05 22:26:39.860 17.0 2020-01-05
32150 32391 2020-01-06 23:00:14.607 18.0 2020-01-06
33236 33477 2020-01-07 22:52:56.757 18.0 2020-01-07
34314 34555 2020-01-08 20:45:48.927 19.0 2020-01-08
35592 35833 2020-01-09 20:56:21.320 18.0 2020-01-09
36528 36769 2020-01-10 20:41:36.323 19.5 2020-01-10
37054 37295 2020-01-11 19:35:50.553 18.5 2020-01-11
37652 37893 2020-01-12 19:28:22.823 17.0 2020-01-12
38828 39069 2020-01-13 23:48:12.533 21.5 2020-01-13
40004 40245 2020-01-14 22:50:56.873 18.5 2020-01-14
详细信息:
print (df['Date'].map(df1.set_index('Date')['data']))
30424 NaN
31054 NaN
32150 NaN
33236 NaN
34314 19.0
35592 NaN
36528 NaN
37054 NaN
37652 NaN
38828 NaN
40004 NaN
Name: Date, dtype: float64