我有2个数据框。 df1
保存了我的数据,df2
进行了一些更正。我想用speedup
中的值替换df1
中的df2
值,其中df2
中的其他列指定df1
中的值来替换该值
df1 = pd.DataFrame({
'subject': ['English', 'Maths', 'Physics', 'English', 'Arts', 'Physics', 'English', 'PE'],
'grade': ['D', 'A', 'A', 'C', 'F', 'B', 'C', 'A'],
'date': pd.bdate_range(end='2019-12-12', periods=8)
})
df1['speedup'] = 1.0
df2 = pd.DataFrame({
'subject': ['Maths', 'Physics'],
'date': ['2019-12-04', '2019-12-10'],
'speedup': [1.1, 0.7]
})
上面的代码将生成如下所示的DataFrame:
Out[1]:
subject grade date speedup
0 English D 2019-12-03 1.0
1 Maths A 2019-12-04 1.0
2 Physics A 2019-12-05 1.0
3 English C 2019-12-06 1.0
4 Arts F 2019-12-09 1.0
5 Physics B 2019-12-10 1.0
6 English C 2019-12-11 1.0
7 PE A 2019-12-12 1.0
df2
Out[2]:
subject date speedup
0 Maths 2019-12-04 1.1
1 Physics 2019-12-10 0.7
为避免混淆,我想使df1
与df2
合并后看起来像这样:
df1 = pd.DataFrame({
'subject': ['English', 'Maths', 'Physics', 'English', 'Arts', 'Physics', 'English', 'PE'],
'grade': ['D', 'A', 'A', 'C', 'F', 'B', 'C', 'A'],
'date': pd.bdate_range(end='2019-12-12', periods=8),
'speedup': [1, 1.1, 1, 1, 1, 0.7, 1, 1]
})
我尝试了这个不起作用
df1[(df1['date'].isin(df2['date'])) & (df1['subject'].isin(df2['subject']))]['speedup'] = df2['speedup']
由于合并键中的数据时间组件,因此合并无效。
df1.merge(df2, left_on=['subject', 'date'], right_on=['subject', 'date'], suffixes=('', '_y'))
答案 0 :(得分:1)
将字符串日期转换为datetime
对象,然后执行merge
df2['date'] = pd.to_datetime(df2['date'], format='%Y-%m-%d')
df1.merge(df2, how='left', on=['subject', 'date']).fillna(method='ffill', axis=1)
答案 1 :(得分:0)
我认为不是合并两个数据帧,而是一种更有效的方法是将第二个数据帧df2
用作具有两个索引的字典。
代码如下:
##Setting the index of df2 to(date,subject)
df2.set_index(['date','subject'],inplace = True)
##This step is to make sure that the indexed date of df2 matches the date of df1
df1['date'] =df1['date'].apply(lambda x: x.strftime('%Y-%m-%d'))
##Iterating over the rows in the df1 and finding the value for speedup if it exists in the second dataframe
for i,val in df1.iterrows():
#Check for a value for the tuple (date,subject) in the map
rep_speedup = df2.loc[df1.loc[i,['date','subject']],'speedup'].values
#If the map contained a speedup, then replace the existing speedup
if len(rep_speedup) >0:
df1.loc[i,'speedup'] =rep_speedup
这将打印出df1
,如下所示:
subject grade date speedup
0 English D 2019-12-03 1.0
1 Maths A 2019-12-04 1.1
2 Physics A 2019-12-05 1.0
3 English C 2019-12-06 1.0
4 Arts F 2019-12-09 1.0
5 Physics B 2019-12-10 0.7
6 English C 2019-12-11 1.0
7 PE A 2019-12-12 1.0