我有两个数据帧,我需要根据df1在df2中计算值列
DF1
col1 col2 col3 value
Chicago M 26 54
NY M 20 21
...
DF2
col1 col2 col3 value
NY M 20 ? (should be 21 based on above dataframe)
我正在做下面的循环,这很慢
for index, row in df2.iterrows():
df1[(df1['col1'] == row['col1'])
& (df1['col2'] == df1['col2'])
&(df1['col3'] == df1['col3'])]['value'].values[0]
如何更有效/更快地完成?
答案 0 :(得分:0)
您需要merge
左侧连接按列进行比较:
print (df2)
col1 col2 col3 value
0 LA M 20 20
1 NY M 20 ?
df = pd.merge(df2, df1, on=['col1','col2','col3'], how='left', suffixes=('','_'))
它创建具有匹配值的新列value_1
。上次使用原始值fillna
替换NaN
,最后删除辅助列value_
:
print (df)
col1 col2 col3 value value_
0 LA M 20 20 NaN
1 NY M 20 ? 21.0
df['value'] = df['value_'].fillna(df['value'])
df = df.drop('value_', axis=1)
print (df)
col1 col2 col3 value
0 LA M 20 20
1 NY M 20 21