如何基于多列有效地在数据帧中进行查找

时间:2017-08-09 06:50:36

标签: python pandas dataframe

我有两个数据帧,我需要根据df1在df2中计算值列

DF1

col1    col2 col3 value

Chicago M     26   54

NY      M     20   21
...

DF2

col1 col2 col3 value

NY     M    20   ? (should be 21 based on above dataframe)

我正在做下面的循环,这很慢

for index, row in df2.iterrows():
    df1[(df1['col1'] == row['col1']) 
                     & (df1['col2'] == df1['col2'])
                    &(df1['col3'] == df1['col3'])]['value'].values[0]

如何更有效/更快地完成?

1 个答案:

答案 0 :(得分:0)

您需要merge左侧连接按列进行比较:

print (df2)
  col1 col2  col3 value
0   LA    M    20    20
1   NY    M    20     ?

df = pd.merge(df2, df1, on=['col1','col2','col3'], how='left', suffixes=('','_'))

它创建具有匹配值的新列value_1。上次使用原始值fillna替换NaN,最后删除辅助列value_

print (df)
  col1 col2  col3 value  value_
0   LA    M    20    20     NaN
1   NY    M    20     ?    21.0

df['value'] = df['value_'].fillna(df['value'])
df = df.drop('value_', axis=1)
print (df)
  col1 col2  col3 value
0   LA    M    20    20
1   NY    M    20    21