pandas dataframe merge lookup - 结果中的多个列

时间:2017-09-15 19:06:40

标签: pandas dataframe merge

我正在努力做一个简单的'查找另一个数据帧的缺失值:

somedict = {'col1':['a1','b2','c3','d4','d5','d6'], 'Col2':['a','b','c','b','e','a'], 'Col3':[33,56,74,55,99,86], 'Col4':['','',3,'',5,'']}
dfa = pd.DataFrame(somedict)

otherdic = {'Col2':['a','b'], 'Col4':['NEW', 'ALSONEW']}
dfb = pd.DataFrame(otherdic)

所以我明白了 dfb和dfa:

 Col2   Col4
0   a   NEW
1   b   ALSONEW

 Col2   Col3    Col4    col1
0   a     33            a1
1   b     56            b2
2   c     74       3    c3
3   b     55            d4
4   e     99       5    d5
5   a     86            d6

我正在寻找的是

 Col2   Col3    Col4    col1
0   a     33     NEW    a1
1   b     56  ALSONEW   b2
2   c     74       3    c3
3   b     55  ALSONEW   d4
4   e     99       5    d5
5   a     86     NEW    d6

我试过了:

pd.merge(dfa, dfb, on='Col2', how='left')

产生

    Col2    Col3    Col4_x  col1    Col4_y
0   a       33                a1    NEW
1   b       56                b2    ALSONEW
2   c       74          3     c3    NaN
3   b       55                d4    ALSONEW
4   e       99          5     d5    NaN
5   a       86                d6    NEW

我做出错误的假设,即合并应该知道'列Col4在名称中匹配?
任何帮助赞赏。感谢。

2 个答案:

答案 0 :(得分:1)

单行方式,将Col4空白''替换为dfb Col2Col4的映射。

In [499]: dfa.loc[dfa['Col4']=='', 'Col4'] = dfa['Col2'].map(dfb.set_index('Col2')['Col4'])

In [500]: dfa
Out[500]:
  Col2  Col3     Col4 col1
0    a    33      NEW   a1
1    b    56  ALSONEW   b2
2    c    74        3   c3
3    b    55  ALSONEW   d4
4    e    99        5   d5
5    a    86      NEW   d6

详细

In [485]: mapping = dfb.set_index('Col2')['Col4']

In [486]: mapping
Out[486]:
Col2
a        NEW
b    ALSONEW
Name: Col4, dtype: object

In [487]: dfa['Col2'].map(mapping)
Out[487]:
0        NEW
1    ALSONEW
2        NaN
3    ALSONEW
4        NaN
5        NEW
Name: Col2, dtype: object

In [488]: dfa.loc[dfa['Col4'] == '', 'Col4'] = dfa['Col2'].map(mapping)

In [489]: dfa
Out[489]:
  Col2  Col3     Col4 col1
0    a    33      NEW   a1
1    b    56  ALSONEW   b2
2    c    74        3   c3
3    b    55  ALSONEW   d4
4    e    99        5   d5
5    a    86      NEW   d6

答案 1 :(得分:0)

new = dfa.Col4.mask(
    dfa.Col4.eq(''),
    dfa.Col2.map(dict(dfb.values))
)
dfa.assign(Col4=new)

  Col2  Col3     Col4 col1
0    a    33      NEW   a1
1    b    56  ALSONEW   b2
2    c    74        3   c3
3    b    55  ALSONEW   d4
4    e    99        5   d5
5    a    86      NEW   d6