我正在努力做一个简单的'查找另一个数据帧的缺失值:
somedict = {'col1':['a1','b2','c3','d4','d5','d6'], 'Col2':['a','b','c','b','e','a'], 'Col3':[33,56,74,55,99,86], 'Col4':['','',3,'',5,'']}
dfa = pd.DataFrame(somedict)
和
otherdic = {'Col2':['a','b'], 'Col4':['NEW', 'ALSONEW']}
dfb = pd.DataFrame(otherdic)
所以我明白了 dfb和dfa:
Col2 Col4
0 a NEW
1 b ALSONEW
Col2 Col3 Col4 col1
0 a 33 a1
1 b 56 b2
2 c 74 3 c3
3 b 55 d4
4 e 99 5 d5
5 a 86 d6
我正在寻找的是
Col2 Col3 Col4 col1
0 a 33 NEW a1
1 b 56 ALSONEW b2
2 c 74 3 c3
3 b 55 ALSONEW d4
4 e 99 5 d5
5 a 86 NEW d6
我试过了:
pd.merge(dfa, dfb, on='Col2', how='left')
产生
Col2 Col3 Col4_x col1 Col4_y
0 a 33 a1 NEW
1 b 56 b2 ALSONEW
2 c 74 3 c3 NaN
3 b 55 d4 ALSONEW
4 e 99 5 d5 NaN
5 a 86 d6 NEW
我做出错误的假设,即合并应该知道'列Col4在名称中匹配?
任何帮助赞赏。感谢。
答案 0 :(得分:1)
单行方式,将Col4
空白''
替换为dfb
Col2
与Col4
的映射。
In [499]: dfa.loc[dfa['Col4']=='', 'Col4'] = dfa['Col2'].map(dfb.set_index('Col2')['Col4'])
In [500]: dfa
Out[500]:
Col2 Col3 Col4 col1
0 a 33 NEW a1
1 b 56 ALSONEW b2
2 c 74 3 c3
3 b 55 ALSONEW d4
4 e 99 5 d5
5 a 86 NEW d6
详细
In [485]: mapping = dfb.set_index('Col2')['Col4']
In [486]: mapping
Out[486]:
Col2
a NEW
b ALSONEW
Name: Col4, dtype: object
In [487]: dfa['Col2'].map(mapping)
Out[487]:
0 NEW
1 ALSONEW
2 NaN
3 ALSONEW
4 NaN
5 NEW
Name: Col2, dtype: object
In [488]: dfa.loc[dfa['Col4'] == '', 'Col4'] = dfa['Col2'].map(mapping)
In [489]: dfa
Out[489]:
Col2 Col3 Col4 col1
0 a 33 NEW a1
1 b 56 ALSONEW b2
2 c 74 3 c3
3 b 55 ALSONEW d4
4 e 99 5 d5
5 a 86 NEW d6
答案 1 :(得分:0)
new = dfa.Col4.mask(
dfa.Col4.eq(''),
dfa.Col2.map(dict(dfb.values))
)
dfa.assign(Col4=new)
Col2 Col3 Col4 col1
0 a 33 NEW a1
1 b 56 ALSONEW b2
2 c 74 3 c3
3 b 55 ALSONEW d4
4 e 99 5 d5
5 a 86 NEW d6