我注意到大多数区域都没有出现在Area列中,而是出现在city列中,所以城市的信息我想在df1中填写密码。
我有两个数据框
Df1 =
City area Pincode
Pune Bibwewadi 159963
Mumbai Bandra(W) 123456
Bibwewadi
Bandra(E)
Badlapur Badlapur 752147
Bhiwandi Bhiwandi 784512
Df2 =
Area Pincode
Bibwewadi 159963
Badlapur 752147
Parvati 784596
Baner 411007
Bandra(E) 326598
在df1中,某些区域在“城市”列中,我想在df2的帮助下使用熊猫填充df1中pincode列的NaN值
预期输出
df1=
City area Pincode
Pune Bibwewadi 159963
Mumbai Bandra(W) 123456
Bibwewadi 159963
Bandra(E) 326598
Badlapur Badlapur 752147
Bhiwandi Bhiwandi 784512
答案 0 :(得分:1)
您可以使用pandas.Series.map,
如果值不是NaN
值,则此选项可能有用。
c=df1['Pincode'].isnull()|df1['Pincode'].eq('')
df1=df1.replace('Bandra – E','Bandra(E)')
df1.loc[c,'Pincode']=df1.loc[c,'City'].map(df2.set_index('Area')['Pincode'])
print(df1)
City area Pincode
0 Pune Bibwewadi 159963.0
1 Mumbai Bandra(W) 123456.0
2 Bibwewadi None 159963.0
3 Bandra(E) None 326598.0
4 Badlapur Badlapur 752147.0
5 Bhiwandi Bhiwandi 784512.0
df1=df1.replace('Bandra – E','Bandra(E)').set_index('City')
df1['Pincode']=df1['Pincode'].fillna(df2.set_index('Area')['Pincode'])
df1.reset_index(inplace=True)
注意:
检查一下数据框中的缺失值类型,以及列标签和替换值:'Bandra – E'