我有两个大小不同的数据框。
timestamp
2018-01-01 0.0
2018-02-01 3.0
有地址,没有邮政编码。
df1
包含地址和邮政编码。
我正在尝试使用df2
将地址从df1
匹配到df2
,如果匹配,请将相应的邮政编码带到np.where
。
但是我刚刚意识到这不适用于不同大小的数据框。
没有邮政编码的第一个数据框:
df1
我要从中获取邮政编码的第二个数据框:
df1 = pd.DataFrame({'address1':['1 o\'toole st','2 main st','3 high street','5 foo street','10 foo street'],
'address2':['town1',np.nan,np.nan,'Bartown',np.nan],
'address3':[np.nan,'village','city','county2','county3']})
df1['zipcode']=''
print(df1)
address1 address2 address3 zipcode
0 1 o'toole st town1 NaN
1 2 main st NaN village
2 3 high street NaN city
3 5 foo street Bartown county2
4 10 foo street NaN county3
使用df2 = pd.DataFrame({'address1':['1 o\'toole st','2 main st','7 mill street','5 foo street','10 foo street','asda'],
'address2':['town1','village','city','Bartown','county3','efsefs'],
'address3':[np.nan,np.nan,np.nan,'county2','USA','asdasd'],
'zipcode': ['er45','qw23','rt67','yu89','yu83','aedsa']})
print(df2)
address1 address2 address3 zipcode
0 1 o'toole st town1 NaN er45
1 2 main st village NaN qw23
2 7 mill street city NaN rt67
3 5 foo street Bartown county2 yu89
4 10 foo street county3 USA yu83
5 asda efsefs asdasd aedsa
填充df1['zipcode']
列。如果两个地址都匹配,则返回np.where
,否则返回df2['zipcode']
:
'no_match'
是否可以使用“ np.where”和不同大小的数据帧来做到这一点?还是有一种更好的方法来搜索匹配项并显示邮政编码?
答案 0 :(得分:1)
您可以使用合并:
df_new = df1.merge(df2[['address1', 'zipcode']], on='address1', how='left')
df_new = df_new.fillna('no_match')
答案 1 :(得分:1)
将Series.map
与key
创建的新列fillna
配合使用,因为没有匹配项会丢失值,因此最后添加fillna('no_match')
:
df1['key'] = df1['address1'] + df1['address2'].fillna(df1['address3'])
df2['key'] = df2['address1'] + df2['address2'].fillna(df2['address3'])
df1['zipcode'] = df1['key'].map(df2.set_index('key')['zipcode']).fillna('no_match')
print (df1)
address1 address2 address3 key zipcode
0 1 o'toole st town1 NaN 1 o'toole sttown1 er45
1 2 main st NaN village 2 main stvillage qw23
2 3 high street NaN city 3 high streetcity no_match
3 5 foo street Bartown county2 5 foo streetBartown yu89
4 10 foo street NaN county3 10 foo streetcounty3 yu83