我有一个字典(键,值)和一个使用熊猫的数据框。
mydict = {'KULAR LUMPUR' : 'MY',
'SINGAPORE' : 'SG',
'HONG KONG' : 'HK',
'VIETNAM': 'VN'}
和带有['Address']列的数据框
Address
0 234 JALAN ST KULAR LUMPUR MALAYSIA
1 123 BUILDING STREET SINGAPORE
2 67 CANNING VALE, HONG KONG
如果在字典键中找到子字符串,如何搜索数据框以从字典中获取值。
例如
Address Code
0 234 JALAN ST KULAR LUMPUR MALAYSIA MY
1 123 BUILDING STREET SINGAPORE SG
2 67 CANNING VALE, HONG KONG HK
答案 0 :(得分:1)
将regex
的{{3}}与带有str.extract
的字典键一起使用:
df = pd.DataFrame({'Address': ['234 JALAN ST KULAR LUMPUR MALAYSIA',
'123 BUILDING STREET SINGAPORE',
'67 CANNING VALE, HONG KONG']})
print (df)
Address
0 234 JALAN ST KULAR LUMPUR MALAYSIA
1 123 BUILDING STREET SINGAPORE
2 67 CANNING VALE, HONG KONG
mydict = {'KULAR LUMPUR' : 'MY',
'SINGAPORE' : 'SG',
'HONG KONG' : 'HK',
'VIETNAM': 'VN'}
pat = '|'.join(r"\b{}\b".format(x) for x in mydict.keys())
df['Code'] = df['Address'].str.extract('('+ pat + ')', expand=False).map(mydict)
print (df)
Address Code
0 234 JALAN ST KULAR LUMPUR MALAYSIA MY
1 123 BUILDING STREET SINGAPORE SG
2 67 CANNING VALE, HONG KONG HK
说明:
print (pat)
\bKULAR LUMPUR\b|\bSINGAPORE\b|\bHONG KONG\b|\bVIETNAM\b
\b
被称为\b
之间的匹配词的词边界
|
用于正则表达式OR