熊猫:搜索子字符串是否包含字典中的键,并返回值

时间:2018-11-12 06:40:13

标签: python string pandas dictionary

我有一个字典(键,值)和一个使用熊猫的数据框。

mydict = {'KULAR LUMPUR' : 'MY',
            'SINGAPORE' : 'SG',
            'HONG KONG' : 'HK',
            'VIETNAM': 'VN'}

和带有['Address']列的数据框

                              Address
0  234 JALAN ST KULAR LUMPUR MALAYSIA
1       123 BUILDING STREET SINGAPORE
2          67 CANNING VALE, HONG KONG

如果在字典键中找到子字符串,如何搜索数据框以从字典中获取值。

例如

                              Address Code
0  234 JALAN ST KULAR LUMPUR MALAYSIA   MY
1       123 BUILDING STREET SINGAPORE   SG
2          67 CANNING VALE, HONG KONG   HK

1 个答案:

答案 0 :(得分:1)

regex的{​​{3}}与带有str.extract的字典键一起使用:

df = pd.DataFrame({'Address': ['234 JALAN ST KULAR LUMPUR MALAYSIA', 
                               '123 BUILDING STREET SINGAPORE', 
                               '67 CANNING VALE, HONG KONG']})

print (df)
                              Address
0  234 JALAN ST KULAR LUMPUR MALAYSIA
1       123 BUILDING STREET SINGAPORE
2          67 CANNING VALE, HONG KONG

mydict = {'KULAR LUMPUR' : 'MY',
            'SINGAPORE' : 'SG',
            'HONG KONG' : 'HK',
            'VIETNAM': 'VN'}

pat = '|'.join(r"\b{}\b".format(x) for x in mydict.keys())
df['Code'] = df['Address'].str.extract('('+ pat + ')', expand=False).map(mydict)

print (df)
                              Address Code
0  234 JALAN ST KULAR LUMPUR MALAYSIA   MY
1       123 BUILDING STREET SINGAPORE   SG
2          67 CANNING VALE, HONG KONG   HK

说明

print (pat)
\bKULAR LUMPUR\b|\bSINGAPORE\b|\bHONG KONG\b|\bVIETNAM\b

\b被称为\b之间的匹配词的词边界
|用于正则表达式OR