从地址行提取邮政编码

时间:2015-12-07 09:17:24

标签: regex r dataframe extract fuzzy-comparison

我有一个数据框,其地址为列之一,地址有时可能包含ZIP / PIN码,有时则不包含。

数据框:

BANK                        ADDRESS                                                    
ABU DHABI COMMERCIAL BANK   REHMAT MANZIL, V. N. ROAD,CURCHGATE, MUMBAI - 400020     
VIJAYA BANK                 BOKARO CITY JHARKHAND,15/D1 HOTEL BLUE-,DIAMOND COMPLEX,BOKARO CITY,JHARKHAND,JHARKHAND
ALLAHABAD BANK              DANKIN GANJ DIST. MIRZAPUR - 231 001 UTTAR PRADESH

如何使用以下信息仅提取ZIP / PIN码:

 1. ZIP/PIN code are 6 digits (INDIAN ZIP/PIN CODE)
 2. ZIP are sometimes split by 3 digits, 560 015
 3. ZIP are sometimes separated by -, eg: 560-015

以下是我目前的代码:

 df$zip <- stri_extract_all_regex(df$ADDRESS, "(?<!\\d)\\d{6}(?!\\d)")

但是上面的代码没有考虑我逻辑的第2点和第3点,即处理ZIP拆分&#34;&#34;或&#34; - &#34;

1 个答案:

答案 0 :(得分:0)

  

但是上面的代码没有考虑我逻辑的第2点和第3点,也就是说   通过&#34;&#34;来处理ZIP拆分或&#34; - &#34;

 m = regexpr("\\<\\d{3}[- ]?\\d{3}\\>", df$ADDRESS)
 df$zip = substr(df$ADDRESS, m, m + attr(m, "match.length") - 1)