我有一个数据框,其地址为列之一,地址有时可能包含ZIP / PIN码,有时则不包含。
数据框:
BANK ADDRESS
ABU DHABI COMMERCIAL BANK REHMAT MANZIL, V. N. ROAD,CURCHGATE, MUMBAI - 400020
VIJAYA BANK BOKARO CITY JHARKHAND,15/D1 HOTEL BLUE-,DIAMOND COMPLEX,BOKARO CITY,JHARKHAND,JHARKHAND
ALLAHABAD BANK DANKIN GANJ DIST. MIRZAPUR - 231 001 UTTAR PRADESH
如何使用以下信息仅提取ZIP / PIN码:
1. ZIP/PIN code are 6 digits (INDIAN ZIP/PIN CODE)
2. ZIP are sometimes split by 3 digits, 560 015
3. ZIP are sometimes separated by -, eg: 560-015
以下是我目前的代码:
df$zip <- stri_extract_all_regex(df$ADDRESS, "(?<!\\d)\\d{6}(?!\\d)")
但是上面的代码没有考虑我逻辑的第2点和第3点,即处理ZIP拆分&#34;&#34;或&#34; - &#34;
答案 0 :(得分:0)
但是上面的代码没有考虑我逻辑的第2点和第3点,也就是说 通过&#34;&#34;来处理ZIP拆分或&#34; - &#34;
m = regexpr("\\<\\d{3}[- ]?\\d{3}\\>", df$ADDRESS)
df$zip = substr(df$ADDRESS, m, m + attr(m, "match.length") - 1)