我需要从此循环的结果中提取邮政编码。

时间:2018-10-04 02:45:03

标签: python regex python-3.x

我该怎么做才能只获取邮政编码,而不是整个地址?现在,它显示了一个包含邮政编码的完整地址,是否可以提取该邮政编码?

import pandas as pd
import matplotlib.pyplot as plt
import numpy as py
from tabulate import tabulate
from geopy.geocoders import Nominatim
geolocator = Nominatim()

my_data = pd.read_csv('dt/TrafficCounts_OpenData_wm.csv')


geolocator = Nominatim(user_agent="my_application")
sub_set = my_data[["POINT_Y","POINT_X"]]
count = 0
for y in sub_set.itertuples() :
    mypoint = str(y[1]) + ' ,' + str(y[2])
    print(mypoint)
    location = geolocator.reverse(mypoint)
    print(location)
    if count == 5 : break
    count +=1

2 个答案:

答案 0 :(得分:0)

由于邮政编码始终是地址中的最后5位数字或5加4位数字,因此您可以使用以下正则表达式从location变量中存储的地址中提取邮政编码:

import re
zipcode = re.search(r'\d{5}(?:-\d{4})?(?=\D*$)', location).group()

答案 1 :(得分:0)

如果您不了解正则表达式,我想您可以做类似的事情,但是您应该了解它们,它们会为您提供更可靠的行为。

data ='''29.607416999999998 ,-95.114007 Pinebrook KinderCare, 4422,Clear Lake City Boulevard, Houston, Harris County, Texas,77059,USA
29.74770501 ,-95.39656199 2345, Commonwealth Street, Houston, Harris County, Texas, 77006, USA
29.707028 ,-95.59624701 Hastings Ninth Grade Center, 6750, Cook Road, Houston, Harris County, Texas, 77072, USA 
29.59038673 ,-95.47975719 6333, Court Road, Houston, Fort Bend County, Texas, 77053, USA
29.67591366 ,-95.32867835 7084, Crestmont Street, Houston, Harris County, Texas, 77033, USA'''

dl = data.split('USA')
# print(dl)


# 1)

zip_code_lst = []
for addrs in dl:
    zip_found = addrs.rstrip(', ')[-5:] # --> 77006,whitspace --> 77006
    if len(zip_found) == 5:
        zip_code_lst.append(zip_found)

print(zip_code_lst) # ['77059', '77006', '77072', '77053', '77033']


# 2)

zip_code_lst_comp =  [ addrs.rstrip(', ')[-5:] for addrs in dl ]

print(zip_code_lst_comp) # ['77059', '77006', '77072', '77053', '77033', '']