作为API响应的一部分,我收到一个地址,该地址是一个字符串,我需要将其拆分以便放入我们自己的数据库中。该地址如下所示:
'teststreet 1, 1234 AZ City, Country'
'teststreet 9C, 1235 AZ City, Country'
'J. teststreet 1, 1243 AZ City, Country'
我很难将字符串拆分成单独的部分。
街道名称本身可以分为两个部分,并且门牌号可能也包含字母,如果存在的话也应该分开,这一事实在很大程度上困扰着我。
我尝试了几种方法来解决此问题:
adressdetails = row['Adresgegevens'].split(",")
adress2 = [x.replace(",", "") for x in adress2]
split_house_number = re.split(r'(\d)', adress2[2])
house_number = split_house_number[1]
house_number_extension = split_house_number[2]
我需要将响应地址分为以下变量:
streetname
house_number
house_number_extension
zip_code
city
country
示例:
"teststreet 1C, 1234 AZ New York, Australia"
进入->
teststreet
1
C
1234 AZ
New York
Australia
示例2:
"Jh. teststreet 1B, 9870 GH Amsterdam, Canada"
进入->
Jh. teststreet
1
B
9870 GH
Amsterdam
Canada
示例3:
"teststreet 45, 9867 HJ Rotterdam, Germany"
进入->
teststreet
45
null
9867 HJ
Rotterdam
Germany
答案 0 :(得分:2)
您可以使用正则表达式(regex101),但我不知道您的所有其他字符串,因此可能需要进行调整:
lst = [
"teststreet 1C, 1234 AZ New York, Australia",
"Jh. teststreet 1B, 9870 GH Amsterdam, Canada",
"teststreet 45, 9867 HJ Rotterdam, Germany"
]
import re
for test_case in lst:
m = re.findall(r'(.*)\s+(\d+)([A-Z]*)\s*,\s*(\d+\s+[A-Z]+)\s*,?\s*(.*?)\s*,\s*(.*)\s*', test_case)
if m:
streetname, house_number, house_number_extension, zip_code, city, country = m[0]
print('Streetname:', streetname)
print('House Number:', house_number)
print('House Number Ext.:', house_number_extension)
print('Zip Code:', zip_code)
print('City:', city)
print('Country:', country)
print('*' * 80)
打印:
Streetname: teststreet
House Number: 1
House Number Ext.: C
Zip Code: 1234 AZ
City: New York
Country: Australia
********************************************************************************
Streetname: Jh. teststreet
House Number: 1
House Number Ext.: B
Zip Code: 9870 GH
City: Amsterdam
Country: Canada
********************************************************************************
Streetname: teststreet
House Number: 45
House Number Ext.:
Zip Code: 9867 HJ
City: Rotterdam
Country: Germany
********************************************************************************