我有一堆像这样的地址:
df['street'] =
5311 Whitsett Ave 34
355 Sawyer St
607 Hampshire Rd #358
342 Old Hwy 1
267 W Juniper Dr 402
我想要做的是删除地址街道末端的那些数字,以获得:
df['street'] =
5311 Whitsett Ave
355 Sawyer St
607 Hampshire Rd
342 Old Hwy 1
267 W Juniper Dr
我有这样的正则表达式:
df['street'] = df.street.str.replace(r"""\s(?:dr|ave|rd)[^a-zA-Z]\D*\d+$""", '', case=False)
给了我这个:
df['street'] =
5311 Whitsett
355 Sawyer St
607 Hampshire
342 Old Hwy 1
267 W Juniper
它从原来的街道地址中删除了“Ave”,“Rd”和“Dr”字样。有没有办法保留正则表达式模式的一部分(在我的情况下,这是'Ave','Rd','Dr'并替换其余的?
修改
注意地址342 Old Hwy 1
。我不想在这种情况下取出数字。这就是为什么我指定了模式('Ave','Rd','Dr'等)以更好地控制谁被改变。
答案 0 :(得分:1)
df_street = '''
5311 Whitsett Ave 34
355 Sawyer St
607 Hampshire Rd #358
342 Old Hwy 1
267 W Juniper Dr 402
'''
# digits on the end are preceded by one of ( Ave, Rd, Dr), space,
# may be preceded by a #, and followed by a possible space, and by the newline
df_street = re.sub(r'(Ave|Rd|Dr)\s+#?\d+\s*\n',r'\1\n', df_street,re.MULTILINE|re.IGNORECASE)
print(df_street)
5311 Whitsett Ave
355 Sawyer St
607 Hampshire Rd
342 Old Hwy 1
267 W Juniper Dr
答案 1 :(得分:0)
您应该使用以下正则表达式:
>>> import re
>>> example_str = "607 Hampshire Rd #358"
>>> re.sub(r"\s*\#?[^\D]+\s*$", r"", example_str)
'607 Hampshire Rd'