我正在尝试过滤街道名称并获取我想要的部分。名称有多种格式。以下是一些例子和我想要的内容。
Car Cycle 5 B Ap 1233 < what I have
Car Cycle 5 B < what I want
Potato street 13 1 AB < what I have
Potato street 13 < what I want
Chrome Safari 41 Ap 765 < what I have
Chrome Safari 41 < what I want
Highstreet 53 Ap 2632/BH < what I have
Highstreet 53 < what I want
Something street 91/Daniel < what I have
Something street 91 < what I want
通常我想要的是街道名称(1-4个名字),后面是街道号码(如果有的话),然后是街道字母(1个字母)(如果有的话)。我只是无法让它正常工作。
这是我的代码(我知道,它很糟糕):
import re
def address_regex(address):
regex1 = re.compile("(\w+ ){1,4}(\d{1,4} ){1}(\w{1} )")
regex2 = re.compile("(\w+ ){1,4}(\d{1,4} ){1}")
regex3 = re.compile("(\w+ ){1,4}(\d){1,4}")
regex4 = re.compile("(\w+ ){1,4}(\w+)")
s1 = regex1.search(text)
s2 = regex2.search(text)
s3 = regex3.search(text)
s4 = regex4.search(text)
regex_address = ""
if s1 != None:
regex_address = s1.group()
elif s2 != None:
regex_address = s2.group()
elif s3 != None:
regex_address = s3.group()
elif s4 != None:
regex_address = s4.group()
else:
regex_address = address
return regex_address
我正在使用Python 3.4
答案 0 :(得分:3)
我会在这里走出困境并假设在你的最后一个例子中你真的想要赶上91号,因为没有意义不这样做。
这是一个能够捕捉到你所有例子(以及你的最后一个,包括91个)的解决方案:
^([\p{L} ]+ \d{1,4}(?: ?[A-Za-z])?\b)
^
在字符串开头[\p{L} ]+
属于&#34;字母&#34;的空格或unicode字符的字符类;类别,1-infinity时间\d{1,4}
数字,1-4次(?: ?[A-Za-z])?
非捕获组的可选空格和单个字母,0-1次捕获组1是整个地址。我并不完全理解你的分组背后的逻辑,但你可以根据自己的喜好对它进行分组。
答案 1 :(得分:0)
这适用于您提供的5个样本
^([a-z]+\s+)*(\d*(?=\s))?(\s+[a-z])*\b
将多线模式和不区分大小写设置为开。如果你的正则表达式支持它,那就是(?im)。
答案 2 :(得分:0)
也许你喜欢更易读的Python版本(没有正则表达式):
import string
names = [
"Car Cycle 5 B Ap 1233",
"Potato street 13 1 AB",
"Chrome Safari 41 Ap 765",
"Highstreet 53 Ap 2632/BH",
"Something street 91/Daniel",
]
for name in names:
result = []
words = name.split()
while any(words) and all(c in string.ascii_letters for c in words[0]):
result += [words[0]]
words = words[1:]
if any(words) and all(c in string.digits for c in words[0]):
result += [words[0]]
words = words[1:]
if any(words) and words[0] in string.ascii_uppercase:
result += [words[0]]
words = words[1:]
print " ".join(result)
输出:
Car Cycle 5 B
Potato street 13
Chrome Safari 41
Highstreet 53
Something street