Question

我正在尝试将以下匹配正则表达式的输出作为

所有部门，例如。 ['Sector-34，Noida'，'Sec 434 Gurgaon'，'sec100']

P.S - sec47，\ n gurgaon是特例

但我怀疑输出是非常奇怪的[（''，'tor'）]

import re

string = "Sector-34, Noida is found to be awesome place I went to eat burgers there and Sec 434 Gurgoan is also good sec100 is one the finest places for outing."

match =  re.findall(r"Sec(tor)?-?\d+\s+?\w+|Sec(tor)?\s+?\d+", string, re.IGNORECASE)

print match

先谢谢！

Answer 1

这是提供预期输出的一种方法，但不是一般方式（因为您没有向我们提供一般条件）：

>>> re.findall(r'(?:[sS]ec(?:tor)?(?:-|\s+)?\d+\W?\s+[A-Z][a-z]+)|[sS]ec(?:tor)?\d+', string)
['Sector-34, Noida', 'Sec 434 Gurgoan', 'sec100']

注意：

在这里，我使用\W（无单词字符）以匹配首次匹配中的,字符。如果您认为其他非单词字符正在拧紧，则应将其更改为,。
我们在这里有2个选项：
1. (?:[sS]ec(?:tor)?(?:-|\s+)?\d+\W?\s+[A-Z][a-z]+)
2. [sS]ec(?:tor)?\d+

正如你可以看到的第二部分我没有考虑扇区和数字之后的单词，如果你认为之后可能有一个单词，那么你可以在那之后添加(?:\s+[A-Z][a-z]+)?。

Answer 2

你可以去：

import re

rx = re.compile(r'(\b[Ss]ec(?:tor)?[- ]?\d+\b[,\s]*\b\w+\b)')

string = """
Sector-34, Noida is found to be awesome place I went to eat burgers there and Sec 434 Gurgoan is also good sec47, 
gurgaon is one the finest places for outing.
"""

sectors = [match.group(1).replace("\n", "") \
            for match in rx.finditer(string)]
print(sectors)
# ['Sector-34, Noida', 'Sec 434 Gurgoan', 'sec47, gurgaon']

否则，请提供其他信息/部门。

Python正则表达式找到所有特定的子串

2 个答案: