Question

我有一些类似

的句子

1：

” RLB显示Oubre Jr.（WAS）在法律上与Nurkic（POR）有联系，球被正确调用。”

2：

“ Nurkic（POR）保持合法防护位置并与Wall（WAS）偶然接触不会影响他的驾驶射击尝试。”

我需要使用Python正则表达式来找到名称“ Oubre Jr”。，“ Nurkic”和“ Nurkic”，“ Wall”。

p = r'\s*(\w+?)\s[(]'

使用此模式，我可以找到“ ['Nurkic'，'Wall']”，但是在句子1中，我只能找到['Nurkic']，错过了“ Oubre Jr。”。

谁可以帮助我？

Answer 1

这是一种方法：

line = "RLB shows Oubre Jr (WAS) legally ties up Nurkic (POR), and a held ball is correctly called."
results = re.findall( r'([A-Z][\w+'](?: [JS][r][.]?)?)(?= \([A-Z]+\))', line, re.M|re.I)
print(results)

['Oubre Jr', 'Nurkic']

以上逻辑将尝试匹配一个以大写字母开头的名称，该名称后面可能带有后缀Jr.或Sr.，而后跟一个{{1} }。

Answer 2

您可以使用以下正则表达式：

(?:[A-Z][a-z][\s\.a-z]*)+(?=\s\()

|-----Main Pattern-----|

详细信息：

(?:)-创建一个非捕获组
[A-Z]-捕获1个大写字母
[a-z]-捕获1个小写字母
[\s\.a-z]*-捕获空格（' '，句点（'.'）或小写字母0次以上
(?=\s\()-如果仅跟随' ('字符串，则捕获主模式

str = '''RLB shows Oubre Jr. (WAS) legally ties up Nurkic (POR), and a held ball is correctly called. 

Nurkic (POR) maintains legal guarding position and makes incidental contact with Wall (WAS) that does not affect his driving shot attempt.'''

res = re.findall( r'(?:[A-Z][a-z][\s\.a-z]*)+(?=\s\()', str )

print(res)

演示：https://repl.it/@RahulVerma8/OvalRequiredAdvance?language=python3

匹配：https://regex101.com/r/OsLTrY/1

Answer 3

您需要一种可以匹配的模式-您可能希望为句子匹配（XXX）之前的内容，并包括可能包含的“后缀”列表-您需要从来源中提取它们。

import re

suffs = ["Jr."] # append more to list

rsu   = r"(?:"+"|".join(suffs)+")? ?"

# combine with suffixes
regex = r"(\w+ "+rsu+")\(\w{3}\)"

test_str = "RLB shows Oubre Jr. (WAS) legally ties up Nurkic (POR), and a held ball is correctly called. Nurkic (POR) maintains legal guarding position and makes incidental contact with Wall (WAS) that does not affect his driving shot attempt."

matches = re.finditer(regex, test_str, re.MULTILINE)

names = []
for matchNum, match in enumerate(matches,1):
    for groupNum in range(0, len(match.groups())):
        names.extend(match.groups(groupNum))

print(names)

输出：

['Oubre Jr.', 'Nurkic ', 'Nurkic ', 'Wall ']

只要您没有名称中带有非\w的名称，此方法就应该起作用。如果您需要调整正则表达式，请以https://regex101.com/r/pRr9ZU/1作为起点。

说明：

r"(?:"+"|".join(suffs)+")? ?"->列表suffs中的所有项目通过|（或）作为非分组（？：...）串在一起，并使其可选，后跟可选空格。
r"(\w+ "+rsu+")\(\w{3}\)"->正则表达式会查找所有单词字符，然后是我们刚刚构建的可选suffs组，然后是文字(，然后是三个单词字符，再是另一个文字{{1 }}

正则表达式在句子中查找名称

3 个答案: