给出文本,我想找到未知单词之前出现的单词
text="the women marathon unknown introduced at the summer olympics los angeles usa and unknown won"
items=re.finditer('unknown',text). #as there are 2 unknown
for i in items:
print(i.start()) #to get index of 2 unknown
输出为
19
81
现在如何分别提取出现在两个未知数之前的单词?
对于第一个未知的女人,我应该去找女人。
对于第二个未知数,我应该去美国,然后
答案 0 :(得分:1)
此表达式可能与此处所需的表达式接近:
([\s\S]*?)(\bunknown\b)
import re
regex = r"([\s\S]*?)(unknown)"
test_str = "the women marathon unknown introduced at the summer olympics los angeles usa and unknown won"
print(re.findall(regex, test_str, re.MULTILINE))
import re
regex = r"([\s\S]*?)(unknown)"
test_str = "the women marathon unknown introduced at the summer olympics los angeles usa and unknown won"
matches = re.finditer(regex, test_str, re.MULTILINE)
for matchNum, match in enumerate(matches, start=1):
print ("Match {matchNum} was found at {start}-{end}: {match}".format(matchNum = matchNum, start = match.start(), end = match.end(), match = match.group()))
for groupNum in range(0, len(match.groups())):
groupNum = groupNum + 1
print ("Group {groupNum} found at {start}-{end}: {group}".format(groupNum = groupNum, start = match.start(groupNum), end = match.end(groupNum), group = match.group(groupNum)))
在this demo的右上角对表达式进行了说明,如果您想探索/简化/修改它,在this link中,您可以观察它如何与某些示例输入步骤匹配一步一步,如果您喜欢。
答案 1 :(得分:1)
简短方法:
import re
text = "the women marathon unknown introduced at the summer olympics los angeles usa and unknown won"
matches = re.finditer('(\S+\s+){2}(?=unknown)', text)
for m in matches:
print(m.group())
输出:
women marathon
usa and
答案 2 :(得分:1)
不带re
和带itertools.groupby
(doc)的版本:
from itertools import groupby
text="the women marathon unknown introduced at the summer olympics los angeles usa and unknown won"
for v, g in groupby(text.split(), lambda k: k=='unknown'):
if v:
continue
l = [*g]
if len(l) > 1:
print(l[-2:])
打印:
['women', 'marathon']
['usa', 'and']