假设我有字符串
'apples are red. this apple is green. pears are sometimes red, but not usually. pears are green. apples are yummy. lizards are green.'
我希望使用正则表达式来拉动该字符串中的句子,首先提到苹果或梨,然后是颜色,红色或绿色。所以我基本上想要一个返回的列表:
["apples are red.", "this apple is green.", "pears are sometimes red, but not usually.", pears are green."]
我可以用苹果和梨或绿色和红色来表达正则表达式,例如
re.findall(r'([^.]*?apple[^.]*|[^.]*?pear[^.]*)', string)
和
re.findall(r'([^.]*?red[^.]*|[^.]*?green[^.]*)', string)
但是当我希望水果(苹果/梨)首先出现在字符串中后跟颜色和句子后面的某个点时,我怎么把这两个放在一起呢?
答案 0 :(得分:0)
您可以使用parentheses对子表达式进行分组:
re.findall(r"[^.]*\b(?:apple|pear)[^.]*\b(?:red|green)\b[^.]*\.", string)
例如:
>>> import re
>>> a = 'apples are red. this apple is green. pears are sometimes red, but not usually. pears are green. apples are yummy. lizards are green.'
>>> re.findall(r"[^.]*\b(?:apple|pear)[^.]*\b(?:red|green)\b[^.]*\.", a)
['apples are red.', ' this apple is green.',
' pears are sometimes red, but not usually.', ' pears are green.']
答案 1 :(得分:0)
使用此模式(?:^|\b)(?=[^.]*(?:apple|pear)[^.]*(?:red|green))([^.]+\.)
Demo
答案 2 :(得分:0)
我建议你阅读NLTK(自然语言工具包)。它是用于文本处理的python包