我试图弄清楚如何找到包含某个单词的句子,所以我们先说单词“哇”,然后再输入以下三个字符串
\nOkay hold on. This is pretty wow in here. Okay.\n
\nThis is super wow. Doesn't get much more wow than that.\n
\nHold up. wow.\n
\nOkay wow. Just wow!\n
将分别产生以下内容:
This is pretty wow in here
This is super wow.
wow.
Okay wow.
我正在Python3中执行此操作,因此我可以编写if语句,但是它很凌乱,我希望避免这样做。这是我的代码,了解正在工作但开始失败的情况。也许我在Regex上太糟糕了,并且让这个事情变得复杂了。
m = re.search('(?:(\.\s[A-Z]))(?=(.*)' + name+ '([^a-z^A-Z]))([^.]*)(\.\s[A-Z])', node.getIntroText())
if m == None:
m = re.search('(?:(\.\s[A-Z]))(?=(.*)' + name+ '([^a-z^A-Z]))(.*)(\.\s[A-Z])', node.getIntroText())
if m == None:
m = re.search('(?:([\r\n]))(?=(.*)' + name+ '([^a-z^A-Z]))([^.]*)(\.\s[A-Z])', node.getIntroText())
基本上,我想捕获“名称”之前的(第一个句点或换行符)实例,直到一个句点的下一个实例,再跟一个(空格和字母以外的东西)或换行符。
答案 0 :(得分:1)
将我的评论转换为答案。您可以使用此正则表达式
>>> reg = re.compile(r"^(?:(?:(?!\bwow\b)[^.\n])*\. +)*((?:[a-z][^.\n]*?)?\bwow\b[^.\n]*)(?=\.)", re.MULTILINE | re.IGNORECASE)
>>> test_str = ("\n"
... "Okay hold on. This is pretty wow in here. Okay.\n\n"
... "This is super wow. Doesn't get much more wow than that.\n\n"
... "Hold up. wow.\n\n"
... "Okay wow. Just Wow!\n")
>>> print ( reg.findall(test_str) )
['This is pretty wow in here', 'This is super wow', 'wow', 'Okay wow']
RegEx说明:
^
:开始(?:(?:(?!\bwow\b)[^.\n])*\. +)*
:匹配0个或多个不包含wow
的句子。((?:[a-z][^.\n]*?)?\bwow\b[^.\n]*)
:匹配包含单词wow
的句子(?=\.)
:声明我们在下一个位置有点re.MULTILINE | re.IGNORECASE
用于多行和忽略大小写答案 1 :(得分:1)
致电let
使生活变得简单:
re.replace()
请参见live demo。
在正则表达式的前面添加wowSentence = re.sub('.*?(?:^|\. *)([^.]*\bwow\b[^.]*).*', '$1', paragraph)
,以不区分大小写地匹配(?i)
。