正则表达式可以找到包含单词的句子中的所有内容

时间:2018-08-03 18:14:08

标签: regex python-3.x

我试图弄清楚如何找到包含某个单词的句子,所以我们先说单词“哇”,然后再输入以下三个字符串

\nOkay hold on. This is pretty wow in here. Okay.\n

\nThis is super wow. Doesn't get much more wow than that.\n

\nHold up. wow.\n

\nOkay wow. Just wow!\n

将分别产生以下内容:

This is pretty wow in here

This is super wow.

wow.

Okay wow.

我正在Python3中执行此操作,因此我可以编写if语句,但是它很凌乱,我希望避免这样做。这是我的代码,了解正在工作但开始失败的情况。也许我在Regex上太糟糕了,并且让这个事情变得复杂了。

    m = re.search('(?:(\.\s[A-Z]))(?=(.*)' + name+ '([^a-z^A-Z]))([^.]*)(\.\s[A-Z])', node.getIntroText())
    if m == None:
        m = re.search('(?:(\.\s[A-Z]))(?=(.*)' + name+ '([^a-z^A-Z]))(.*)(\.\s[A-Z])', node.getIntroText())
    if m == None:
        m = re.search('(?:([\r\n]))(?=(.*)' + name+ '([^a-z^A-Z]))([^.]*)(\.\s[A-Z])', node.getIntroText())

基本上,我想捕获“名称”之前的(第一个句点或换行符)实例,直到一个句点的下一个实例,再跟一个(空格和字母以外的东西)或换行符。

2 个答案:

答案 0 :(得分:1)

将我的评论转换为答案。您可以使用此正则表达式

>>> reg = re.compile(r"^(?:(?:(?!\bwow\b)[^.\n])*\. +)*((?:[a-z][^.\n]*?)?\bwow\b[^.\n]*)(?=\.)", re.MULTILINE | re.IGNORECASE)
>>> test_str = ("\n"
...     "Okay hold on. This is pretty wow in here. Okay.\n\n"
...     "This is super wow. Doesn't get much more wow than that.\n\n"
...     "Hold up. wow.\n\n"
...     "Okay wow. Just Wow!\n")
>>> print ( reg.findall(test_str) )

['This is pretty wow in here', 'This is super wow', 'wow', 'Okay wow']

RegEx Demo

RegEx说明:

  • ^:开始
  • (?:(?:(?!\bwow\b)[^.\n])*\. +)*:匹配0个或多个不包含wow的句子。
  • ((?:[a-z][^.\n]*?)?\bwow\b[^.\n]*):匹配包含单词wow的句子
  • (?=\.):声明我们在下一个位置有点
  • 模式re.MULTILINE | re.IGNORECASE用于多行和忽略大小写

答案 1 :(得分:1)

致电let使生活变得简单:

re.replace()

请参见live demo

在正则表达式的前面添加wowSentence = re.sub('.*?(?:^|\. *)([^.]*\bwow\b[^.]*).*', '$1', paragraph) ,以不区分大小写地匹配(?i)