如何为以下情况编写正则表达式?

时间:2019-05-06 10:48:13

标签: regex python-3.x

我喜欢写一个必须满足所有这些示例的正则表达式

不使用^

[^ i|like|to|drink][\w+](\s*)(\w*)

正则表达式:?

示例1:

sentence = i like to drink black tea
output = black tea

示例2:

sentence: drink tea
output : tea

示例3:

sentence = drink pomegranate juice
output = pomegranate juice

3 个答案:

答案 0 :(得分:0)

尝试模式(?<=\bdrink\b)\s*(.*$)-> Lookbehind

例如:

import re

data = ["i like to drink black tea", "drink tea", "drink pomegranate juice"]
for sentence in data:
    m = re.search(r"(?<=\bdrink\b)\s*(.*$)", sentence)
    if m:
        print(m.group(1))

输出:

black tea
tea
pomegranate juice

答案 1 :(得分:0)

您的模式[^ i|like|to|drink][\w+](\s*)(\w*)使用negated character class,该匹配将匹配不在字符类中的任何字符。

我认为您打算使用将alternation与或|结合使用的分组结构,但这不会为您提供所需的匹配项。

似乎您想要drink之后的内容。在这种情况下,您不需要向后看,而只需一个捕获组即可,您的值位于第一个组中:

\bdrink\b\s+(.*)$

Regex demo

如果要匹配的单词前可以有多个单词,则可以使用交替形式:

\b(?:drink|have)\b\s+(.*)$

Regex demo

另一种选择是在单词边界\b之间将“ drink”一词分开

import re
strings = ["i like to drink black tea", "drink tea", "drink pomegranate juice", "testdrinktest"]
for str in strings:
    parts = re.split(r"\bdrink\b", str)
    if len(parts) > 1:
        print(parts[1])

结果:

 black tea
 tea
 pomegranate juice

答案 2 :(得分:0)

This expression可能会通过简单地创建两个捕获组来帮助您:

(drink)\s(.+)

enter image description here

此图显示了表达式的工作方式,您可以在此link中可视化其他表达式:

enter image description here

代码

# coding=utf8
# the above tag defines encoding for this document and is for Python 2.x compatibility

import re

regex = r"(drink)\s(.+)"

test_str = ("i like to drink black tea\n"
    "drink tea\n"
    "drink pomegranate juice\n")

matches = re.finditer(regex, test_str, re.MULTILINE)

for matchNum, match in enumerate(matches, start=1):

    print ("Match {matchNum} was found at {start}-{end}: {match}".format(matchNum = matchNum, start = match.start(), end = match.end(), match = match.group()))

    for groupNum in range(0, len(match.groups())):
        groupNum = groupNum + 1

        print ("Group {groupNum} found at {start}-{end}: {group}".format(groupNum = groupNum, start = match.start(groupNum), end = match.end(groupNum), group = match.group(groupNum)))

# Note: for Python 2.7 compatibility, use ur"" to prefix the regex and u"" to prefix the test string and substitution.

输出

Match 1 was found at 10-25: drink black tea
Group 1 found at 10-15: drink
Group 2 found at 16-25: black tea
Match 2 was found at 26-35: drink tea
Group 1 found at 26-31: drink
Group 2 found at 32-35: tea
Match 3 was found at 36-59: drink pomegranate juice
Group 1 found at 36-41: drink
Group 2 found at 42-59: pomegranate juice