使用正则表达式无法获得自定义结果

时间:2019-07-07 15:03:50

标签: python regex python-3.x

我试图弄清楚如何修改现有的正则表达式模式或创建一个新的模式,以获取其中包含dear的所有行。如果匹配,则脚本应打印从:到这些行末尾的所有行。此处不能选择字符串操作来获取结果。

我尝试过:

import re

instr = """
Expression: It's been a while man.
Expression: How have you been moron?
Expression: Good to see you dear.
Greeting: How is everything dear?
Greeting: Hi dear, how are you?
"""
pattern = r'.*(?<=dear)'

for item in instr.splitlines():
    if re.search(pattern, item):
        print(item)

我得到的结果:

Expression: Good to see you dear.
Greeting: How is everything dear?
Greeting: Hi dear, how are you?

我希望得到的东西:

Good to see you dear.
How is everything dear?
Hi dear, how are you?

如何使用正则表达式获取自定义结果?

4 个答案:

答案 0 :(得分:2)

您可以使用正数Lookbehind仅捕获冒号之后的内容。这样的事情应该起作用:

(?<=:).*\bdear\b.*

Demo

我使用了边界声明\b一词来避免匹配“除气器”之类的东西。如果这不是您想要的行为,请随时删除它们。

答案 1 :(得分:2)

另一个选择可能是使用锚点^和捕获组:

^[^:]*:\s*(.*\bdear\b.*)

说明

  • ^字符串的开头
  • [^:]*Match 0+ times not, then:`
  • \s*匹配0+次空白字符
  • (捕获组
    • .*\bdear\b.*匹配单词边界和它左右两侧的任何字符之间的亲爱的
  • )关闭捕获组

Regex demo | Python demo

例如:

import re

instr = """
Expression: It's been a while man.
Expression: How have you been moron?
Expression: Good to see you dear.
Greeting: How is everything dear?
Greeting: Hi dear, how are you?
"""
pattern = r'^[^:]*:\s*(.*\bdear\b.*)'

for item in instr.splitlines():
    res = re.search(pattern, item)
    if res:
        print(res.group(1))

结果

Good to see you dear.
How is everything dear?
Hi dear, how are you?

答案 2 :(得分:1)

>>> for m in re.finditer(r'^[^:]+:\s*(.*dear.*)', instr, flags=re.M):
...     print(m[1])
... 
Good to see you dear.
How is everything dear?
Hi dear, how are you?
  • re.finditer遍历所有匹配项
  • flags=re.M,这样^$的锚将匹配每行,而不是每个完整的字符串
  • ^[^:]+:\s*覆盖从行首到:的字符串以及可选的空格
  • (.*dear.*)如果包含dear,则匹配该行的其余部分(请注意,.默认情况下将不匹配换行符)
  • 由于所需的字符串在捕获组内部,因此m[1]仅给出该部分,而不是整行
    • 如果Python版本低于3.6,请使用m.group(1)

答案 3 :(得分:1)

此表达式

(?=:.*\bdear\b):\s*(.*)

可以在这里工作。

该表达式在this demo的右上角进行了说明,如果您想进一步探索或修改它,在this link中,您可以逐步观察它如何与某些示例输入匹配步骤,如果您愿意的话。

使用re.findall

进行测试
import re

regex = r"(?=:.*\bdear\b):\s*(.*)"

test_str = ("Expression: It's been a while man.\n"
    "Expression: How have you been moron?\n"
    "Expression: Good to see you dear.\n"
    "Greeting:      How is everything dear?\n"
    "Greeting: Hi dear, how are you?\n"
    "Greeting:   Hi dear, how are you?\n"
    "dear: Hi there, how are you?")

print(re.findall(regex, test_str))

使用re.finditer

进行测试
import re

regex = r"(?=:.*\bdear\b):\s*(.*)"

test_str = ("Expression: It's been a while man.\n"
    "Expression: How have you been moron?\n"
    "Expression: Good to see you dear.\n"
    "Greeting:      How is everything dear?\n"
    "Greeting: Hi dear, how are you?\n"
    "Greeting:   Hi dear, how are you?\n"
    "dear: Hi there, how are you?")

matches = re.finditer(regex, test_str, re.MULTILINE)

for matchNum, match in enumerate(matches, start=1):

    print ("Match {matchNum} was found at {start}-{end}: {match}".format(matchNum = matchNum, start = match.start(), end = match.end(), match = match.group()))

    for groupNum in range(0, len(match.groups())):
        groupNum = groupNum + 1

        print ("Group {groupNum} found at {start}-{end}: {group}".format(groupNum = groupNum, start = match.start(groupNum), end = match.end(groupNum), group = match.group(groupNum)))