Question

我有一个包含许多句子块和完整句子的文本。这些句子都有以下格式：

Text=Some aliens try to run into the fields
Text=Some aliens try
Text=Some aliens try to run

我想匹配QUERY变量中的文本（对于当前示例'Some aliens try'）。

我正在使用以下代码：

my_query_reg = re.compile("".join(['Text=', QUERY, '$']))
my_query_reg.findall(TEXT)

但是，我的正则表达式似乎不正确，findall()没有返回任何结果，为什么？

Answer 1

如果我理解你想要的东西，试试这个：

import re

txt='''\
Text=Some aliens try to run into the fields
Text=Some aliens try
Text=Some aliens try to run'''

QUERY='Some aliens try'
print re.findall(r'^(Text={}\s*)$'.format(QUERY), txt, re.M)
# ['Text=Some aliens try']

它具有re.M标志，因此行开始^和行结束$锚点将匹配具有换行符的行，而不是限制为整个字符串。

您还可以使用“传统”Python字符串格式将QUERY字符串插入到模式中：

re.findall(r'^(Text=%s\s*)$' % QUERY, txt, re.M)

您可以根据需要添加各种\s*以补偿文本的噪音。

不要忘记Python的字符串测试这样一个简单的例子：

print [line for line in txt.splitlines() if line.strip().endswith(QUERY)]
# ['Text=Some aliens try']

Answer 2

默认情况下，查询中的$仅匹配字符串的最后一端。

使用re.MULTILINE选项使$与任意行的结尾相匹配：

my_query_reg = re.compile("".join(['Text=', QUERY, '$']), re.MULTILINE)

Answer 3

import re

TEXT = """Text=Some aliens try to run into the fields
Text=Some aliens try
Text=Some aliens try to run"""

QUERY = 'Some aliens try'

my_query_reg = re.compile('Text=\s*(%s)\s*$' % QUERY,re.M)
print my_query_reg.findall(TEXT)

\s*以防=与研究字符串之间或字符串与行尾之间存在空格

用于Python中句子精确匹配的正则表达式

3 个答案: