Question

我在这里经历了许多正则表达式问题并使用了它们中的建议，但似乎无法让我的代码保持运行状态。我有一个字符串列表，我试图在此列表中找到包含以下模式之一的条目：

BLANK BLANK
BLANK BLANK
BLANK BLANK
BLANK BLANK
BLANK BLANK
BLANK BLANK
BLANK BLANK
BLANK的BLANK

例如，我应该能够找到包含＆＃34;医生的白痴等短语的句子。或者＆＃34;学生的勤奋工作者。＆＃34;

一旦找到，我想列出满足此标准的句子。到目前为止，这是我的代码：

for sentence in sentences:
    matched = re.search(r"a [.*]of a " \
                        r"an [.*]of an " \
                        r"a [.*]of an" \
                        r"an [.*]of a " \
                        r"that [.*]of a " \
                        r"that [.*]of an " \
                        r"the [.*]of a " \
                        r"the [.*]of an ", sentence)
    if matched:
        bnp.append(matched)

#Below two lines for testing purposes only
print(matched)
print(bnp)

尽管有些短语应符合列表中的标准，但此代码没有结果。

Answer 1

[.*]是一个字符类，因此您要求regex实际匹配点或星号字符，引用re's文档：

[]

用于表示一组字符。在一组：

可以单独列出字符，例如[amk]会匹配＆＃39; a＆＃39; m＆＃39; m＆＃39;或＆＃39;＃＆＃39;。

...

所以，这是一种方法：

(th(at|e)|a[n]?)\b.*\b(a[n]?)\b.*

此表达式将尝试匹配，a，a或an，然后任何字符都可以是a或。

在此link中，展示了它的过程。

以下是实际演示：

>>> import re
>>>
>>> regex = r"(th(at|e)|a[n]?)\b.*\b(a[n]?)\b.*"
>>> test_str = ("an idiot of a doctor\n"
    "the hard-worker of a student.\n"
    "an BLANK of an BLANK\n"
    "a BLANK of an BLANK\n"
    "an BLANK of a BLANK\n"
    "that BLANK of a BLANK\n"
    "the BLANK of a BLANK\n"
    "the BLANK of an BLANK\n")
>>>
>>> matches =  re.finditer(regex, test_str, re.MULTILINE | re.IGNORECASE) 
>>> 
>>> for m in matches:
        print(m.group())


an idiot of a doctor
the hard-worker of a student.
an BLANK of an BLANK
a BLANK of an BLANK
an BLANK of a BLANK
that BLANK of a BLANK
the BLANK of a BLANK
the BLANK of an BLANK

Answer 2

目前，此代码将您的模式参数连接成一个长字符串，它们之间没有运算符。所以实际上你正在搜索正则表达式＆＃34; a。[。*]的[。*] a [。*]的... [＆＃34;

您错过了＆＃39;或＆＃39;运营商：|。完成此任务的更简单的正则表达式将是：

(a|an|that|the) \b.*\b of (a|an) \b.*\b

使用正则表达式查找字符串列表以查找子字符串Python

2 个答案: