Question

我正在尝试编写一个正则表达式（在Python程序中）来匹配看起来像这样的字符串：

@Embeddable
public class ApplicationAndFbAdgroupId implements Serializable {
    private static final long serialVersionUID = 1L;

    private Application application;
    private long adgroupId;

    public ApplicationAndFbAdgroupId(Application application, long adgroupId) {
        this.application = application;
        this.adgroupId = adgroupId;
    }

    private ApplicationAndFbAdgroupId() {
    }

    @ManyToOne(fetch = FetchType.EAGER,cascade = CascadeType.REMOVE)
    @JoinColumn(name="application_id")
    public Application getApplication() {
        return application;
    }

    public void setApplication(Application application) {
        this.application = application;
    }
    @Column(name="adgroup_id")
    public long getAdgroupId() {
        return adgroupId;
    }

    public void setAdgroupId(long adgroupId) {
        this.adgroupId = adgroupId;
    }

}

我的正则表达式目前是：

             """(book "Moby Dick" (MLA) #foo ?bar baz)

             """(book "Moby Dick" (MLA))

             """(book "Moby Dick")

期望的结果是：

(?P<indent>\s*)("""|\'\'\'|blockquote:)(\((?P<type>\w*)\s*(["\'](?P<citation>.+?)["\'])?\s*(\((?P<format>\w+?)\))?(?P<other>.+?)\))?

对于字符串的第一个版本，这就是我得到的。但是，对于字符串的较短版本，“其他”组正在捕获文本的早期部分，因此对于我得到的第二个版本：

indent  [0-8]   `        `
type    [12-16] `book`
citation    [18-27] `Moby Dick`
format  [30-33] `MLA`
other   [34-44] ` #foo ?bar baz`

我得到的第三个：

indent  [0-8]   `        `
type    [12-16] `book`
citation    [18-27] `Moby Dick`
other   [29-33] `(MLA`

所以我的问题是，为什么“其他”模式在早期模式之前匹配，我该怎么做才能获得模式的“引用”和“格式”部分以匹配第二个模式中的预期文本第三种情况？

Answer 1

您需要使(?P<other>.+?)模式也可选，以获得所需的结果：

reg = r'(?P<indent>\s*)("""|\'\'\'|blockquote:)(\((?P<type>\w*)\s*(["\'](?P<citation>.+?)["\'])?\s*(\((?P<format>\w+?)\))?(?P<other>.+?)?\))?'

由于它不是可选的，因此正则表达式引擎至少需要1个字符而不是最后一个右括号才能成功匹配正则表达式。由于其他模式为.+，之后有一个右括号。因此，最后两个字符串other被匹配，而不是citation和format。

>>> import re
>>> 
>>> reg = re.compile(r'(?P<indent>\s*)("""|\'\'\'|blockquote:)(\((?P<type>\w*)\s*(["\'](?P<citation>.+?)["\'])?\s*(\((?P<format>\w+?)\))?(?P<other>.+?)?\))?')
>>> 
>>> 
>>> s2 = '             """(book "Moby Dick" (MLA))'
>>> 
>>> m2 = reg.match(s2)
>>> m2.groupdict()
{'indent': '             ', 'citation': 'Moby Dick', 'type': 'book', 'other': None, 'format': 'MLA'}
>>> 
>>> s3 = '             """(book "Moby Dick")'
>>> m3 = reg.match(s3)
>>> 
>>> m3.groupdict()
{'indent': '             ', 'citation': 'Moby Dick', 'type': 'book', 'other': None, 'format': None}

正则表达式的不同部分匹配相同的文本，具体取决于之后的内容

1 个答案: