Question

Python docutils模块中的一个组件在机制中使用下面的正则表达式，该机制旨在将带有星号的文本转换为斜体文本：

原始：Most people know what is meant by the latin phrase *Carpe Diem*.

翻译：大多数人都知道拉丁短语 Carpe Diem 的含义。

这是一个非常简单的模式：如果前面没有空格，换行符或空字符，则匹配星号。我想知道的是通过将空的unicode字符串（u''）附加到模式中获得了什么？它附加在docutils中也可以找到的许多其他模式中，但我不知道它与给定的文本是否匹配有什么不同。

non_whitespace_escape_before = r'(?<![ \n\x00])'
end_string_suffix = u''

emphasis=re.compile(non_whitespace_escape_before + r'(\*)' + end_string_suffix, re.U)
# emphasis.pattern -> u'(?<![ \\n\\x00])(\\*)'

Answer 1

你错过了字符串并不总是空的;来自relevant source code：

if getattr(settings, 'character_level_inline_markup', False):
    start_string_prefix = u'(^|(?<!\x00))'
    end_string_suffix = u''
else:
    start_string_prefix = (u'(^|(?<=\\s|[%s%s]))' %
                           (punctuation_chars.openers,
                            punctuation_chars.delimiters))
    end_string_suffix = (u'($|(?=\\s|[\x00%s%s%s]))' %
                         (punctuation_chars.closing_delimiters,
                          punctuation_chars.delimiters,
                          punctuation_chars.closers))

增益是变量在任何地方定义;不是它是空的。如果它是空的，它确实会产生0差异，但是如果启用了character_level_inline_markup功能，那么现在编译的模式会有一个后缀，与空字符串相比会改变行为。

在Python 2中混合字节串和Unicode字符串时，docutils项目有点邋;;他们逃脱了这一点，因为所有连接到Unicode字符串的字节串恰好是ASCII干净的，因此可以隐式解码。

什么是追加你的＆＃39;＆＃39;正则表达式？

1 个答案: