python - 用于在两个字符之间匹配文本而忽略反斜杠字符的正则表达式

时间:2016-09-22 10:20:28

标签: python regex escaping

我正在尝试使用python来获取两个美元符号($)之间的文本,但是美元符号应该以反斜杠开头,即\ $(这是一个LaTeX渲染程序)。所以如果给出了

$\$x + \$y = 5$ and $3$ 

这是应该输出的

['\$x + \$y = 5', ' and ', '3']

到目前为止,这是我的代码:

def parse_latex(text):
    return re.findall(r'(^|[^\\])\$.*?[^\\]\$', text)
print(parse_latex(r'$\$x + \$y = 5$ and $3$'))

但这就是我得到的:

['', ' ']

我不知道如何从这里开始。

1 个答案:

答案 0 :(得分:1)

您可以使用此基于外观的正则表达式排除转义字符:

>>> text = r'$\$x + \$y = 5$ and $3$'
>>> re.findall(r'(?<=\$)([^$\\]*(?:\\.[^$\\]*)*)(?=\$)', text)
['\\$x + \\$y = 5', ' and ', '3']

RegEx Demo

Code Demo

RegEx分手:

(?<=\$)           # Lookbehind to assert previous character is $
(                 # start capture group
   [^$\\]*        # match 0 or more characters that are not $ and \
   (?:            # start non-capturing group
      \\.         # match \ followed any escaped character
      [^$\\]*     # match 0 or more characters that are not $ and \
   )*             # non-capturing group, match 0 or more of this non-capturing group
)                 # end capture group
(?=\$)            # Lookahead to assert next character is $