我正在尝试使用python来获取两个美元符号($)之间的文本,但是美元符号应该不以反斜杠开头,即\ $(这是一个LaTeX渲染程序)。所以如果给出了
$\$x + \$y = 5$ and $3$
这是应该输出的
['\$x + \$y = 5', ' and ', '3']
到目前为止,这是我的代码:
def parse_latex(text):
return re.findall(r'(^|[^\\])\$.*?[^\\]\$', text)
print(parse_latex(r'$\$x + \$y = 5$ and $3$'))
但这就是我得到的:
['', ' ']
我不知道如何从这里开始。
答案 0 :(得分:1)
您可以使用此基于外观的正则表达式排除转义字符:
>>> text = r'$\$x + \$y = 5$ and $3$'
>>> re.findall(r'(?<=\$)([^$\\]*(?:\\.[^$\\]*)*)(?=\$)', text)
['\\$x + \\$y = 5', ' and ', '3']
RegEx分手:
(?<=\$) # Lookbehind to assert previous character is $
( # start capture group
[^$\\]* # match 0 or more characters that are not $ and \
(?: # start non-capturing group
\\. # match \ followed any escaped character
[^$\\]* # match 0 or more characters that are not $ and \
)* # non-capturing group, match 0 or more of this non-capturing group
) # end capture group
(?=\$) # Lookahead to assert next character is $