在perl一线模式中进行转义和捕获的正确语法是什么?

时间:2019-08-23 03:18:46

标签: regex perl escaping latex doxygen

我正在尝试使用pandoc将乳胶文件(由doxygen自动生成)转换为.docx格式。我遇到了一个错误,也许是在doxygen中,它允许应该在<{1}}乳胶环境中转义的一些字符转义(_%) 。一些下划线出现在文件名中,并且在花括号内。那些不应该逃脱。

我写了一个perl单行代码,用于查找括号之间没有的所有下划线或百分比,并用反斜杠替换它们,后跟相同的字符:

DoxyCode

这按预期工作。但是,我随后发现在perl -i -pe 's/(?<!\\)([_%])(?![^{]+})/\\$1/g' test.tex 环境中,某些文件在大括号内包含例如初始化列表,某些变量包含下划线。因此,我需要一个perl脚本,该脚本可以识别下划线或百分比在DoxyCode\begin{DoxyCode}之间,并在没有反斜杠的情况下插入反斜杠。

此命令的正则表达式正在运行;参见https://regex101.com/r/gsQm2L/2

尽管它只获得第一场比赛。我希望perl可以参加其他比赛,但我可能会误会。

我的命令是

\end{DoxyCode}

,但无法进行任何更改。 (我尝试不转义左括号,但出现错误:perl -i -pe 's/(?<=begin\{DoxyCode})([\s\S]+?[^\\])([_%])([\s\S]+?)(?=end\{DoxyCode})/$1\\$2$3/g' test.tex 等。)我无法分辨是由于我的捕获语法不正确而导致找不到匹配项还是未能替换匹配项。

对于第一个和第二个示例,test.tex的原始内容如下:

Unescaped left brace in regex is deprecated, passed through in regex;

运行perl命令后,test.tex的所需内容如下:

\begin{DoxyCode}                                                                                                     
17 This is some code that contains an_undersc_ore and                                                                
18 an escaped\_underscore. Plus another unescaped_unders_core                                                        
19 for good measure.                                                                                                 
20 As if that was not "bad" enough, it also contains a %percent sign                                                 
21 that is unescaped.                                                                                                
\end{DoxyCode}                                                                                                       

Here is some other stuff that may contain \index{things_not_to_be_escaped}.                                          

\begin{DoxyCode}                                                                                                     
17 This is some code that contains an_underscore and                                                                 
18 an escaped\_underscore. Plus another unescaped_underscore                                                         
19 for good measure.                                                                                                 
20 As if that was not "bad" enough, it also contains a \%percent sign                                                
21 that is escaped.                                                                                                  
\end{DoxyCode}     

为什么我的perl单线版失败了?以及如何获得所需的输出?我绝对不是perl或regex专家,所以我欢迎其他错误的反馈。

在适当的情况下,我正在研究debian Stretch,\begin{DoxyCode} 17 This is some code that contains an\_undersc\_ore and 18 an escaped\_underscore. Plus another unescaped\_unders\_core 19 for good measure. 20 As if that was not "bad" enough, it also contains a \%percent sign 21 that is unescaped. \end{DoxyCode} Here is some other stuff that may contain \index{things_not_to_be_escaped}. \begin{DoxyCode} 17 This is some code that contains an\_underscore and 18 an escaped\_underscore. Plus another unescaped\_underscore 19 for good measure. 20 As if that was not "bad" enough, it also contains a \%percent sign 21 that is escaped. \end{DoxyCode} 返回

perl --version

1 个答案:

答案 0 :(得分:1)

这很容易,尽管“正确”的方法是使用正则表达式解析器,但它仍然非常简单,您可以使用一个内衬即可。关键是进行两阶段替换。我为文字反斜杠(\)添加了一个用例,这些反斜杠(_)不会为_或%进行转义。如果可以有其他嵌入的{},则可以使用相同的范式排除它们。

$text = <<'EOF';
\begin{DoxyCode}
17 This is some code that contains an_undersc_ore and
18 an escaped\_underscore. Plus another unescaped_unders_core
19 for good measure. A literal \ and a literal \\_.
20 As if that was not "bad" enough, it also contains a %percent sign
21 that is unescaped.
\end{DoxyCode}

Here is some other stuff that may contain \index{things_not_to_be_escaped}.

\begin{DoxyCode}
17 This is some code that contains an_underscore and
18 an escaped\_underscore. Plus another unescaped_underscore
19 for good measure. A literal \\%.
20 As if that was not "bad" enough, it also contains a \%percent sign
21 that is escaped.
\end{DoxyCode}
EOF

print "before:\n$text\n\n";
$text =~ s{\Q\begin{DoxyCode}\E\K(.+?)(\Q\end{DoxyCode}\E)}{
    my($t,$e) = ($1,$2);
    $t =~ s{(\\\\ | \\?[_%])}{1==length $1 ? "\\$1" : $1}egsx; "$t$e";
}egs;
print "after:\n$text\n";

输出:

before:
\begin{DoxyCode}
17 This is some code that contains an_undersc_ore and
18 an escaped\_underscore. Plus another unescaped_unders_core
19 for good measure. A literal \ and a literal \\_.
20 As if that was not "bad" enough, it also contains a %percent sign
21 that is unescaped.
\end{DoxyCode}

Here is some other stuff that may contain \index{things_not_to_be_escaped}.

\begin{DoxyCode}
17 This is some code that contains an_underscore and
18 an escaped\_underscore. Plus another unescaped_underscore
19 for good measure. A literal \\%.
20 As if that was not "bad" enough, it also contains a \%percent sign
21 that is escaped.
\end{DoxyCode}


after:
\begin{DoxyCode}
17 This is some code that contains an\_undersc\_ore and
18 an escaped\_underscore. Plus another unescaped\_unders\_core
19 for good measure. A literal \ and a literal \\\_.
20 As if that was not "bad" enough, it also contains a \%percent sign
21 that is unescaped.
\end{DoxyCode}

Here is some other stuff that may contain \index{things_not_to_be_escaped}.

\begin{DoxyCode}
17 This is some code that contains an\_underscore and
18 an escaped\_underscore. Plus another unescaped\_underscore
19 for good measure. A literal \\\%.
20 As if that was not "bad" enough, it also contains a \%percent sign
21 that is escaped.
\end{DoxyCode}

还要阅读http://perldoc.perl.org/perlre.htmlhttp://perldoc.perl.org/perlop.html#Regexp-Quote-Like-Operators。请特别注意\ G断言和/ gc标志。这样便可以为该任务编写适当的解析器。

HTH