复杂的前瞻/不贪婪的正则表达式

时间:2013-12-24 10:47:07

标签: regex perl lookahead

我有以下字符串:

(全部在一条线上)

<IMG SRC="/include/images/moredetails.png" WIDTH="8" HEIGHT="7" ONMOUSEOVER="return createPopup('<b>[scan_name@user:home]:</b> <!-- #EscapedName# --><br><b>[organization@user:home]:</b><br><!-- #EscapedOrganizationPath# --><br><b>[total@user:home]:</b> <!-- #EscapedTotal# --><br><b>[high@user:home]:</b> <!-- #EscapedHigh# --><br><b>[medium@user:home]:</b> <!-- #EscapedMedium# --><br><b>[low@user:home]:</b> <!-- #EscapedLow# --><br><b>[date_last_scanned@user:home]:</b> <!-- #EscapedDate# -->');" ONMOUSEOUT="return nd(1000);"><!-- #Name# --></TD>

第二个字符串:

<IMG SRC="/include/images/moredetails.png" WIDTH="8" HEIGHT="7" ONMOUSEOVER="return createPopup('<b>[scan_name@user:home]:</b> <!-- #EscapedName# --><br><b>조직/부서 경로:</b><br><!-- #EscapedOrganizationPath# --><br><b>[total@user:home]:</b> <!-- #EscapedTotal# --><br><b>[high@user:home]:</b> <!-- #EscapedHigh# --><br><b>[medium@user:home]:</b> <!-- #EscapedMedium# --><br><b>[low@user:home]:</b> <!-- #EscapedLow# --><br><b>[date_last_scanned@user:home]:</b> <!-- #EscapedDate# -->');" ONMOUSEOUT="return nd(1000);"><!-- #Name# --></TD>

我希望从第一个字符串中找到所有[..]占位符,并在第二个字符串中找到他们的韩语翻译。

我写了代码来执行此操作:

while($stringA =~ /(.*?)(\[[^\]]+?\])(.*?)/g) {
 my $prefix = $1;
 my $tag = $2;
 my $suffix = $3;

然后在$prefix$suffix上调用正则表达式:

if ($stringB =~ /\Q$prefix\E(.*)\Q$suffix\E/g) {

注意以下复制的示例不会转义",我只是这样做以使其更清晰

问题:

一个。 $prefix$suffix不包含该占位符之前和之后的所有内容,因为我使用的是非贪婪的。例如:

$prefix = "<IMG SRC="/include/images/moredetails.png" WIDTH="8" HEIGHT="7" ONMOUSEOVER="return createPopup('<b>"
$tag = "[scan_name@user:home]"
$suffix = ""

B中。如果我不使用贪婪的(.*)(\[[^\]]+?\])(.*)我会“正确”捕获所有内容,但只会抓住最后一个标记。例如:

$prefix = "<IMG SRC="/include/images/moredetails.png" WIDTH="8" HEIGHT="7" ONMOUSEOVER="return createPopup('<b>[scan_name@user:home]:</b> <!-- #EscapedName# --><br><b>[organization@user:home]:</b><br><!-- #EscapedOrganizationPath# --><br><b>[total@user:home]:</b> <!-- #EscapedTotal# --><br><b>[high@user:home]:</b> <!-- #EscapedHigh# --><br><b>[medium@user:home]:</b> <!-- #EscapedMedium# --><br><b>[low@user:home]:</b> <!-- #EscapedLow# --><br><b>"
$tag = "[date_last_scanned@user:home]"
$suffix = ":</b> <!-- #EscapedDate# -->');" ONMOUSEOUT="return nd(1000);"><!-- #Name# --></TD>"

我想要什么

我希望捕获所有标记,并能够将其与已翻译的字符串进行比较,并返回类似:

'[state@user:home] = '상태'

感谢您的帮助

1 个答案:

答案 0 :(得分:2)

怎么样:

my $strA = q~<IMG SRC="/include/images/moredetails.png" WIDTH="8" HEIGHT="7" ONMOUSEOVER="return createPopup('<b>[scan_name@user:home]:</b> <!-- #EscapedName# --><br><b>[organization@user:home]:</b><br><!-- #EscapedOrganizationPath# --><br><b>[total@user:home]:</b> <!-- #EscapedTotal# --><br><b>[high@user:home]:</b> <!-- #EscapedHigh# --><br><b>[medium@user:home]:</b> <!-- #EscapedMedium# --><br><b>[low@user:home]:</b> <!-- #EscapedLow# --><br><b>[date_last_scanned@user:home]:</b> <!-- #EscapedDate# -->');" ONMOUSEOUT="return nd(1000);"><!-- #Name# --></TD>~;
my $strB = q~<IMG SRC="/include/images/moredetails.png" WIDTH="8" HEIGHT="7" ONMOUSEOVER="return createPopup('<b>[scan_name@user:home]:</b> <!-- #EscapedName# --><br><b>조직/부서 경로:</b><br><!-- #EscapedOrganizationPath# --><br><b>[total@user:home]:</b> <!-- #EscapedTotal# --><br><b>[high@user:home]:</b> <!-- #EscapedHigh# --><br><b>[medium@user:home]:</b> <!-- #EscapedMedium# --><br><b>[low@user:home]:</b> <!-- #EscapedLow# --><br><b>[date_last_scanned@user:home]:</b> <!-- #EscapedDate# -->');" ONMOUSEOUT="return nd(1000);"><!-- #Name# --></TD>~;

while($strA =~ /(.*?)\[([^\]]+?)\](.)/g) {
    my $prefix = $1;
    my $tag = $2;
    my $suffix = $3;
    print "prefix=$prefix\ntag=$tag\nsuffix=$suffix\n";
    print "found it $1\n\n" if ($strB =~ /\Q$prefix\E\[?([^\[\]]+)\]?\Q$suffix\E/g);
}

如果你想要一个更长的后缀以避免重叠,你可以使用它:

while($strA =~ /(.*?)\[([^\]]+?)\]([^[]*))/g) {

<强>输出:

prefix=<IMG SRC="/include/images/moredetails.png" WIDTH="8" HEIGHT="7" ONMOUSEOVER="return createPopup('<b>
tag=scan_name@user:home
suffix=:
found it scan_name@user:home

prefix=</b> <!-- #EscapedName# --><br><b>
tag=organization@user:home
suffix=:
found it 조직/부서 경로

prefix=</b><br><!-- #EscapedOrganizationPath# --><br><b>
tag=total@user:home
suffix=:
found it total@user:home

prefix=</b> <!-- #EscapedTotal# --><br><b>
tag=high@user:home
suffix=:
found it high@user:home

prefix=</b> <!-- #EscapedHigh# --><br><b>
tag=medium@user:home
suffix=:
found it medium@user:home

prefix=</b> <!-- #EscapedMedium# --><br><b>
tag=low@user:home
suffix=:
found it low@user:home

prefix=</b> <!-- #EscapedLow# --><br><b>
tag=date_last_scanned@user:home
suffix=:
found it date_last_scanned@user:home