最终,我正在尝试将XML文件的所有非空元素包装在
中'<![CDATA[...]]>'
以下是我正在测试我的代码的示例:
<currentTime4 dsi="user 2009/06/02 10:43">10:36</currentTime4>
<todayDate dsi="user 2009/06/02 10:43">06/02/2009</todayDate>
<todayDate3 dsi="user 2009/06/02 10:43">06/02/2009</todayDate3>
<todayDate4 dsi="user 2009/06/02 10:43">06/02/2009</todayDate4>
<currentTime dsi="user 2009/06/02 10:43">10:36</currentTime>
<Relationship dsi="user 2009/06/02 10:43"></Relationship>
<PatSignatureIII dsi="user 2009/06/02 10:43"></PatSignatureIII>
<PatSignatureIV dsi="user 2009/06/02 10:43"></PatSignatureIV>
<PatSignature dsi="user 2009/06/02 10:43">313031320D0A3</PatSignature>
<Relationship dsi="user 2009/06/02 10:43">Mother</Relationship>
<currentTime3 dsi="user 2009/06/02 10:43">10:36</currentTime3>
</consent_to_treat>
它模仿我必须处理的XML,但实际上,一些元素包含多行文本,这使得这次冒险更有趣......
我构建了一个正则表达式,只要没有重复项就可以工作:
$text =~ s/(<(\w+) +[" \w\/\-=:]+?>)(?!\n)(.+?)(?<!\n)(<\/\2>)/$1<!\[CDATA\[$3\]\]>$4/gs;
但在此示例中失败,如下所示:
<consent_to_treat dsi="user 2009/06/02 10:43" version="">
<currentTime4 dsi="user 2009/06/02 10:43"><![CDATA[10:36]]></currentTime4>
<todayDate dsi="user 2009/06/02 10:43"><![CDATA[06/02/2009]]></todayDate>
<todayDate3 dsi="user 2009/06/02 10:43"><![CDATA[06/02/2009]]></todayDate3>
<todayDate4 dsi="user 2009/06/02 10:43"><![CDATA[06/02/2009]]></todayDate4>
<currentTime dsi="user 2009/06/02 10:43"><![CDATA[10:36]]></currentTime>
<Relationship dsi="user 2009/06/02 10:43"><![CDATA[</Relationship>
<PatSignatureIII dsi="user 2009/06/02 10:43"></PatSignatureIII>
<PatSignatureIV dsi="user 2009/06/02 10:43"></PatSignatureIV>
<PatSignature dsi="user 2009/06/02 10:43">313031320D0A3</PatSignature>
<Relationship dsi="user 2009/06/02 10:43">Mother]]></Relationship>
<currentTime3 dsi="user 2009/06/02 10:43"><![CDATA[10:36]]></currentTime3>
</consent_to_treat>
让它变得非贪婪的最佳方法是什么,或者,或许是一种与我不同的更好的解决方案?
提前谢谢。
P.S。我相信我最终弄明白了。以下代码似乎可以解决问题:
$text =~ s/(<(\w+) +[" \w\/\-=:]+?>)(?!(\n|\s*<\/\2>))(.+?)(?<!\n)(<\/\2>)/$1<!\[CDATA\[$4\]\]>$5/gs;
再次感谢所有回答我问题的人,我仍然愿意接受更好的解决方案......
答案 0 :(得分:0)
这个正则表达式将满足您的需求:
s/(<[^>]+>)(.*?)(<\/[^>]+>)/$1<![CDATA[$2]]>$3/gi
代码:
#!/usr/bin/perl
my $xml = <<'END_XML';
<currentTime4 dsi="user 2009/06/02 10:43">10:36</currentTime4>
<todayDate dsi="user 2009/06/02 10:43">06/02/2009</todayDate>
<todayDate3 dsi="user 2009/06/02 10:43">06/02/2009</todayDate3>
<todayDate4 dsi="user 2009/06/02 10:43">06/02/2009</todayDate4>
<currentTime dsi="user 2009/06/02 10:43">10:36</currentTime>
<Relationship dsi="user 2009/06/02 10:43"></Relationship>
<PatSignatureIII dsi="user 2009/06/02 10:43"></PatSignatureIII>
<PatSignatureIV dsi="user 2009/06/02 10:43"></PatSignatureIV>
<PatSignature dsi="user 2009/06/02 10:43">313031320D0A3</PatSignature>
<Relationship dsi="user 2009/06/02 10:43">Mother</Relationship>
<currentTime3 dsi="user 2009/06/02 10:43">10:36</currentTime3>
</consent_to_treat>
END_XML
$xml =~ s/(<[^>]+>)(.*?)(<\/[^>]+>)/$1<![CDATA[$2]]>$3/gi;
print $xml;
输出:
<currentTime4 dsi="user 2009/06/02 10:43"><![CDATA[10:36]]></currentTime4>
<todayDate dsi="user 2009/06/02 10:43"><![CDATA[06/02/2009]]></todayDate>
<todayDate3 dsi="user 2009/06/02 10:43"><![CDATA[06/02/2009]]></todayDate3>
<todayDate4 dsi="user 2009/06/02 10:43"><![CDATA[06/02/2009]]></todayDate4>
<currentTime dsi="user 2009/06/02 10:43"><![CDATA[10:36]]></currentTime>
<Relationship dsi="user 2009/06/02 10:43"><![CDATA[]]></Relationship>
<PatSignatureIII dsi="user 2009/06/02 10:43"><![CDATA[]]></PatSignatureIII>
<PatSignatureIV dsi="user 2009/06/02 10:43"><![CDATA[]]></PatSignatureIV>
<PatSignature dsi="user 2009/06/02 10:43"><![CDATA[313031320D0A3]]></PatSignature>
<Relationship dsi="user 2009/06/02 10:43"><![CDATA[Mother]]></Relationship>
<currentTime3 dsi="user 2009/06/02 10:43"><![CDATA[10:36]]></currentTime3>
</consent_to_treat>