perl脚本以递增方式生成xml id?

时间:2014-01-30 18:31:24

标签: xml regex perl

我有像这样的xml文件

 <ce:para id="p0010">xxx</ce:para>**<ce:para>xxx</ce:para**>

 **<ce:para>vvv</ce:para>**

 <ce:para id="p0015">vvv</ce:para>

 <ce:para id="p0020">vv</ce:para>

 **<ce:para>vvvv</ce:para><ce:para>xxxxxxx</ce:para>**

 <ce:para id="p0070">vvddd</ce:para>

现在我想要生成id =“pxxxx这些标签没有我在这里加粗,但条件是id不相同而id只是递增5。

2 个答案:

答案 0 :(得分:1)

快速而肮脏的Perl解决方案

 use strict;
 use warnings;

 $/ = undef;

 my $str = <DATA>;
 my $i = 0;

 $str =~ s/(<ce:para) (?=\s|>) (.*?)>/"$1 id=\"p" . sprintf("%04d",$i+=5) . "\">"/xsge;

 print $str;

 __DATA__

 <ce:para id="p0010">xxx</ce:para>**<ce:para>xxx</ce:para**>

  **<ce:para>vvv</ce:para>**

  <ce:para id="p0015">vvv</ce:para>

  <ce:para id="p0020">vv</ce:para>

  **<ce:para>vvvv</ce:para><ce:para>xxxxxxx</ce:para>**

  <ce:para id="p0070">vvddd</ce:para>

输出&gt;&gt;

 <ce:para id="p0005">xxx</ce:para>**<ce:para id="p0010">xxx</ce:para**>

  **<ce:para id="p0015">vvv</ce:para>**

  <ce:para id="p0020">vvv</ce:para>

  <ce:para id="p0025">vv</ce:para>

  **<ce:para id="p0030">vvvv</ce:para><ce:para id="p0035">xxxxxxx</ce:para>**

  <ce:para id="p0040">vvddd</ce:para>

修改 - 仅更改没有ID的那些,,,,

use strict;
use warnings;

$/ = undef;

my $str = <DATA>;
my $i = 0;

$str =~ 
 s/
     ( <ce:para )        # (1)
     (?= \s | > )
     \s* 
     (?:
          id=
          "p
          ( \d{1,} )     # (2)
          "
       |  .*? 
     )
     >
  /
     defined $2 and $i=$2-5;
     "$1 id=\"p" . sprintf("%04d",$i+=5) . "\">"
  /xsge;


 print $str;

 __DATA__

 <ce:para id="p0010">xxx</ce:para>**<ce:para>xxx</ce:para**>

 **<ce:para>vvv</ce:para>**

 <ce:para id="p0015">vvv</ce:para>

 <ce:para id="p0020">vv</ce:para>

 **<ce:para>vvvv</ce:para><ce:para>xxxxxxx</ce:para>**

 <ce:para id="p0070">vvddd</ce:para>

输出&gt;&gt;

 <ce:para id="p0010">xxx</ce:para>**<ce:para id="p0015">xxx</ce:para**>

 **<ce:para id="p0020">vvv</ce:para>**

 <ce:para id="p0015">vvv</ce:para>

 <ce:para id="p0020">vv</ce:para>

 **<ce:para id="p0025">vvvv</ce:para><ce:para id="p0030">xxxxxxx</ce:para>**

 <ce:para id="p0070">vvddd</ce:para>

答案 1 :(得分:0)

 use strict;
 use warnings; 
 $/ = undef;
 my $str = '<ce:para><ce:para><ce:para><ce:para><ce:para id="p0010">';
 my $i = 0;
 first:
 $i=$i+5;
 if($str =~ /<ce:para>/) {
 my $id=sprintf("%04d",$i);
 if($str =~ /<ce:para id="p$id">/){
 goto first;
 }
 else{
 $str =~ s/(<ce:para)>/"$1 id=\"p" . sprintf("%04d",$i) . "\">"/xse;
 }}
 if($str =~ /<ce:para>/){
 goto first;
 }
 print $str;

这里我首先生成id并检查数据是否已经退出。如果exit id跳过所有ce:para。

的替换和增量id