XML :: Twig新手脚本

时间:2014-10-14 15:42:52

标签: xml perl xml-twig

我开始用XML :: Twig学习XML解析,我参与了两个小问题。 我的xml(书目记录集合)具有以下结构:

<?xml version="1.0" encoding="UTF-8"?>

<collection xmlns="http://www.loc.gov/MARC21/slim">
 <!-- FIRST INCREMENTAL -->
 <!-- INSTANCE:sfxudn -->
 <record>
  <leader>-----nas-a2200000z--4500</leader>
  <controlfield tag="008">140922uuuuuuuuuxx-uu-|------u|----|eng-d</controlfield>
  <datafield tag="010" ind1="" ind2="">
   <subfield code="a">01015589</subfield>
  </datafield>
  <datafield tag="245" ind1="" ind2="0">
   <subfield code="a">Publishers weekly</subfield>
  </datafield>
  <datafield tag="260" ind1="" ind2="">
   <subfield code="a">New York, NY</subfield>
   <subfield code="b">Reed Business Information</subfield>
  </datafield>
  <datafield tag="022" ind1="" ind2="">
   <subfield code="a">0000-0019</subfield>
  </datafield>
  <datafield tag="776" ind1="" ind2="">
   <subfield code="x">2150-4008</subfield>
  </datafield>
  <datafield tag="090" ind1="" ind2="">
   <subfield code="a">954921332001</subfield>
  </datafield>
  <datafield tag="866" ind1="" ind2="">
   <subfield code="a">Available from 1997. </subfield>
   <subfield code="s">1000000000001224</subfield>
   <subfield code="t">1000000000000630</subfield>
   <subfield code="x">EBSCOhost Business Source Complete:Full Text</subfield>
   <subfield code="z">1000000000125212</subfield>
  </datafield>
 </record>

 ....more records...
  </collection>

我想做两次操作:

1)在“/ collection / record / datafield [\ @ tag ='866'] / subfield [\ @ code ='a']”中的精确位置添加单个costant线(具有costant内容)。   一句话,I.E。,

<datafield tag="866" ind1="" ind2="">
   <subfield code="a">Available from 1997. </subfield>
   <subfield code="s">1000000000001224</subfield>
   <subfield code="t">1000000000000630</subfield>
   <subfield code="x">EBSCOhost Business Source Complete:Full Text</subfield>
   <subfield code="z">1000000000125212</subfield>
  </datafield>

应转换为:

  <datafield tag="866" ind1="" ind2="">
   <subfield code="a">Available from 1997. </subfield>
   ****add the following line with "code" attribute in alphabetical order, after "a" and before "s"****
    <subfield code="i">DEFAULT</subfield>
   <subfield code="s">1000000000001224</subfield>
   <subfield code="t">1000000000000630</subfield>
   <subfield code="x">EBSCOhost Business Source Complete:Full Text</subfield>
   <subfield code="z">10000000value 00125212</subfield>
  </datafield>

2)找到ALL和ONLY记录标题(它是/ collection / record / datafield [\ @ tag ='245'] / subfield [\ @ code ='a']的内容):

a)“/ collection / record / datafield [\ @ tag ='866'] / subfield [\ @ code ='x']”等于“Elsevier SD Freedom Collection:Full Text”的值   b)“/ collection / record / datafield [\ @ tag ='866'] / subfield [\ @ code ='a']”完全不存在,或者 - 如果存在 - 是空的.I.E。:

 <datafield tag="866" ind1="" ind2="">
   <subfield code="s">1000000000000992</subfield>
   <subfield code="t">1000000000000473</subfield>
   <subfield code="x">Elsevier SD Freedom Collection:Full Text</subfield>
   <subfield code="z">1000000000043233</subfield>
  </datafield>

OR

<datafield tag="866" ind1="" ind2="">
   <subfield code="a"></subfield>
   <subfield code="s">1000000000000992</subfield>
   <subfield code="t">1000000000000473</subfield>
   <subfield code="x">Elsevier SD Freedom Collection:Full Text</subfield>
   <subfield code="z">1000000000043233</subfield>
  </datafield> 

非常感谢您的回复,

fabianope

1 个答案:

答案 0 :(得分:1)

对于第一个,这是一个天真的&#34;解决方案(即它将整个文档加载到内存中,如果需要,可以使用twig_roots来避免这种情况):

#!/usr/bin/perl

use strict;
use warnings;

use XML::Twig;

my $coll= "colls.xml";

my $tag_nb= 866;
my $new_subfield= { code => 'i', content => 'DEFAULT' }; 

my $trigger= qq{datafield[\@tag="$tag_nb"]};

my $t= XML::Twig->new( twig_handlers => { 
                          $trigger => sub{ add_subfied( @_, $new_subfield); }
                          },
                       pretty_print => 'indented',
                     )
               ->parsefile( $coll)
               ->print;

sub add_subfied {
    my( $t, $datafield, $subfield)= @_;
    $datafield->insert_new_elt( first_child => subfield 
                                               => { code => $subfield->{code}, },
                                                  $subfield->{content}
                              );
    $datafield->sort_children_on_att( 'code');
}