我开始用XML :: Twig学习XML解析,我参与了两个小问题。 我的xml(书目记录集合)具有以下结构:
<?xml version="1.0" encoding="UTF-8"?>
<collection xmlns="http://www.loc.gov/MARC21/slim">
<!-- FIRST INCREMENTAL -->
<!-- INSTANCE:sfxudn -->
<record>
<leader>-----nas-a2200000z--4500</leader>
<controlfield tag="008">140922uuuuuuuuuxx-uu-|------u|----|eng-d</controlfield>
<datafield tag="010" ind1="" ind2="">
<subfield code="a">01015589</subfield>
</datafield>
<datafield tag="245" ind1="" ind2="0">
<subfield code="a">Publishers weekly</subfield>
</datafield>
<datafield tag="260" ind1="" ind2="">
<subfield code="a">New York, NY</subfield>
<subfield code="b">Reed Business Information</subfield>
</datafield>
<datafield tag="022" ind1="" ind2="">
<subfield code="a">0000-0019</subfield>
</datafield>
<datafield tag="776" ind1="" ind2="">
<subfield code="x">2150-4008</subfield>
</datafield>
<datafield tag="090" ind1="" ind2="">
<subfield code="a">954921332001</subfield>
</datafield>
<datafield tag="866" ind1="" ind2="">
<subfield code="a">Available from 1997. </subfield>
<subfield code="s">1000000000001224</subfield>
<subfield code="t">1000000000000630</subfield>
<subfield code="x">EBSCOhost Business Source Complete:Full Text</subfield>
<subfield code="z">1000000000125212</subfield>
</datafield>
</record>
....more records...
</collection>
我想做两次操作:
1)在“/ collection / record / datafield [\ @ tag ='866'] / subfield [\ @ code ='a']”中的精确位置添加单个costant线(具有costant内容)。 一句话,I.E。,
<datafield tag="866" ind1="" ind2="">
<subfield code="a">Available from 1997. </subfield>
<subfield code="s">1000000000001224</subfield>
<subfield code="t">1000000000000630</subfield>
<subfield code="x">EBSCOhost Business Source Complete:Full Text</subfield>
<subfield code="z">1000000000125212</subfield>
</datafield>
应转换为:
<datafield tag="866" ind1="" ind2="">
<subfield code="a">Available from 1997. </subfield>
****add the following line with "code" attribute in alphabetical order, after "a" and before "s"****
<subfield code="i">DEFAULT</subfield>
<subfield code="s">1000000000001224</subfield>
<subfield code="t">1000000000000630</subfield>
<subfield code="x">EBSCOhost Business Source Complete:Full Text</subfield>
<subfield code="z">10000000value 00125212</subfield>
</datafield>
2)找到ALL和ONLY记录标题(它是/ collection / record / datafield [\ @ tag ='245'] / subfield [\ @ code ='a']的内容):
a)“/ collection / record / datafield [\ @ tag ='866'] / subfield [\ @ code ='x']”等于“Elsevier SD Freedom Collection:Full Text”的值 b)“/ collection / record / datafield [\ @ tag ='866'] / subfield [\ @ code ='a']”完全不存在,或者 - 如果存在 - 是空的.I.E。:
<datafield tag="866" ind1="" ind2="">
<subfield code="s">1000000000000992</subfield>
<subfield code="t">1000000000000473</subfield>
<subfield code="x">Elsevier SD Freedom Collection:Full Text</subfield>
<subfield code="z">1000000000043233</subfield>
</datafield>
OR
<datafield tag="866" ind1="" ind2="">
<subfield code="a"></subfield>
<subfield code="s">1000000000000992</subfield>
<subfield code="t">1000000000000473</subfield>
<subfield code="x">Elsevier SD Freedom Collection:Full Text</subfield>
<subfield code="z">1000000000043233</subfield>
</datafield>
非常感谢您的回复,
fabianope
答案 0 :(得分:1)
对于第一个,这是一个天真的&#34;解决方案(即它将整个文档加载到内存中,如果需要,可以使用twig_roots
来避免这种情况):
#!/usr/bin/perl
use strict;
use warnings;
use XML::Twig;
my $coll= "colls.xml";
my $tag_nb= 866;
my $new_subfield= { code => 'i', content => 'DEFAULT' };
my $trigger= qq{datafield[\@tag="$tag_nb"]};
my $t= XML::Twig->new( twig_handlers => {
$trigger => sub{ add_subfied( @_, $new_subfield); }
},
pretty_print => 'indented',
)
->parsefile( $coll)
->print;
sub add_subfied {
my( $t, $datafield, $subfield)= @_;
$datafield->insert_new_elt( first_child => subfield
=> { code => $subfield->{code}, },
$subfield->{content}
);
$datafield->sort_children_on_att( 'code');
}