如何获取相同标签的内部标签内容值,该内容标签内容值显示在单个标签内而没有换行符
<BOOK-REF ID="Kyle-ch001-bib036"><AUTHOR-REF><SURNAME>Neinstein</SURNAME>, <GIVEN-NAME>L. S.</GIVEN-NAME></AUTHOR-REF>, <AUTHOR-REF><SURNAME>Gordon</SURNAME>, <GIVEN-NAME>C. G.</GIVEN-NAME></AUTHOR-REF>, <AUTHOR-REF><SURNAME>Katzman</SURNAME>, <GIVEN-NAME>D.</GIVEN-NAME></AUTHOR-REF>, <AUTHOR-REF><SURNAME>Rosen</SURNAME>, <GIVEN-NAME>D.</GIVEN-NAME></AUTHOR-REF>, & <AUTHOR-REF><SURNAME>Woods</SURNAME>, <GIVEN-NAME>E.</GIVEN-NAME></AUTHOR-REF> (<YEAR-REF>2007</YEAR-REF>). <BOOK-TITLE-REF>Adolescent health care: A practical guide</BOOK-TITLE-REF> (<EDITION-REF>5th ed.</EDITION-REF>). <PLACE-OF-PUBLICATION-REF>Philadelphia</PLACE-OF-PUBLICATION-REF>: <PUBLISHER-REF>Lippincott Williams and Wilkins</PUBLISHER-REF>.</BOOK-REF>
我只想在thenametag中获取内容(仅名称),并在book ref标签内显示,surname标签可能出现n次,我希望数组中的内容
my (@arr2);
while ($str =~ /<BOOK-REF ID="([^"]*)">(?:[^\)]*)<SURNAME>(.*?)<\/SURNAME>.*?<YEAR-REF>(\d+\w+)<\/YEAR-REF>.*?<\/BOOK-REF>/sgi){
my $id = $1;
my $sname = $2;
my $year = $3;
push (@arr2,[$id,$sname,$year]);
}
提前致谢
答案 0 :(得分:2)
使用XML :: Twig。我在书籍参考书周围添加了一个包装器books
,以防你在文件中有多个。如果没有它,代码就会一样。
#!/usr/bin/perl
use strict;
use warnings;
use YAML;
use XML::Twig;
my @by_name;
XML::Twig->new( twig_handlers => { 'BOOK-REF' => sub { book_ref( @_, \@by_name); } })
-> parse( \*DATA);
print Dump \@by_name;
sub book_ref
{ my( $t, $bookref, $by_name)= @_;
foreach my $surname ($bookref->descendants( 'SURNAME'))
{ push @$by_name, { name => $surname->text, id => $bookref->att( 'ID'), year => $bookref->field( 'YEAR-REF') }; }
$t->purge; # if the file can be too big to fit in memory
}
__DATA__
<books>
<BOOK-REF ID="Kyle-ch001-bib036"><AUTHOR-REF><SURNAME>Neinstein</SURNAME>, <GIVEN-NAME>L. S.</GIVEN-NAME></AUTHOR-REF>, <AUTHOR-REF><SURNAME>Gordon</SURNAME>, <GIVEN-NAME>C. G.</GIVEN-NAME></AUTHOR-REF>, <AUTHOR-REF><SURNAME>Katzman</SURNAME>, <GIVEN-NAME>D.</GIVEN-NAME></AUTHOR-REF>, <AUTHOR-REF><SURNAME>Rosen</SURNAME>, <GIVEN-NAME>D.</GIVEN-NAME></AUTHOR-REF>, & <AUTHOR-REF><SURNAME>Woods</SURNAME>, <GIVEN-NAME>E.</GIVEN-NAME></AUTHOR-REF> (<YEAR-REF>2007</YEAR-REF>). <BOOK-TITLE-REF>Adolescent health care: A practical guide</BOOK-TITLE-REF> (<EDITION-REF>5th ed.</EDITION-REF>). <PLACE-OF-PUBLICATION-REF>Philadelphia</PLACE-OF-PUBLICATION-REF>: <PUBLISHER-REF>Lippincott Williams and Wilkins</PUBLISHER-REF>.</BOOK-REF>
</books>
答案 1 :(得分:1)
使用XML::XSH2:
#!/usr/bin/perl
use warnings;
use strict;
use Data::Dumper;
use XML::XSH2;
xsh << ' end xsh;';
open 1.xml ;
for //SURNAME {
$y = string(../../YEAR-REF) ;
$s = string(.) ;
$i = string(ancestor::BOOK-REF/@ID) ;
perl { push @arr, [$i, $s, $y] } }
end xsh;
print Dumper \@XML::XSH2::Map::arr;
答案 2 :(得分:0)
使用XPath查询提取您感兴趣的值。这三个XPath查询应该返回您要查找的值:
//BOOK-REF/@ID
//BOOK-REF/AUTHOR-REF/SURNAME
//BOOK-REF/YEAR-REF
要执行XPath查询,请使用XML::LibXML之类的内容。完整的例子:
#!/usr/bin/perl
use strict;
use warnings;
use XML::LibXML;
my $xml = XML::LibXML->load_xml(string => q{<?xml version="1.0" encoding="utf-8"?>
<BOOK-REF ID="Kyle-ch001-bib036"><AUTHOR-REF><SURNAME>Neinstein</SURNAME>, <GIVEN-NAME>L. S.</GIVEN-NAME></AUTHOR-REF>, <AUTHOR-REF><SURNAME>Gordon</SURNAME>, <GIVEN-NAME>C. G.</GIVEN-NAME></AUTHOR-REF>, <AUTHOR-REF><SURNAME>Katzman</SURNAME>, <GIVEN-NAME>D.</GIVEN-NAME></AUTHOR-REF>, <AUTHOR-REF><SURNAME>Rosen</SURNAME>, <GIVEN-NAME>D.</GIVEN-NAME></AUTHOR-REF>, & <AUTHOR-REF><SURNAME>Woods</SURNAME>, <GIVEN-NAME>E.</GIVEN-NAME></AUTHOR-REF> (<YEAR-REF>2007</YEAR-REF>). <BOOK-TITLE-REF>Adolescent health care: A practical guide</BOOK-TITLE-REF> (<EDITION-REF>5th ed.</EDITION-REF>). <PLACE-OF-PUBLICATION-REF>Philadelphia</PLACE-OF-PUBLICATION-REF>: <PUBLISHER-REF>Lippincott Williams and Wilkins</PUBLISHER-REF>.</BOOK-REF>
});
my $xc = XML::LibXML::XPathContext->new($xml);
my $id = $xc->find('//BOOK-REF/@ID');
my @snames = map $_->textContent => $xc->findnodes('//BOOK-REF/AUTHOR-REF/SURNAME');
my $year = $xc->find('//BOOK-REF/YEAR-REF');
print "$id\n";
print join(', ' => @snames), "\n";
print "$year\n";
# prints:
# Kyle-ch001-bib036
# Neinstein, Gordon, Katzman, Rosen, Woods
# 2007
您可以将结果很好地保存在这样的数组中:
push @some_array, +{
id => $id,
snames => \@snames,
year => $year
};
如果您想要遵循原始计划并复制每个sname的ID和年份,那么它就是:
push @arr2, map [ $id, $_, $year ] => @snames;
存储它们的另一种可能有用的方法是在id字段上键入一个哈希值,所以
$some_hash{$id} = +{
id => $id,
snames => \@snames,
year => $year
};