Perl版本:为x86_64-linux-thread-multi构建的perl,v5.10.1(*)
我是perl的相对新手。我试过看Perl的各种XML处理实用程序,XML :: Simple,XML :: Parser,XML :: LibXML,XML :: DOM,XML :: XML :: Twig,XML :: XPath等。
我正在尝试处理一些在值部分中有引号的XML。我特意想从下面的XML中提取标题,但是,我现在已经磕磕绊绊了一下,如果可能的话,我会很感激。
$VAR1 = {
'issue' => {
'priority' => {
'fid' => '11',
'content' => '3 - Best Effort'
},
'transNum' => {
'fid' => '2',
'content' => '170'
},
'dueDate' => {
'fid' => '17',
'content' => '1327944695'
},
'status' => {
'fid' => '18',
'content' => 'Open - Unassigned'
},
'createdBy' => {
'fid' => '15',
'content' => '32'
},
'title' => {
'fid' => '20',
'content' => 'Testing on spider - issue with "quotation marks"'
},
'description' => {
'fid' => '22',
'content' => 'Noticed issue with title having quotes in title'
},
'issueNum' => {
'fid' => '1',
'content' => '33'
}
}
};
使用XML :: LibXML和以下代码(注意:上面打印$ issueXML变量的内容):
my $parser = XML::LibXML->new();
my $doc = $parser->parse_string($issueXML);
print $doc->toString;
打印出来:
<?xml version="1.0" encoding="utf-8"?>
<issues>
<issue>
<issueNum fid="1">33</issueNum>
<transNum fid="2">170</transNum>
<createdBy fid="15">32</createdBy>
<status fid="18">Open - Unassigned</status>
<title fid="20">Testing on spider - issue with "quotation marks"</title>
<priority fid="11">3 - Best Effort</priority>
<description fid="22">Noticed issue with submission of Documentation issue #40 on accurev with quotes in title. </description>
<dueDate fid="17">1327944695</dueDate>
</issue>
</issues>
我希望专门为标题标记提取值。
当我使用XML :: Parser进行处理时,我最终只得到了最终的引号。我想保持显示的字符串格式相同:
测试蜘蛛 - 问题与“引号”
目前我对各种XML处理功能感到不知所措。我已经尝试了一段时间来解决这个问题,我正在认真地转动轮子。
TIA,感谢任何帮助,
此致 斯科特
答案 0 :(得分:2)
我不确定您使用引号遇到了什么问题。它们只是一个与其他任何字符一样的字符,除非在属性值中,如果引用已用作值分隔符,则可能必须使用实体。你确定“问题”不仅仅是Data :: Dumper显示XML :: Simple生成的数据结构吗?
在任何情况下,远离XML :: Parser,它太低级,使用XML :: LibXML或XML :: Twig。 XML :: Simple似乎产生了一个很多的问题,尤其是那些不熟悉Perl的人,所以我不确定它是否适合使用。
以下是使用XML :: Twig的解决方案,但还有其他方法可以执行此操作,具体取决于您要对标题执行的操作。
#!/usr/bin/perl
use strict;
use warnings;
use XML::Twig;
my $issueXML=q{<?xml version="1.0" encoding="utf-8"?>
<issues>
<issue>
<issueNum fid="1">33</issueNum>
<transNum fid="2">170</transNum>
<createdBy fid="15">32</createdBy>
<status fid="18">Open - Unassigned</status>
<title fid="20">Testing on spider - issue with "quotation marks"</title>
<priority fid="11">3 - Best Effort</priority>
<description fid="22">Noticed issue with submission of Documentation issue #40 on accurev with quotes in title. </description>
<dueDate fid="17">1327944695</dueDate>
</issue>
</issues>
};
my $t= XML::Twig->new( twig_handlers => { title => sub { print $_->text, "\n"; } })
->parse( $issueXML);
答案 1 :(得分:2)
另一个XML::LibXML。文本节点内的引号应该没有问题。
#!/usr/bin/perl
use strict;
use warnings;
use XML::LibXML;
use Data::Dumper;
my $xml = XML::LibXML->load_xml(string => q{<?xml version="1.0" encoding="utf-8"?>
<issues>
<issue>
<issueNum fid="1">33</issueNum>
<transNum fid="2">170</transNum>
<createdBy fid="15">32</createdBy>
<status fid="18">Open - Unassigned</status>
<title fid="20">Testing on spider - issue with "quotation marks"</title>
<priority fid="11">3 - Best Effort</priority>
<description fid="22">Noticed issue with submission of Documentation issue #40 on accurev with quotes in title. </description>
<dueDate fid="17">1327944695</dueDate>
</issue>
</issues>
});
my $title = $xml->find('/issues/issue/title');
print $title->get_node(0)->textContent;
答案 2 :(得分:0)
我通常使用XML::XSH2进行XML操作。您的问题简化为:
open FILE.xml ;
for //title echo (.) ;
答案 3 :(得分:0)
从XML中提取位的最佳方法是使用XPath查询。
在这种情况下,您正在元素'问题'中找到元素'title',在元素'issues'中。
因此,您的XPath查询只是'// issues / issue / title'。
在两行代码中,您可以使用XML :: LibXML :: XPathContext为您执行XPath查询,这将返回您要查找的元素内容。
此代码段将演示执行XPath查询的简单方法。其中重要的一点是评论“此处相关位”之后的两行。
有关详细信息,请参阅the documentation for XML::LibXML::XPathContext
#!/usr/bin/perl
use strict;
use warnings;
use XML::LibXML;
my $xml = XML::LibXML->load_xml(string => q{<?xml version="1.0" encoding="utf-8"?>
<issues>
<issue>
<issueNum fid="1">33</issueNum>
<transNum fid="2">170</transNum>
<createdBy fid="15">32</createdBy>
<status fid="18">Open - Unassigned</status>
<title fid="20">Testing on spider - issue with "quotation marks"</title>
<priority fid="11">3 - Best Effort</priority>
<description fid="22">Noticed issue with submission of Documentation issue #40 on accurev with quotes in title. </description>
<dueDate fid="17">1327944695</dueDate>
</issue>
</issues>
});
# Relevant bit here
my $xc = XML::LibXML::XPathContext->new($xml);
my $title = $xc->find('//issues/issue/title');
print "$title\n";
# prints:
# Testing on spider - issue with "quotation marks"