进程在XML中引用字符串

时间:2012-01-24 14:26:40

标签: perl

Perl版本:为x86_64-linux-thread-multi构建的perl,v5.10.1(*)

我是perl的相对新手。我试过看Perl的各种XML处理实用程序,XML :: Simple,XML :: Parser,XML :: LibXML,XML :: DOM,XML :: XML :: Twig,XML :: XPath等。

我正在尝试处理一些在值部分中有引号的XML。我特意想从下面的XML中提取标题,但是,我现在已经磕磕绊绊了一下,如果可能的话,我会很感激。

$VAR1 = {
   'issue' => {
       'priority' => {
             'fid' => '11',
             'content' => '3 - Best Effort'
           },
       'transNum' => {
             'fid' => '2',
             'content' => '170'
           },
       'dueDate' => {
             'fid' => '17',
             'content' => '1327944695'
           },
       'status' => {
             'fid' => '18',
             'content' => 'Open - Unassigned'
           },
       'createdBy' => {
             'fid' => '15',
             'content' => '32'
           },
       'title' => {
             'fid' => '20',
             'content' => 'Testing on spider - issue with "quotation marks"'
           },
       'description' => {
             'fid' => '22',
             'content' => 'Noticed issue with title having quotes in title'
           },
       'issueNum' => {
             'fid' => '1',
             'content' => '33'
           }
   }
};

使用XML :: LibXML和以下代码(注意:上面打印$ issueXML变量的内容):

my $parser = XML::LibXML->new();
my $doc = $parser->parse_string($issueXML);
print $doc->toString;

打印出来:

<?xml version="1.0" encoding="utf-8"?>
<issues>
 <issue>
   <issueNum fid="1">33</issueNum>
   <transNum fid="2">170</transNum>
   <createdBy fid="15">32</createdBy>
   <status fid="18">Open - Unassigned</status>
   <title fid="20">Testing on spider - issue with "quotation marks"</title>
   <priority fid="11">3 - Best Effort</priority>
   <description fid="22">Noticed issue with submission of Documentation issue #40 on accurev with quotes in title. </description>
  <dueDate fid="17">1327944695</dueDate>
 </issue>
</issues>

我希望专门为标题标记提取值。 当我使用XML :: Parser进行处理时,我最终只得到了最终的引号。我想保持显示的字符串格式相同:
测试蜘蛛 - 问题与“引号”

目前我对各种XML处理功能感到不知所措。我已经尝试了一段时间来解决这个问题,我正在认真地转动轮子。

TIA,感谢任何帮助,

此致 斯科特

4 个答案:

答案 0 :(得分:2)

我不确定您使用引号遇到了什么问题。它们只是一个与其他任何字符一样的字符,除非在属性值中,如果引用已用作值分隔符,则可能必须使用实体。你确定“问题”不仅仅是Data :: Dumper显示XML :: Simple生成的数据结构吗?

在任何情况下,远离XML :: Parser,它太低级,使用XML :: LibXML或XML :: Twig。 XML :: Simple似乎产生了一个很多的问题,尤其是那些不熟悉Perl的人,所以我不确定它是否适合使用。

以下是使用XML :: Twig的解决方案,但还有其他方法可以执行此操作,具体取决于您要对标题执行的操作。

#!/usr/bin/perl

use strict;
use warnings;

use XML::Twig;

my $issueXML=q{<?xml version="1.0" encoding="utf-8"?>
<issues>
 <issue>
   <issueNum fid="1">33</issueNum>
   <transNum fid="2">170</transNum>
   <createdBy fid="15">32</createdBy>
   <status fid="18">Open - Unassigned</status>
   <title fid="20">Testing on spider - issue with "quotation marks"</title>
   <priority fid="11">3 - Best Effort</priority>
   <description fid="22">Noticed issue with submission of Documentation issue #40 on accurev with quotes in title. </description>
  <dueDate fid="17">1327944695</dueDate>
 </issue>
</issues>
};

my $t= XML::Twig->new( twig_handlers => { title => sub { print $_->text, "\n"; } })
                ->parse( $issueXML);

答案 1 :(得分:2)

另一个XML::LibXML。文本节点内的引号应该没有问题。

#!/usr/bin/perl
use strict;
use warnings;
use XML::LibXML;
use Data::Dumper;

my $xml = XML::LibXML->load_xml(string => q{<?xml version="1.0" encoding="utf-8"?>
<issues>
 <issue>
   <issueNum fid="1">33</issueNum>
   <transNum fid="2">170</transNum>
   <createdBy fid="15">32</createdBy>
   <status fid="18">Open - Unassigned</status>
   <title fid="20">Testing on spider - issue with "quotation marks"</title>
   <priority fid="11">3 - Best Effort</priority>
   <description fid="22">Noticed issue with submission of Documentation issue #40 on accurev with quotes in title. </description>
  <dueDate fid="17">1327944695</dueDate>
 </issue>
</issues>
});

my $title = $xml->find('/issues/issue/title');
print $title->get_node(0)->textContent;

答案 2 :(得分:0)

我通常使用XML::XSH2进行XML操作。您的问题简化为:

open FILE.xml ;
for //title echo (.) ;

答案 3 :(得分:0)

从XML中提取位的最佳方法是使用XPath查询。

在这种情况下,您正在元素'问题'中找到元素'title',在元素'issues'中。

因此,您的XPath查询只是'// issues / issue / title'。

在两行代码中,您可以使用XML :: LibXML :: XPathContext为您执行XPath查询,这将返回您要查找的元素内容。

此代码段将演示执行XPath查询的简单方法。其中重要的一点是评论“此处相关位”之后的两行。

有关详细信息,请参阅the documentation for XML::LibXML::XPathContext

#!/usr/bin/perl
use strict;
use warnings;
use XML::LibXML;

my $xml = XML::LibXML->load_xml(string => q{<?xml version="1.0" encoding="utf-8"?>
<issues>
 <issue>
   <issueNum fid="1">33</issueNum>
   <transNum fid="2">170</transNum>
   <createdBy fid="15">32</createdBy>
   <status fid="18">Open - Unassigned</status>
   <title fid="20">Testing on spider - issue with "quotation marks"</title>
   <priority fid="11">3 - Best Effort</priority>
   <description fid="22">Noticed issue with submission of Documentation issue #40 on accurev with quotes in title. </description>
  <dueDate fid="17">1327944695</dueDate>
 </issue>
</issues>
});

# Relevant bit here
my $xc = XML::LibXML::XPathContext->new($xml);
my $title = $xc->find('//issues/issue/title');
print "$title\n";

# prints:
# Testing on spider - issue with "quotation marks"