我们如何用分隔符分隔从XML :: LibXMl的findvalues中获取的值?

时间:2015-01-07 23:35:06

标签: perl

我有一个要解析的XML。虽然我能够获取它们,但我无法通过分隔符将它们分开以进行进一步处理。请指教。我的代码如下

use XML::LibXML;

my $filename = 'Test.xml';

my $parser = XML::LibXML->new();
my $dom = $parser->parse_file($filename);
my $root = $dom->documentElement();
my $xpc = XML::LibXML::XPathContext->new($root);

foreach my $id ($xpc->findnodes('/dataset/chapter'))
{
    print $xpc->findvalue('mono/route-list', $id);
    print join ",", $xpc->findvalue('mono/route-list', $id);
}

对于" print"虽然预期的结果是:

,但我得到了相同的结果 眼科口服外用鼻腔注射口服口服口服

眼科,口服,局部,鼻腔,注射,口服,口服,口服,口服

xml文件结构如下:

<dataset id="5"><title>NDC 11</title>
<chapter id="9"><title>NDC 11</title>
<mono id="310694" mid="145787">
<nam>00173074200</nam>
<route-list>
    <list-set-field dbId="25413">
        <name>ophthalmic</name>
    </list-set-field>
</route-list>   
</mono>
<mono id="4128683" mid="536890">
<nam>51079020406</nam>
<route-list>
    <list-set-field dbId="25413">
        <name>oral</name>
    </list-set-field>
</route-list>
</mono>
<mono id="4128743" mid="536930">
<nam>65862007360</nam>
<route-list>
    <list-set-field dbId="25413">
        <name>topical</name>
    </list-set-field></route-list>
</mono>
<mono id="3419599" mid="469070">
<nam>49702021718</nam>
<route-list>
    <list-set-field dbId="25413">
        <name>nasal</name>
    </list-set-field>
</route-list>
</mono>
<mono id="2990346" mid="440470">
<nam>49702022118</nam>
<route-list>
    <list-set-field dbId="25413">
        <name>injection</name>
    </list-set-field>
</route-list>
</mono>
<mono id="2990347" mid="440470">
<nam>49702022144</nam>
<route-list>
    <list-set-field dbId="25413">
        <name>oral</name>
    </list-set-field>
</route-list>
</mono>
<mono id="2990357" mid="440491">
<nam>49702022248</nam>
<route-list>
    <list-set-field dbId="25413">
        <name>oral</name>
    </list-set-field>
</route-list>
</mono>
<mono id="3808911" mid="513570">
<nam>00378410591</nam>
<route-list>
    <list-set-field dbId="25413">
        <name>oral</name>
    </list-set-field>
</route-list>
</mono>
<mono id="4128724" mid="536910">
<nam>60505358306</nam>
<route-list>
    <list-set-field dbId="25413">
        <name>oral</name>
    </list-set-field>
</route-list>
</mono>
</chapter>
</dataset>

1 个答案:

答案 0 :(得分:1)

如果您尝试此代码(请注意for循环中的最后一行):

use strict;
use warnings;
use 5.016;
use XML::LibXML;

my $filename = 'Test.xml';

my $dom = XML::LibXML->load_xml(
    location => $filename,
);

my $xpc = XML::LibXML::XPathContext->new($dom);

CHAPTER:
for my $chapter ($xpc->findnodes('/dataset/chapter')) {
    my $string = $xpc->findvalue('mono/route-list', $chapter);
    print $string;

    last CHAPTER;  #<*****NOTE THIS
}

您将获得输出:

          ophthalmic



      oral



          topical



       nasal



       injection



       oral



       oral



       oral



       oral

文档说:

  

findvalue()

     

...返回结果的字面值。

结果不仅仅是一个结果。并且一个结果 all 匹配标记之间的文本

xml在每行的末尾都有一个隐藏字符:

  <route-list>\n
    <list-set-field dbId="25413">\n
        <name>ophthalmic</name>\n
    </list-set-field>\n
  </route-list>\n  

...以及每行开头的几个空格/制表符。空格/制表符和换行符被视为文本,它们位于<route_list>标记之间。因此, one 结果的文本也包含所有空格/制表符/换行符。

findvalue()将所有结果中的文本作为一个字符串返回。您可以将该字符串与正则表达式分开以获取各个值;但不是为自己创造更多的工作,你可以这样做:

CHAPTER:
for my $chapter ($xpc->findnodes('/dataset/chapter')) {
    for my $name ($xpc->findnodes('//mono/route-list//name', $chapter)) {
        say $name->textContent;
        last CHAPTER;
    }
}

--output:--
ophthalmic

......甚至是这个:

CHAPTER:
for my $chapter ($xpc->findnodes('/dataset/chapter')) {
    for my $name_text ($xpc->findnodes('//mono/route-list//name/text()', $chapter)) {
        say $name_text;
        last CHAPTER;
    }
}