我有一个要解析的XML。虽然我能够获取它们,但我无法通过分隔符将它们分开以进行进一步处理。请指教。我的代码如下
use XML::LibXML;
my $filename = 'Test.xml';
my $parser = XML::LibXML->new();
my $dom = $parser->parse_file($filename);
my $root = $dom->documentElement();
my $xpc = XML::LibXML::XPathContext->new($root);
foreach my $id ($xpc->findnodes('/dataset/chapter'))
{
print $xpc->findvalue('mono/route-list', $id);
print join ",", $xpc->findvalue('mono/route-list', $id);
}
对于" print"虽然预期的结果是:
,但我得到了相同的结果 眼科口服外用鼻腔注射口服口服口服眼科,口服,局部,鼻腔,注射,口服,口服,口服,口服
xml文件结构如下:
<dataset id="5"><title>NDC 11</title>
<chapter id="9"><title>NDC 11</title>
<mono id="310694" mid="145787">
<nam>00173074200</nam>
<route-list>
<list-set-field dbId="25413">
<name>ophthalmic</name>
</list-set-field>
</route-list>
</mono>
<mono id="4128683" mid="536890">
<nam>51079020406</nam>
<route-list>
<list-set-field dbId="25413">
<name>oral</name>
</list-set-field>
</route-list>
</mono>
<mono id="4128743" mid="536930">
<nam>65862007360</nam>
<route-list>
<list-set-field dbId="25413">
<name>topical</name>
</list-set-field></route-list>
</mono>
<mono id="3419599" mid="469070">
<nam>49702021718</nam>
<route-list>
<list-set-field dbId="25413">
<name>nasal</name>
</list-set-field>
</route-list>
</mono>
<mono id="2990346" mid="440470">
<nam>49702022118</nam>
<route-list>
<list-set-field dbId="25413">
<name>injection</name>
</list-set-field>
</route-list>
</mono>
<mono id="2990347" mid="440470">
<nam>49702022144</nam>
<route-list>
<list-set-field dbId="25413">
<name>oral</name>
</list-set-field>
</route-list>
</mono>
<mono id="2990357" mid="440491">
<nam>49702022248</nam>
<route-list>
<list-set-field dbId="25413">
<name>oral</name>
</list-set-field>
</route-list>
</mono>
<mono id="3808911" mid="513570">
<nam>00378410591</nam>
<route-list>
<list-set-field dbId="25413">
<name>oral</name>
</list-set-field>
</route-list>
</mono>
<mono id="4128724" mid="536910">
<nam>60505358306</nam>
<route-list>
<list-set-field dbId="25413">
<name>oral</name>
</list-set-field>
</route-list>
</mono>
</chapter>
</dataset>
答案 0 :(得分:1)
如果您尝试此代码(请注意for循环中的最后一行):
use strict;
use warnings;
use 5.016;
use XML::LibXML;
my $filename = 'Test.xml';
my $dom = XML::LibXML->load_xml(
location => $filename,
);
my $xpc = XML::LibXML::XPathContext->new($dom);
CHAPTER:
for my $chapter ($xpc->findnodes('/dataset/chapter')) {
my $string = $xpc->findvalue('mono/route-list', $chapter);
print $string;
last CHAPTER; #<*****NOTE THIS
}
您将获得输出:
ophthalmic
oral
topical
nasal
injection
oral
oral
oral
oral
文档说:
findvalue()
...返回结果的字面值。
结果不仅仅是一个结果。并且一个结果 all 匹配标记之间的文本。
xml在每行的末尾都有一个隐藏字符:
<route-list>\n
<list-set-field dbId="25413">\n
<name>ophthalmic</name>\n
</list-set-field>\n
</route-list>\n
...以及每行开头的几个空格/制表符。空格/制表符和换行符被视为文本,它们位于<route_list>
标记之间。因此, one 结果的文本也包含所有空格/制表符/换行符。
findvalue()将所有结果中的文本作为一个字符串返回。您可以将该字符串与正则表达式分开以获取各个值;但不是为自己创造更多的工作,你可以这样做:
CHAPTER:
for my $chapter ($xpc->findnodes('/dataset/chapter')) {
for my $name ($xpc->findnodes('//mono/route-list//name', $chapter)) {
say $name->textContent;
last CHAPTER;
}
}
--output:--
ophthalmic
......甚至是这个:
CHAPTER:
for my $chapter ($xpc->findnodes('/dataset/chapter')) {
for my $name_text ($xpc->findnodes('//mono/route-list//name/text()', $chapter)) {
say $name_text;
last CHAPTER;
}
}