不从R中的XML包中检索节点

时间:2015-09-29 15:28:34

标签: xml r xpath

我使用 XML 包从R文件中提取节点。

不幸的是,我得到的印象是R不识别单个标签,即此结构中的标签

<event currentPlaytime="600000" score_away="0"
score_home="0" tickerstateid="1" tickerstate="Not started"
minute="10" event_code="Players introduction" event_code_id="171"
event_number="5" matchid="210941"/> 

我正在使用例如

doc <- xmlInternalTreeParse(file, encoding = "UTF-8")
xpathApply(doc, "event", xmlGetAttr, "matchid")

但我没有结果。

对于文件以XML结尾的普通<\match>,一切正常。

完整的文件是:

<?xml version='1.0' encoding='UTF-8'?>
<event_list status="event" replytype="error" timestamp="1441886226356" xmlns="http://rball.com/eventpusher/data/xmltcpbeans">
<event matchid="269679" event_number="0" event_code_id="514" event_code="Scout in stadium" timestamp="1357486150967" minute="10" tickerstate="Not started" tickerstateid="1" score_home="0" score_away="0" currentPlaytime="600000" clockRunning="false"/>
<event matchid="269679" event_number="1" event_code_id="517" event_code="Transmission online" timestamp="1357486166310" minute="10" tickerstate="Not started" tickerstateid="1" score_home="0" score_away="0" currentPlaytime="600000" clockRunning="false"/>
</event_list>

1 个答案:

答案 0 :(得分:1)

更改xpath表达式。我没有您的实际数据,但如果我运行您的代码

if($xml->nodeType == XMLReader::ELEMENT && $xml->localName ==  'VehicleRemarketingBoat'){
         $node = simplexml_import_dom($doc->importNode($xml->expand(), true));
         $content['MakeString'] = $node->MakeString;

    }
if($xml->nodeType == XMLReader::ELEMENT && $xml->localName ==  'VehicleRemarketingEngine'){
         $node = simplexml_import_dom($doc->importNode($xml->expand(), true));
         $content['EngineMake'] = $node->MakeString;
         $PreOwnedData[] = $content;
         var_dump($PreOwnedData);
    }

然后它打印

library(XML)
q <- '<event  currentPlaytime="600000" score_away="0" score_home="0" tickerstateid="1" tickerstate="Not started" minute="10" event_code="Players introduction" event_code_id="171" event_number="5" matchid="210941"/>'
doc <- xmlInternalTreeParse(q , encoding = "UTF-8")
xpathApply(doc, "/event", xmlGetAttr, "matchid")

(注意xpath为> xpathApply(doc, "/event", xmlGetAttr, "matchid") [[1]] [1] "210941" ,即以/event

开头

同样,对于两个节点(现在包含在/节点中):

root

将打印

library(XML)
q <- '<event  currentPlaytime="600000" score_away="0" score_home="0" tickerstateid="1" tickerstate="Not started" minute="10" event_code="Players introduction" event_code_id="171" event_number="5" matchid="210941"/>'
q <- paste('<root>',q,'<event matchid="210942" />','</root>')
doc <- xmlInternalTreeParse(q , encoding = "UTF-8")
xpathApply(doc, "/root/event", xmlGetAttr, "matchid")

更新2016年3月9日:您的XML现在已定义了命名空间,因此您需要使用它:

> xpathApply(doc, "/root/event", xmlGetAttr, "matchid")
[[1]]
[1] "210941"

[[2]]
[1] "210942"

产生

library(XML)

q <- '<?xml version="1.0" encoding="UTF-8"?>
<event_list status="event" replytype="error" timestamp="1441886226356" xmlns="http://rball.com/eventpusher/data/xmltcpbeans">
<event matchid="269679" event_number="0" event_code_id="514" event_code="Scout in stadium" timestamp="1357486150967" minute="10" tickerstate="Not started" tickerstateid="1" score_home="0" score_away="0" currentPlaytime="600000" clockRunning="false"/>
<event matchid="269679" event_number="1" event_code_id="517" event_code="Transmission online" timestamp="1357486166310" minute="10" tickerstate="Not started" tickerstateid="1" score_home="0" score_away="0" currentPlaytime="600000" clockRunning="false"/>
</event_list>'

ns <- c(ns="http://rball.com/eventpusher/data/xmltcpbeans")
doc <- xmlInternalTreeParse(q , encoding = "UTF-8")
xpathApply(doc, "/ns:event_list/ns:event", xmlGetAttr  , "matchid", namespaces = ns)