将函数应用于R中的xmlNodeList(不是整个xml文件)

时间:2014-06-29 00:12:29

标签: xml r xpath xml-parsing

我正在尝试使用R从XML文件中解析出信息。每个文件都可以包含模型记录,我想最终得到一个表示这些记录的对象列表。

this file为例,我想应用一个函数来表示每个PubmedArticle下的节点。当我尝试使用xpathApply库中的XML执行此操作时,每个记录都包含来自文件中每个发布文章的信息(而不是仅将该函数应用于给定PubmedArticle)。一个简单的例子来说明:

library(XML)
library(RCurl)

raw_record <- getURI("http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?&db=pubmed&id=20203609,11959827,19409887&rettype=xml")
parsed <- xmlTreeParse(raw_record, useInternalNodes=TRUE)

get_title <- function(node) xpathApply(node, "//ArticleTitle", xmlValue)
xpathApply(parsed, "/PubmedArticleSet/PubmedArticle", get_title)
#[[1]]
#[[1]][[1]]
#[1] "Changes in Hox genes' structure and function during the evolution of the squamate body plan."
#
#[[1]][[2]]
#[1] "Cdx1 and Cdx2 have overlapping functions in anteroposterior patterning and     
# posterior axis elongation."
#
#[[1]][[3]]
#[1] "Axial patterning in snakes and caecilians: evidence for an alternative         interpretation of the Hox code."
#
#
#[[2]]
#[[2]][[1]]
#[1] "Changes in Hox genes' structure and function during the evolution of the squamate     body plan."
#
#[[2]][[2]]
#[1] "Cdx1 and Cdx2 have overlapping functions in anteroposterior patterning and posterior axis elongation."
#[SNIP]

xpathApplygetNodeSet创建的每个节点中提取信息的正确方法是什么?

1 个答案:

答案 0 :(得分:2)

您只想在get_title函数中使用相对路径尝试

get_title <- function(node) xpathApply(node, ".//ArticleTitle", xmlValue)
titles<-xpathApply(parsed, "/PubmedArticleSet/PubmedArticle", get_title)
unlist(titles)

.//表示它将开始查看当前节点下方的任何位置。这会给你

[1] "Changes in Hox genes' structure and function during the evolution of the squamate body plan."          
[2] "Cdx1 and Cdx2 have overlapping functions in anteroposterior patterning and posterior axis elongation." 
[3] "Axial patterning in snakes and caecilians: evidence for an alternative interpretation of the Hox code."