我正在尝试使用R从XML文件中解析出信息。每个文件都可以包含模型记录,我想最终得到一个表示这些记录的对象列表。
以this file为例,我想应用一个函数来表示每个PubmedArticle
下的节点。当我尝试使用xpathApply
库中的XML
执行此操作时,每个记录都包含来自文件中每个发布文章的信息(而不是仅将该函数应用于给定PubmedArticle
)。一个简单的例子来说明:
library(XML)
library(RCurl)
raw_record <- getURI("http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?&db=pubmed&id=20203609,11959827,19409887&rettype=xml")
parsed <- xmlTreeParse(raw_record, useInternalNodes=TRUE)
get_title <- function(node) xpathApply(node, "//ArticleTitle", xmlValue)
xpathApply(parsed, "/PubmedArticleSet/PubmedArticle", get_title)
#[[1]]
#[[1]][[1]]
#[1] "Changes in Hox genes' structure and function during the evolution of the squamate body plan."
#
#[[1]][[2]]
#[1] "Cdx1 and Cdx2 have overlapping functions in anteroposterior patterning and
# posterior axis elongation."
#
#[[1]][[3]]
#[1] "Axial patterning in snakes and caecilians: evidence for an alternative interpretation of the Hox code."
#
#
#[[2]]
#[[2]][[1]]
#[1] "Changes in Hox genes' structure and function during the evolution of the squamate body plan."
#
#[[2]][[2]]
#[1] "Cdx1 and Cdx2 have overlapping functions in anteroposterior patterning and posterior axis elongation."
#[SNIP]
从xpathApply
或getNodeSet
创建的每个节点中提取仅信息的正确方法是什么?
答案 0 :(得分:2)
您只想在get_title
函数中使用相对路径尝试
get_title <- function(node) xpathApply(node, ".//ArticleTitle", xmlValue)
titles<-xpathApply(parsed, "/PubmedArticleSet/PubmedArticle", get_title)
unlist(titles)
.//
表示它将开始查看当前节点下方的任何位置。这会给你
[1] "Changes in Hox genes' structure and function during the evolution of the squamate body plan."
[2] "Cdx1 and Cdx2 have overlapping functions in anteroposterior patterning and posterior axis elongation."
[3] "Axial patterning in snakes and caecilians: evidence for an alternative interpretation of the Hox code."