Question

我正在尝试使用R从XML文件中解析出信息。每个文件都可以包含模型记录，我想最终得到一个表示这些记录的对象列表。

以this file为例，我想应用一个函数来表示每个PubmedArticle下的节点。当我尝试使用xpathApply库中的XML执行此操作时，每个记录都包含来自文件中每个发布文章的信息（而不是仅将该函数应用于给定PubmedArticle）。一个简单的例子来说明：

library(XML)
library(RCurl)

raw_record <- getURI("http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?&db=pubmed&id=20203609,11959827,19409887&rettype=xml")
parsed <- xmlTreeParse(raw_record, useInternalNodes=TRUE)

get_title <- function(node) xpathApply(node, "//ArticleTitle", xmlValue)
xpathApply(parsed, "/PubmedArticleSet/PubmedArticle", get_title)
#[[1]]
#[[1]][[1]]
#[1] "Changes in Hox genes' structure and function during the evolution of the squamate body plan."
#
#[[1]][[2]]
#[1] "Cdx1 and Cdx2 have overlapping functions in anteroposterior patterning and     
# posterior axis elongation."
#
#[[1]][[3]]
#[1] "Axial patterning in snakes and caecilians: evidence for an alternative         interpretation of the Hox code."
#
#
#[[2]]
#[[2]][[1]]
#[1] "Changes in Hox genes' structure and function during the evolution of the squamate     body plan."
#
#[[2]][[2]]
#[1] "Cdx1 and Cdx2 have overlapping functions in anteroposterior patterning and posterior axis elongation."
#[SNIP]

从xpathApply或getNodeSet创建的每个节点中提取仅信息的正确方法是什么？

Answer 1

您只想在get_title函数中使用相对路径尝试

get_title <- function(node) xpathApply(node, ".//ArticleTitle", xmlValue)
titles<-xpathApply(parsed, "/PubmedArticleSet/PubmedArticle", get_title)
unlist(titles)

.//表示它将开始查看当前节点下方的任何位置。这会给你

[1] "Changes in Hox genes' structure and function during the evolution of the squamate body plan."          
[2] "Cdx1 and Cdx2 have overlapping functions in anteroposterior patterning and posterior axis elongation." 
[3] "Axial patterning in snakes and caecilians: evidence for an alternative interpretation of the Hox code."

将函数应用于R中的xmlNodeList（不是整个xml文件）

1 个答案: