我想从以下http://www.uniprot.org/uniprot/P43405.xml
下的XML文件中提取数据作为数据帧我只回到空字符串,虽然我认为xpath查询没问题。
library(RCurl)
library(XML)
url <- "http://www.uniprot.org/uniprot/P43405.xml"
urldata <- getURL(url)
xmlfile <- xmlParse(urldata)
# some xpath queries
xmlfile["//entry/comment[@type='function']/text"]
xmlfile["//entry/comment[@type='PTM']/text"]
xpathSApply(xmlfile,"//uniprot/entry",xmlGetAttr, 'dataset')
xpathSApply(xmlfile,"//uniprot/entry",xmlValue)
任何人都可以帮我解决这个问题吗?
谢谢,弗兰克
答案 0 :(得分:1)
缺少命名空间:
library(RCurl)
library(XML)
url <- "http://www.uniprot.org/uniprot/P43405.xml"
urldata <- getURL(url)
xmlfile <- xmlParse(urldata)
getNodeSet(xmlfile, "//entry//comment")
namespaces <- c(ns="http://uniprot.org/uniprot")
getNodeSet(xmlfile, "//ns:entry//ns:comment", namespaces)
getNodeSet(xmlfile, "//ns:entry//ns:comment[@type='PTM']/ns:text", namespaces)
xpathSApply(xmlfile,"//ns:uniprot/ns:entry",xmlGetAttr, 'dataset', namespaces=namespaces)
xpathSApply(xmlfile,"//ns:uniprot/ns:entry",xmlValue, namespaces=namespaces)
参考文献:
?xpathApply
答案 1 :(得分:0)
感谢您的帮助! YE,命名空间丢失了。我添加了一些额外的代码。也许这会帮助其他人熟悉XML。
library(RCurl)
library(XML)
url <- "http://www.uniprot.org/uniprot/P43405.xml"
urldata <- getURL(url)
xmlfile <- xmlParse(urldata)
getNodeSet(xmlfile, "//entry//comment")
# one needs the name space here
namespaces <- c(ns="http://uniprot.org/uniprot")
# extract all comments, make a data frame
comments.uniprot <- getNodeSet(xmlfile, "//ns:entry//ns:comment", namespaces)
comments.dataframe <- as.data.frame(sapply(comments.uniprot, xmlValue))
comments.attributes <- as.data.frame(sapply(comments.uniprot, xmlGetAttr,'type'))
comments.all <- cbind(comments.attributes,comments.dataframe)
# only extract PTM comments
PTMs <- getNodeSet(xmlfile, "//ns:entry//ns:comment[@type='PTM']/ns:text", namespaces)
PTMs2 <- sapply(PTMs, xmlValue)
PTMs2.dataframe <- as.data.frame(PTMs2)
xpathSApply(xmlfile,"//ns:uniprot/ns:entry",xmlGetAttr, 'dataset', namespaces=namespaces)
xpathSApply(xmlfile,"//ns:uniprot/ns:entry/ns:accession",xmlValue, namespaces=namespaces)