我是XML的新手,我知道在使用大文件时,最好在R中使用xmlEventParse()和getNodeSet()。但是,我的代码运行得很好,我不知道为什么。也许是路径的不正确定义?
可以在此链接上找到与原始文件类似的虚拟XML文件: dummy xml file
我的虚拟R代码是这样的:
library(XML)
FOOid_traverse <- function() {
uids <- c()
refs <- c()
REC <- function(x) {
uid <- xpathSApply(x, "//N8:EntityList/N8:Entity/N2:OrganisationName",
xmlValue)
ref <- xpathSApply(x, "//N8:EntityList/N8:Entity/N5:Identifiers/N5:Identifier/N5:IdentifierElement", xmlValue)
if (length(uid) > 0) {
if (length(ref) == 0) {
uids <<- c(uids, uid)
refs <<- c(refs, NA_character_)
} else {
uids <<- c(uids, rep(uid, length(ref)))
refs <<- c(refs, ref)
}
}
}
list(
REC = REC,
FOOid_df = function() {
data.frame(uid = uids, ref = refs, stringsAsFactors = FALSE)
}
)
}
FOOid_f&lt; - FOOid_traverse()
不可见的( xmlEventParse( file = path.expand(&#34; companies_xml_extract_20170703.xml&#34;), branches = FOOid_f [&#34; REC&#34;]) )
FOOid_f $ FOOid_df()
由于
答案 0 :(得分:0)
希望这有帮助!
library(xml2)
library(dplyr)
xml_doc <- read_xml("test.xml")
OrganisationName <- xml_doc %>%
xml_find_all("//N2:OrganisationName/N2:NameElement", ns=xml_ns(xml_doc)) %>%
xml_text()
IdentifierElement <- xml_doc %>%
xml_find_all("//N5:Identifiers/N5:Identifier/N5:IdentifierElement", ns=xml_ns(xml_doc)) %>%
xml_text()
df <- data.frame(OrganisationName, IdentifierElement)
df
如果它解决了您的问题,请不要告诉我们。)