Question

我是XML的新手，我知道在使用大文件时，最好在R中使用xmlEventParse（）和getNodeSet（）。但是，我的代码运行得很好，我不知道为什么。也许是路径的不正确定义？

可以在此链接上找到与原始文件类似的虚拟XML文件： dummy xml file

我的虚拟R代码是这样的：

library(XML)

FOOid_traverse <- function() {

uids <- c() 
refs <- c()

REC <- function(x) {

uid <- xpathSApply(x, "//N8:EntityList/N8:Entity/N2:OrganisationName", 
xmlValue)
ref <- xpathSApply(x,   "//N8:EntityList/N8:Entity/N5:Identifiers/N5:Identifier/N5:IdentifierElement", xmlValue)

if (length(uid) > 0) {

  if (length(ref) == 0) {

    uids <<- c(uids, uid)
    refs <<- c(refs, NA_character_)

  } else {

    uids <<- c(uids, rep(uid, length(ref)))
    refs <<- c(refs, ref)

  } 

 } 

}


list(
REC = REC, 
FOOid_df = function() { 
  data.frame(uid = uids, ref = refs, stringsAsFactors = FALSE)
}

）

}

FOOid_f＆lt; - FOOid_traverse（）

不可见的（ xmlEventParse（ file = path.expand（＆＃34; companies_xml_extract_20170703.xml＆＃34;）， branches = FOOid_f [＆＃34; REC＆＃34;]））

FOOid_f $ FOOid_df（）

由于

Answer 1

希望这有帮助！

library(xml2)
library(dplyr)
xml_doc <- read_xml("test.xml")

OrganisationName <- xml_doc %>% 
  xml_find_all("//N2:OrganisationName/N2:NameElement", ns=xml_ns(xml_doc)) %>% 
  xml_text()
IdentifierElement <- xml_doc %>% 
  xml_find_all("//N5:Identifiers/N5:Identifier/N5:IdentifierElement", ns=xml_ns(xml_doc)) %>% 
  xml_text()
df <- data.frame(OrganisationName, IdentifierElement)
df

如果它解决了您的问题，请不要告诉我们。）

xmlEventParse返回空数据帧

1 个答案: