xmlEventParse返回空数据帧

时间:2017-09-20 04:44:54

标签: r xml xml-parsing

我是XML的新手,我知道在使用大文件时,最好在R中使用xmlEventParse()和getNodeSet()。但是,我的代码运行得很好,我不知道为什么。也许是路径的不正确定义?

可以在此链接上找到与原始文件类似的虚拟XML文件: dummy xml file

我的虚拟R代码是这样的:

library(XML)

FOOid_traverse <- function() {

uids <- c() 
refs <- c()

REC <- function(x) {

uid <- xpathSApply(x, "//N8:EntityList/N8:Entity/N2:OrganisationName", 
xmlValue)
ref <- xpathSApply(x,   "//N8:EntityList/N8:Entity/N5:Identifiers/N5:Identifier/N5:IdentifierElement", xmlValue)

if (length(uid) > 0) {

  if (length(ref) == 0) {

    uids <<- c(uids, uid)
    refs <<- c(refs, NA_character_)

  } else {

    uids <<- c(uids, rep(uid, length(ref)))
    refs <<- c(refs, ref)

  } 

 } 

}


list(
REC = REC, 
FOOid_df = function() { 
  data.frame(uid = uids, ref = refs, stringsAsFactors = FALSE)
}

}

FOOid_f&lt; - FOOid_traverse()

不可见的(   xmlEventParse(     file = path.expand(&#34; companies_xml_extract_20170703.xml&#34;),     branches = FOOid_f [&#34; REC&#34;])   )

FOOid_f $ FOOid_df()

由于

1 个答案:

答案 0 :(得分:0)

希望这有帮助!

library(xml2)
library(dplyr)
xml_doc <- read_xml("test.xml")

OrganisationName <- xml_doc %>% 
  xml_find_all("//N2:OrganisationName/N2:NameElement", ns=xml_ns(xml_doc)) %>% 
  xml_text()
IdentifierElement <- xml_doc %>% 
  xml_find_all("//N5:Identifiers/N5:Identifier/N5:IdentifierElement", ns=xml_ns(xml_doc)) %>% 
  xml_text()
df <- data.frame(OrganisationName, IdentifierElement)
df


如果它解决了您的问题,请不要告诉我们。)