Question

我有一个非常大的XML文件（> 70GB），我只需要读取一些段。但是，我也不知道文件的结构，并且由于文件的大小而无法提取它。

我不需要读取整个文件或将其转换为数据框 - 仅提取特定部分，但我不知道这些序列的具体格式，因为我没有结构。

我尝试使用xmlParse，并根据此处的建议使用xmlEventParse： How to read large (~20 GB) xml file in R?

建议的代码返回一个空数据框：

xmlDoc <- "Final.xml"
result <- NULL

#function to use with xmlEventParse
row.sax = function() {
    ROW = function(node){
            children <- xmlChildren(node)
            children[which(names(children) == "text")] <- NULL
            result <<- rbind(result, sapply(children,xmlValue))
          }
    branches <- list(ROW = ROW)
    return(branches)
}

#call the xmlEventParse
xmlEventParse(xmlDoc, handlers = list(), branches = row.sax(),
              saxVersion = 2, trim = FALSE)

#and here is your data.frame
result <- as.data.frame(result, stringsAsFactors = F)

我没有使用XML的经验，因此我不完全理解我尝试使用的解决方案。

感谢您的帮助！

从R中的一个非常大的XML文件中读取

0 个答案: