Question

我依赖这些帖子1和2来提供以下关闭代码。该代码适用于压缩xml大小为1.3 GB（实际大小为13.5 GB）。但是，获得最终结果大约需要10个小时。我有时间代码和闭包函数占用了大约9.5小时的10小时（所以，我只发布代码的相关闭包部分）。鉴于此，有没有办法进一步加快这段代码？我可以在这里进行并行化吗？这是一个非常小的data sample。

更新：指向25% sample data和100% population的链接。

library(XML)

branchFunction <- function() {
  store <- new.env() 
  func <- function(x, ...) {
    ns <- getNodeSet(x,path = "//person[@id]|//plan[@selected='yes']//*[not(self::route)]")
    value <- lapply(ns, xmlAttrs)
    id <- value[[1]]
    store[[id]] <- value
  }
  getStore <- function() { as.list(store) }
  list(person = func, getStore=getStore)
}

myfunctions <- branchFunction()

xmlEventParse(file = "plansfull.xml", handlers = NULL, branches = myfunctions)

#to see what is inside
l <- myfunctions$getStore()
l <- unlist(l, recursive = FALSE)

R可以使这个闭包代码更有效吗？

0 个答案: