加速R中的系统发育树遍历

时间:2016-03-30 20:58:12

标签: r performance optimization parallel-processing tree

我有以下代码应该采用系统发育树(以phylo4格式),并且每个内部节点返回两个子节点的长度之和(树是二进制的)。我不确定为什么,但与我在类似大小的树上写的其他函数(约45,000叶子)相比,这是非常慢的。

require(phylobase)
require(parallel)

# df.ilr is the ILR transformed data
# tr is the corresponding tree
calculate.blw <- function(tr, n_cores=1){
  # Note that some of the terminal branches of the tree have zero length. 
  # In these cases  I will replace those zero values with 
  # the minimum branch length (greater than 0) found in the tree.
  min.nonzero <- min(edgeLength(tr)[edgeLength(tr)>0 & !is.na(edgeLength(tr))])
  edgeLength(tr)[edgeLength(tr) == 0 & !is.na(edgeLength(tr))] <- min.nonzero

  coords.names <- nodeLabels(tr)
  #blw <- rep(NA, length(coords.names)) # [b]ranch-[l]ength [w]eights
  fxn <- function(x){
    childs <- names(descendants(tr, x, type='children'))
    return(sum(edgeLength(tr, childs)))
  }

  cl <- makeCluster(n_cores, 'FORK')
  blw <- parSapply(cl, coords.names, fxn)
  stopCluster(cl)
  names(blw) <- coords.names

  return(blw)
}

也欢迎有关记忆效率的建议!

0 个答案:

没有答案