我想引导一个包含多个列和行变量的大型双向数据集。我必须保留行和列变量。结果应该是一个列表,其中包含每个行变量的所有列变量的引导程序。我提供了回答我的问题所需的代码,但我认为它不优雅。我希望有更好更快的代码。以下是双向数据集的简化重新创建:
rm(list=ls())
data <- 1:72
创建双向矩阵数据:
charDataDiff <- matrix(data, nrow=9,ncol=8)
varNames <- c("A", "B", "C","A", "B", "C","A", "B", "C")
将字符列添加到charDataDiff矩阵:
charDataDiff <- cbind(varNames ,data.frame(charDataDiff))
添加列名:
colnames(charDataDiff) <- c("patchId","s380","s390","s400","s410","s420","s430","s440","s450")
使用行变量“patchId”作为条件分隔数据。这将创建三个列表:每个变量一个
idColor <- c("A", "B", "C")
(patchSpectrum <- lapply(idColor, function(idColor) charDataDiff[charDataDiff$patchId==idColor,]))
创建函数sampleBoot以对patchSpectrum进行采样
sampleBoot <- function(nbootstrap=2, patch=3){
return(lapply(1:nbootstrap, function(i)
{patchSpectrum[[patch]][sample(1:nrow(patchSpectrum[[patch]]),replace=TRUE),]}))}
列表“k”回答了我的问题。但是,我认为我的代码对于大型数据集和大型引导程序来说速度很慢。我只为三个行变量引导10次迭代。感谢更快更优雅的代码。
numBoots <- 10
for (i in 1: numBoots)
k <- lapply(1:3, function(n)
do.call(rbind, lapply(sampleBoot(i, n), function(x) apply(x[-1], 2, median))))
k
答案 0 :(得分:4)
我能提供的是对代码的干净重写:
varNames
列,而是使用by
来分割您的数据,就像@KenWilliams在您之前的问题中建议的那样,replicate
代替lapply
,for
循环。它应该以这种方式运行得更快;如果没有,我建议你尝试分析它。
charDataDiff <- matrix(1:72, nrow = 9, ncol = 8)
colnames(charDataDiff) <- c("s380", "s390", "s400", "s410",
"s420", "s430", "s440", "s450")
varNames <- c("A", "B", "C", "A", "B", "C", "A", "B", "C")
patchSpectrum <- by(charDataDiff, varNames, data.frame)
sampleOne <- function(x) x[sample(seq_len(nrow(x)), replace = TRUE), ]
sampleBoot <- function(x, n) replicate(n, sampleOne(x), simplify = FALSE)
applyMedian <- function(l) do.call(rbind, lapply(l, apply, 2, median))
k <- lapply(lapply(patchSpectrum, sampleBoot, n = 10), applyMedian)