我有一个非常大的data.frame
。我想要做的是从这些列中减去列37-2574的行平均值,然后除以行标准偏差。然后,我需要将第1-18列乘以(同一行)标准偏差。最后,我需要从第19-36行的第18-2574行中减去行平均值。我目前正试图通过for
循环来实现这一目标,但这需要永远。有没有办法用apply
,甚至更快的for
循环来做到这一点?这就是我目前所拥有的:
for (i in 1:nrow(samples)){
theta.mean <- mean(samples[i, 37:2574])
theta.sd <- sd(samples[i, 37:2574])
samples[i, 37:2574] <- (samples[i, 37:2574] - theta.mean)/ theta.sd
# then multiply columns 1-18 by SD of theta at each iteration
samples[i, 1:18] <- samples[i, 1:18] * theta.sd
# subtract theta-mean * column 1-18 from columns 19-36
for (j in 1:18){
theta.mean.beta <- theta.mean * samples[i, j]
samples[i, j + 18] <- samples[i, j + 18] - theta.mean.beta
}
}
答案 0 :(得分:5)
诀窍是使用apply()
一次性计算所有行统计信息,然后按列进行操作,如下所示:
# calculate the row means and sds's using apply()
theta.means <- apply(samples[,37:2574], # the object to summarized
1, # summarize over the rows (MARGIN = 1)
mean) # the summary function
theta.sds <- apply(samples[,37:2574],1,sd)
# define a function to apply for each row
standardize <- function(x)
(x - mean(x))/sd(x)
# apply it it over for each row (MARGIN = 1)
samples[,37:2574] <- t(apply(samples[,37:2574],1,standardize))
# subtract theta-mean * column 1-18 from columns 19-36
for (j in 1:18){
samples[, j] <- samples[,j] * theta.sds
theta.mean.beta <- theta.means * samples[, j]
samples[, j + 18] <- samples[, j + 18] - theta.mean.beta
}
请确保并通过获取行的子集(例如'samples&lt; - samples [1:100,]`)并检查结果是否相同来仔细检查此代码是否与原始代码等效(我会我自己做了这个,但没有发布一个示例数据集...)。
<强>更新强>
根据David Arenburg的评论,这是一个更有效的实施方案:
# calculate the row means via rowMeans()
theta.means <- rowMeans(as.matrix(samples[,37:2574]))
# redefine SD to be vectorized with respect to rows in the data.frame
rowSD <- function(x)
sqrt(rowSums((x - rowMeans(x))^2)/(dim(x)[2] - 1))
# calculate the row means and sds's using the vectorized version of SD
theta.sds <- rowSD(as.matrix(samples[,37:2574]))
现在使用从data.frame(x
)中减去向量(df
)的事实,
R回收x
- 以及lengh(x) == nrow(df)
结果时的值
与从x
的每列中减去df
相同:
# standardize columns 37 through 2574
samples[,37:2574] <- (samples[,37:2574] - theta.means)/theta.sds
现在对行1:18
和19:36
# subtract theta-mean * column 1-18 from columns 19-36
samples[, 1:18] <- samples[,1:18] * theta.sds
samples[, 1:18 + 18] <- samples[, 1:18 + 18] - theta.means * samples[,1:18] * theta.sds