Question

我有一个非常大的data.frame。我想要做的是从这些列中减去列37-2574的行平均值，然后除以行标准偏差。然后，我需要将第1-18列乘以（同一行）标准偏差。最后，我需要从第19-36行的第18-2574行中减去行平均值。我目前正试图通过for循环来实现这一目标，但这需要永远。有没有办法用apply，甚至更快的for循环来做到这一点？这就是我目前所拥有的：

for (i in 1:nrow(samples)){
  theta.mean <- mean(samples[i, 37:2574])
  theta.sd <- sd(samples[i, 37:2574])
  samples[i, 37:2574] <- (samples[i, 37:2574] - theta.mean)/ theta.sd
  # then multiply columns 1-18 by SD of theta at each iteration 
  samples[i, 1:18] <- samples[i, 1:18] * theta.sd
  # subtract theta-mean * column 1-18 from columns 19-36
  for (j in 1:18){
    theta.mean.beta <- theta.mean * samples[i, j]
    samples[i, j + 18] <- samples[i, j + 18] - theta.mean.beta
  }
}

Answer 1

诀窍是使用apply()一次性计算所有行统计信息，然后按列进行操作，如下所示：

# calculate the row means and sds's using apply()
theta.means  <-  apply(samples[,37:2574],  # the object to summarized
                       1,                  # summarize over the rows (MARGIN = 1)
                       mean)               # the summary function 
theta.sds  <-  apply(samples[,37:2574],1,sd)

# define a function to apply for each row
standardize  <-  function(x)
    (x - mean(x))/sd(x)
# apply it it over for each row (MARGIN = 1)
samples[,37:2574]  <-  t(apply(samples[,37:2574],1,standardize))

# subtract theta-mean * column 1-18 from columns 19-36
for (j in 1:18){
    samples[, j] <- samples[,j] * theta.sds
    theta.mean.beta <- theta.means * samples[, j]
    samples[, j + 18] <- samples[, j + 18] - theta.mean.beta
}

请确保并通过获取行的子集（例如'samples＆lt; - samples [1：100，]`）并检查结果是否相同来仔细检查此代码是否与原始代码等效（我会我自己做了这个，但没有发布一个示例数据集...）。

<强>更新

根据David Arenburg的评论，这是一个更有效的实施方案：

# calculate the row means via rowMeans()
theta.means  <-  rowMeans(as.matrix(samples[,37:2574]))

# redefine SD to be vectorized with respect to rows in the data.frame 
rowSD <- function(x)  
    sqrt(rowSums((x - rowMeans(x))^2)/(dim(x)[2] - 1)) 

# calculate the row means and sds's using the vectorized version of SD
theta.sds  <-  rowSD(as.matrix(samples[,37:2574]))

现在使用从data.frame（x）中减去向量（df）的事实， R回收x - 以及lengh(x) == nrow(df)结果时的值与从x的每列中减去df相同：

 # standardize columns 37 through 2574
 samples[,37:2574] <-  (samples[,37:2574] - theta.means)/theta.sds

现在对行1:18和19:36

执行类似的计算

# subtract theta-mean * column 1-18 from columns 19-36
samples[, 1:18] <- samples[,1:18] * theta.sds
samples[, 1:18 + 18] <- samples[, 1:18 + 18] - theta.means * samples[,1:18] * theta.sds

对于永久循环 - 可能适用？

1 个答案: