Question

我正在使用一个包含大约300万个观测值的非常大的数据集，并且如果它们满足特定要求，我想要完成并基本上将某些观察结合起来。我在下面写了一个for循环来做这个，但效率非常低。是否有一种更有效的方式，例如使用apply函数或其他东西，可以改善这一点？

nobs <- nrow(acsdata)

for (i in 2:nobs){

  if (acsdata[i,6]==1 & acsdata[(i-1),6]==1) acsdata[(i-1),3]=2  

  if (acsdata[i,6]==1 & acsdata[(i-1),6]==1) acsdata[(i-1),21:30]=acsdata[(i-1),21:30]+acsdata[i,21:30]

  if (acsdata[i,6]==1 & acsdata[(i-1),6]==1) acsdata[(i),31]=1

}

非常感谢任何帮助。谢谢！

Answer 1

只是矢量化。不要乱用循环或应用函数。像（未经测试）的东西：

to.fix <- which(acsdata[ 2:nobs, 6] == 1 & acsdata[ 1:(nobs - 1), 6] == 1)
acsdata[to.fix, 3] <- 2
acsdata[to.fix, 21:30] <- acsdata[to.fix, 21:30] + acsdata[to.fix + 1, 21:30]
acsdata[to.fix + 1, 31] <- 1

在R中向量化一个for循环

1 个答案: