Question

所以我有一个面板数据集，并希望为每个id为变量（x）的变化/增加分配一些虚拟变量。我写了一个功能，完全符合我的要求。不幸的是，它非常慢，因为数据非常大，而且函数必须遍历数千个ID。

我想知道是否有更有效的方法来生成这些结果，可能是通过避免循环。

值得一提的是，并非所有ID都会在整个时间段内被观察到。

AddDummies <- function(ids, startYear = 2003, endYear = 2013){
  # Checks for ids over a time span if there has been an increase/change in variable x for a single id from one year to the next
  for(i in 1:length(ids)){
    cat("Progress: ", round((i/length(ids)), 4)*100, "% \n")
    for(k in startYear:endYear){
      x.curr <- fullDat[id_d == ids[i] & year == k, x] # x in year k
      x.last <- fullDat[id_d == ids[i] & year == k-1, x] # x in year k-1
      if(length(x.curr) == 0 | length(x.last) == 0){ # if id has not been in the data the year before or in the current year
        # skip and go to next iteration, since change can not be determined
        next
      } else if(x.curr != x.last){ # if there has been an change in x
        fullDat[id_d == ids[i] & year == k, changeDummy := 1] # dummy for change
        fullDat[id_d == ids[i] & year == k, changeAbsolute := (x.curr - x.last)] # absolute change
        if(x.curr > x.last){
          fullDat[id_d == ids[i] & year == k, increase := 1] # dummy for increase
        }
        if(x.curr == 1 & x.last == 0){
          fullDat[id_d == ids[i] & year == k, zeroToOne := 1] # dummy for ids with an increase in x from 0 to 1
        }
      } else {
        next
      }
    }
  }
}

提前致谢

Answer 1

我认为你可以在一行data.table中执行这些操作：

fullDat[, increase := c(F, x[-1] > x[1:(.N-1)]), by = id_d]
fullDat[, zerotoone := c(F, x[-1]==1 & x[1:(.N-1)]==0), by = id_d]

一些数据

library(data.table)
fullDat = data.table(
  id_d = rep(letters[1:3], each=10),
  year=rep(1:10, 3),
  x = sample(10,30,replace = T)
  )
setkey(fullDat, 'id_d', 'year')

提高R功能的速度（避免循环）？

1 个答案: