所以我有一个面板数据集,并希望为每个id为变量(x)的变化/增加分配一些虚拟变量。我写了一个功能,完全符合我的要求。不幸的是,它非常慢,因为数据非常大,而且函数必须遍历数千个ID。
我想知道是否有更有效的方法来生成这些结果,可能是通过避免循环。
值得一提的是,并非所有ID都会在整个时间段内被观察到。
AddDummies <- function(ids, startYear = 2003, endYear = 2013){
# Checks for ids over a time span if there has been an increase/change in variable x for a single id from one year to the next
for(i in 1:length(ids)){
cat("Progress: ", round((i/length(ids)), 4)*100, "% \n")
for(k in startYear:endYear){
x.curr <- fullDat[id_d == ids[i] & year == k, x] # x in year k
x.last <- fullDat[id_d == ids[i] & year == k-1, x] # x in year k-1
if(length(x.curr) == 0 | length(x.last) == 0){ # if id has not been in the data the year before or in the current year
# skip and go to next iteration, since change can not be determined
next
} else if(x.curr != x.last){ # if there has been an change in x
fullDat[id_d == ids[i] & year == k, changeDummy := 1] # dummy for change
fullDat[id_d == ids[i] & year == k, changeAbsolute := (x.curr - x.last)] # absolute change
if(x.curr > x.last){
fullDat[id_d == ids[i] & year == k, increase := 1] # dummy for increase
}
if(x.curr == 1 & x.last == 0){
fullDat[id_d == ids[i] & year == k, zeroToOne := 1] # dummy for ids with an increase in x from 0 to 1
}
} else {
next
}
}
}
}
提前致谢
答案 0 :(得分:1)
我认为你可以在一行data.table中执行这些操作:
fullDat[, increase := c(F, x[-1] > x[1:(.N-1)]), by = id_d]
fullDat[, zerotoone := c(F, x[-1]==1 & x[1:(.N-1)]==0), by = id_d]
一些数据
library(data.table)
fullDat = data.table(
id_d = rep(letters[1:3], each=10),
year=rep(1:10, 3),
x = sample(10,30,replace = T)
)
setkey(fullDat, 'id_d', 'year')