Question

我正在寻找输出，其中包括我的（大）data.table在可用观察值之间缺失的所有情况。

DT <- data.table(country=c(rep("DE",10),rep("AT",10)),time=rep(2001:2010,2), value=rnorm(20))
DT[country=="DE" &time %in% c(2001,2005,2006),"value"] <- NA
DT[country=="AT" &time %in% c(2003,2008,2009,2010),"value"] <- NA

我想编写一个函数，允许我在data.table＆amp; DE中创建仅2005的{{1}}。 2006中的AT和2003。在this的基础上，我几乎就在那里，对于一个看起来像这样的国家：

test <-DT[country=="DE"]
range <- range(test[!is.na(value),time])
sequence <- seq(range[1],range[2]) 
sequence[!sequence %in% test[!is.na(value),time]]

现在我想通过country data.table by选项在其中创建一个函数。这是我的非工作示例：

#function to find datagaps (NA's) in a data.table (you still have to apply by group):
#x is the name of the column which specifies your frequency (such as year or date)
#y is the name of the column which has the NA's you're looking for
#data is a data.table

findgaps <- function(x,y,data){
range <- range(data[!is.na(y),x])
sequence <- seq(range[1],range[2]) 
return(sequence[!sequence %in% data[!is.na(y),x]])
}
DT[findgaps(time,year,DT),.(country,time,value),by=country)]

我最好的猜测是，函数不会返回data.table对过滤器中的子集有意义的东西，对吗？它应该成为F，F，F，T，F，F，F ...矢量以某种方式作为函数的输出？任何帮助将不胜感激。

编辑：所需的输出可能如下所示：

output <- data.table(country=c("DE","DE","AT"), time=c(2005,2006,2003), value=c("NA","NA","NA"))

最后我想对此做点什么，就像插值一样。因此，在DT中专门处理这些行的任何方法对我来说都没问题。

Answer 1

或许这样的事情：

DT[, { r = rleid(is.na(value))
       idx = r > r[1] & r < tail(r, 1) & is.na(value)
       .(time = time[idx], value = NA)
     }
   , by = country]

R：（年度）data.table中的地址数据缺口

1 个答案: