data.table在R中制作表的副本

时间:2013-07-31 07:25:00

标签: r data.table lapply

我这样做:

myfun <- function(inputvar_vec){
# inputvar_vec is input vector
# do something
# result = output vector
return(result)
}

DT[, result := lapply(.SD, myfun), by = byvar, .SDcols = inputvar]

我收到以下警告:

Warning message:
`In `[.data.table`(df1, , `:=`(prop, lapply(.SD, propEventInLastK)),  :
Invalid .internal.selfref detected and fixed by taking a copy of the whole table, 
so that     := can add this new column by reference. At an earlier point, this 
data.table has been copied by R (or been created manually using structure() 
or similar). (and then some more stuff) .... `

我的猜测是因为我正在堆叠result个向量(在按操作之后),正在制作副本?

有人可以建议删除此警告的方法吗?我已经使用apply函数完成了这个,并且认为它也应该在这里扩展。

我的另一个问题是:你能从数据帧中传递一大块行(通过使用by语句进行子集化),然后调用函数myfun对其进行操作吗?

根据需要添加示例

# generate data
N = 10000
default=NA
value = 1
df = data.table(id = sample(1:5000, N, replace=TRUE),
                trial = sample(c(0,1,2), N, replace=TRUE),
                ts = sample(1:200, N, replace=TRUE))

#set keys
setkeyv(df, c("id", "ts"))

df[["trial"]] = as.numeric(df[["trial"]]==value)

testfun <- function(x){
  L=length(x)
  x = x[L:1]
  x = fts(data=x)
  y = rep(default, L)
  if(L>=K){
    y1 = as.numeric(moving.sum(x,K))
    y = c(y1, rep(default,L-length(y1)))
  } 
  return(y[L:1]/K)
}

df[, prop:= lapply(.SD, testfun), by = id, .SDcols = "trial"]

仍然收到相同的警告信息:

Warning message:
In `[.data.table`(df, , `:=`(prop, lapply(.SD, testfun)), by = id,  :
  Invalid .internal.selfref detected and fixed by taking a copy of the whole table, so that := can add this new column by reference. At an earlier point, this data.table has been copied by R (or been created manually using structure() or similar). Avoid key<-, names<- and attr<- which in R currently (and oddly) may copy the whole data.table. Use set* syntax instead to avoid copying: setkey(), setnames() and setattr(). Also, list(DT1,DT2) will copy the entire DT1 and DT2 (R's list() copies named objects), use reflist() instead if needed (to be implemented). If this message doesn't help, please report to datatable-help so the root cause can be fixed.

1 个答案:

答案 0 :(得分:2)

问题出现在

df[["trial"]] = as.numeric(df[["trial"]]==value)

这不是data.table方法

data.table方法是使用:=

 df[, trial := as.numeric(trial == value)]

应该避免这个问题。

了解复制的原因(因此内部自我引用可能无效)请参阅Understanding exactly when a data.table is a reference to (vs a copy of) another data.table

重要的是要意识到[[<-没有data.table方法,因此[[<-.data.frame被调用,这将复制整个对象,而且不会做任何小心data.table方法(例如[<-.data.table)所做的事情(返回有效的data.table