阻止使用`set*`

Question

是否可以创建一个data.table静态（即不可更新）？使用lockBinding（）函数可以防止重新分配变量，但仍可以编辑数据表的列。例如：

> dt = data.table( x = 1:5 )
> lockBinding( "dt", env = environment() )
> dt = 1
Error: cannot change value of locked binding for 'dt'
> dt[ , x := 1 ]
> dt[ , x ]
[1] 1 1 1 1 1

我猜这与数据表的引用方式有关，但是，能够锁定数据表的内容也很有用。（我经常有共享参考表，我不想意外更新。）

Answer 1

这有点棘手。一种方法是劫持[函数以禁止在对象上使用:=。如果我们想绑定data.table，我们可以为它添加一个类，如下所示：

boundDT <- function(dt){
  class(dt) <- c("bound.data.table", class(dt))
  dt
}

结果：

library(data.table)
dt = data.table( x = 1:5 )
bound <- boundDT(dt)
class(bound)
[1] "bound.data.table" "data.table"       "data.frame"

如果我们然后创建一个新的索引函数来处理bound.data.table类，我们可以做我们的事情：

`[.bound.data.table` <- function(dt, ...){
  if(any(unlist(sapply(match.call()[-(1:2)], function(x) if(length(x) > 1)as.character(x[1]) == ":=")))){
    stop("Can't use `:=` on this object.")
  }
  class(dt) <- class(dt)[-1]
  dt[...]
}

检查函数:=是否在调用中使用，如果出现则抛出错误。否则，它会删除data.table内部副本上的绑定类，并调用常规[函数。

bound[, x := 1]
 Error in `[.bound.data.table`(bound, , `:=`(x, 1)) : 
  Can't use `:=` on this object. 
bound[, x]
[1] 1 2 3 4 5

这很难看，但似乎有效。

一个警告：

在连接中使用:=时，如果绑定表不是基表，则不起作用：

dt = data.table( x = 1:5 , y = 5:1)
bound <- boundDT(dt)
dt[bound, y := 1, on = .(x = x)]
bound
   x y
1: 1 1
2: 2 1
3: 3 1
4: 4 1
5: 5 1

然而：

bound[dt, y := 1, on = .(x = x)]
 Error in `[.bound.data.table`(bound, dt, `:=`(y, 1), on = .(x = x)) : 
  Can't use `:=` on this object.

阻止使用`set*`

关于:=运算符的大部分问题，我们可以专注于阻止在对象上使用set*。

当使用绑定的data.table时，我们可以在提供data.table之前检查调用栈以查看是否有任何set*函数。

bindDT <- function(dt){
  bound <- boundDT(dt)
  function(){
    calls <- sys.calls()
    forbidden <- c("set", "set2key", "set2keyv", "setattr", "setcolorder", "setdiff", "setDT", 
                   "setDTthreads", "setequal", "setindex", "setindexv", "setkey", "setkeyv", 
                   "setnames", "setNumericRounding", "setorder", "setorderv")
    matches <- unlist(lapply(calls, function(x) as.character(x)[1] %in% forbidden))
    if(any(matches)){
      stop(paste0("Can't use function ", calls[[which(matches)[1]]][1], " on bound data.table."))
    }

    bound
  }
}

这个函数像以前一样绑定data.table，但不返回它，而是返回一个函数。调用此函数时会检查callstack中的set*函数，如果找到则会抛出错误。我从data.table帮助页面获得了这个列表，所以这应该是完整的。

您可以使用活动绑定来避免使用pryr将data.table作为每次使用的函数调用：

library(data.table)
library(pryr)

dt = data.table( x = 1:5 , y = 5:1)
bound %<a-% (bindDT(dt))()

setkey(bound, x)
Error in (bindDT(dt))() : Can't use function setkey on bound data.table.

Answer 2

您可以将data.table放在函数中：

library("data.table")
dt = function() data.table( x = 1:5 )
dt() = 1 ### error
dt()[ , x := 1 ]
dt()[ , x ]
# > dt()[ , x ]
# [1] 1 2 3 4 5

以下是每次都不生成data.table的变体：

library("data.table")
dt0 = data.table( x = 1:5 )
dt <- function() copy(dt0)
dt() = 1
dt()[ , x := 1 ]
dt()[ , x ]

我对我的解决方案不满意：每次想要使用它时都会复制data.table。每次要使用未更改的 data.table时，都必须通过调用函数dt()

来完成

Answer 3

attr(dt, ".data.table.locked") = TRUE

将锁定大多数data.table操作。尽管如此，一些操作（setnames）仍然可以偷偷过去。但请注意，这是data.table的未记录的内部功能，用于其他目的。因此，如果您尝试更改锁定的表，则收到的错误消息将显得奇怪且令人困惑。并且无法保证它在后续版本的软件包中的表现如何。

如果你决定采用这条路线，你也应该使用lockBinding来锁定基本R类型的操作。

锁定data.table表的内容

3 个答案:

阻止使用`set*`

锁定data.table表的内容

3 个答案:

阻止使用set*

阻止使用`set*`