data.table与checkUsage不匹配

时间:2013-04-23 14:38:50

标签: r data.table

data.table是一个精彩的软件包,唉,会从checkUsage生成无根据的警告(代码来自herehere):

> library(compiler)
> compiler::enableJIT(3)
> dt <- data.table(a = c(rep(3, 5), rep(4, 5)), b=1:10, c=11:20, d=21:30, key="a")
> my.func <- function (dt) {
  dt.out <- dt[, lapply(.SD, sum), by = a]
  dt.out[, count := dt[, .N, by=a]$N]
  dt.out
}
> checkUsage(my.func)
<anonymous>: no visible binding for global variable ‘.SD’ (:2)
<anonymous>: no visible binding for global variable ‘a’ (:2)
<anonymous>: no visible binding for global variable ‘count’ (:3)
<anonymous>: no visible binding for global variable ‘.N’ (:3)
<anonymous>: no visible binding for global variable ‘a’ (:3)
> my.func(dt)
Note: no visible binding for global variable '.SD' 
Note: no visible binding for global variable 'a' 
Note: no visible binding for global variable 'count' 
Note: no visible binding for global variable '.N' 
Note: no visible binding for global variable 'a' 
   a  b  c   d count
1: 3 15 65 115     5
2: 4 40 90 140     5

a替换为by=a可以避免有关by="a"的警告,但如何处理其他3个警告?

这对我很重要,因为这些警告会使屏幕变得混乱并掩盖合法的警告。由于警告是在my.func调用时发出的(当启用JIT编译器时),而不仅仅是checkUsage,我倾向于将其称为bug

2 个答案:

答案 0 :(得分:4)

更新:现已在v1.8.11中解决。来自NEWS

  

.SD.N.I.GRP.BY现已导出(NULL)。因此R CMD checkcodetools::checkUsage通过compiler::enableJIT()不会为它们生成NOTE。考虑utils::globalVariables(),但选择了出口。       感谢Sam Steingold提升,#2723

要解析列名称符号counta的注释,它们都可以用引号括起来(即使在:=的LHS上)。使用新的R会话(因为笔记只是第一次),以下内容现在不会产生任何注释。

$ R
R version 3.0.1 (2013-05-16) -- "Good Sport"
Copyright (C) 2013 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)
> require(data.table)
Loading required package: data.table
data.table 1.8.11  For help type: help("data.table")
> library(compiler)
> compiler::enableJIT(3)
[1] 0
> dt <- data.table(a=c(rep(3,5),rep(4,5)), b=1:10, c=11:20, d=21:30, key="a")
> my.func <- function (dt) {
  dt.out <- dt[, lapply(.SD, sum), by = "a"]
  dt.out[, "count" := dt[, .N, by="a"]$N]
  dt.out
}
> my.func(dt)
   a  b  c   d count
1: 3 15 65 115     5
2: 4 40 90 140     5
> checkUsage(my.func)
> 

答案 1 :(得分:2)

目前看来唯一的方法是

my.func <- function (dt) {
  .SD <- .N <- count <- a <- NULL  # avoid inappropriate warnings
  dt.out <- dt[, lapply(.SD, sum), by = a]
  dt.out[, count := dt[, .N, by=a]$N]
  dt.out
}

即,本地绑定报告为未绑定全局变量的变量。

感谢@GSee的链接。