Question

我试图在我的R代码中加入一些错误处理。

下面的伪代码：

foo = function(X,Y) {
...

return(ret.df);
}

DT = DT[,ret.df := foo(X,Y), by=key(DT)];

目的是检查X，Y的某些组合，函数foo是否会引发错误。如果它确实引发了错误，那么我想在最终结果数据帧中跳过该记录组合。我在下面试过没有太多运气：

    DT = DT[ ,  try(ret.df = : foo(X,y)); 
    if(not (class(ref.df) %in% "try-error') ) {
        return(ret.df);
    }, by = key(DT) ];

我总是可以尝试在foo周围编写一个包装器来进行错误检查，但是我正在寻找一种直接在data.table调用中编写语法的方法。这可能吗？

提前感谢您的帮助！

Answer 1

这是一个虚拟函数和数据：

foo = function(X,Y) {
    if (any(Y==2)) stop("Y contains 2!")
    X*Y
}
DT = data.table(a=1:3, b=1:6)
DT
   a b
1: 1 1
2: 2 2
3: 3 3
4: 1 4
5: 2 5
6: 3 6

一步一步：

> DT[, c := foo(a,b), by=a ]
Error in foo(a, b) : Y contains 2!

好的，那就是施工。好。

除此之外：添加了通知栏c，尽管有错误。

> DT
   a b  c
1: 1 1  1
2: 2 2 NA
3: 3 3 NA
4: 1 4  4
5: 2 5 NA
6: 3 6 NA

只填充了第一个成功的小组;它在第二组停了下来。这是设计的。在将来的某个时候，我们可以在内部将事务添加到data.table，就像SQL一样，这样如果发生错误，任何更改都可以回滚。无论如何，只是需要注意的事情。

要处理错误，您可以使用{}。

首次尝试：

> DT[, c := {
    if (inherits(try(ans<-foo(a,b)),"try-error"))
        NA
    else
        ans
}, by=a]
Error in foo(a, b) : Y contains 2!
Error in `[.data.table`(DT, , `:=`(c, { : 
  Type of RHS ('logical') must match LHS ('integer'). To check and coerce would
  impact performance too much for the fastest cases. Either change the type of
  the target column, or coerce the RHS of := yourself (e.g. by using 1L instead
  of 1)

错误告诉我们该怎么做。让我们强制从NA到logical强制RHS（integer）的类型。

> DT[, c:= {
    if (inherits(try(ans<-foo(a,b)),"try-error"))
        NA_integer_
    else
        ans
}, by=a]
Error in foo(a, b) : Y contains 2!

更好的是，长期错误消失了。但是为什么foo的错误呢？我们来看DT只是为了检查。

> DT
   a b  c
1: 1 1  1
2: 2 2 NA
3: 3 3  9
4: 1 4  4
5: 2 5 NA
6: 3 6 18

哦，它已经奏效了。第3组已经运行，值9和18出现在第3行和第6行。查看?try会显示silent参数。

> DT[, c:= {
    if (inherits(try(ans<-foo(a,b),silent=TRUE),"try-error"))
        NA_integer_
    else
        ans
}, by=a]
> # no errors
> DT
   a b  c
1: 1 1  1
2: 2 2 NA
3: 3 3  9
4: 1 4  4
5: 2 5 NA
6: 3 6 18

Answer 2

plyr中有一个你可能会觉得有用的功能 - 它完全包含了Matt所做的，但是以简洁和可重复使用的方式：failwith()。

library(data.table)
library(plyr)

foo = function(X,Y) {
    if (any(Y==2)) stop("Y contains 2!")
    X*Y
}
DT = data.table(a=1:3, b=1:6)
DT

DT[, c := failwith(NA_integer, foo)(a,b), by=a ]

failwith有两个参数：一个返回错误的值，一个要修改的函数f。它返回f的新版本，而不是抛出错误将返回默认值。

failwith的定义非常简单：

failwith <- function(default = NULL, f, quiet = FALSE) {
  f <- match.fun(f)
  function(...) {
    try_default(f(...), default, quiet = quiet)
  }
}

使用try语句进行data.table和错误处理

2 个答案: