如何更改将dplyr mutate_all等输出转换为data.table的包装函数,从而引发错误

时间:2019-06-04 11:10:55

标签: r dplyr data.table

由于我是R的新手,所以我努力寻找避免使用dplyr的解决方案,因为我知道data.table通常具有更好的性能。为了避免在使用mutate_all等之后必须将每个data.frame转换为data.table。我在另一篇文章中找到了一种解决方法(注意:我没有信誉点可以评论并直接向那里提问)。但是,这将引发以下错误:

“评估嵌套太深:无限递归/ options(expressions =)?”

我认为这是因为我在mutate_all,mutate_at等中嵌入了函数,并且我对函数的了解不足以尝试更改包装器。关于如何适应包装器功能的任何想法?

我使用mutate_all和mutate_at进行了几次这样的转换,它们具有不同的乐趣。我们通常从Excel或CSV导入数据。

我使用的包装函数在这里解决了类似问题的解决方案:用户:BenjaminWolfe。我认为主要区别在于,他们在mutate_if中没有乐趣:mutate_if, summarize_at etc coerce data.table to data.frame

这似乎是一个不错的解决方法,因为我只需要在代码中包括这些包装器函数即可。

下面是数据示例和代码块之一:

第一个没有包装函数的解决方法

library(dplyr)
library(data.table)


# Example of data to clean without wrapper functions
========
DT = data.table(date=as.character(c(43131:43140)),numbers=c("1000000000","1000000001","1000000002"))

# Define the columns which contain dates or numbers in the data
DateNumberColumns <- c("date", "numbers")
DateColumns <- c("date")


# Change data types where they should be numbers and dates
DT <- DT %>%
  mutate_at(vars(DateNumberColumns),
            funs(as.numeric)) %>%
  mutate_at(vars(DateColumns),
            # due to an error in Excel's dates, the origin that gives the correct dates is as below
            funs(as.Date(., origin = "1899-12-30")))
is.data.table(DT)

这是包装函数的解决方法

# Example of data to clean with wrapper functions
========
# take out # to clean environment:
# rm(list=ls())
# data table example
DT = data.table(date=as.character(c(43131:43140)),numbers=c("1000000000","1000000001","1000000002"))

# Define the columns which contain dates or numbers in the data
DateNumberColumns <- c("date", "numbers")
DateColumns <- c("date")

# wrapper function from https://stackoverflow.com/questions/56145140/mutate-if-summarize-at-etc-coerce-data-table-to-data-frame
    mutate_at <- function(.tbl, ...) {
  if ("data.table" %in% class(.tbl)) {
    .tbl %>% mutate_at(...) %>% as.data.table()
  } else {
    .tbl %>% mutate_at(...)
  }
}

DT <- DT %>%
  mutate_at(vars(DateNumberColumns),
            funs(as.numeric)) %>%
  mutate_at(vars(DateColumns),
            # due to an error in Excel's dates, the origin that gives the correct dates is as below
            funs(as.Date(., origin = "1899-12-30")))
is.data.table(DT)

我希望输出为:

          date numbers
 1: 2018-01-31   1e+09
 2: 2018-02-01   1e+09
 3: 2018-02-02   1e+09
 4: 2018-02-03   1e+09
 5: 2018-02-04   1e+09
 6: 2018-02-05   1e+09
 7: 2018-02-06   1e+09
 8: 2018-02-07   1e+09
 9: 2018-02-08   1e+09
10: 2018-02-09   1e+09

is.data.table(DT)
[1] TRUE

但是实际输出是:

 DT <- DT %>%
+   mutate_at(vars(DateNumberColumns),
+             funs(as.numeric)) %>%
+   mutate_at(vars(DateColumns),
+             # due to an error in Excel's dates, the origin that gives the correct dates is as below
+             funs(as.Date(., origin = "1899-12-30")))
Error: evaluation nested too deeply: infinite recursion / options(expressions=)?
> DT
     date    numbers
 1: 43131 1000000000
 2: 43132 1000000001
 3: 43133 1000000002
 4: 43134 1000000000
 5: 43135 1000000001
 6: 43136 1000000002
 7: 43137 1000000000
 8: 43138 1000000001
 9: 43139 1000000002
10: 43140 1000000000
> is.data.table(DT)
[1] TRUE

1 个答案:

答案 0 :(得分:0)

如果您对替代品感兴趣, 我最近发布了table.express软件包, 当您想使用类似于dplyr的语法时,可以在这种情况下为您提供帮助。 您的示例可以这样完成:

library(data.table)
library(table.express)

DT = data.table(date=as.character(c(43131:43140)),numbers=c("1000000000","1000000001","1000000002"))

# Define the columns which contain dates or numbers in the data
DateNumberColumns <- c("date", "numbers")
DateColumns <- c("date")

DT <- DT %>%
  start_expr %>%
  mutate_sd(as.numeric, .SDcols = DateNumberColumns) %>%
  mutate_sd(as.Date, origin = "1899-12-30", .SDcols = DateColumns) %>%
  end_expr %T>%
  print
          date numbers
 1: 2018-01-31   1e+09
 2: 2018-02-01   1e+09
 3: 2018-02-02   1e+09
 4: 2018-02-03   1e+09
 5: 2018-02-04   1e+09
 6: 2018-02-05   1e+09
 7: 2018-02-06   1e+09
 8: 2018-02-07   1e+09
 9: 2018-02-08   1e+09
10: 2018-02-09   1e+09