由于我是R的新手,所以我努力寻找避免使用dplyr的解决方案,因为我知道data.table通常具有更好的性能。为了避免在使用mutate_all等之后必须将每个data.frame转换为data.table。我在另一篇文章中找到了一种解决方法(注意:我没有信誉点可以评论并直接向那里提问)。但是,这将引发以下错误:
“评估嵌套太深:无限递归/ options(expressions =)?”
我认为这是因为我在mutate_all,mutate_at等中嵌入了函数,并且我对函数的了解不足以尝试更改包装器。关于如何适应包装器功能的任何想法?
我使用mutate_all和mutate_at进行了几次这样的转换,它们具有不同的乐趣。我们通常从Excel或CSV导入数据。
我使用的包装函数在这里解决了类似问题的解决方案:用户:BenjaminWolfe。我认为主要区别在于,他们在mutate_if中没有乐趣:mutate_if, summarize_at etc coerce data.table to data.frame
这似乎是一个不错的解决方法,因为我只需要在代码中包括这些包装器函数即可。
下面是数据示例和代码块之一:
第一个没有包装函数的解决方法
library(dplyr)
library(data.table)
# Example of data to clean without wrapper functions
========
DT = data.table(date=as.character(c(43131:43140)),numbers=c("1000000000","1000000001","1000000002"))
# Define the columns which contain dates or numbers in the data
DateNumberColumns <- c("date", "numbers")
DateColumns <- c("date")
# Change data types where they should be numbers and dates
DT <- DT %>%
mutate_at(vars(DateNumberColumns),
funs(as.numeric)) %>%
mutate_at(vars(DateColumns),
# due to an error in Excel's dates, the origin that gives the correct dates is as below
funs(as.Date(., origin = "1899-12-30")))
is.data.table(DT)
这是包装函数的解决方法
# Example of data to clean with wrapper functions
========
# take out # to clean environment:
# rm(list=ls())
# data table example
DT = data.table(date=as.character(c(43131:43140)),numbers=c("1000000000","1000000001","1000000002"))
# Define the columns which contain dates or numbers in the data
DateNumberColumns <- c("date", "numbers")
DateColumns <- c("date")
# wrapper function from https://stackoverflow.com/questions/56145140/mutate-if-summarize-at-etc-coerce-data-table-to-data-frame
mutate_at <- function(.tbl, ...) {
if ("data.table" %in% class(.tbl)) {
.tbl %>% mutate_at(...) %>% as.data.table()
} else {
.tbl %>% mutate_at(...)
}
}
DT <- DT %>%
mutate_at(vars(DateNumberColumns),
funs(as.numeric)) %>%
mutate_at(vars(DateColumns),
# due to an error in Excel's dates, the origin that gives the correct dates is as below
funs(as.Date(., origin = "1899-12-30")))
is.data.table(DT)
我希望输出为:
date numbers
1: 2018-01-31 1e+09
2: 2018-02-01 1e+09
3: 2018-02-02 1e+09
4: 2018-02-03 1e+09
5: 2018-02-04 1e+09
6: 2018-02-05 1e+09
7: 2018-02-06 1e+09
8: 2018-02-07 1e+09
9: 2018-02-08 1e+09
10: 2018-02-09 1e+09
is.data.table(DT)
[1] TRUE
但是实际输出是:
DT <- DT %>%
+ mutate_at(vars(DateNumberColumns),
+ funs(as.numeric)) %>%
+ mutate_at(vars(DateColumns),
+ # due to an error in Excel's dates, the origin that gives the correct dates is as below
+ funs(as.Date(., origin = "1899-12-30")))
Error: evaluation nested too deeply: infinite recursion / options(expressions=)?
> DT
date numbers
1: 43131 1000000000
2: 43132 1000000001
3: 43133 1000000002
4: 43134 1000000000
5: 43135 1000000001
6: 43136 1000000002
7: 43137 1000000000
8: 43138 1000000001
9: 43139 1000000002
10: 43140 1000000000
> is.data.table(DT)
[1] TRUE
答案 0 :(得分:0)
如果您对替代品感兴趣,
我最近发布了table.express
软件包,
当您想使用类似于dplyr
的语法时,可以在这种情况下为您提供帮助。
您的示例可以这样完成:
library(data.table)
library(table.express)
DT = data.table(date=as.character(c(43131:43140)),numbers=c("1000000000","1000000001","1000000002"))
# Define the columns which contain dates or numbers in the data
DateNumberColumns <- c("date", "numbers")
DateColumns <- c("date")
DT <- DT %>%
start_expr %>%
mutate_sd(as.numeric, .SDcols = DateNumberColumns) %>%
mutate_sd(as.Date, origin = "1899-12-30", .SDcols = DateColumns) %>%
end_expr %T>%
print
date numbers
1: 2018-01-31 1e+09
2: 2018-02-01 1e+09
3: 2018-02-02 1e+09
4: 2018-02-03 1e+09
5: 2018-02-04 1e+09
6: 2018-02-05 1e+09
7: 2018-02-06 1e+09
8: 2018-02-07 1e+09
9: 2018-02-08 1e+09
10: 2018-02-09 1e+09