在函数中使用data.table

时间:2018-02-09 05:56:53

标签: r data.table

我发现R包data.table在交互式控制台上使用时非常有用。 但是当在函数中使用它时会让事情变得更加棘手

library(data.table)
flights <- fread("https://github.com/arunsrinivasan/flights/wiki/NYCflights14/flights14.csv")

flights[origin == "JFK" & month == 6L,
        .(m_arr = mean(arr_delay), m_dep = mean(dep_delay))]

但这失败了:

x="arr_delay" # x and y are passed from arguments of a function
y="dep_delay"
flights[origin == "JFK" & month == 6L,
        .(m_arr = mean(x), m_dep = mean(y))]

是否有解决方法?

1 个答案:

答案 0 :(得分:3)

选项是在.SDcols中指定,然后从mean获取SD

setnames(flights[origin == "JFK" & month == 6L,
    lapply(.SD, mean), .SDcols = c(x, y)], c('m_arr', 'm_dep'))[]
#     m_arr    m_dep
#1: 5.839349 9.807884

它可以包含在函数中

f1 <- function(dat, col1, col2) {

  setnames(dat[origin == "JFK" & month == 6L,
       lapply(.SD, mean), .SDcols = c(col1, col2)],  c('m_arr', 'm_dep'))[]
 }

f1(flights, x, y)

如果我们不想这样做,那么get是获取值的选项

flights[origin == "JFK" & month == 6L,
    .(m_arr = mean(get(x)), m_dep = mean(get(y)))]
#     m_arr    m_dep
#1: 5.839349 9.807884

或另一个选项是eval(as.name

f2 <- function(dat, col1, col2) {

  dat[origin == "JFK" & month == 6L,
    .(m_arr = mean(eval(as.name(col1))), m_dep = mean(eval(as.name(col2))))]

}
f2(flights, x, y)
#     m_arr    m_dep
#1: 5.839349 9.807884

使用tidyverse的选项将是

f3 <- function(dat, col1, col2) {

  dat %>% 
       filter(origin == "JFK",  month == 6L) %>%
       summarise_at(vars(col1, col2), mean) %>%
       rename(m_arr := !! rlang::sym(col1),
              m_dep := !! rlang::sym(col2))

  }

f3(flights, x, y)
#     m_arr    m_dep
#1 5.839349 9.807884