在函数中使用data.table时,避免在“quote()”中包装函数参数

时间:2014-05-25 16:46:25

标签: r data.table

我有一个函数create.summary,当传递一个列名时,按年份和月份汇总该列的值。请注意在数据表的eval()表达式中使用j

create.summary <- function(full.panel.df, outcome.name){
    df.apps <- data.table(full.panel.df)[, list(
                                        Y = mean(eval(outcome.name)),
                                        se = sd(eval(outcome.name))/sqrt(.N)
                                        ),
                                by = list(month, year, trt)]
    return df.apps
}

为此,我需要使用引用的列名调用此函数,如下所示:  create.summary(df, quote(hourly_earnings))

但这很痛苦并且会让我的用户感到困惑---我宁愿让用户能够以列名作为字符串来调用此函数: create.summary(df, "hourly_earnings")

我猜测deparseevalsubstitute等组合可以使这项工作成功,但我无法弄清楚我只是或多或少地随意尝试。

3 个答案:

答案 0 :(得分:3)

尝试使用get代替eval

create.summary <- function(full.panel.df, outcome.name){
    df.apps <- data.table(full.panel.df)[, list(
                                        Y = mean(get(outcome.name)),
                                        se = sd(get(outcome.name))/sqrt(.N)
                                        ),
                                by = list(month, year, trt)]
    return df.apps
}

这是一个可重复的例子:

foo <- function(x, n) {
  data.table(x)[, list(Y=mean(get(n)),
                       se=sd(get(n))/sqrt(.N)),
                by=list(cyl, am)]
}

foo(mtcars, "wt")
#    cyl am        Y         se
# 1:   6  1 2.755000 0.07399324
# 2:   4  1 2.042250 0.14472656
# 3:   6  0 3.388750 0.05810820
# 4:   8  0 4.104083 0.22179111
# 5:   4  0 2.935000 0.23528352
# 6:   8  1 3.370000 0.20000000
foo(mtcars, "hp")
#    cyl am         Y        se
# 1:   6  1 131.66667 21.666667
# 2:   4  1  81.87500  8.009899
# 3:   6  0 115.25000  4.589390
# 4:   8  0 194.16667  9.630156
# 5:   4  0  84.66667 11.348030
# 6:   8  1 299.50000 35.500000

答案 1 :(得分:1)

对于我(希望是其他人)的缘故,我排列了我的答案,@ GSee,@ BodieG根据不同的行为回答。至少我发现这个比较很有用。

for:create.summary(df, hourly_earnings)

  • eval更改为evalq,在这种情况下绝对是最简单的。

    来自帮助文件:

      

    evalq表单等同于eval(quote(expr),...)。eval在将其传递给求值程序之前计算当前作用域中的第一个参数:evalq避免这种情况。

您的功能变为:

create.summary <- function(full.panel.df, outcome.name){
    df.apps <- data.table(full.panel.df)[, list(
                                        Y = mean(evalq(outcome.name)),
                                        se = sd(evalq(outcome.name))/sqrt(.N)
                                        ),
                                by = list(month, year, trt)]
    return df.apps
}
  • 使用substitute()get()

您的功能变为:

create.summary <- function(full.panel.df, outcome.name){
  out.name.quoted <- as.character(substitute(outcome.name))
  df.apps <- data.table(full.panel.df)[, list(
    Y = mean(get(out.name.quoted)),
    se = sd(get(out.name.quoted))/sqrt(.N)
    ),
    by = list(month, year, trt)
  ]
  df.apps
}

for:create.summary(df, "hourly_earnings")

  • get()搜索该名称的对象;它比 parse(text=)
  • 更安全

您的功能变为:

create.summary <- function(full.panel.df, outcome.name){
    df.apps <- data.table(full.panel.df)[, list(
                                        Y = mean(get(outcome.name)),
                                        se = sd(get(outcome.name))/sqrt(.N)
                                        ),
                                by = list(month, year, trt)]
    return df.apps
}
  • parse(text=),对于合成表达式/从文件中读取非常有用。

您的功能变为:

create.summary <- function(full.panel.df, outcome.name){
    df.apps <- data.table(full.panel.df)[, list(
                                        Y = mean(eval(parse(text=outcome.name))),
                                        se = sd(eval(parse(text=outcome.name)))/sqrt(.N)
                                        ),
                                by = list(month, year, trt)]
    return df.apps
}

答案 2 :(得分:1)

另一位使用substituteget

create.summary <- function(full.panel.df, outcome.name){
  out.name.quoted <- as.character(substitute(outcome.name))
  df.apps <- data.table(full.panel.df)[, list(
    Y = mean(get(out.name.quoted)),
    se = sd(get(out.name.quoted))/sqrt(.N)
    ),
    by = list(month, year, trt)
  ]
  df.apps
}

用法:

create.summary(df, a)

有些数据:

df <- data.frame(month=month.abb, year=rep(2000:2005, each=24), trt=c("one", "two"), a=runif(6 * 12), b=runif(6 * 12))