在函数中使用dplyr函数,NSE / SE出现问题

时间:2016-04-13 17:23:17

标签: r dplyr

我正在使用dplyr软件包在R中工作,我需要一个函数用于重复实例,其中我沿着X变量进行观察,然后在每个二进制文件中的Y变量上绘制平均值。

以下是(A)我在此功能上失败的尝试的可重现的例子,然后是(B)具有单个X和Y的所需输出的工作示例。

library(plyr)
library(dplyr)
library(ggplot2)

df = data.frame(x = c(1, 2, 3, 4, 5, 6, 7, 8, 9),
                y = c(1, 1, 1, 2, 2, 2, 0, 0, 0))

# (A) function that doesn't work correctly
bin_and_plot <- function(data, x, y) {
  data.binned = data %>%
    mutate_(cut = cut(x, breaks = 3)) %>%
    group_by_(cut) %>%
    summarise_(n = ~n(),
               mean = ~mean(y))
  qplot(data = data.binned, x = cut, y = mean)
}

bin_and_plot(df, ~x, ~y)


# (B) working example of desired output
df.binned = df %>%
  mutate(cut = cut(x, breaks = 3)) %>%
  group_by(cut) %>%
  summarise(n = n(),
            mean = mean(y))
qplot(data = df.binned, x = cut, y = mean)

我已经阅读了其他几十个类似问题的问题,并在NSE / SE上查看了这些参考文献......

https://cran.r-project.org/web/packages/dplyr/vignettes/nse.html
http://adv-r.had.co.nz/Computing-on-the-language.html

...但是虽然我很清楚我有评估问题,但我还是无法解决它们。目前,它正在打破cut() - 我可以避免这个错误,但是除此之外的问题有多个层。我无法成功排除故障,可能是因为我现在已经写了几个并发问题。

非常感谢任何帮助。

2 个答案:

答案 0 :(得分:0)

我使用以下代码。您似乎需要删除公式中的~符号和下划线。

在定义函数时添加x = as.character(),您可以定义要使用的列的名称

df = data.frame(x = c(1, 2, 3, 4, 5, 6, 7, 8, 9),
                y = c(1, 1, 1, 2, 2, 2, 0, 0, 0))

# create a new column to test the function
df$test = rnorm(9,5,2)


# (A) function that doesn't work correctly
bin_and_plot <- function(data, x = as.character(), y = as.character()) {
  data$x = data[,x]
  data$y = data[,y]
  data.binned = data %>%
    mutate(cut = cut(x, breaks = 3)) %>%
    group_by(cut) %>%
    summarise(n = n(),
              mean = mean(y))
  qplot(data = data.binned, x = cut, y = mean)
}

bin_and_plot(df,"x","y")

bin_and_plot(df,"test","y")

答案 1 :(得分:0)

这是我经常从包 lazyeval 开始使用interp的地方。我认为在你联系到的小插图中有一些这样的例子。此外,在此特定示例中,不需要group_by_

library(lazyeval)
bin_and_plot <- function(data, x, y) {
    data.binned = data %>%
        mutate_(cut = interp(~cut(var, breaks = 3), var = as.name(x))) %>%
        group_by(cut) %>%
        summarise_(n = ~n(),
                 mean = interp(~mean(var2), var2 = as.name(y)))
    qplot(data = data.binned, x = cut, y = mean)
}

bin_and_plot(df, "x", "y")