我正在使用dplyr软件包在R中工作,我需要一个函数用于重复实例,其中我沿着X变量进行观察,然后在每个二进制文件中的Y变量上绘制平均值。
以下是(A)我在此功能上失败的尝试的可重现的例子,然后是(B)具有单个X和Y的所需输出的工作示例。
library(plyr)
library(dplyr)
library(ggplot2)
df = data.frame(x = c(1, 2, 3, 4, 5, 6, 7, 8, 9),
y = c(1, 1, 1, 2, 2, 2, 0, 0, 0))
# (A) function that doesn't work correctly
bin_and_plot <- function(data, x, y) {
data.binned = data %>%
mutate_(cut = cut(x, breaks = 3)) %>%
group_by_(cut) %>%
summarise_(n = ~n(),
mean = ~mean(y))
qplot(data = data.binned, x = cut, y = mean)
}
bin_and_plot(df, ~x, ~y)
# (B) working example of desired output
df.binned = df %>%
mutate(cut = cut(x, breaks = 3)) %>%
group_by(cut) %>%
summarise(n = n(),
mean = mean(y))
qplot(data = df.binned, x = cut, y = mean)
我已经阅读了其他几十个类似问题的问题,并在NSE / SE上查看了这些参考文献......
https://cran.r-project.org/web/packages/dplyr/vignettes/nse.html
http://adv-r.had.co.nz/Computing-on-the-language.html
...但是虽然我很清楚我有评估问题,但我还是无法解决它们。目前,它正在打破cut()
- 我可以避免这个错误,但是除此之外的问题有多个层。我无法成功排除故障,可能是因为我现在已经写了几个并发问题。
非常感谢任何帮助。
答案 0 :(得分:0)
我使用以下代码。您似乎需要删除公式中的~
符号和下划线。
在定义函数时添加x = as.character()
,您可以定义要使用的列的名称
df = data.frame(x = c(1, 2, 3, 4, 5, 6, 7, 8, 9),
y = c(1, 1, 1, 2, 2, 2, 0, 0, 0))
# create a new column to test the function
df$test = rnorm(9,5,2)
# (A) function that doesn't work correctly
bin_and_plot <- function(data, x = as.character(), y = as.character()) {
data$x = data[,x]
data$y = data[,y]
data.binned = data %>%
mutate(cut = cut(x, breaks = 3)) %>%
group_by(cut) %>%
summarise(n = n(),
mean = mean(y))
qplot(data = data.binned, x = cut, y = mean)
}
bin_and_plot(df,"x","y")
bin_and_plot(df,"test","y")
答案 1 :(得分:0)
这是我经常从包 lazyeval 开始使用interp
的地方。我认为在你联系到的小插图中有一些这样的例子。此外,在此特定示例中,不需要group_by_
。
library(lazyeval)
bin_and_plot <- function(data, x, y) {
data.binned = data %>%
mutate_(cut = interp(~cut(var, breaks = 3), var = as.name(x))) %>%
group_by(cut) %>%
summarise_(n = ~n(),
mean = interp(~mean(var2), var2 = as.name(y)))
qplot(data = data.binned, x = cut, y = mean)
}
bin_and_plot(df, "x", "y")