我正在开发一个自定义函数,其目标是为分组变量..f
的所有组合运行一个函数(grouping.var
)提供给定的数据帧,然后使用{将这些结果整理到数据帧中{1}}软件包。
这是我编写的自定义函数。请注意,broom
提供给...
,而..f
方法的附加参数是通过broom::tidy
列表提供的。
tidy.args
如下面的示例所示,尽管此功能有效,但我怀疑# setup
set.seed(123)
library(tidyverse)
options(pillar.sigfig = 8)
# custom function
grouped_tidy <- function(data,
grouping.vars,
..f,
...,
tidy.args = list()) {
# check how many variables were entered for grouping variable vector
grouping.vars <-
as.list(rlang::quo_squash(rlang::enquo(grouping.vars)))
grouping.vars <-
if (length(grouping.vars) == 1) {
grouping.vars
} else {
grouping.vars[-1]
}
# quote all argument to `..f`
dots <- rlang::enquos(...)
# running the grouped analysis
df_results <- data %>%
dplyr::group_by(.data = ., !!!grouping.vars, .drop = TRUE) %>%
dplyr::group_map(
.tbl = .,
.f = ~ broom::tidy(
x = rlang::exec(.fn = ..f, !!!dots, data = .x),
unlist(tidy.args)
))
# return the final dataframe with results
return(df_results)
}
列表是否得到正确评估,因为无论选择哪种tidy.args
,我总是得到相同的结果,小数点后四位地方。
conf.level
# using the function to get 95% CI
grouped_tidy(
data = ggplot2::diamonds,
grouping.vars = c(cut),
..f = stats::lm,
formula = price ~ carat - 1,
tidy.args = list(conf.int = TRUE, conf.level = 0.95)
)
#> # A tibble: 5 x 8
#> # Groups: cut [5]
#> cut term estimate std.error statistic p.value conf.low conf.high
#> <ord> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 Fair carat 4510.7919 42.614474 105.85117 0 4427.2062 4594.3776
#> 2 Good carat 5260.8494 27.036670 194.58200 0 5207.8454 5313.8534
#> 3 Very Good carat 5672.5054 18.675939 303.73334 0 5635.8976 5709.1132
#> 4 Premium carat 5807.1392 16.836474 344.91422 0 5774.1374 5840.1410
#> 5 Ideal carat 5819.4837 15.178657 383.39911 0 5789.7324 5849.2350
关于如何更改函数,以便# using the function to get 99% CI
grouped_tidy(
data = ggplot2::diamonds,
grouping.vars = c(cut),
..f = stats::lm,
formula = price ~ carat - 1,
tidy.args = list(conf.int = TRUE, conf.level = 0.99)
)
#> # A tibble: 5 x 8
#> # Groups: cut [5]
#> cut term estimate std.error statistic p.value conf.low conf.high
#> <ord> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 Fair carat 4510.7919 42.614474 105.85117 0 4427.2062 4594.3776
#> 2 Good carat 5260.8494 27.036670 194.58200 0 5207.8454 5313.8534
#> 3 Very Good carat 5672.5054 18.675939 303.73334 0 5635.8976 5709.1132
#> 4 Premium carat 5807.1392 16.836474 344.91422 0 5774.1374 5840.1410
#> 5 Ideal carat 5819.4837 15.178657 383.39911 0 5789.7324 5849.2350
正确评估参数列表的任何想法吗?
答案 0 :(得分:2)
set.seed(123)
library(tidyverse)
options(pillar.sigfig = 8)
grouped_tidy <- function(data,
grouping.vars,
..f,
...,
tidy.args = list()) {
# functions passed to group_map must accept
# .x and .y arguments, where .x is the data
tidy_group <- function(.x, .y) {
# presumes ..f won't explode if called with these args
model <- ..f(..., data = .x)
# mild variation on do.call to call function with
# list of arguments
rlang::exec(broom::tidy, model, !!!tidy.args)
}
data %>%
group_by(!!!grouping.vars, .drop = TRUE) %>%
group_map(tidy_group) %>%
ungroup() # don't get bitten by groups downstream
}
grouped_tidy(
data = ggplot2::diamonds,
# wrap grouping columns in vars() like in scoped dplyr verbs
grouping.vars = vars(cut),
..f = stats::lm,
formula = price ~ carat - 1,
tidy.args = list(conf.int = TRUE, conf.level = 0.95)
)
#> # A tibble: 5 x 8
#> cut term estimate std.error statistic p.value conf.low conf.high
#> <ord> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 Fair carat 4510.7919 42.614474 105.85117 0 4427.2062 4594.3776
#> 2 Good carat 5260.8494 27.036670 194.58200 0 5207.8454 5313.8534
#> 3 Very Good carat 5672.5054 18.675939 303.73334 0 5635.8976 5709.1132
#> 4 Premium carat 5807.1392 16.836474 344.91422 0 5774.1374 5840.1410
#> 5 Ideal carat 5819.4837 15.178657 383.39911 0 5789.7324 5849.2350
由reprex package(v0.2.1)于2019-02-23创建