我正在编写一个函数来计算需要使用dplyr和tidyr进行NSE评估的计数表的优势比。很明显,这是我第一次进入NSE世界。
例如,使用数据框' foo':
# A tibble: 4 x 3
strata group select
<chr> <chr> <chr>
1 Manager A_Group Chosen
2 Worker A_Group Chosen
3 Manager B_Group Not_Chosen
4 Worker B_Group Chosen
5 ...
我首先要做的是:foo2 <- foo %>% count(strata, group, select)
# A tibble: 8 x 4
strata group select n
<chr> <chr> <chr> <int>
1 Manager A_Group Chosen 1
2 Manager A_Group Not_Chosen 9
3 Manager B_Group Chosen 1
4 Manager B_Group Not_Chosen 3
5 ...
接下来,我使用tidyr的unite和spread进行广泛格式化,并根据组的值命名新列并选择列:
foo2 %>% unite(cat, c(group, select)) %>%
spread(cat, n, fill = 0)
# A tibble: 2 x 5
strata A_Group_Chosen A_Group_Not_Chosen B_Group_Chosen B_Group_Not_Chosen
* <chr> <dbl> <dbl> <dbl> <dbl>
1 Manager 1 9 1 3
2 Worker 1 11 1 3
最后,我计算了一个新列,或者作为
... %>% mutate(OR = (A_Group_Chosen * B_Group_Not_Chosen) /
(A_Group_Not_Chosen * B_Group_Chosen))
要将此代码放在函数中,我使用enquo和!!处理原始列,但是为了计算新列,OR,我需要新创建的列(通过组的值的串联命名并选择列)。问题是如何取消引用&#39; OR计算的名称?
我的当前草稿在unite / spread之后保存中间结果,将名称放入向量中,并使用$`!!&#39;()运算符。这感觉非常糟糕。一个更好的方法?
我的功能:
OR_tab <- function(dat, strat, grp, decision ){
strat <- enquo(strat)
grp <- enquo(grp)
decision <- enquo(decision)
tab <- dat %>% count(!!strat, !!grp, !!decision) %>% unite(cat, c(!!grp, !!decision)) %>%
spread(cat, n, fill = 0)
nm <- names(tab)[2:5]
tab %>% mutate(OR = (tab$`!!`(nm[1]) * tab$`!!`(nm[4])) / (tab$`!!`(nm[2]) * (tab$`!!`(nm[3])))) %>%
print(n = Inf)
}
OR_tab(foo, strata, group, select)
我的原始数据框,&#39; foo&#39; :
> dput(foo2)
structure(list(strata = c("Manager", "Worker", "Manager", "Manager",
"Worker", "Manager", "Manager", "Manager", "Worker", "Worker",
"Worker", "Worker", "Worker", "Worker", "Manager", "Worker",
"Worker", "Manager", "Manager", "Manager", "Worker", "Worker",
"Manager", "Manager", "Manager", "Manager", "Worker", "Worker",
"Worker", "Worker"), group = c("A_Group", "A_Group", "A_Group",
"A_Group", "B_Group", "A_Group", "B_Group", "A_Group", "A_Group",
"A_Group", "A_Group", "A_Group", "B_Group", "B_Group", "A_Group",
"A_Group", "A_Group", "A_Group", "A_Group", "B_Group", "A_Group",
"A_Group", "B_Group", "B_Group", "A_Group", "A_Group", "B_Group",
"A_Group", "A_Group", "A_Group"), select = c("Chosen", "Chosen",
"Not_Chosen", "Not_Chosen", "Not_Chosen", "Not_Chosen", "Not_Chosen",
"Not_Chosen", "Not_Chosen", "Not_Chosen", "Not_Chosen", "Not_Chosen",
"Not_Chosen", "Not_Chosen", "Not_Chosen", "Not_Chosen", "Not_Chosen",
"Not_Chosen", "Not_Chosen", "Not_Chosen", "Not_Chosen", "Not_Chosen",
"Not_Chosen", "Chosen", "Not_Chosen", "Not_Chosen", "Chosen",
"Not_Chosen", "Not_Chosen", "Not_Chosen")), .Names = c("strata",
"group", "select"), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA,
-30L))
答案 0 :(得分:0)
通过对长数据进行优势比计算(如@MrFlick所建议),您可以避免在spread
之后取消引用:
library(tidyverse)
OR_tab2 <- function(dat, strat, grp, decision ){
strat <- enquo(strat)
grp <- enquo(grp)
decision <- enquo(decision)
dat %>%
count(!!strat, !!grp, !!decision) %>%
group_by(!!strat) %>%
mutate(OR = (n[1]*n[4])/(n[2]*n[3])) %>%
unite(cat, c(!!grp, !!decision)) %>%
spread(cat, n)
}
OR_tab2(foo2, strata, group, select)
strata OR A_Group_Chosen A_Group_Not_Chosen B_Group_Chosen B_Group_Not_Chosen <chr> <dbl> <int> <int> <int> <int> 1 Manager 0.333 1 9 1 3 2 Worker 0.273 1 11 1 3
与原始代码一样,这适用于任何数据框,group
和select
参数每个只有两个级别,但分子或分母中哪些级别对将取决于每列中级别的顺序。例如,请注意,对于下面的重新编码数据框df2
,优势比相对于原始数据框df
的优先级被反转。
df2 = foo2 %>%
mutate(experimental_groups = recode(group,
"A_Group"="Control",
"B_Group"="Treatment"),
flavor = recode(select,
"Chosen"="Vanilla",
"Not_Chosen"="Chocolate"))
OR_tab2(df2, strata, experimental_groups, flavor)
strata OR Control_Chocolate Control_Vanilla Treatment_Chocolate Treatment_Vanilla <chr> <dbl> <int> <int> <int> <int> 1 Manager 3.00 9 1 3 1 2 Worker 3.67 11 1 3 1
OR_tab(df2, strata, experimental_groups, flavor)
strata Control_Chocolate Control_Vanilla Treatment_Chocolate Treatment_Vanilla OR <chr> <dbl> <dbl> <dbl> <dbl> <dbl> 1 Manager 9. 1. 3. 1. 3.00 2 Worker 11. 1. 3. 1. 3.67