我正在尝试创建一个数据框,其中的字段是两个分组变量,以及大量度量的均值,下置信度区间和上置信度区间,如下所述:
How to pass more complex functions to summarise_if or mutate_if?
这是我正在使用的数据集的代表(真正的数据集具有更多的结果和更多的行):
var1 var2 outcome1 outcome2 outcome3 outcome4 outcome5
1 0000999 214 0 0.0000000 0.0 10 82
2 0000999 214 0 0.0000000 0.0 11 88
3 0000999 214 0 0.0000000 0.0 10 90
4 0000999 214 0 0.0000000 0.0 5 45
5 0001382 214 13 0.7647059 1.5 12 36
6 0001382 214 0 0.0000000 0.0 7 46
7 0001382 214 8 1.0000000 1.5 7 51
8 0001382 214 0 0.0000000 0.0 0 2
9 0001382 214 16 1.0000000 1.5 15 55
10 0001950 214 7 0.8750000 1.5 6 43
11 0001950 214 0 0.0000000 0.0 8 59
12 0001950 214 0 0.0000000 0.0 3 105
13 0001950 214 0 0.0000000 0.0 1 65
14 0001957 214 0 0.0000000 0.0 3 30
15 0001957 214 0 0.0000000 0.0 8 57
16 0001957 214 5 0.7142857 1.5 4 78
17 0001957 214 0 0.0000000 0.0 3 36
18 0010610 214 0 0.0000000 0.0 1 8
19 0021726 215 0 0.0000000 0.0 8 67
20 0021726 215 0 0.0000000 0.0 15 87
21 0021726 215 0 0.0000000 0.0 8 79
22 0021726 215 14 0.7368421 3.0 12 106
23 0021726 215 0 0.0000000 0.0 0 11
24 0022908 215 0 0.0000000 0.0 1 41
25 0022908 215 0 0.0000000 0.0 0 0
我使用的代码是:
lci <- function(data) {
as.numeric(ci(data)[2])
}
uci <- function(data) {
as.numeric(ci(data)[3])
}
data_agg <- data %>%
group_by(var1, var2) %>%
summarise_if(is.numeric, funs(mean, lci, uci)) %>%
select(var1, var2, sort(current_vars())) #sorts into lci, mean, uci for each outcome var
打印时给出的
# A tibble: 7 x 17
# Groups: var1 [7]
var1 var2 outcome1_lci outcome1_mean outcome1_uci outcome2_lci outcome2_mean outcome2_uci outcome3_lci outcome3_mean
<chr> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 0000~ 214 0 0 0 0 0 0 0 0
2 0001~ 214 -1.71 7.4 16.5 -0.0851 0.553 1.19 -0.120 0.9
3 0001~ 214 -3.82 1.75 7.32 -0.477 0.219 0.915 -0.818 0.375
4 0001~ 214 -2.73 1.25 5.23 -0.390 0.179 0.747 -0.818 0.375
5 0010~ 214 NaN 0 NaN NaN 0 NaN NaN 0
6 0021~ 215 -4.97 2.8 10.6 -0.262 0.147 0.557 -1.07 0.6
7 0022~ 215 0 0 0 0 0 0 0 0
# ... with 7 more variables: outcome3_uci <dbl>, outcome4_lci <dbl>, outcome4_mean <dbl>, outcome4_uci <dbl>,
# outcome5_lci <dbl>, outcome5_mean <dbl>, outcome5_uci <dbl>
但是较低的CI经常低于零,对于这些数据,这在物理上是不可能的。因此,在这种情况下,我尝试添加条件突变以将其重置为零。
data_agg <- data %>%
group_by(var1, var2) %>%
summarise_if(is.numeric, funs(mean, lci, uci)) %>%
mutate_at(vars(contains("lci")), case_when(.<0 ~ 0, TRUE ~ .)) %>%
select(var1, var2, sort(current_vars())) #sorts into lci, mean, uci for each outcome var
返回:
Error: `TRUE ~ (.)` must be length 119 or one, not 17
任何有更多使用范围函数经验的人都可以告诉我我在做错什么,而应该怎么做吗?