带有case_when语句的mutate_at

时间:2019-12-11 15:52:41

标签: r dplyr conditional-statements

我正在尝试创建一个数据框,其中的字段是两个分组变量,以及大量度量的均值,下置信度区间和上置信度区间,如下所述:

How to pass more complex functions to summarise_if or mutate_if?

这是我正在使用的数据集的代表(真正的数据集具有更多的结果和更多的行):

      var1 var2 outcome1  outcome2 outcome3 outcome4 outcome5
1  0000999  214        0 0.0000000      0.0       10       82
2  0000999  214        0 0.0000000      0.0       11       88
3  0000999  214        0 0.0000000      0.0       10       90
4  0000999  214        0 0.0000000      0.0        5       45
5  0001382  214       13 0.7647059      1.5       12       36
6  0001382  214        0 0.0000000      0.0        7       46
7  0001382  214        8 1.0000000      1.5        7       51
8  0001382  214        0 0.0000000      0.0        0        2
9  0001382  214       16 1.0000000      1.5       15       55
10 0001950  214        7 0.8750000      1.5        6       43
11 0001950  214        0 0.0000000      0.0        8       59
12 0001950  214        0 0.0000000      0.0        3      105
13 0001950  214        0 0.0000000      0.0        1       65
14 0001957  214        0 0.0000000      0.0        3       30
15 0001957  214        0 0.0000000      0.0        8       57
16 0001957  214        5 0.7142857      1.5        4       78
17 0001957  214        0 0.0000000      0.0        3       36
18 0010610  214        0 0.0000000      0.0        1        8
19 0021726  215        0 0.0000000      0.0        8       67
20 0021726  215        0 0.0000000      0.0       15       87
21 0021726  215        0 0.0000000      0.0        8       79
22 0021726  215       14 0.7368421      3.0       12      106
23 0021726  215        0 0.0000000      0.0        0       11
24 0022908  215        0 0.0000000      0.0        1       41
25 0022908  215        0 0.0000000      0.0        0        0

我使用的代码是:

 lci <- function(data) {
  as.numeric(ci(data)[2])
}

uci <- function(data) {
  as.numeric(ci(data)[3])
}   

data_agg <- data %>%
      group_by(var1, var2) %>%
      summarise_if(is.numeric, funs(mean, lci, uci)) %>%
      select(var1, var2, sort(current_vars())) #sorts into lci, mean, uci for each outcome var

打印时给出的

# A tibble: 7 x 17
# Groups:   var1 [7]
  var1   var2 outcome1_lci outcome1_mean outcome1_uci outcome2_lci outcome2_mean outcome2_uci outcome3_lci outcome3_mean
  <chr> <int>        <dbl>         <dbl>        <dbl>        <dbl>         <dbl>        <dbl>        <dbl>         <dbl>
1 0000~   214         0             0            0          0              0            0            0             0    
2 0001~   214        -1.71          7.4         16.5       -0.0851         0.553        1.19        -0.120         0.9  
3 0001~   214        -3.82          1.75         7.32      -0.477          0.219        0.915       -0.818         0.375
4 0001~   214        -2.73          1.25         5.23      -0.390          0.179        0.747       -0.818         0.375
5 0010~   214       NaN             0          NaN        NaN              0          NaN          NaN             0    
6 0021~   215        -4.97          2.8         10.6       -0.262          0.147        0.557       -1.07          0.6  
7 0022~   215         0             0            0          0              0            0            0             0    
# ... with 7 more variables: outcome3_uci <dbl>, outcome4_lci <dbl>, outcome4_mean <dbl>, outcome4_uci <dbl>,
#   outcome5_lci <dbl>, outcome5_mean <dbl>, outcome5_uci <dbl>

但是较低的CI经常低于零,对于这些数据,这在物理上是不可能的。因此,在这种情况下,我尝试添加条件突变以将其重置为零。

   data_agg <- data %>%
      group_by(var1, var2) %>%
      summarise_if(is.numeric, funs(mean, lci, uci)) %>%
      mutate_at(vars(contains("lci")), case_when(.<0 ~ 0, TRUE ~ .)) %>%
      select(var1, var2, sort(current_vars()))  #sorts into lci, mean, uci for each outcome var 

返回:

Error: `TRUE ~ (.)` must be length 119 or one, not 17

任何有更多使用范围函数经验的人都可以告诉我我在做错什么,而应该怎么做吗?

0 个答案:

没有答案