计算每个组i数据帧的分位数并分配NA?

时间:2020-05-17 12:25:11

标签: r

我编造了这个例子来解释我的问题:

 df= structure(list(group = structure(c(1L, 1L, 2L, 2L, 10L, 10L
   ), .Label = c("Eve", "ba", "De", "De","Mi", "C", "O", "W", 
"as", "ras", "Cro", "ics"), class = "factor"), ds = c(8, 8, 
 1, 4, 4, 6), em = c(1, 3, 8,2, 7, 3)), row.names = c(74567L, 
74568L, 74570L, 74576L, 74577L, 74578L), class = "data.frame")

我需要为每个组将em和ds的所有值分配给NA

 > quantile 90 = NA

 < quantile 10 = NA

1 个答案:

答案 0 :(得分:0)

这是使用dplyr和ifelse对每个组和每个数字变量执行此操作的方法。

每组只有几个样本,很难解释分位数的整个概念,因此,获得的结果很大程度上取决于定义分位数的方式。使用type参数可以指定所使用的定义。 R默认为type = 7

library(dplyr)

df %>% 
   group_by(group) %>% 
   mutate(ds = ifelse(ds > quantile(ds, .9) | ds < quantile(ds, .1), NA, ds),
          em = ifelse(em > quantile(em, .9) | em < quantile(em, .1), NA, em))
#> # A tibble: 6 x 3
#> # Groups:   group [3]
#>   group    ds em   
#>   <fct> <dbl> <lgl>
#> 1 Eve       8 NA   
#> 2 Eve       8 NA   
#> 3 ba       NA NA   
#> 4 ba       NA NA   
#> 5 ras      NA NA   
#> 6 ras      NA NA   

但是,您可以根据定义进行更改:

df %>% 
   group_by(group) %>% 
   mutate(ds = ifelse(ds > quantile(ds, .9, type = 1) | 
                      ds < quantile(ds, .1, type = 1), NA, ds),
          em = ifelse(em > quantile(em, .9, type = 1) |
                      em < quantile(em, .1, type = 1), NA, em))
#> # A tibble: 6 x 3
#> # Groups:   group [3]
#>   group    ds    em
#>   <fct> <dbl> <dbl>
#> 1 Eve       8     1
#> 2 Eve       8     3
#> 3 ba        1     8
#> 4 ba        4     2
#> 5 ras       4     7
#> 6 ras       6     3

reprex package(v0.3.0)于2020-05-17创建