根据分类和分位数对具有唯一条件的列进行突变

时间:2018-12-30 12:49:37

标签: r dplyr mutate

set.seed(2)
example <- tibble(Score = round(rnorm(n = 12, 100, 20), digits = 0))

   Score
   <dbl>
 1    82
 2   104
 3   132
 4    77
 5    98
 6   103
 7   114
 8    95
 9   140
10    97
11   108
12   120

我想做的是对一个新变量new进行突变,该变量将最小的数字分类为alpha,将第二个最小的数字分类为beta,将第三个最大的数字分类为第二个分位数为above median,最后两个分位数为below median

我考虑过要进行多个变异来实现此目的,但想知道是否有人可以提供更优雅的解决方案?

预期产量

   Score new         
   <dbl> <chr>       
 1    77 alpha       
 2    82 beta        
 3    95 above median
 4    97 above median
 5    98 above median
 6   103 above median
 7   104 below median
 8   108 below median
 9   114 below median
10   120 below median
11   132 below median
12   140 below median

2 个答案:

答案 0 :(得分:1)

这是使用dplyr的case_when的非常幼稚的实现:

library(dplyr)
library(tibble)

set.seed(2)
example <- tibble(Score = round(rnorm(n = 12, 100, 20), digits = 0))

#returns the second smallest number
second_min = function(x){
  t = which.min(x)
  temp_x = x[-t]
  m = min(temp_x)
  return(m)
}

example %>% mutate(category = case_when(Score == min(Score) ~ "alpha",
                                                       Score == second_min(Score) ~ "beta",
                                                       Score < median(Score) ~ "below_median",
                                                       Score >= median(Score) ~ "above_median"))

请注意,所有等于最小值的值都将归为“ alpha”,所有等于第二最小值的值都将归为beta

答案 1 :(得分:0)

这看起来像是case_whendplyr的经典用例,其中我们可以定义多个条件并相应地将值分配给列。

library(dplyr)

example %>%
  arrange(Score) %>%
  mutate(new = case_when(row_number() == 1 ~ 'alpha', 
                         row_number() == 2  ~ 'beta', 
                         Score < median(Score) ~ 'below median', 
                         TRUE ~ 'above median'))



#    Score new         
#   <dbl> <chr>       
# 1   77. alpha       
# 2   82. beta        
# 3   95. below median
# 4   97. below median
# 5   98. below median
# 6  103. below median
# 7  104. above median
# 8  108. above median
# 9  114. above median
#10  120. above median
#11  132. above median
#12  140. above median