set.seed(2)
example <- tibble(Score = round(rnorm(n = 12, 100, 20), digits = 0))
Score
<dbl>
1 82
2 104
3 132
4 77
5 98
6 103
7 114
8 95
9 140
10 97
11 108
12 120
我想做的是对一个新变量new
进行突变,该变量将最小的数字分类为alpha
,将第二个最小的数字分类为beta
,将第三个最大的数字分类为第二个分位数为above median
,最后两个分位数为below median
。
我考虑过要进行多个变异来实现此目的,但想知道是否有人可以提供更优雅的解决方案?
预期产量
Score new
<dbl> <chr>
1 77 alpha
2 82 beta
3 95 above median
4 97 above median
5 98 above median
6 103 above median
7 104 below median
8 108 below median
9 114 below median
10 120 below median
11 132 below median
12 140 below median
答案 0 :(得分:1)
这是使用dplyr的case_when
的非常幼稚的实现:
library(dplyr)
library(tibble)
set.seed(2)
example <- tibble(Score = round(rnorm(n = 12, 100, 20), digits = 0))
#returns the second smallest number
second_min = function(x){
t = which.min(x)
temp_x = x[-t]
m = min(temp_x)
return(m)
}
example %>% mutate(category = case_when(Score == min(Score) ~ "alpha",
Score == second_min(Score) ~ "beta",
Score < median(Score) ~ "below_median",
Score >= median(Score) ~ "above_median"))
请注意,所有等于最小值的值都将归为“ alpha”,所有等于第二最小值的值都将归为beta
答案 1 :(得分:0)
这看起来像是case_when
中dplyr
的经典用例,其中我们可以定义多个条件并相应地将值分配给列。
library(dplyr)
example %>%
arrange(Score) %>%
mutate(new = case_when(row_number() == 1 ~ 'alpha',
row_number() == 2 ~ 'beta',
Score < median(Score) ~ 'below median',
TRUE ~ 'above median'))
# Score new
# <dbl> <chr>
# 1 77. alpha
# 2 82. beta
# 3 95. below median
# 4 97. below median
# 5 98. below median
# 6 103. below median
# 7 104. above median
# 8 108. above median
# 9 114. above median
#10 120. above median
#11 132. above median
#12 140. above median