我的数据集如下;
ID Quantity
1 0.93
2 0.17
3 NA
4 0.44
5 NA
6 0.86
7 0.07
8 0.23
9 1.00
现在,我想在列'数量'中对所有非零/非NA数据进行分类。进入< =中值和>中值。 ' NA'应被视为' 0'。因此,对于上述数据,中位数是“0.44”。因此,最终的数据集看起来应该是这样的;
ID Quantity Quantity_median
1 0.93 >0.44
2 0.17 <=0.44
3 NA 0
4 0.44 <=0.44
5 NA 0
6 0.86 >0.44
7 0.07 <=0.44
8 0.23 <=0.44
9 1.00 >0.44
答案 0 :(得分:2)
因为,可能的级别数只有3,因此你可以尝试类似:
library(dplyr)
df %>% mutate(Qmedian = median(Quantity, na.rm = TRUE)) %>%
mutate(Quantity_median = as.factor(case_when(
is.na(Quantity) ~ "0",
Quantity <= Qmedian ~ paste0("<=", Qmedian),
Quantity >= Qmedian ~ paste0(">", Qmedian)
))) %>%
select(-Qmedian)
# ID Quantity Quantity_median
# 1 1 0.93 >0.44
# 2 2 0.17 <=0.44
# 3 3 NA 0
# 4 4 0.44 <=0.44
# 5 5 NA 0
# 6 6 0.86 >0.44
# 7 7 0.07 <=0.44
# 8 8 0.23 <=0.44
# 9 9 1.00 >0.44
答案 1 :(得分:2)
我们也可以使用cut
m1 <- median(df1$Quantity, na.rm = TRUE)
lbls <- paste0(c("<=", ">"), m1)
df1$Quantity_median <- with(df1, as.character(cut(Quantity, breaks = 2, labels = lbls)))
df1$Quantity_median[is.na(df1$Quantity_median)] <- 0
df1
# ID Quantity Quantity_median
#1 1 0.93 >0.44
#2 2 0.17 <=0.44
#3 3 NA 0
#4 4 0.44 <=0.44
#5 5 NA 0
#6 6 0.86 >0.44
#7 7 0.07 <=0.44
#8 8 0.23 <=0.44
#9 9 1.00 >0.44