r按中位数对非零数据进行分类

时间:2018-05-18 05:15:28

标签: r

我的数据集如下;

ID  Quantity
1   0.93
2   0.17
3   NA
4   0.44
5   NA
6   0.86
7   0.07
8   0.23
9   1.00

现在,我想在列'数量'中对所有非零/非NA数据进行分类。进入< =中值和>中值。 ' NA'应被视为' 0'。因此,对于上述数据,中位数是“0.44”。因此,最终的数据集看起来应该是这样的;

ID  Quantity    Quantity_median
1   0.93        >0.44
2   0.17        <=0.44
3   NA          0
4   0.44        <=0.44
5   NA          0
6   0.86        >0.44
7   0.07        <=0.44
8   0.23        <=0.44
9   1.00        >0.44

2 个答案:

答案 0 :(得分:2)

因为,可能的级别数只有3,因此你可以尝试类似:

library(dplyr)

df %>% mutate(Qmedian = median(Quantity, na.rm = TRUE)) %>% 
      mutate(Quantity_median =  as.factor(case_when(
                                 is.na(Quantity) ~ "0",
                                 Quantity <= Qmedian ~ paste0("<=", Qmedian),
                                 Quantity >= Qmedian ~ paste0(">", Qmedian)
                                      ))) %>%
    select(-Qmedian)

#  ID Quantity Quantity_median
# 1  1     0.93           >0.44
# 2  2     0.17          <=0.44
# 3  3       NA               0
# 4  4     0.44          <=0.44
# 5  5       NA               0
# 6  6     0.86           >0.44
# 7  7     0.07          <=0.44
# 8  8     0.23          <=0.44
# 9  9     1.00           >0.44

答案 1 :(得分:2)

我们也可以使用cut

m1 <- median(df1$Quantity, na.rm = TRUE)
lbls <- paste0(c("<=", ">"), m1)
df1$Quantity_median <-  with(df1, as.character(cut(Quantity, breaks = 2, labels = lbls)))
df1$Quantity_median[is.na(df1$Quantity_median)] <- 0
df1
#  ID Quantity Quantity_median
#1  1     0.93           >0.44
#2  2     0.17          <=0.44
#3  3       NA               0
#4  4     0.44          <=0.44
#5  5       NA               0
#6  6     0.86           >0.44
#7  7     0.07          <=0.44
#8  8     0.23          <=0.44
#9  9     1.00           >0.44