将列值划分为前20%和后80%时出错

时间:2018-10-12 09:57:11

标签: r

我有一个向量使得:

Vec <- data.frame( Vec = c("70.0600", "8.5100", "5.8600", "399.9800", "9.0600", "78.8200", "71.4600") )

我想将上述值分为“最高20%”和“最低80%”,其结果应类似于:

 Vec        Dec
 70.0600    Top_20
 .          .
 .          .
 5.8600     Bottom_80

我正在尝试类似的事情:

Vec$Quartile <- quantile(Vec$Vec, probs = c(0.20, 0.80))

但是我恰好得到了50-50%的数据值:

 sum( Vec$Quartile>20 )

我不确定我在哪里错了?

2 个答案:

答案 0 :(得分:4)

喜欢吗?

library(dplyr)

Vec <- data.frame(Vec = c(70.0600, 8.5100, 5.8600, 399.9800, 9.0600, 78.8200, 71.4600))

Vec %>%
  mutate(up = quantile(Vec, .8),
         part = ifelse(Vec > up, "Top_20", "Bottom_80"))

     Vec     up      part
1  70.06 77.348 Bottom_80
2   8.51 77.348 Bottom_80
3   5.86 77.348 Bottom_80
4 399.98 77.348    Top_20
5   9.06 77.348 Bottom_80
6  78.82 77.348    Top_20
7  71.46 77.348 Bottom_80

答案 1 :(得分:3)

一种非常简单的方法,无需加载其他库:

结果

   value       dec
1 399.98    Top_20
2  78.82    Top_20
3  70.06 Bottom_20
4   8.51 Bottom_20
5   5.86 Bottom_20
6   9.06 Bottom_20
7  71.46 Bottom_20

代码

Vec <- c(70.0600, 8.5100, 5.8600, 399.9800, 9.0600, 78.8200, 71.4600)

q <- quantile(Vec, .8)

Vec <- rbind(
    data.frame(value = subset(Vec, Vec > q), dec = "Top_20"),
    data.frame(value = subset(Vec, Vec <= q), dec = "Bottom_20"))