子组上的新列以及另一列中的百分比范围

时间:2018-09-26 15:26:47

标签: r dataframe

我有一个示例df,如下所示:

df_test<- data.frame("Group.Name"=c("Group1","Group2","Group1","Group2","Group2","Group2","Group1"),
                "Sub_group_name"=c("A","A","B","C","D","E","C"),
                "Total%"=c(35,26,10,9,5,11,13))

原始df很大,需要记住此df:

  • 只有两个组“ Group1”和“ Group2”
  • 一个组下有多个子组,上面的df显示了一些子组
  • 组+子组的总百分比总计为100%。在上面并不是因为它只是一个示例。因此,对于Group1之类的所有子组,例如A, B, C等,“ Group2 ”的总和最多为100。 Group1 Group2 的子组将大致相同

询问:

我需要创建一个名为Category的列,该列可以在Total%级别的Group.Name范围内工作。创建新列的条件是:

  • 对于每个Group.Name最高的Total%,类别列即为Sub_group_name名称。

  • 对于10到30之间的每个Group.NameTotal%,类别列为“ New_Group1 ”。

  • 对于每Group.NameTotal%小于10的类别列为“ New_Group2 ”。

预期输出:

df_output<- data.frame("Group.Name"=c("Group1","Group2","Group1","Group2","Group2","Group2","Group1"),
                     "Sub_group_name"=c("A","A","B","C","D","E","C"),
                     "Total%"=c(35,26,10,9,5,11,13),
                     "category"=c("A","A","New_Group1","New_Group1","New_Group2","New_Group1","New_Group1"))

1 个答案:

答案 0 :(得分:1)

我们可以使用cut来创建具有相应labels的{​​{1}},然后替换“总计”。是每个“ Group.Name”中最高的,对应的是“ Sub_group_name”

breaks

数据

library(dplyr)
df_test %>% 
  group_by(Group.Name) %>%
  mutate(category = as.character(cut(`Total%`, breaks = c(-Inf,10, 30, Inf), 
          labels = c("New_Group2", "New_Group1", "Other"), right = FALSE)), 
         category = case_when(`Total%` == max(`Total%`) ~ 
                          Sub_group_name,
                                   TRUE ~ category))
# A tibble: 7 x 4
# Groups:   Group.Name [2]
#  Group.Name Sub_group_name `Total%` category  
#  <chr>      <chr>             <dbl> <chr>     
#1 Group1     A                    35 A         
#2 Group2     A                    26 A         
#3 Group1     B                    10 New_Group1
#4 Group2     C                     9 New_Group2
#5 Group2     D                     5 New_Group2
#6 Group2     E                    11 New_Group1
#7 Group1     C                    13 New_Group1