我有一个示例df,如下所示:
df_test<- data.frame("Group.Name"=c("Group1","Group2","Group1","Group2","Group2","Group2","Group1"),
"Sub_group_name"=c("A","A","B","C","D","E","C"),
"Total%"=c(35,26,10,9,5,11,13))
原始df很大,需要记住此df:
Group1
之类的所有子组,例如A, B, C
等,“ Group2 ”的总和最多为100。 Group1 和 Group2 的子组将大致相同询问:
我需要创建一个名为Category
的列,该列可以在Total%
级别的Group.Name
范围内工作。创建新列的条件是:
对于每个Group.Name
最高的Total%
,类别列即为Sub_group_name
名称。
对于10到30之间的每个Group.Name
和Total%
,类别列为“ New_Group1 ”。
对于每Group.Name
和Total%
小于10的类别列为“ New_Group2 ”。
预期输出:
df_output<- data.frame("Group.Name"=c("Group1","Group2","Group1","Group2","Group2","Group2","Group1"),
"Sub_group_name"=c("A","A","B","C","D","E","C"),
"Total%"=c(35,26,10,9,5,11,13),
"category"=c("A","A","New_Group1","New_Group1","New_Group2","New_Group1","New_Group1"))
答案 0 :(得分:1)
我们可以使用cut
来创建具有相应labels
的{{1}},然后替换“总计”。是每个“ Group.Name”中最高的,对应的是“ Sub_group_name”
breaks
library(dplyr)
df_test %>%
group_by(Group.Name) %>%
mutate(category = as.character(cut(`Total%`, breaks = c(-Inf,10, 30, Inf),
labels = c("New_Group2", "New_Group1", "Other"), right = FALSE)),
category = case_when(`Total%` == max(`Total%`) ~
Sub_group_name,
TRUE ~ category))
# A tibble: 7 x 4
# Groups: Group.Name [2]
# Group.Name Sub_group_name `Total%` category
# <chr> <chr> <dbl> <chr>
#1 Group1 A 35 A
#2 Group2 A 26 A
#3 Group1 B 10 New_Group1
#4 Group2 C 9 New_Group2
#5 Group2 D 5 New_Group2
#6 Group2 E 11 New_Group1
#7 Group1 C 13 New_Group1