我看到了类似的帖子 - https://stackoverflow.com/questions/6104836/splitting-a-continuous-variable-into-equal-sized-groups。但我的问题是,范围必须是一个字符串。 下面是我的数据框
df
name salary bonus increment(%)
AK 22200 120 2
BK 55000 34 .1
JK 12000 400 3
VK 3400 350 15
DK 5699 NA NA
df = structure(list(name = c("AK", "BK", "JK", "VK", "DK"), salary = c(22200L, 55000L, 12000L, 3400L, 5699L), bonus = c(120L, 34L, 400L, 350L, NA), `increment(%)` = c(2, 0.1, 3, 15, NA)), .Names = c("name", "salary", "bonus", "increment(%)"), row.names = c(NA, -5L), class = "data.frame")
工资列需要使用"< 10K"
,"10K-20K"
,"20K-30K"
"> 30K"
这样的范围进行修改,这些值是字母数字,cut by defined interval无法解决
name salary bonus increment(%)
AK 20K-30K 120 2
BK >30K 34 .1
JK 10K-20K 400 3
VK <10K 350 15
DK <10K NA NA
然而,在使用cut by r定义的间隔后没有产生所需的结果,下面是代码
df$salary<-cut(df$salary,breaks = c(0,10000,20000,30000,60000),include.lowest = TRUE)
输出
name salary bonus increment(%)
1 AK (2e+04,3e+04] 120 2.0
2 BK (3e+04,6e+04] 34 0.1
3 JK (1e+04,2e+04] 400 3.0
4 VK [0,1e+04] 350 15.0
5 DK [0,1e+04] NA NA
答案 0 :(得分:1)
您可以使用case_when
包中的dplyr
功能。 df2
是最终输出。
library(dplyr)
df2 <- df %>%
mutate(salary = case_when(
salary < 10000 ~ "<10K",
salary >= 10000 & salary < 20000 ~ "10K-20K",
salary >= 20000 & salary < 30000 ~ "20K-30K",
salary >= 30000 ~ ">30K",
TRUE ~ "NA"
))