我有class
变量,它是marital
,gender
和age
(例如MM32)的串联。我想对它们进行分组,以便最终输出如下:
Class ClassGrp
SM20 SM20-25
SM21 SM20-25
SM22 SM20-25
MF20 MF20-25
MF21 MF20-25
SF30 SF26-30
SF31 SF31-35
我有age
,gender
和marital
的单独列,因此我的初始流程是age
cut
函数中断cut(data$Class, breaks = 10)
}}。但是,我无法想到如何将它们转换为20-25
格式。
修改
输入数据
data <- structure(list(age = c(19L, 20L, 20L, 21L, 21L, 22L), gender = structure(c(2L,
1L, 2L, 1L, 2L, 1L), .Label = c("Female", "Male"), class = "factor"),
marital = structure(c(3L, 3L, 3L, 3L, 3L, 2L), .Label = c("Divorced",
"Married", "Single", "Widowed"), class = "factor"), class = c("SM19",
"SF20", "SM20", "SF21", "SM21", "MF22"), ageGrp = structure(c(1L,
1L, 1L, 1L, 1L, 1L), .Label = c("(18.9,25]", "(25,31]", "(31,37]",
"(37,43]", "(43,49]", "(49,55]", "(55,61]", "(61,67]", "(67,73]",
"(73,79.1]"), class = "factor")), .Names = c("age", "gender",
"marital", "class", "ageGrp"), row.names = c(NA, 6L), class = "data.frame")
答案 0 :(得分:1)
# Read data
x <- read.table(file = "clipboard")
# Show the data I read in
x
# Bin the data as requested
x$ClassGrp <- as.character(x$ageGrp)
x$ClassGrp <- gsub("\\(", "", x$ClassGrp)
x$ClassGrp <- gsub("\\]", "", x$ClassGrp)
x$ClassGrp <- gsub(",", "-", x$ClassGrp)
x$ClassGrp <- gsub("18.9", "19", x$ClassGrp)
x$g <- "M"
x$g[x$gender == "Female"] <- "F"
x$m <- "S"
x$m[x$marital == "Married"] <- "M"
for(i in 1:nrow(x)){
x$ClassGrp[i] <- paste(x$g[i],x$m[i],x$ClassGrp[i], collapse="", sep="")
}
x$g <- NULL
x$m <- NULL
# Show results
x
age gender marital class ageGrp ClassGrp
1 19 Male Single SM19 (18.9,25] MS19-25
2 20 Female Single SF20 (18.9,25] FS19-25
3 20 Male Single SM20 (18.9,25] MS19-25
4 21 Female Single SF21 (18.9,25] FS19-25
5 21 Male Single SM21 (18.9,25] MS19-25
6 22 Female Married MF22 (18.9,25] FM19-25
7 22 Female Single SF22 (18.9,25] FS19-25
8 22 Male Married MM22 (18.9,25] MM19-25
9 22 Male Single SM22 (18.9,25] MS19-25
10 23 Female Married MF23 (18.9,25] FM19-25
答案 1 :(得分:1)
您可以将输出箱定义为已排序的数组,并检查包含值的位置(大于或等于1,小于以下值)。
我还添加了一个控制检查,以防您的值超出您的容器(即可能小于最小值,或大于最大值)。
# Important: they should be ordered! "marital", "class", "ageGrp"), row.names = c(NA, 6L), class = "data.frame")
my.bins <- c(20, 25, 30, 35, 40, 50, 65)
# Transform into bins
to.bin <- function(class) {
gender.marital <- substring(class, 1, 2)
age <- as.numeric(substring(class, 3))
# Check the boundaries
if (age < min(my.bins)) {
return(paste0(gender.marital, "<", min(my.bins)))
} else if (age >= max(my.bins)) {
return(paste0(gender.marital, ">=", max(my.bins)))
}
lower <- which(my.bins > age)[1]
return(paste0(gender.marital, my.bins[lower - 1], "-", my.bins[lower] - 1))
}
data$ClassGrp <- sapply(data$class, to.bin)
data
代码将您的数据返回:
age gender marital class ageGrp ClassGrp
1 19 Male Single SM19 (18.9,25] SM<20
2 20 Female Single SF20 (18.9,25] SF20-24
3 20 Male Single SM20 (18.9,25] SM20-24
4 21 Female Single SF21 (18.9,25] SF20-24
5 21 Male Single SM21 (18.9,25] SM20-24
6 22 Female Married MF22 (18.9,25] MF20-24