我们说我有一个具有多个级别的因子变量,我试图将它们分成几组。
> levels(dat$years_continuously_insured_order2)
[1] "1" "2" "3" "4" "5" "6" "7" "8" "9" "10" "11" "12" "13" "14" "15" "16" "17" "18"
[19] "19" "20"
> levels(dat$age_of_oldest_driver)
[1] "-16" "1" "15" "16" "17" "18" "19" "20" "21" "22" "23" "24" "25" "26" "27" "28" "29" "30" "31" "32" "33"
[22] "34" "35" "36" "37" "38" "39" "40
我有一个运行这些变量的脚本,并将它们分成几个类别。但是,每次运行脚本时,级别数可能(通常是)不同。因此,如果我将用于对变量进行分组的原始代码如下(见下文),如果在一小时之后,我的脚本运行且级别不同,则它将无法使用。而不是15个级别,我现在可以有25个级别,并且值不同,但我仍然需要将它们分组到特定类别。
dat$years_continuously_insured2 <- NA
dat$years_continuously_insured2[dat$years_continuously_insured %in% levels(dat$years_continuously_insured)[1]] <- NA
dat$years_continuously_insured2[dat$years_continuously_insured %in% levels(dat$years_continuously_insured)[2:3]] <- "1 or less"
dat$years_continuously_insured2[dat$years_continuously_insured %in% levels(dat$years_continuously_insured)[4]] <- "2"
dat$years_continuously_insured2[dat$years_continuously_insured %in% levels(dat$years_continuously_insured)[5:7]] <- "3 +"
dat$years_continuously_insured2 <- factor(dat$years_continuously_insured2)
如何找到一种更优雅的方式将变量分组到细分中?在R中有更好的方法吗?
谢谢!
答案 0 :(得分:2)
您可以将连续保险变量中的因子水平转换为数字,然后切换到您的类别并重新计算因子()。第一步在R-FAQ中描述(要正确地执行它是一个两步过程):
dat$years_cont <- factor( cut( as.numeric(as.character(
dat$years_continuously_insured_order2)),
breaks=c(0,2,3, Inf), right=FALSE ),
labels=c( "1 or less", "2", "3 +")
)
#-----------------
> str(dat)
'data.frame': 100 obs. of 2 variables:
$ years_continuously_insured_order2: Factor w/ 20 levels "1","10","11",..: 4 15 19 5 8 4 16 12 12 18 ...
$ years_cont : Factor w/ 3 levels "1 or less","2",..: 3 3 3 3 3 3 3 2 2 3 ...
答案 1 :(得分:0)
如果您的原始列是数字,请将其视为数字,而不是因素。一个更容易做你正在做的事情的方法是:
bin.value = function(x) {
ifelse(x <= 1, "1 or less", ifelse(x == 2, "2", "3+"))
}
dat$years_continuously_insured2 = as.factor(bin.value(as.integer(dat$years_continuously_insured)))