将变量分组为多个级别

时间:2012-09-10 17:02:30

标签: r

我们说我有一个具有多个级别的因子变量,我试图将它们分成几组。

> levels(dat$years_continuously_insured_order2)
 [1] "1"    "2"    "3"    "4"    "5"    "6"    "7"    "8"    "9"    "10"   "11"   "12"   "13"   "14"   "15"   "16"   "17"   "18"  
[19] "19"   "20" 

> levels(dat$age_of_oldest_driver)
 [1] "-16" "1"   "15"  "16"  "17"  "18"  "19"  "20"  "21"  "22"  "23"  "24"  "25"  "26"  "27"  "28"  "29"  "30"  "31"  "32"  "33" 
[22] "34"  "35"  "36"  "37"  "38"  "39"  "40

我有一个运行这些变量的脚本,并将它们分成几个类别。但是,每次运行脚本时,级别数可能(通常是)不同。因此,如果我将用于对变量进行分组的原始代码如下(见下文),如果在一小时之后,我的脚本运行且级别不同,则它将无法使用。而不是15个级别,我现在可以有25个级别,并且值不同,但我仍然需要将它们分组到特定类别。

dat$years_continuously_insured2 <- NA
dat$years_continuously_insured2[dat$years_continuously_insured %in% levels(dat$years_continuously_insured)[1]] <- NA
dat$years_continuously_insured2[dat$years_continuously_insured %in% levels(dat$years_continuously_insured)[2:3]] <- "1 or less"
dat$years_continuously_insured2[dat$years_continuously_insured %in% levels(dat$years_continuously_insured)[4]] <- "2"
dat$years_continuously_insured2[dat$years_continuously_insured %in% levels(dat$years_continuously_insured)[5:7]] <- "3 +"
dat$years_continuously_insured2 <- factor(dat$years_continuously_insured2)

如何找到一种更优雅的方式将变量分组到细分中?在R中有更好的方法吗?

谢谢!

2 个答案:

答案 0 :(得分:2)

您可以将连续保险变量中的因子水平转换为数字,然后切换到您的类别并重新计算因子()。第一步在R-FAQ中描述(要正确地执行它是一个两步过程):

 dat$years_cont <-  factor( cut(  as.numeric(as.character( 
                                     dat$years_continuously_insured_order2)),
                                 breaks=c(0,2,3, Inf), right=FALSE  ),
                           labels=c( "1 or less", "2", "3 +")
                           )
#-----------------
> str(dat)
'data.frame':   100 obs. of  2 variables:
 $ years_continuously_insured_order2: Factor w/ 20 levels "1","10","11",..: 4 15 19 5 8 4 16 12 12 18 ...
 $ years_cont                       : Factor w/ 3 levels "1 or less","2",..: 3 3 3 3 3 3 3 2 2 3 ...

答案 1 :(得分:0)

如果您的原始列是数字,请将其视为数字,而不是因素。一个更容易做你正在做的事情的方法是:

bin.value = function(x) {
    ifelse(x <= 1, "1 or less", ifelse(x == 2, "2", "3+"))
}

dat$years_continuously_insured2 = as.factor(bin.value(as.integer(dat$years_continuously_insured)))