我希望合并因子变量中的类别以减少级别数

时间:2019-11-28 09:05:49

标签: r if-statement

我有一个充满变量的工作摘要(bank.full $ job)的数据集

       admin.   blue-collar  entrepreneur     housemaid    management 
         5171          9732          1487          1240          9458 
      retired self-employed      services       student    technician 
         2264          1579          4154           938          7597 
   unemployed       unknown 
         1303           288 

这是变量与目标变量y的百分比交叉表

                no  yes
  admin.        0.88 0.12
  blue-collar   0.93 0.07
  entrepreneur  0.92 0.08
  housemaid     0.92 0.08
  management    0.87 0.13
  retired       0.83 0.17
  self-employed 0.89 0.11
  services      0.91 0.09
  student       0.72 0.28
  technician    0.90 0.10
  unemployed    0.84 0.16
  unknown       0.89 0.11

现在,我希望合并交叉表值相似的职位类别 我使用了这两种方法

 bank.full$newjob<-ifelse(c(bank.full$job=='admin.',
                            bank.full$job=='self-employed',
                            bank.full$job=='unknown'),'CAT1',
                   ifelse(c(bank.full$job=='blue-collar',
                            bank.full$job=='entrepreneur'),'CAT2',
                   ifelse(c(bank.full$job=='housemaid',
                            bank.full$job=='services'),'CAT3',
                   ifelse(c(bank.full$job=='management',
                            bank.full$job=='unemployed',
                            bank.full$job=='technician'),'CAT4',
                   ifelse(bank.full$job=='student','student','retired')))))
Error in `$<-.data.frame`(`*tmp*`, newjob, value = c("CAT4", "retired",  : 
  replacement has 135633 rows, data has 45211

第二种方法

bank.full$newjob<-ifelse(bank.full$job=='admin.','CAT1',
                   ifelse(bank.full$job=='self-employed','CAT1',
                   ifelse(bank.full$job=='unknown'),'CAT1',
                   ifelse(bank.full$job=='blue-collar','CAT2',
                   ifelse(bank.full$job=='entrepreneur','CAT2',
                   ifelse(bank.full$job=='housemaid','CAT3',
                   ifelse(bank.full$job=='services','CAT3',
                   ifelse(bank.full$job=='management','CAT4',
                   ifelse(bank.full$job=='unemployed','CAT4',
                   ifelse(bank.full$job=='technician','CAT4',"")))))))))
Error in ifelse(bank.full$job == "self-employed", "CAT1", ifelse(bank.full$job ==  : 
  unused arguments ("CAT1", ifelse(bank.full$job == "blue-collar", "CAT2", ifelse(bank.full$job == 
"entrepreneur", "CAT2", ifelse(bank.full$job == "housemaid", "CAT3", ifelse(bank.full$job == "services", "CAT3", ifelse(bank.full$job == "management", "CAT4", ifelse(bank.full$job == "unemployed", "CAT4",
 ifelse(bank.full$job == "technician", "CAT4", ""))))))))

我一直可以得到直到这个水平的输出,但是当我插入所有if条件时,这给我一个错误

bank.full$newjob<-ifelse(bank.full$job=='admin.','CAT1',
+                          ifelse(bank.full$job=='self-employed','CAT1',
+                                 ifelse(bank.full$job=='unknown','CAT1',
+ ifelse(c(bank.full$job=='blue-collar',bank.full$job=='entrepreneur'),'CAT2',""))))
> bank.full$newjob<-as.factor(bank.full$newjob)
> summary(bank.full$newjob)
> summary(bank.full$newjob)
       CAT1  CAT2 
28441  7038  9732 

2 个答案:

答案 0 :(得分:0)

尝试这种方法:

bank.full$newjob<- 'CAT0'
bank.full$newjob<- ifelse(test= bank.full$job %in% c('admin.','self-employed','unknown'), yes='CAT1',no=bank.full$job)
bank.full$newjob<- ifelse(test= bank.full$job %in% c('blue-collar','entrepreneur'), yes='CAT2',no=bank.full$job)
bank.full$newjob<- ifelse(test= bank.full$job %in% c('management','unemployed','technician'), yes='CAT3',no=bank.full$job)

这种方法会起作用。我会做其他事情-合并因子级别(搜索出来)

答案 1 :(得分:0)

谢谢您的回答@Zahiro Mor 您提到的方法效果不佳,因为我只获得了CAT4级别和以前的相同级别。但是我尝试了您提到的CombineLevels函数,效果很好

install.packages("rockchalk");library(rockchalk)
levels(bank.full$month)
bank.full$job<-combineLevels(bank.full$job,levs =c('admin.','self-employed','unknown'),newLabel = 'CAT1' )
bank.full$job<-combineLevels(bank.full$job,levs =c('blue-collar','entrepreneur'),newLabel = 'CAT2' )
bank.full$job<-combineLevels(bank.full$job,levs =c('housemaid','services'),newLabel = 'CAT3' )
bank.full$job<-combineLevels(bank.full$job,levs =c('management','unemployed','technician'),newLabel = 'CAT4' )
bank.full$job<-combineLevels(bank.full$job,levs =c('student','retired'),newLabel = 'CAT5' )

这是我运行最后一行后得到的输出

The original levels retired student CAT1 CAT2 CAT3 CAT4 
have been replaced by CAT1 CAT2 CAT3 CAT4 CAT5 

到目前为止,我遇到的最简单的方法是替换功能