我有一个充满变量的工作摘要(bank.full $ job)的数据集
admin. blue-collar entrepreneur housemaid management
5171 9732 1487 1240 9458
retired self-employed services student technician
2264 1579 4154 938 7597
unemployed unknown
1303 288
这是变量与目标变量y的百分比交叉表
no yes
admin. 0.88 0.12
blue-collar 0.93 0.07
entrepreneur 0.92 0.08
housemaid 0.92 0.08
management 0.87 0.13
retired 0.83 0.17
self-employed 0.89 0.11
services 0.91 0.09
student 0.72 0.28
technician 0.90 0.10
unemployed 0.84 0.16
unknown 0.89 0.11
现在,我希望合并交叉表值相似的职位类别 我使用了这两种方法
bank.full$newjob<-ifelse(c(bank.full$job=='admin.',
bank.full$job=='self-employed',
bank.full$job=='unknown'),'CAT1',
ifelse(c(bank.full$job=='blue-collar',
bank.full$job=='entrepreneur'),'CAT2',
ifelse(c(bank.full$job=='housemaid',
bank.full$job=='services'),'CAT3',
ifelse(c(bank.full$job=='management',
bank.full$job=='unemployed',
bank.full$job=='technician'),'CAT4',
ifelse(bank.full$job=='student','student','retired')))))
Error in `$<-.data.frame`(`*tmp*`, newjob, value = c("CAT4", "retired", :
replacement has 135633 rows, data has 45211
第二种方法
bank.full$newjob<-ifelse(bank.full$job=='admin.','CAT1',
ifelse(bank.full$job=='self-employed','CAT1',
ifelse(bank.full$job=='unknown'),'CAT1',
ifelse(bank.full$job=='blue-collar','CAT2',
ifelse(bank.full$job=='entrepreneur','CAT2',
ifelse(bank.full$job=='housemaid','CAT3',
ifelse(bank.full$job=='services','CAT3',
ifelse(bank.full$job=='management','CAT4',
ifelse(bank.full$job=='unemployed','CAT4',
ifelse(bank.full$job=='technician','CAT4',"")))))))))
Error in ifelse(bank.full$job == "self-employed", "CAT1", ifelse(bank.full$job == :
unused arguments ("CAT1", ifelse(bank.full$job == "blue-collar", "CAT2", ifelse(bank.full$job ==
"entrepreneur", "CAT2", ifelse(bank.full$job == "housemaid", "CAT3", ifelse(bank.full$job == "services", "CAT3", ifelse(bank.full$job == "management", "CAT4", ifelse(bank.full$job == "unemployed", "CAT4",
ifelse(bank.full$job == "technician", "CAT4", ""))))))))
我一直可以得到直到这个水平的输出,但是当我插入所有if条件时,这给我一个错误
bank.full$newjob<-ifelse(bank.full$job=='admin.','CAT1',
+ ifelse(bank.full$job=='self-employed','CAT1',
+ ifelse(bank.full$job=='unknown','CAT1',
+ ifelse(c(bank.full$job=='blue-collar',bank.full$job=='entrepreneur'),'CAT2',""))))
> bank.full$newjob<-as.factor(bank.full$newjob)
> summary(bank.full$newjob)
> summary(bank.full$newjob)
CAT1 CAT2
28441 7038 9732
答案 0 :(得分:0)
尝试这种方法:
bank.full$newjob<- 'CAT0'
bank.full$newjob<- ifelse(test= bank.full$job %in% c('admin.','self-employed','unknown'), yes='CAT1',no=bank.full$job)
bank.full$newjob<- ifelse(test= bank.full$job %in% c('blue-collar','entrepreneur'), yes='CAT2',no=bank.full$job)
bank.full$newjob<- ifelse(test= bank.full$job %in% c('management','unemployed','technician'), yes='CAT3',no=bank.full$job)
这种方法会起作用。我会做其他事情-合并因子级别(搜索出来)
答案 1 :(得分:0)
谢谢您的回答@Zahiro Mor 您提到的方法效果不佳,因为我只获得了CAT4级别和以前的相同级别。但是我尝试了您提到的CombineLevels函数,效果很好
install.packages("rockchalk");library(rockchalk)
levels(bank.full$month)
bank.full$job<-combineLevels(bank.full$job,levs =c('admin.','self-employed','unknown'),newLabel = 'CAT1' )
bank.full$job<-combineLevels(bank.full$job,levs =c('blue-collar','entrepreneur'),newLabel = 'CAT2' )
bank.full$job<-combineLevels(bank.full$job,levs =c('housemaid','services'),newLabel = 'CAT3' )
bank.full$job<-combineLevels(bank.full$job,levs =c('management','unemployed','technician'),newLabel = 'CAT4' )
bank.full$job<-combineLevels(bank.full$job,levs =c('student','retired'),newLabel = 'CAT5' )
这是我运行最后一行后得到的输出
The original levels retired student CAT1 CAT2 CAT3 CAT4
have been replaced by CAT1 CAT2 CAT3 CAT4 CAT5
到目前为止,我遇到的最简单的方法是替换功能