连续数据类型的数据框中属性的等宽离散化和分类

时间:2019-02-03 05:46:39

标签: r discretization

我数据框中的一个属性具有连续的数据类型(aggregatedInocme),我想基于(aggregatedInocme)属性中的值创建一个具有(低,中,高)类别的新属性。我将分类分为三个范围,如下面的代码

所示

我使用for循环编写了一个简单的代码,如果该属性中每个单元格的值属于特定范围,则如果声明为chaeck,则将对应的字符串分配给它

y<-min(data_loanapp$aggregatedInocme)-0
x<-max(data_loanapp$aggregatedInocme)-min(data_loanapp$aggregatedInocme)
c1<-(y+(x/3))
c2<- (y+((2*x)/3))
rr <- c()
 for (val in data_loanapp$aggregatedInocme){
   if(val<= c1) {
      rr[val]<- append(rr[val], 'Low')
     }else if (c1< val<= c2){
      rr[val]<-append(rr[val], "mid")
     }else
      rr[val]<-append(rr[val], "high")
}

rr

我希望有一个属性(值(低,高,中))。但是我不断获得所有不适用和错误的属性 警告信息: 在rr [val] <-append(rr [val],“ high”)中:   要替换的项目数不是替换长度的倍数

  

}   错误:“}”中意外的“}”

1 个答案:

答案 0 :(得分:0)

我知道了:

#this was used only to find the bins width
library(classInt)
classIntervals(data_loanapp$aggregatedInocme, 3)
data_loanapp$Cat_AggInc<- classIntervals(data_loanapp$aggregatedInocme, 3, 
style 
= 'equal')
#here i defined and created the categores 
data_loanapp$Income_Cat<-c( "low", "medium", "high")[
               findInterval(data_loanapp$aggregatedInocme, c(1442,4583, 6588, 81000))]
data_loanapp$Income_Cat