R:为数字变量创建不均匀的因子级别

时间:2017-06-08 23:22:59

标签: r quantile

我有一组值(10​​0000个条目),范围从-0.20到+0.15,这是返回百分比。

大量的价值介于+ 3.5%和-3.5%

之间

我希望将此转换为以下因素:

  • -0.035到+.035之间的任何回报都以0.05为增量同等分箱并且
  • 将-0.2到-.035之间的任何内容合并为一个因子,
  • 将0.05到.15之间的任何内容合并为一个因子变量。

关于如何在R中实现这一点的任何想法?我确实试过cut,但它似乎只是以相同的增量进行分组。

1 个答案:

答案 0 :(得分:1)

所以我生成了保存值的数据(不均匀分布)

 library(data.table)
 set.seed(555)#in order to be reproducible
 N <- 100000#number of pseudonumbers to be generated
 min1=-0.035#arbitrary limits
 max1=0.035#idem

 samp <- runif(N,min = -0.2,max = 0.15)#create the vector 

 level1 <- as.factor(ifelse(samp<=min1,paste0("(",min(samp),",",min1,"]"),NA))#create the first level 
 level2 <- as.factor(ifelse(samp>=max1,paste0("[",max1,",",max(samp),")"),NA))#create the second level
 incr <- 0.005
 level3 <- cut(samp,seq(min1, max1, by = incr))#create the intermediate levels 

 dt <- data.table(samp,level1,level2,level3)#put all together
 mylevels <- na.omit(unlist(matrix(t(dt[,-1]))))#the vector that contains in which range the samp belongs to 

为了更好地显示结果:

mylevels<-factor(mylevels,levels= unique(mylevels))
dt2<-dt[,.(samp,levels=mylevels)]
            samp                      levels
     1: -0.07023653 (-0.199996188434307,-0.035]
     2:  0.10889991   [0.035,0.149995080730878)
     3:  0.04246077   [0.035,0.149995080730878)
     4: -0.01193010              (-0.015,-0.01]
     5:  0.02607736                (0.025,0.03]
   ---                                        
 99996: -0.04786692 (-0.199996188434307,-0.035]
 99997: -0.08700210 (-0.199996188434307,-0.035]
 99998:  0.09989973   [0.035,0.149995080730878)
 99999:  0.10095336   [0.035,0.149995080730878)
100000: -0.05555869 (-0.199996188434307,-0.035]