用最少的观测值将范围切成等宽的等分格

时间:2019-06-03 13:34:15

标签: r

我想将连续数据切成等宽的bin。箱的宽度应调整为使每个箱中的最小观察数等于指定数目。 R中是否已经有启用该功能的功能?

1 个答案:

答案 0 :(得分:1)

我不知道这样的功能。我会使用while循环,只要每个bin的观察数足够大,就可以增加每次迭代的bin数量。

equalBins <- function(values, min_per_bin){
  # before doing anything we can check whether deviding the variable in minimal possible bin number of 2 is okay
  if(length(values)/ 2 < min_per_bin){
    print("Can not cut variable with this min_per_bin")
  } else{
    # firstly we see what range the vector has
    value_range <- max(values) - min(values)

    # starting with one bin
    bin_number <- 1
    # width per bin is calculated with value_range/bin_number
    width_per_bin <- value_range/bin_number
    # we cut the variable from min to max by the width per bin
    cut_variable <- cut(values, seq(min(values), max(values), width_per_bin), include.lowest= TRUE)

    # the following code does the same as the code above, increasing in each iteration bin_number by 1 as long as there is no bin that has a smaller bin number than we asked for
    while(min(table(cut_variable)) > min_per_bin){
      width_per_bin <- value_range/bin_number
      cut_variable <- cut(values, seq(min(values), max(values), width_per_bin), include.lowest= TRUE)
      bin_number <- bin_number + 1
    }    
    return(cut_variable)}}

示例:

# some vector
vec <- 0:100

# min per bin 25
equalBins(values= vec, min_per_bin= 25)
Levels: [0,25] (25,50] (50,75] (75,100]

# min per bin 33
equalBins(values= vec, min_per_bin= 33)
Levels: [0,33.3] (33.3,66.7] (66.7,100]

# not possible to cut
equalBins(values= vec, min_per_bin= 89)
"Can not cut variable with this min_per_bin"