我想将连续数据切成等宽的bin。箱的宽度应调整为使每个箱中的最小观察数等于指定数目。 R中是否已经有启用该功能的功能?
答案 0 :(得分:1)
我不知道这样的功能。我会使用while循环,只要每个bin的观察数足够大,就可以增加每次迭代的bin数量。
equalBins <- function(values, min_per_bin){
# before doing anything we can check whether deviding the variable in minimal possible bin number of 2 is okay
if(length(values)/ 2 < min_per_bin){
print("Can not cut variable with this min_per_bin")
} else{
# firstly we see what range the vector has
value_range <- max(values) - min(values)
# starting with one bin
bin_number <- 1
# width per bin is calculated with value_range/bin_number
width_per_bin <- value_range/bin_number
# we cut the variable from min to max by the width per bin
cut_variable <- cut(values, seq(min(values), max(values), width_per_bin), include.lowest= TRUE)
# the following code does the same as the code above, increasing in each iteration bin_number by 1 as long as there is no bin that has a smaller bin number than we asked for
while(min(table(cut_variable)) > min_per_bin){
width_per_bin <- value_range/bin_number
cut_variable <- cut(values, seq(min(values), max(values), width_per_bin), include.lowest= TRUE)
bin_number <- bin_number + 1
}
return(cut_variable)}}
示例:
# some vector
vec <- 0:100
# min per bin 25
equalBins(values= vec, min_per_bin= 25)
Levels: [0,25] (25,50] (50,75] (75,100]
# min per bin 33
equalBins(values= vec, min_per_bin= 33)
Levels: [0,33.3] (33.3,66.7] (66.7,100]
# not possible to cut
equalBins(values= vec, min_per_bin= 89)
"Can not cut variable with this min_per_bin"