我正在尝试在r中存储一个变量,我想自己设置二进制文件的宽度。因此,变量将基于第一列进行分箱,并且我将根据以下参数获得r bin:
bin1 = 0.1
bin2 = 0.4
bin2 = 0.3
bin4 = 0.2
The output would look like this:
var_to_bin binned_var
1 1
2 2
3 2
4 2
5 2
6 3
7 3
8 3
9 4
10 4
有谁知道这样做的方法?我找到的分箱功能可以根据我的var_to_bin设置bin范围,但是我希望r自动将分箱设置为预先指定大小的分位数。
答案 0 :(得分:1)
您可以使用findInterval
,quantile
和cumsum
这样做。
dat$newBin <- findInterval(dat$var_to_bin,
vec=quantile(dat$var_to_bin, probs=cumsum(myProbs)),
rightmost.closed=TRUE) + 1L
这里,findInterval
将矢量带到bin,以及切割点的矢量。切割点向量使用quantile
构建,并为其提供所需分区概率的累积和。最后一个参数rightmost.closed确定每个分区的端点是包含(设置为关闭)还是排除(设置为打开)。
返回
dat
var_to_bin binned_var newBin
1 1 1 1
2 2 2 2
3 3 2 2
4 4 2 2
5 5 2 2
6 6 3 3
7 7 3 3
8 8 3 3
9 9 4 4
10 10 4 4
数据强>
dat <-
structure(list(var_to_bin = 1:10, binned_var = c(1L, 2L, 2L,
2L, 2L, 3L, 3L, 3L, 4L, 4L)), .Names = c("var_to_bin", "binned_var"
), class = "data.frame", row.names = c(NA, -10L))
myProbs <- c(.1, .4, .3, .2)
答案 1 :(得分:0)
你可以用剪切来做到这一点。
var_to_bin = 1:10
as.numeric(cut(var_to_bin, include.lowest=TRUE,
breaks=quantile(var_to_bin, probs=c(0,0.1,0.5,0.8,1))))
[1] 1 2 2 2 2 3 3 3 4 4
答案 2 :(得分:0)
从mltools查看bin_data()
。
# Here x is your var_to_bin
# We specify the bins end points cumulatively as quantiles.
# The result is an ordered factor whose levels represent the unique bins
# and whose values represent which bin each value of x falls into
# Note that these bins are "left-closed, right open" by default.
bin_data(x = 1:10, bins = c(0, 0.1, 0.5, 0.8, 1), binType = "quantile")
[1] [1, 1.9) [1.9, 5.5) [1.9, 5.5) [1.9, 5.5) [1.9, 5.5) [5.5, 8.2) [5.5, 8.2) [5.5, 8.2) [8.2, 10] [8.2, 10]
Levels: [1, 1.9) < [1.9, 5.5) < [5.5, 8.2) < [8.2, 10]