我想将矢量中的数字计算到自定义分档中。
假设我的自定义分箱为:[-Inf, -1), [-1, 0), [0, 1.5)
和[1.5, Inf)
。
我要分类的矢量是c(.5, 2)
。
基本上我想要的是这样的结果:
hist(x = c(.5,2), breaks = c(-1000, -1, 0, 1.5, 1000), plot = FALSE)$count
[1] 0 0 1 1
显然,如果向量超出边界,这将产生错误:
hist(x = c(.5, 2, 1001), breaks = c(-1000, -1, 0, 1.5, 1000), plot = FALSE)$count
Error in hist.default(x = c(0.5, 2, 1001), breaks = c(-1000, -1, 0, 1.5, :
some 'x' not counted; maybe 'breaks' do not span range of 'x'
以下代码令人惊讶地无效:
hist(x = c(.5,2), breaks = c(-Inf, -1, 0, 1.5, Inf), plot = FALSE)$count
[1] 2 0 0 0
可能我可以使用findInterval
函数,但我不想使用它,因为代码会更长,并且有可能出现空箱,我想知道它们。
有什么想法吗?
答案 0 :(得分:4)
怎么样:
x <- c(5,2)
table(cut(x = x,
breaks = c(-Inf, -1, 0, 1.5, Inf)))
这也可行:
maxval <- 1.1*max(abs(x))
hist(x = c(.5,2), breaks = c(-maxval, -1, 0, 1.5, maxval),
plot=FALSE)$counts
这是原始的(非常明智的)建议:
hist(x = c(.5,2), breaks = c(-Inf, -1, 0, 1.5, Inf),
plot=FALSE)$counts
它出错的原因是hist.default()
试图做一些花哨的东西来添加&#34; fuzz&#34;休息,如果median(diff(breaks))
是无限的,就会发生灾难......在这种情况下......
## ....
diddle <- 1e-07 * stats::median(diff(breaks)) ## diddle -> Inf
fuzz <- if (right)
c(if (include.lowest) -diddle else diddle, rep.int(diddle,
length(breaks) - 1))
else c(rep.int(-diddle, length(breaks) - 1), if (include.lowest) diddle else -diddle)
## fuzz -> {-Inf Inf Inf Inf Inf}
fuzzybreaks <- breaks + fuzz ## -> same as fuzz
h <- diff(fuzzybreaks) ## -> {Inf NaN NaN NaN}
counts <- .Call(C_BinCount, x, fuzzybreaks, right, include.lowest) ## -> { 2 0 0 0 }
## ....
hist
的文档并没有真正说明这一点,除了&#34;断言&#34;:These are the nominal breaks, not with the boundary fuzz.
这可能值得r-devel邮件列表注释......