使用hist函数将数字计入二进制数

时间:2014-09-04 19:47:32

标签: r count histogram

我想将矢量中的数字计算到自定义分档中。 假设我的自定义分箱为:[-Inf, -1), [-1, 0), [0, 1.5)[1.5, Inf)。 我要分类的矢量是c(.5, 2)

基本上我想要的是这样的结果:

hist(x = c(.5,2), breaks = c(-1000, -1, 0, 1.5, 1000), plot = FALSE)$count

[1] 0 0 1 1

显然,如果向量超出边界,这将产生错误:

hist(x = c(.5, 2, 1001), breaks = c(-1000, -1, 0, 1.5, 1000), plot = FALSE)$count

Error in hist.default(x = c(0.5, 2, 1001), breaks = c(-1000, -1, 0, 1.5,  : 
  some 'x' not counted; maybe 'breaks' do not span range of 'x'

以下代码令人惊讶地无效:

hist(x = c(.5,2), breaks = c(-Inf, -1, 0, 1.5, Inf), plot = FALSE)$count

[1] 2 0 0 0

可能我可以使用findInterval函数,但我不想使用它,因为代码会更长,并且有可能出现空箱,我想知道它们。

有什么想法吗?

1 个答案:

答案 0 :(得分:4)

怎么样:

x <- c(5,2)
table(cut(x = x,
            breaks = c(-Inf, -1, 0, 1.5, Inf)))

这也可行:

maxval <- 1.1*max(abs(x))
hist(x = c(.5,2), breaks = c(-maxval, -1, 0, 1.5, maxval),
       plot=FALSE)$counts

这是原始的(非常明智的)建议:

hist(x = c(.5,2), breaks = c(-Inf, -1, 0, 1.5, Inf),
       plot=FALSE)$counts

它出错的原因是hist.default()试图做一些花哨的东西来添加&#34; fuzz&#34;休息,如果median(diff(breaks))是无限的,就会发生灾难......在这种情况下......

## ....
diddle <- 1e-07 * stats::median(diff(breaks))   ## diddle -> Inf 
fuzz <- if (right) 
    c(if (include.lowest) -diddle else diddle, rep.int(diddle, 
        length(breaks) - 1))
else c(rep.int(-diddle, length(breaks) - 1), if (include.lowest) diddle else -diddle)
## fuzz ->  {-Inf Inf Inf Inf Inf}
fuzzybreaks <- breaks + fuzz  ## -> same as fuzz
h <- diff(fuzzybreaks)        ## -> {Inf NaN NaN NaN}
counts <- .Call(C_BinCount, x, fuzzybreaks, right, include.lowest)  ## -> { 2 0 0 0 }
## ....

hist的文档并没有真正说明这一点,除了&#34;断言&#34;:These are the nominal breaks, not with the boundary fuzz.

下的神秘笔记

这可能值得r-devel邮件列表注释......