编辑：

Question

我有一个数字向量data。我需要收集以下数据，即直方图，但累积意义上。

a=c()
s=seq(0,1000,10)
for(i in s)
{
    a<-c(a,length(data[data>=i]))
}
plot(s,a)

如何使这个矢量化，这个操作应该被调用？它目前不是很好，因为我必须知道范围才能在上面写s，R中是否存在执行此操作的现有函数？

谢谢。

Answer 1

这样的东西？

set.seed(1)          # for reproducible example
data <- rnorm(100)   # random sample from N(0,1)
par(mfrow=c(1,2))    # set up graphics device for 2 plots

z <- hist(data,ylab="Counts",main="Histogram")
barplot(cumsum(z$counts), names.arg=z$breaks[-1],main="Cuml. Histogram")

这利用了hist(...)函数不仅绘制直方图，而且返回类型为histogram的对象的事实。此对象的元素$breaks包含直方图区间的上限和下限，$counts包含每个区间中的数据计数。 cumsum函数计算累积总和。所以右边的情节只是计数与休息的累积总和。

另一种稍微简单的方法是＆＃34; hack＆＃34;由hist(...)返回的直方图对象，然后在其上使用plot(...)：

z <- hist(data,ylab="Counts",main="Histogram")
z$counts <- cumsum(z$counts)
plot(z, main="Cuml. Histogram")

最后，ecdf(...)（经验累积分布函数）返回一个可以轻松绘制的函数。

plot(ecdf(data))

enter image description here

Answer 2

我会转换为具有您想要的多个级别的因子，然后使用table和cumsum。

例如：

# Create some fake data:
> tst = sample(1:50,10)
> tst
 [1] 33  7 13 19  1 18 39 15 21 25

# create a vector of factors with all possible levels from "min(tst)" until "max(tst)":
> tst2 = factor(as.character(tst),levels=paste0(min(tst):max(tst)))
> tst2
 [1] 33 7  13 19 1  18 39 15 21 25
39 Levels: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 ... 39

# finally, get in one (vectorized) operation the distribution of values >= levels (for each level):
> cumsum(table(tst2))
 1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 
 1  1  1  1  1  1  2  2  2  2  2  2  3  3  4  4  4  5  6  6  7  7  7  7  8  8  8  8 
29 30 31 32 33 34 35 36 37 38 39 
 8  8  8  8  9  9  9  9  9  9 10

这有帮助吗？

编辑：

我刚刚意识到这会为您提供值小于给定阈值的项目数。你可以通过以下方式获得你想要的东西：

> tst3 = rev(cumsum(table(tst2)))
> names(tst3) = rev(names(tst3))
> tst3
 1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 
10  9  9  9  9  9  9  8  8  8  8  8  8  8  8  7  7  7  7  6  6  5  4  4  4  3  3  2 
29 30 31 32 33 34 35 36 37 38 39 
 2  2  2  2  2  1  1  1  1  1  1

编辑2：

事实上更为简单：

> sapply(min(tst):max(tst), function(x)sum(tst>=x))
 [1] 10  9  9  9  9  9  9  8  8  8  8  8  8  7  7  6  6  6  5  4  4  3  3  3  3  2
[27]  2  2  2  2  2  2  2  1  1  1  1  1  1

r累积图简化以及它应该被称为什么

2 个答案:

编辑：

编辑2：