为什么值不出现在ecdf情节中?

时间:2011-08-24 22:06:54

标签: debugging r statistics

我正在尝试绘制下面给出的数据的ccdf,但由于某种原因,它看起来不正确。我正在与一些数据点(2523,313,224)进行交叉检查,但它们不可见。我做错了吗?

R脚本:

# Y defined below
Y.ecdf = ecdf(Y)
curve((length((Y))*(1-Y.ecdf(x))), n = 10000, 
       from = 0, to = 100, xlab = "# of items", 
       ylab = "# instances", col=colors[1], lty=1, lwd=4)

ecdf plot

Y = c( 3, 1, 4, 11, 2, 2, 9, 7, 22, 3, 1, 1, 7, 2, 2, 2, 4, 2, 1, 1, 6, 3, 20,
15, 4, 1, 1, 5, 3, 10, 16, 224, 74, 2, 1, 2, 2, 3, 3, 7, 2, 2, 1, 4, 2, 9,
3, 3, 2, 1, 1, 3, 2, 4, 4, 1, 7, 2, 1, 2, 1, 1, 2, 4, 3, 1, 1, 1, 3, 4, 2,
2, 1, 1, 5, 6, 13, 15, 3, 1, 2, 5, 1, 1, 1, 1, 2, 6, 1, 4, 1, 3, 1, 1, 4,
2, 2, 3, 3, 1, 4, 2, 1, 4, 6, 1, 1, 1, 1, 2, 5, 2, 1, 1, 1, 1, 1, 3, 1, 3,
2, 1, 1, 1, 2, 1, 8, 2, 3, 1, 1, 1, 1, 1, 3, 1, 3, 2, 1, 2, 1, 1, 5, 1, 1,
4, 3, 3, 1, 1, 1, 3, 4, 4, 3, 2, 2, 4, 3, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 3,
2, 3, 9, 3, 4, 2, 1, 1, 1, 3, 22, 5, 13, 1, 1, 1, 1, 1, 4, 1, 1, 31, 1, 1,
2, 1, 1, 1, 3, 4, 4, 8, 6, 6, 7, 2, 1, 2, 2, 5, 1, 2, 6, 6, 1, 3, 1, 5, 2,
1, 5, 3, 1, 2, 2, 1, 2, 1, 2, 2, 1, 2, 1, 1, 4, 1, 3, 2, 1, 4, 1, 212, 2,
7, 7, 10, 2, 4, 2, 1, 1, 1, 2, 3, 2, 1, 16, 6, 2, 10, 2, 1, 1, 15, 1, 3, 8,
1, 1, 3, 1, 1, 2, 1, 1, 4, 2, 3, 1, 1, 1, 1, 5, 9, 4, 1, 1, 2, 5, 1, 4, 9,
6, 19, 1, 1, 1, 2, 10, 6, 9, 5, 11, 6, 8, 1, 1, 1, 1, 1, 313, 3, 1, 3, 1,
2, 2, 2, 3, 4, 5, 1, 1, 3, 1, 1, 5, 4, 2, 5, 1, 20, 4, 1, 2, 1, 1, 1, 2, 5,
4, 2, 3, 1, 3, 1, 2, 1, 1, 1, 1, 2, 2, 1, 1, 1, 1, 1, 1, 1, 1, 2, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 3, 2, 1, 1, 3, 3, 1, 1, 1, 8, 1, 1, 1, 1,
1, 1, 2, 2, 2, 2, 4, 13, 1, 2, 1, 2, 3, 3, 1, 2, 2, 1, 3, 4, 1, 1, 1, 1, 2,
2, 4, 5, 3, 2, 2, 2, 1, 1, 3, 2523, 7, 4, 2, 4, 11, 8, 1, 4, 4, 2, 5, 3, 3,
1, 3, 1, 3, 4, 1, 1, 1, 1, 6, 6, 2, 2, 1, 8, 8, 3, 3, 4, 5, 2, 2, 2, 3, 2,
6, 2, 2, 2, 1, 5, 5, 4, 3, 1, 2, 2, 6, 3, 2, 2, 2, 10, 9, 1, 2, 1, 1, 1, 2,
2, 3, 1, 3, 1, 9, 1, 1, 1, 2, 1, 96, 2, 2, 5, 1, 1, 1, 2, 2, 1, 1, 1, 5, 2,
1, 1, 1, 2, 1, 1, 4, 2, 10, 3, 2, 2, 8, 8, 2, 1, 2, 4, 1, 1, 13, 20, 3, 2,
5, 9, 1, 22, 25, 4, 1, 1, 3, 2, 1, 1, 7, 9, 5, 9, 1, 3, 1, 8, 2, 2, 1, 3,
1, 2, 6, 2, 1, 2, 2, 1, 2, 2, 2, 1, 1, 1, 16, 3, 5, 2)

1 个答案:

答案 0 :(得分:2)

在评论中扩展我们的讨论...

经验累积分布函数是X(x轴)与Pr(X

plot(Y.ecdf,do.points = FALSE,
     verticals = TRUE,col = "blue",
     xlab = "x", ylab = "Pr(X < x)")

enter image description here

如果你仔细观察,当你达到非常大的数值时,你可以看到线的上升位置,但很难弄清楚,因为你的很多值都小于10。

您所做的是反转此功能,以便您查看分布的相反尾部,即Pr(X> x)。您还缩放了y轴上的概率。我不知道为什么,但无论如何。鉴于您的特定任务,这可能是有意义的。所以你正在做这样的事情(但是用y轴缩放):

curve((1-Y.ecdf(x)), n = 10000, 
       from = 0, to = 2600, ylab = "Pr(X > x)", 
       xlab = "x", col="blue", lty=1, lwd=2)

enter image description here

但您最初设置的fromto参数只能绘制从0到100的函数。如果您想“放大”异常值,可以只更改{{ 1}}和from值更相关:

to

enter image description here