Python中的CDF无法正确显示

时间:2017-04-20 16:39:29

标签: python numpy matplotlib cdf

早上好,

在Python中,我有一个字典(称为packet_size_dist),其中包含以下值:

34  =>  0.00909909009099
42  =>  0.02299770023
54  =>  0.578742125787
58  =>  0.211278872113
62  =>  0.00529947005299
66  =>  0.031796820318
70  =>  0.0530946905309
74  =>  0.0876912308769

请注意,值的总和== 1。

我正在尝试生成一个CDF,我成功地做了,但它看起来不对,我想知道我是否会错误地生成它。有问题的代码是:

sorted_p = sorted(packet_size_dist.items(), key=operator.itemgetter(0))
yvals = np.arange(len(sorted_p))/float(len(sorted_p))
plt.plot(sorted_p, yvals)
plt.show()

但结果图如下所示: CDF of Packet Distribution

这似乎与字典中的值完全匹配。有任何想法吗?我还看到图表左侧有一条模糊的绿线,我不知道它是什么。例如,该图表描述了大约78%的数据包大小在70%的时间内出现,而在我的字典中,它表示为在5%的时间内发生。

2 个答案:

答案 0 :(得分:1)

这不是您问题的直接答案。但是,我认为我应该指出,你的数据来自一个离散的随机变量(而不是一个连续的变量),因此,用一系列线段表示它们在某些情况下可能会有些误导。 cumulative distribution function 中的表示可能有点矫枉过正。我提供以下简化。

enter image description here

'x'表示截断。点表示闭合开放区间的闭合端。

这是代码。我不认为使用np.cumsum

import numpy as np
import pylab as pl
from matplotlib import collections  as mc

p = [0.00909909009099,0.02299770023,0.578742125787,0.211278872113,0.00529947005299,0.031796820318,0.0530946905309,0.0876912308769]
cumSums = [0] + [sum(p[:i]) for i in range(1,len(p)+1)]
counts = [30,34,42,54,58,62,66,70,74,80]

lines =[[(counts[i],cumSums[i]),(counts[i+1],cumSums[i])] for i in range(-1+len(counts))]

lc = mc.LineCollection(lines, linewidths=2)
fig, ax = pl.subplots()
ax.add_collection(lc)

pl.plot([30, 80],[0, 1],'bx')
pl.plot(counts[1:-1], cumSums[1:], 'bo')

ax.autoscale()
ax.margins(0.1)

pl.show()

这更像是您想要的情节。 (更正,我希望。)

enter image description here

代码。

import numpy as np
import pylab as pl
from matplotlib import collections  as mc
from sys import exit

p = [0.00909909009099,0.02299770023,0.578742125787,0.211278872113,0.00529947005299,0.031796820318,0.0530946905309,0.0876912308769]
cumSums = [sum(p[:i]) for i in range(1,len(p)+1)]
counts = [34,42,54,58,62,66,70,74]

lines = [[(counts[i],cumSums[i]),(counts[i+1],cumSums[i+1])] for i in range(-1+len(p))]

lc = mc.LineCollection(lines, linewidths=2)
fig, ax = pl.subplots()
ax.add_collection(lc)
ax.autoscale()
ax.margins(0.1)

pl.show()

答案 1 :(得分:1)

使用numpy可以让一切变得更轻松。首先,您可以将字典转换为2列numpy数组。然后,您可以按第一列对其进行排序。最后,只需计算第二列的累积总和,并将其与第一列进行对比。

dic = { 34  :  0.00909909009099,
        42  :  0.02299770023,
        54  :  0.578742125787,
        58  :  0.211278872113,
        62  :  0.00529947005299,
        66  :  0.031796820318,
        70  :  0.0530946905309,
        74  :  0.0876912308769 }

import numpy as np
import matplotlib.pyplot as plt

data = np.array([[k,v] for k,v in dic.iteritems()]) # use dic.items() for python3
data = data[data[:,0].argsort()]
cdf = np.cumsum(data[:,1])

plt.plot(data[:,0], cdf)

plt.show()

enter image description here