图表matplotlib中按年累积的百分比?

时间:2015-10-30 01:03:47

标签: python matplotlib

我有一些看起来像这样的数据:

 items_by_year = { 
     2004: 10352,
     2005: 15125,
     2006: 8989 ...
 }

我在matplotlib中绘制了一年中累积百分比的图表,如下所示:

# Get cumulative count. Ugh!
new_dict = {}
for year in items_by_year:
    sum_of_items += items_by_year[year]
    new_dict[year] = items_by_year[year]
    for y in items_by_year:
        if y < year:
            new_dict[year] += items_by_year[y]

# Calculate cumulative percentage.
temp_data = []
for year in new_dict:
    temp_data.append((year.year, (new_dict[year] / sum_of_items) * 100))

# Sort array by year. 
data = sorted(temp_data, key=lambda x: x[0])
x = [date for (date, value) in data]
y = [value for (date, value) in data]

# Draw chart. 
fig = plt.figure()
graph = fig.add_subplot(111)
graph.plot(x, y)
plt.show()

我认为必须有一种方法可以使这段代码更好,但任何建议都会非常感激!

3 个答案:

答案 0 :(得分:1)

以下内容将最大限度地缩短循环时间。绑定已排序键的列表将节省时间并使您的代码更清晰。用户1866935不需要使用条件;无论如何你必须初始化sum_of_items。

cumulative = {}
sum_of_items = 0
years = sorted(items_by_year) # bind this to plot x values
for year in sorted(items_by_year):
    sum_of_items += items_by_year[year]
    cumulative[year] = sum_of_items
fig, ax = plt.subplots(1, 1)
ax.plot(years, [cumulative[year]/sum_of_items for year in years])
fig.show()

答案 1 :(得分:1)

更简单的方法是使用plt.hist()函数及其参数cumulativenormed!值normed=True表示百分比,cumulative=1表示您需要的值。唯一的一点是:plt.hist()

之类的形式获取未绑定的列表

[2004, 2004, 2004, ..., 2006, 2006]

因此,为了将您的数据添加到此表单中,我使用了这种转换(但如果您已将此原始数据减少到您发布的内容之前,此步骤对您来说可能是无关紧要的):

items_by_year = { 
 2004: 10352,
 2005: 15125,
 2006: 8989,
 2007: 1500,
 2008: 10000
}
years = sorted(items_by_year.keys())
to_hist = []
for year in items_by_year:
    to_hist.extend([year]*items_by_year[year])

如果您已有这些数据,那么您需要的是:

plt.hist(to_hist, cumulative=1, normed=True, bins=years+[max(years)+1])
plt.xticks([i+0.5 for i in years], years)
plt.show()

enter image description here

还有一个补充:您也可以绘制反向累积分布(即在给定年份之后的事件百分比),只需传递cumulative = -1

即可

enter image description here

答案 2 :(得分:0)

计算累积时速度要快得多,因为你可以避免一次又一次地绕过你的dict。您可以使用以下内容来衡量时间:

import timeit
start = timeit.timeit()
first=True
years=sorted(items_by_year.keys())
for year in years:
    sum_of_items += items_by_year[year]
    if first:
      new_dict[year]=items_by_year[year]
      first=False
    else:
        new_dict[year]=items_by_year[year]+new_dict[year-1]
end = timeit.timeit()
print end - start

对于43个条目,您的代码在我的机器上以0.000481843948364秒运行,而那个代码在2.59876251221e-05中运行

希望我帮助