我有一些看起来像这样的数据:
items_by_year = {
2004: 10352,
2005: 15125,
2006: 8989 ...
}
我在matplotlib中绘制了一年中累积百分比的图表,如下所示:
# Get cumulative count. Ugh!
new_dict = {}
for year in items_by_year:
sum_of_items += items_by_year[year]
new_dict[year] = items_by_year[year]
for y in items_by_year:
if y < year:
new_dict[year] += items_by_year[y]
# Calculate cumulative percentage.
temp_data = []
for year in new_dict:
temp_data.append((year.year, (new_dict[year] / sum_of_items) * 100))
# Sort array by year.
data = sorted(temp_data, key=lambda x: x[0])
x = [date for (date, value) in data]
y = [value for (date, value) in data]
# Draw chart.
fig = plt.figure()
graph = fig.add_subplot(111)
graph.plot(x, y)
plt.show()
我认为必须有一种方法可以使这段代码更好,但任何建议都会非常感激!
答案 0 :(得分:1)
以下内容将最大限度地缩短循环时间。绑定已排序键的列表将节省时间并使您的代码更清晰。用户1866935不需要使用条件;无论如何你必须初始化sum_of_items。
cumulative = {}
sum_of_items = 0
years = sorted(items_by_year) # bind this to plot x values
for year in sorted(items_by_year):
sum_of_items += items_by_year[year]
cumulative[year] = sum_of_items
fig, ax = plt.subplots(1, 1)
ax.plot(years, [cumulative[year]/sum_of_items for year in years])
fig.show()
答案 1 :(得分:1)
更简单的方法是使用plt.hist()
函数及其参数cumulative
和normed
!值normed=True
表示百分比,cumulative=1
表示您需要的值。唯一的一点是:plt.hist()
以
[2004, 2004, 2004, ..., 2006, 2006]
因此,为了将您的数据添加到此表单中,我使用了这种转换(但如果您已将此原始数据减少到您发布的内容之前,此步骤对您来说可能是无关紧要的):
items_by_year = {
2004: 10352,
2005: 15125,
2006: 8989,
2007: 1500,
2008: 10000
}
years = sorted(items_by_year.keys())
to_hist = []
for year in items_by_year:
to_hist.extend([year]*items_by_year[year])
如果您已有这些数据,那么您需要的是:
plt.hist(to_hist, cumulative=1, normed=True, bins=years+[max(years)+1])
plt.xticks([i+0.5 for i in years], years)
plt.show()
还有一个补充:您也可以绘制反向累积分布(即在给定年份之后的事件百分比),只需传递cumulative = -1
:
答案 2 :(得分:0)
计算累积时速度要快得多,因为你可以避免一次又一次地绕过你的dict。您可以使用以下内容来衡量时间:
import timeit
start = timeit.timeit()
first=True
years=sorted(items_by_year.keys())
for year in years:
sum_of_items += items_by_year[year]
if first:
new_dict[year]=items_by_year[year]
first=False
else:
new_dict[year]=items_by_year[year]+new_dict[year-1]
end = timeit.timeit()
print end - start
对于43个条目,您的代码在我的机器上以0.000481843948364秒运行,而那个代码在2.59876251221e-05中运行
希望我帮助