我写了以下内容:
ax = df.pivot_table(index=['month'], columns='year', values='sale_amount_usd', margins=True,fill_value=0).round(2).plot(kind='bar',colormap=('Blues'),figsize=(18,15))
plt.legend(loc='best')
plt.ylabel('Average Sales Amount in USD')
plt.xlabel('Month')
plt.xticks(rotation=0)
plt.title('Average Sales Amount in USD by Month/Year')
for p in ax.patches:
ax.annotate(str(p.get_height()), (p.get_x() * 1.001, p.get_height() * 1.005))
plt.show();
我现在想知道几年之间每个月内的均值差异是否显着。换句话说,从2013年3月的321美元增加到2014年3月的365美元是否意味着平均销售额的大幅增长?
我该怎么做?有没有办法在数据透视表上覆盖一个标记,该标记可以直观地告诉我何时差异很大?
已编辑以添加示例数据:
event_id event_date week_number week_of_month holiday month day year pub_organization_id clicks sales click_to_sale_conversion_rate sale_amount_usd per_sale_amount_usd per_click_sale_amount pub_commission_usd per_sale_pub_commission_usd per_click_pub_commission_usd
0 3365 1/11/13 2 2 NaN 1. January 11 2013 214 11945 754 0.06 40311.75 53.46 3.37 2418.71 3.21 0.20
1 13793 2/12/13 7 3 NaN 2. February 12 2013 214 11711 1183 0.10 73768.54 62.36 6.30 4426.12 3.74 0.38
2 4626 1/15/13 3 3 NaN 1. January 15 2013 214 11561 1029 0.09 70356.46 68.37 6.09 4221.39 4.10 0.37
3 10917 2/3/13 6 1 NaN 2. February 3 2013 167 11481 0 0.00 0.00 0.00 0.00 0.00 0.00 0.00
4 14653 2/15/13 7 3 NaN 2. February 15 2013 214 11268 795 0.07 37262.56 46.87 3.31 2235.77 2.81 0.20
5 18448 2/27/13 9 5 NaN 2. February 27 2013 214 11205 504 0.04 48773.71 96.77 4.35 2926.43 5.81 0.26
6 11382 2/5/13 6 2 NaN 2. February 5 2013 214 11166 1324 0.12 93322.84 70.49 8.36 5599.38 4.23 0.50
7 14764 2/16/13 7 3 NaN 2. February 16 2013 214 11042 451 0.04 22235.51 49.30 2.01 1334.14 2.96 0.12
8 17080 2/23/13 8 4 NaN 2. February 23 2013 214 10991 248 0.02 14558.85 58.71 1.32 873.53 3.52 0.08
9 21171 3/8/13 10 2 NaN 3. March 8 2013 214 10910 1081 0.10 52005.12 48.11 4.77 3631.28 3.36 0.33
10 16417 2/21/13 8 4 NaN 2. February 21 2013 214 10826 507 0.05 44907.20 88.57 4.15 2694.43 5.31 0.25
11 13399 2/11/13 7 3 NaN 2. February 11 2013 214 10772 1142 0.11 38549.55 33.76 3.58 2312.97 2.03 0.21
12 1532 1/5/13 1 1 NaN 1. January 5 2013 214 10750 610 0.06 29838.49 48.92 2.78 1790.31 2.93 0.17
13 22500 3/13/13 11 3 NaN 3. March 13 2013 214 10743 821 0.08 47310.71 57.63 4.40 3688.83 4.49 0.34
14 5840 1/19/13 3 3 NaN 1. January 19 2013 214 10693 487 0.05 28427.35 58.37 2.66 1705.64 3.50 0.16
15 19566 3/3/13 10 1 NaN 3. March 3 2013 214 10672 412 0.04 15722.29 38.16 1.47 1163.16 2.82 0.11
16 26313 3/25/13 13 5 NaN 3. March 25 2013 214 10629 529 0.05 21946.51 41.49 2.06 1589.84 3.01 0.15
17 19732 3/4/13 10 2 NaN 3. March 4 2013 214 10619 1034 0.10 37257.20 36.03 3.51 2713.71 2.62 0.26
18 18569 2/28/13 9 5 NaN 2. February 28 2013 214 10603 414 0.04 40920.28 98.84 3.86 2455.22 5.93 0.23
19 8704 1/28/13 5 5 NaN 1. January 28 2013 214 10548 738 0.07 29041.87 39.35 2.75 1742.52 2.36 0.17
答案 0 :(得分:0)
尽管不是确定性的,但可以使用误差线(通过yerr
中的plt.plot
参数)来表示不确定性的一个标准偏差,然后仅注意间隔的重叠即可。诸如此类(未经测试)...
stds = df.groupby(['month', 'year'])['sale_amount_usd'].std().to_frame()
stds.columns = ['std_sales']
df_stds = df.pivot_table(index=['month'], columns='year',\
values='sale_amount_usd', \
margins=True,fill_value=0).round(2).join(stds)
ax = df_stds.plot(kind='bar', yerr = 'std_sales', colormap=('Blues'),figsize=(18,15))