Question

我写了以下内容：

ax = df.pivot_table(index=['month'], columns='year', values='sale_amount_usd', margins=True,fill_value=0).round(2).plot(kind='bar',colormap=('Blues'),figsize=(18,15))
plt.legend(loc='best')
plt.ylabel('Average Sales Amount in USD')
plt.xlabel('Month')
plt.xticks(rotation=0)
plt.title('Average Sales Amount in USD by Month/Year')
for p in ax.patches:
    ax.annotate(str(p.get_height()), (p.get_x() * 1.001, p.get_height() * 1.005))
plt.show();

哪个返回漂亮的条形图：

我现在想知道几年之间每个月内的均值差异是否显着。换句话说，从2013年3月的321美元增加到2014年3月的365美元是否意味着平均销售额的大幅增长？

我该怎么做？有没有办法在数据透视表上覆盖一个标记，该标记可以直观地告诉我何时差异很大？

已编辑以添加示例数据：

    event_id    event_date  week_number week_of_month   holiday month   day year    pub_organization_id clicks  sales   click_to_sale_conversion_rate   sale_amount_usd per_sale_amount_usd per_click_sale_amount   pub_commission_usd  per_sale_pub_commission_usd per_click_pub_commission_usd
0   3365    1/11/13 2   2   NaN 1. January  11  2013    214 11945   754 0.06    40311.75    53.46   3.37    2418.71 3.21    0.20
1   13793   2/12/13 7   3   NaN 2. February 12  2013    214 11711   1183    0.10    73768.54    62.36   6.30    4426.12 3.74    0.38
2   4626    1/15/13 3   3   NaN 1. January  15  2013    214 11561   1029    0.09    70356.46    68.37   6.09    4221.39 4.10    0.37
3   10917   2/3/13  6   1   NaN 2. February 3   2013    167 11481   0   0.00    0.00    0.00    0.00    0.00    0.00    0.00
4   14653   2/15/13 7   3   NaN 2. February 15  2013    214 11268   795 0.07    37262.56    46.87   3.31    2235.77 2.81    0.20
5   18448   2/27/13 9   5   NaN 2. February 27  2013    214 11205   504 0.04    48773.71    96.77   4.35    2926.43 5.81    0.26
6   11382   2/5/13  6   2   NaN 2. February 5   2013    214 11166   1324    0.12    93322.84    70.49   8.36    5599.38 4.23    0.50
7   14764   2/16/13 7   3   NaN 2. February 16  2013    214 11042   451 0.04    22235.51    49.30   2.01    1334.14 2.96    0.12
8   17080   2/23/13 8   4   NaN 2. February 23  2013    214 10991   248 0.02    14558.85    58.71   1.32    873.53  3.52    0.08
9   21171   3/8/13  10  2   NaN 3. March    8   2013    214 10910   1081    0.10    52005.12    48.11   4.77    3631.28 3.36    0.33
10  16417   2/21/13 8   4   NaN 2. February 21  2013    214 10826   507 0.05    44907.20    88.57   4.15    2694.43 5.31    0.25
11  13399   2/11/13 7   3   NaN 2. February 11  2013    214 10772   1142    0.11    38549.55    33.76   3.58    2312.97 2.03    0.21
12  1532    1/5/13  1   1   NaN 1. January  5   2013    214 10750   610 0.06    29838.49    48.92   2.78    1790.31 2.93    0.17
13  22500   3/13/13 11  3   NaN 3. March    13  2013    214 10743   821 0.08    47310.71    57.63   4.40    3688.83 4.49    0.34
14  5840    1/19/13 3   3   NaN 1. January  19  2013    214 10693   487 0.05    28427.35    58.37   2.66    1705.64 3.50    0.16
15  19566   3/3/13  10  1   NaN 3. March    3   2013    214 10672   412 0.04    15722.29    38.16   1.47    1163.16 2.82    0.11
16  26313   3/25/13 13  5   NaN 3. March    25  2013    214 10629   529 0.05    21946.51    41.49   2.06    1589.84 3.01    0.15
17  19732   3/4/13  10  2   NaN 3. March    4   2013    214 10619   1034    0.10    37257.20    36.03   3.51    2713.71 2.62    0.26
18  18569   2/28/13 9   5   NaN 2. February 28  2013    214 10603   414 0.04    40920.28    98.84   3.86    2455.22 5.93    0.23
19  8704    1/28/13 5   5   NaN 1. January  28  2013    214 10548   738 0.07    29041.87    39.35   2.75    1742.52 2.36    0.17

Answer 1

尽管不是确定性的，但可以使用误差线（通过yerr中的plt.plot参数）来表示不确定性的一个标准偏差，然后仅注意间隔的重叠即可。诸如此类（未经测试）...

stds = df.groupby(['month', 'year'])['sale_amount_usd'].std().to_frame()

stds.columns = ['std_sales']

df_stds = df.pivot_table(index=['month'], columns='year',\
                values='sale_amount_usd', \
                margins=True,fill_value=0).round(2).join(stds)

ax = df_stds.plot(kind='bar', yerr = 'std_sales', colormap=('Blues'),figsize=(18,15))

Python Pandas数据透视表-如何判断数据透视表中均值之间的差异是否显着？

1 个答案: