按多列分组并按月/年分组

时间:2019-11-24 00:44:38

标签: python python-3.x pandas

我需要在熊猫上做数据透视表。

我有以下Pandas数据框:

from wand import image

with image.Image(filename='your.png') as img:
    img.compression = 'dxt5'
    img.save(filename='your.dds')

我想用这种方式来解决这个问题:

+------------------+---------+-----------------+-------------+-------------------+
| Date             | Product | Status of Order | # of Orders | Total Order Value |
+------------------+---------+-----------------+-------------+-------------------+
| January 1, 2016  | Windows | Cancelled       | 360         | 1000              |
+------------------+---------+-----------------+-------------+-------------------+
| January 2, 2016  | Mac     | Cancelled       | 120         | 2000              |
+------------------+---------+-----------------+-------------+-------------------+
| January 3, 2016  | Mac     | Completed       | 940         | 500               |
+------------------+---------+-----------------+-------------+-------------------+
| ...              | ...     | ...             | ...         | ...               |
+------------------+---------+-----------------+-------------+-------------------+
| February 1, 2016 | Windows | Completed       | 60          | 1300              |
+------------------+---------+-----------------+-------------+-------------------+
| February 1, 2016 | Mac     | Cancelled       | 420         | 2500              |
+------------------+---------+-----------------+-------------+-------------------+
| February 3, 1916 | Windows | Completed       | 610         | 3400              |
+------------------+---------+-----------------+-------------+-------------------+

我尝试过的是df.set_index('Date')。resample('M')[“ Orders of Orders”]。sum(),但我真正想要的是按“产品”,“ “订单状态”和“ #ofOrders的总和”,然后执行然后执行总和...这通常需要在Excel数据透视表上单击几下并需要一分钟,但是过去一个小时我一直在与Pandas交流。 ..

下面是创建表的代码(随机值)。

+---------+-----------------+-----------------------------+------------------------------+
| Product | Status of Order | Sum of #ofOrders in January | Sum of #ofOrders in February |
+---------+-----------------+-----------------------------+------------------------------+
| Windows | Completed       | 0                           | 670                          |
+---------+-----------------+-----------------------------+------------------------------+
|         | Cancelled       | 360                         | 0                            |
+---------+-----------------+-----------------------------+------------------------------+
| Mac     | Completed       | 940                         | 0                            |
+---------+-----------------+-----------------------------+------------------------------+
|         | Cancelled       | 120                         | 420                          |
+---------+-----------------+-----------------------------+------------------------------+

1 个答案:

答案 0 :(得分:1)

您可以分两步完成

按月份分组并汇总所有订单

temp_df = df.groupby([pd.Grouper(freq='M'), 'Product', 'Status of Order']).agg({'# of Orders': 'sum'}).reset_index()

然后旋转date and orders并与temp_df合并

df = temp_df[['Product', 'Status of Order']].merge(temp_df[['Date', '# of Orders']].pivot(columns='Date', values='# of Orders').fillna(0), left_index=True, right_index=True)

最后您可以对其进行排序

df = df.sort_values(['Product'])

结果

   Product Status of Order  2016-01-31 00:00:00  2016-02-29 00:00:00
0      Mac       Cancelled               2400.0                  0.0
1      Mac       Completed               4410.0                  0.0
3      Mac       Cancelled                  0.0               1600.0
4      Mac       Completed                  0.0               2590.0
2  Windows       Cancelled               6460.0                  0.0
5  Windows       Cancelled                  0.0               4140.0