我需要在熊猫上做数据透视表。
我有以下Pandas数据框:
from wand import image
with image.Image(filename='your.png') as img:
img.compression = 'dxt5'
img.save(filename='your.dds')
我想用这种方式来解决这个问题:
+------------------+---------+-----------------+-------------+-------------------+
| Date | Product | Status of Order | # of Orders | Total Order Value |
+------------------+---------+-----------------+-------------+-------------------+
| January 1, 2016 | Windows | Cancelled | 360 | 1000 |
+------------------+---------+-----------------+-------------+-------------------+
| January 2, 2016 | Mac | Cancelled | 120 | 2000 |
+------------------+---------+-----------------+-------------+-------------------+
| January 3, 2016 | Mac | Completed | 940 | 500 |
+------------------+---------+-----------------+-------------+-------------------+
| ... | ... | ... | ... | ... |
+------------------+---------+-----------------+-------------+-------------------+
| February 1, 2016 | Windows | Completed | 60 | 1300 |
+------------------+---------+-----------------+-------------+-------------------+
| February 1, 2016 | Mac | Cancelled | 420 | 2500 |
+------------------+---------+-----------------+-------------+-------------------+
| February 3, 1916 | Windows | Completed | 610 | 3400 |
+------------------+---------+-----------------+-------------+-------------------+
我尝试过的是df.set_index('Date')。resample('M')[“ Orders of Orders”]。sum(),但我真正想要的是按“产品”,“ “订单状态”和“ #ofOrders的总和”,然后执行然后执行总和...这通常需要在Excel数据透视表上单击几下并需要一分钟,但是过去一个小时我一直在与Pandas交流。 ..
下面是创建表的代码(随机值)。
+---------+-----------------+-----------------------------+------------------------------+
| Product | Status of Order | Sum of #ofOrders in January | Sum of #ofOrders in February |
+---------+-----------------+-----------------------------+------------------------------+
| Windows | Completed | 0 | 670 |
+---------+-----------------+-----------------------------+------------------------------+
| | Cancelled | 360 | 0 |
+---------+-----------------+-----------------------------+------------------------------+
| Mac | Completed | 940 | 0 |
+---------+-----------------+-----------------------------+------------------------------+
| | Cancelled | 120 | 420 |
+---------+-----------------+-----------------------------+------------------------------+
答案 0 :(得分:1)
您可以分两步完成
按月份分组并汇总所有订单
temp_df = df.groupby([pd.Grouper(freq='M'), 'Product', 'Status of Order']).agg({'# of Orders': 'sum'}).reset_index()
然后旋转date and orders
并与temp_df
合并
df = temp_df[['Product', 'Status of Order']].merge(temp_df[['Date', '# of Orders']].pivot(columns='Date', values='# of Orders').fillna(0), left_index=True, right_index=True)
最后您可以对其进行排序
df = df.sort_values(['Product'])
结果
Product Status of Order 2016-01-31 00:00:00 2016-02-29 00:00:00
0 Mac Cancelled 2400.0 0.0
1 Mac Completed 4410.0 0.0
3 Mac Cancelled 0.0 1600.0
4 Mac Completed 0.0 2590.0
2 Windows Cancelled 6460.0 0.0
5 Windows Cancelled 0.0 4140.0