我有一个看起来像这样的数据框:
| PACKAGES SHIPPED | PACKAGES TRANSFERRED |
Product & Quantity | Apple-5 pk | Apple-5 pk | Apple-5 pk | Apple-5pk |
Store Branch I.D. | 34234324 | 34235555 | 34234324 | 34235555 |
----------------------------------------------------------------------------
Period Week
5/14 - 5/20 | 5 | 10 | 20 | 7 |
5/21 - 5/27 | 40 | X | 1 | Y |
此数据框具有“已包装的包裹”的多列标题,其中许多商店分支都将具有“已包装的包裹”。
如果我想针对特定的“产品和数量”值以及特定的“商店和分支机构ID”求和“已运送的包裹”和“已转移的包裹”,那么对于每个期间周,最有效的方法是做这个?
理想的结果数据框为:
|Sum Shipped & Transferred|Sum Shipped & Transferred |
Product & Quantity | Apple-5 pk | Apple-10 pk |
Store Branch I.D. | 34234324 | 34235555 | 34234324 | 34235555 |
----------------------------------------------------------------------------
Period Week
5/14 - 5/20 | 25 | 17 | 40 | 234 |
5/21 - 5/27 | 41 | X+Y | 34 | 25 |
答案 0 :(得分:0)
考虑将其表示为数据框而不是图片可能会有所帮助。这是考虑问题的一种简单方法。当然,如果您确实按图片所示将数据存储在多列索引中,那么这将毫无帮助。
In [33]: df = pd.DataFrame({'Period Week':['5/14 - 5/20','5/21 - 5/27','5/14 - 5/20','5/21 - 5/27'],'Transaction':['Shi
...: pped','Shipped','Transfered','Transfered'],'Package SKU':['Apples-5k','Apples-10k','Apples-5k','Apples-10k'],'
...: Quantity':[5,10,20,7]})
In [34]: df
Out[34]:
Period Week Transaction Package SKU Quantity
0 5/14 - 5/20 Shipped Apples-5k 5
1 5/21 - 5/27 Shipped Apples-10k 10
2 5/14 - 5/20 Transfered Apples-5k 20
3 5/21 - 5/27 Transfered Apples-10k 7
然后将索引设置为多列:
df.set_index(['Period Week','Transaction','Package SKU'])
最后,groupby和calc
In [35]: df.groupby(['Period Week','Package SKU'])['Quantity'].count()
Out[35]:
Period Week Package SKU
5/14 - 5/20 Apples-5k 2
5/21 - 5/27 Apples-10k 2
Name: Quantity, dtype: int64