我的数据框如下所示
category subcategory contract week1 week2 week3
cat1 sub1 11001 20 20 10
cat1 sub1 11001 0 0 30
cat1 sub2 11002 10 20 0
cat1 sub2 11003 10 20 0
cat2 sub3 11004 10 0 50
cat2 sub3 11005 10 20 0
我希望每周按类别和子类别计算一周内非零的唯一合同数。
category | subcategory | week1 | week2 | week3 |
-----------------------------------------------
cat1 | sub1 | 1 | 1 | 1 |
cat1 | sub2 | 2 | 2 | 0 |
cat2 | sub3 | 2 | 1 | 1 |
我正在尝试为此设置一个玩具示例,但是新的熊猫,所以我也在那里挣扎。
答案 0 :(得分:3)
首先,按'category'
,'subcategory'
,'contract'
分组,取总和并测试总和是否大于零:
In [179]: result = df.groupby(['category', 'subcategory', 'contract']).sum() > 0
In [180]: result
Out[180]:
week1 week2 week3
category subcategory contract
cat1 sub1 11001 True True True
sub2 11002 True True False
11003 True True False
cat2 sub3 11004 True False True
11005 True True False
现在按'category'
和'subcategory'
对此结果进行分组,并对这些组进行求和,以计算每个组中的项目数为True:
In [181]: result.groupby(level=['category','subcategory']).sum().dropna(axis=0)
Out[181]:
week1 week2 week3
category subcategory
cat1 sub1 1 1 1
sub2 2 2 0
cat2 sub3 2 1 1
import io
import pandas as pd
df = '''\
category | subcategory | contract | week1 | week2 | week3
cat1 | sub1 | 11001 | 20 | 20 | 10
cat1 | sub1 | 11001 | 0 | 0 | 30
cat1 | sub2 | 11002 | 10 | 20 | 0
cat1 | sub2 | 11003 | 10 | 20 | 0
cat2 | sub3 | 11004 | 10 | 0 | 50
cat2 | sub3 | 11005 | 10 | 20 | 0 '''
df = pd.read_table(io.BytesIO(df), sep=r'\s*[|]\s*')
result = df.groupby(['category', 'subcategory', 'contract']).sum() > 0
result = result.groupby(level=['category','subcategory']).sum().dropna(axis=0)
print(result)
产量
week1 week2 week3
category subcategory
cat1 sub1 1 1 1
sub2 2 2 0
cat2 sub3 2 1 1