pandas数据帧计数相对于另一列是唯一的

时间:2015-03-17 12:56:27

标签: python pandas

我的数据框如下所示

category  subcategory  contract     week1   week2  week3
cat1      sub1          11001       20     20      10 
cat1      sub1          11001       0      0       30  
cat1      sub2          11002       10     20      0  
cat1      sub2          11003       10     20      0  
cat2      sub3          11004       10     0       50 
cat2      sub3          11005       10     20      0 

我希望每周按类别和子类别计算一周内非零的唯一合同数。

category | subcategory | week1 | week2 | week3 |
-----------------------------------------------
cat1     | sub1        | 1     | 1     |   1   |
cat1     | sub2        | 2     | 2     |   0   |
cat2     | sub3        | 2     | 1     |   1   | 

我正在尝试为此设置一个玩具示例,但是新的熊猫,所以我也在那里挣扎。

1 个答案:

答案 0 :(得分:3)

首先,按'category''subcategory''contract'分组,取总和并测试总和是否大于零:

In [179]: result = df.groupby(['category', 'subcategory', 'contract']).sum() > 0

In [180]: result
Out[180]: 
                              week1  week2  week3
category subcategory contract                    
cat1     sub1        11001     True   True   True
         sub2        11002     True   True  False
                     11003     True   True  False
cat2     sub3        11004     True  False   True
                     11005     True   True  False

现在按'category''subcategory'此结果进行分组,并对这些组进行求和,以计算每个组中的项目数为True:

In [181]: result.groupby(level=['category','subcategory']).sum().dropna(axis=0)
Out[181]: 
                      week1  week2  week3
category subcategory                     
cat1     sub1             1      1      1
         sub2             2      2      0
cat2     sub3             2      1      1

import io
import pandas as pd

df = '''\
category | subcategory | contract | week1 | week2 | week3
cat1     | sub1         | 11001 |      20 |    20 |    10
cat1     | sub1         | 11001 |      0  |    0  |    30   
cat1     | sub2         | 11002 |      10 |    20 |    0    
cat1     | sub2         | 11003 |      10 |    20 |    0    
cat2     | sub3         | 11004 |      10 |    0  |    50   
cat2     | sub3         | 11005 |      10 |    20 |    0 '''

df = pd.read_table(io.BytesIO(df), sep=r'\s*[|]\s*')
result = df.groupby(['category', 'subcategory', 'contract']).sum() > 0
result = result.groupby(level=['category','subcategory']).sum().dropna(axis=0)
print(result)

产量

                      week1  week2  week3
category subcategory                     
cat1     sub1             1      1      1
         sub2             2      2      0
cat2     sub3             2      1      1