Python:具有groupby类别计数的数据透视表

时间:2020-08-13 02:44:03

标签: python pandas

假设我有一个看起来像这样的文件:

+---------+---------+-------+
| Product | Quality | Origin|
+---------+---------+-------+
| Apple   | Good    |       |
+---------+---------+-------+
| Apple   | Bad     |       |
+---------+---------+-------+
| Apple   | Bad     |       |
+---------+---------+-------+
| Orange  | Good    |       |
+---------+---------+-------+
| .       |         |       |
+---------+---------+-------+
| .       |         |       |
+---------+---------+-------+
| Grape   | Good    |       |
+---------+---------+-------+

我想用计数得出关键结果:

+---------+---------------+------+-----+
| Product | Total Number  | Good | Bad |
+---------+---------------+------+-----+
| Apple   | 5             | 3    | 2   |
+---------+---------------+------+-----+
| Orange  | 8             | 5    | 3   |
+---------+---------------+------+-----+
| Grape   | 3             | 1    | 2   |
+---------+---------------+------+-----+
| Total   | 16            | 9    | 7   |
+---------+---------------+------+-----+

我正在使用groupbycount来获取总数:

Total_Product = ProdcutFile.groupby('Product').count()

但是我如何使结果表包含好坏计数?

1 个答案:

答案 0 :(得分:0)

这是使用分配表和数据透视表的一种方法。 Assign语句将一列由1组成,并将其加总即可得出最终表中的计数。

from io import StringIO
import pandas as pd

data = '''Product  Quality 
Apple    Good    
Apple    Bad     
Apple    Bad     
Orange   Good
Orange   Bad
Grape    Good    
'''

df = (pd.read_csv(StringIO(data), sep='\s+', engine='python')
        .assign(counter = 1)
        .pivot_table(index='Product', 
                     columns='Quality', 
                     values='counter', 
                     aggfunc=sum, 
                     fill_value=0, 
                     margins=True, 
                     margins_name='Totals')
     )
print(df)

Quality  Bad  Good  Totals
Product                   
Apple      2     1       3
Grape      0     1       1
Orange     1     1       2
Totals     3     3       6

(提供列名和顺序很简单,没有显示。)