假设我有一个看起来像这样的文件:
+---------+---------+-------+
| Product | Quality | Origin|
+---------+---------+-------+
| Apple | Good | |
+---------+---------+-------+
| Apple | Bad | |
+---------+---------+-------+
| Apple | Bad | |
+---------+---------+-------+
| Orange | Good | |
+---------+---------+-------+
| . | | |
+---------+---------+-------+
| . | | |
+---------+---------+-------+
| Grape | Good | |
+---------+---------+-------+
我想用计数得出关键结果:
+---------+---------------+------+-----+
| Product | Total Number | Good | Bad |
+---------+---------------+------+-----+
| Apple | 5 | 3 | 2 |
+---------+---------------+------+-----+
| Orange | 8 | 5 | 3 |
+---------+---------------+------+-----+
| Grape | 3 | 1 | 2 |
+---------+---------------+------+-----+
| Total | 16 | 9 | 7 |
+---------+---------------+------+-----+
我正在使用groupby
和count
来获取总数:
Total_Product = ProdcutFile.groupby('Product').count()
但是我如何使结果表包含好坏计数?
答案 0 :(得分:0)
这是使用分配表和数据透视表的一种方法。 Assign语句将一列由1组成,并将其加总即可得出最终表中的计数。
from io import StringIO
import pandas as pd
data = '''Product Quality
Apple Good
Apple Bad
Apple Bad
Orange Good
Orange Bad
Grape Good
'''
df = (pd.read_csv(StringIO(data), sep='\s+', engine='python')
.assign(counter = 1)
.pivot_table(index='Product',
columns='Quality',
values='counter',
aggfunc=sum,
fill_value=0,
margins=True,
margins_name='Totals')
)
print(df)
Quality Bad Good Totals
Product
Apple 2 1 3
Grape 0 1 1
Orange 1 1 2
Totals 3 3 6
(提供列名和顺序很简单,没有显示。)