假设有一个如下所示的数据框:
V290 V311
0 GOOD TOP QUARTER
1 NK-UNASCERTAIN MIDDLE HALF
2 AVERAGE TOP QUARTER
3 POOR NK-UNASCERTAIN
4 POOR MIDDLE HALF
5 GOOD MIDDLE HALF
6 POOR TOP QUARTER
7 AVERAGE MIDDLE HALF
8 POOR MIDDLE HALF
9 AVERAGE MIDDLE HALF
10 POOR MIDDLE HALF
11 POOR MIDDLE HALF
12 AVERAGE MIDDLE HALF
13 AVERAGE TOP QUARTER
我希望按['V311']对这些数据进行分组,看看每个['V311']子类别中有多少GOOD或POOR。 我想做这样的事情:
Top Quarter:GOOD:12
POOR:30
Average:15
Middle half:GOOD:5
POOR:19
Average:3
等等......
答案 0 :(得分:3)
您可以使用unstack进行转轴,即
df.pivot_table(index='V290',columns='V311',aggfunc='size',fill_value=0).unstack()
V311 V290
MIDDLE HALF AVERAGE 3
GOOD 1
NK-UNASCERTAIN 1
POOR 4
NK-UNASCERTAIN AVERAGE 0
GOOD 0
NK-UNASCERTAIN 0
POOR 1
TOP QUARTER AVERAGE 2
GOOD 1
NK-UNASCERTAIN 0
POOR 1
dtype: int64
另外:
df.groupby(['V290','V311']).size().unstack().fillna(0).unstack()
如果你想要百分比,那么你可以除以总和,即
ndf = df.pivot_table(index='V290',columns='V311',aggfunc='size',fill_value=0)
percents = (ndf/ndf.sum()*100).unstack()
V311 V290
MIDDLE HALF AVERAGE 33.333333
GOOD 11.111111
NK-UNASCERTAIN 11.111111
POOR 44.444444
NK-UNASCERTAIN AVERAGE 0.000000
GOOD 0.000000
NK-UNASCERTAIN 0.000000
POOR 100.000000
TOP QUARTER AVERAGE 33.333333
GOOD 33.333333
NK-UNASCERTAIN 0.000000
POOR 33.333333
dtype: float64
答案 1 :(得分:2)
将dict comprehenion
与groupby
,value_counts
一起使用并转换为dict
:
d = {k:v.value_counts().to_dict() for k,v in df.groupby('V311')['V290']}
print (d)
{'NK-UNASCERTAIN': {'POOR': 1},
'MIDDLE HALF': {'POOR': 4, 'NK-UNASCERTAIN': 1, 'AVERAGE': 3, 'GOOD': 1},
'TOP QUARTER': {'POOR': 1, 'AVERAGE': 2, 'GOOD': 1}}
输出为Series
:
s = df.groupby('V311')['V290'].value_counts()
print (s)
V311 V290
MIDDLE HALF POOR 4
AVERAGE 3
GOOD 1
NK-UNASCERTAIN 1
NK-UNASCERTAIN POOR 1
TOP QUARTER AVERAGE 2
GOOD 1
POOR 1
Name: V290, dtype: int64
编辑:如果需要相对频率:
s = df.groupby('V311')['V290'].value_counts(normalize=True)
print (s)
V311 V290
MIDDLE HALF POOR 0.444444
AVERAGE 0.333333
GOOD 0.111111
NK-UNASCERTAIN 0.111111
NK-UNASCERTAIN POOR 1.000000
TOP QUARTER AVERAGE 0.500000
GOOD 0.250000
POOR 0.250000
Name: V290, dtype: float64
EDIT1:
如果想要所有缺少的类别:
s = df.groupby('V311')['V290'].value_counts()
s = s.reindex(pd.MultiIndex.from_product(s.index.levels), fill_value=0)
print (s)
MIDDLE HALF AVERAGE 3
GOOD 1
NK-UNASCERTAIN 1
POOR 4
NK-UNASCERTAIN AVERAGE 0
GOOD 0
NK-UNASCERTAIN 0
POOR 1
TOP QUARTER AVERAGE 2
GOOD 1
NK-UNASCERTAIN 0
POOR 1
Name: V290, dtype: int64
答案 2 :(得分:2)
仅使用熊猫:
import pandas as pd
dataframe = pd.DataFrame()
dataframe['V311'] = ['MIDDLE','TOP','MIDDLE','TOP','MIDDLE','TOP','TOP']
print(dataframe['V311'].value_counts())
输出:
TOP 4
MIDDLE 3
Name: V311, dtype: int64