我的代码如下:
df.loc[df['Shape'].isin(Shapes), 'Shape'].value_counts().div(len(df)).to_frame().reset_index()
这给了我出现的次数,然后给了%,这个值是整个数据帧的三角形。但是,如果我想添加另一列以将其作为一个组进行分层,我将如何对其进行调整?
当前代码为我提供了整个df中每种形状的百分比
Triangle .20
Square .40
Circle .40
我也希望它带有颜色,所以输出如下:
Triangle Blue .20
Triangle Red .40
Triangle Black .40
Square Blue .40
Square Red .30
Square Purple.30
...
谢谢
答案 0 :(得分:2)
我认为您可以将GroupBy.size
用于多列:
np.random.seed(2020)
s = ['Triangle','Square','Circle', 'Rectangle']
c = ['Blue','Red','Black', 'Purple']
df = pd.DataFrame({'Shape':np.random.choice(s, size=20),
'Colors':np.random.choice(c, size=20)})
#print (df)
Shapes = ['Triangle','Square','Circle']
df1 = (df.loc[df['Shape'].isin(Shapes)]
.groupby(['Shape', 'Colors'])
.size()
.div(len(df))
.reset_index(name='per'))
print (df1)
Shape Colors per
0 Circle Black 0.10
1 Circle Red 0.05
2 Square Blue 0.05
3 Square Red 0.10
4 Triangle Black 0.05
5 Triangle Blue 0.05
6 Triangle Purple 0.10
7 Triangle Red 0.10
用SeriesGroupBy.value_counts
替代,不同之处在于按组对值进行排序:
df1 = (df.loc[df['Shape'].isin(Shapes)]
.groupby(['Shape'])['Colors']
.value_counts()
.div(len(df))
.reset_index(name='per'))
print (df1)
Shape Colors per
0 Circle Black 0.10
1 Circle Red 0.05
2 Square Red 0.10
3 Square Blue 0.05
4 Triangle Purple 0.10
5 Triangle Red 0.10
6 Triangle Black 0.05
7 Triangle Blue 0.05
如果要每个组的百分比(每个组的总百分比是1
或100%
),请使用:
Shapes = ['Triangle','Square','Circle']
df2 = (df.loc[df['Shape'].isin(Shapes)]
.groupby(['Shape'])['Colors']
.value_counts(normalize=True)
.reset_index(name='per'))
print (df2)
Shape Colors per
0 Circle Black 0.666667
1 Circle Red 0.333333
2 Square Red 0.666667
3 Square Blue 0.333333
4 Triangle Purple 0.333333
5 Triangle Red 0.333333
6 Triangle Black 0.166667
7 Triangle Blue 0.166667