假设我有一个DataFrame:
term score
0 this 0
1 that 1
2 the other 3
3 something 2
4 anything 1
5 the other 2
6 that 2
7 this 0
8 something 1
如何通过score
列中的唯一值来对term
列中的实例进行计数?产生如下结果:
term score 0 score 1 score 2 score 3
0 this 2 0 0 0
1 that 0 1 1 0
2 the other 0 0 1 1
3 something 0 1 1 0
4 anything 0 1 0 0
我在这里阅读过的相关问题包括Python Pandas counting and summing specific conditions和COUNTIF in pandas python over multiple columns with multiple conditions,但似乎都不是我想要做的。 this question中提到的pivot_table
似乎很有意义,但由于缺乏经验和熊猫文档的简短,我受到了阻碍。感谢您的任何建议。
答案 0 :(得分:6)
将groupby
与size
一起使用,并通过unstack
进行整形,最后add_prefix
:
df = df.groupby(['term','score']).size().unstack(fill_value=0).add_prefix('score ')
或使用crosstab
:
df = pd.crosstab(df['term'],df['score']).add_prefix('score ')
df = (df.pivot_table(index='term',columns='score', aggfunc='size', fill_value=0)
.add_prefix('score '))
print (df)
score score 0 score 1 score 2 score 3
term
anything 0 1 0 0
something 0 1 1 0
that 0 1 1 0
the other 0 0 1 1
this 2 0 0 0
答案 1 :(得分:6)
您还可以将get_dummies
,set_index
和sum
与level
参数一起使用:
(pd.get_dummies(df.set_index('term'), columns=['score'], prefix_sep=' ')
.sum(level=0)
.reset_index())
输出:
term score 0 score 1 score 2 score 3
0 this 2 0 0 0
1 that 0 1 1 0
2 the other 0 0 1 1
3 something 0 1 1 0
4 anything 0 1 0 0