Question

假设我有一个DataFrame：

    term      score
0   this          0
1   that          1
2   the other     3
3   something     2
4   anything      1
5   the other     2
6   that          2
7   this          0
8   something     1

如何通过score列中的唯一值来对term列中的实例进行计数？产生如下结果：

    term      score 0     score 1     score 2     score 3
0   this            2           0           0           0
1   that            0           1           1           0
2   the other       0           0           1           1
3   something       0           1           1           0
4   anything        0           1           0           0

我在这里阅读过的相关问题包括Python Pandas counting and summing specific conditions和COUNTIF in pandas python over multiple columns with multiple conditions，但似乎都不是我想要做的。 this question中提到的pivot_table似乎很有意义，但由于缺乏经验和熊猫文档的简短，我受到了阻碍。感谢您的任何建议。

Answer 1

将groupby与size一起使用，并通过unstack进行整形，最后add_prefix：

df = df.groupby(['term','score']).size().unstack(fill_value=0).add_prefix('score ')

或使用crosstab：

df = pd.crosstab(df['term'],df['score']).add_prefix('score ')

或pivot_table：

df = (df.pivot_table(index='term',columns='score', aggfunc='size', fill_value=0)
        .add_prefix('score '))

print (df)
score      score 0  score 1  score 2  score 3
term                                         
anything         0        1        0        0
something        0        1        1        0
that             0        1        1        0
the other        0        0        1        1
this             2        0        0        0

Answer 2

您还可以将get_dummies，set_index和sum与level参数一起使用：

(pd.get_dummies(df.set_index('term'), columns=['score'], prefix_sep=' ')
   .sum(level=0)
   .reset_index())

输出：

        term  score 0  score 1  score 2  score 3
0       this        2        0        0        0
1       that        0        1        1        0
2  the other        0        0        1        1
3  something        0        1        1        0
4   anything        0        1        0        0

pandas-计算每个列中每个唯一值在DataFrame中出现的值

2 个答案: