将数据框按元素和熊猫分组

时间:2020-04-17 07:13:02

标签: python python-3.x pandas dataframe pandas-groupby

我从以下形式的数据框开始:

scope   provider1    provider2   provider3
------------------------------------------
h1       A             AA          AAA
c12      B             AA          BBB
hn3      A             BB          AAA
hs34     C             CC          BBB
623x     B             DD          CCC
m23      A             AA          BBB

其中AAAAAA是三个完全不同的标签。

我想计算每个标签出现的次数并获取数据框:

label    provider   value_count
-------------------------------
A        provider1    3
B        provider1    2
C        provider1    1
AA       provider2    3
BB       provider2    1
CC       provider2    1
DD       provider2    1
AAA      provider3    2
BBB      provider3    3
CCC      provider3    1

我该怎么做?

2 个答案:

答案 0 :(得分:2)

DataFrame.melt与汇总GroupBy.size一起使用:

df = (df.melt('scope', value_name='label', var_name='provider')
        .groupby(['provider','label'])
        .size()
        .reset_index(name='value_count')
        )
print (df)
    provider label  value_count
0  provider1     A            3
1  provider1     B            2
2  provider1     C            1
3  provider2    AA            3
4  provider2    BB            1
5  provider2    CC            1
6  provider2    DD            1
7  provider3   AAA            2
8  provider3   BBB            3
9  provider3   CCC            1

替代DataFrame.set_indexDataFrame.stack

df = (df.set_index('scope')
        .stack()
        .rename_axis(['scope','provider'])
        .reset_index(name='label')
        .groupby(['provider','label'])
        .size()
        .reset_index(name='value_count')
)
print (df)
    provider label  value_count
0  provider1     A            3
1  provider1     B            2
2  provider1     C            1
3  provider2    AA            3
4  provider2    BB            1
5  provider2    CC            1
6  provider2    DD            1
7  provider3   AAA            2
8  provider3   BBB            3
9  provider3   CCC            1

答案 1 :(得分:1)

您可以meltgroupby

(df.melt(id_vars='scope', value_name='label', var_name='provider')
   .groupby(['variable', 'value']).size().reset_index())

     label provider 0
0  provider1     A  3
1  provider1     B  2
2  provider1     C  1
3  provider2    AA  3
4  provider2    BB  1
5  provider2    CC  1
6  provider2    DD  1
7  provider3   AAA  2
8  provider3   BBB  3
9  provider3   CCC  1