我从以下形式的数据框开始:
scope provider1 provider2 provider3
------------------------------------------
h1 A AA AAA
c12 B AA BBB
hn3 A BB AAA
hs34 C CC BBB
623x B DD CCC
m23 A AA BBB
其中A
,AA
和AAA
是三个完全不同的标签。
我想计算每个标签出现的次数并获取数据框:
label provider value_count
-------------------------------
A provider1 3
B provider1 2
C provider1 1
AA provider2 3
BB provider2 1
CC provider2 1
DD provider2 1
AAA provider3 2
BBB provider3 3
CCC provider3 1
我该怎么做?
答案 0 :(得分:2)
将DataFrame.melt
与汇总GroupBy.size
一起使用:
df = (df.melt('scope', value_name='label', var_name='provider')
.groupby(['provider','label'])
.size()
.reset_index(name='value_count')
)
print (df)
provider label value_count
0 provider1 A 3
1 provider1 B 2
2 provider1 C 1
3 provider2 AA 3
4 provider2 BB 1
5 provider2 CC 1
6 provider2 DD 1
7 provider3 AAA 2
8 provider3 BBB 3
9 provider3 CCC 1
替代DataFrame.set_index
和DataFrame.stack
:
df = (df.set_index('scope')
.stack()
.rename_axis(['scope','provider'])
.reset_index(name='label')
.groupby(['provider','label'])
.size()
.reset_index(name='value_count')
)
print (df)
provider label value_count
0 provider1 A 3
1 provider1 B 2
2 provider1 C 1
3 provider2 AA 3
4 provider2 BB 1
5 provider2 CC 1
6 provider2 DD 1
7 provider3 AAA 2
8 provider3 BBB 3
9 provider3 CCC 1
答案 1 :(得分:1)
(df.melt(id_vars='scope', value_name='label', var_name='provider')
.groupby(['variable', 'value']).size().reset_index())
label provider 0
0 provider1 A 3
1 provider1 B 2
2 provider1 C 1
3 provider2 AA 3
4 provider2 BB 1
5 provider2 CC 1
6 provider2 DD 1
7 provider3 AAA 2
8 provider3 BBB 3
9 provider3 CCC 1