我有一个2列的pandas数据框。 report_tags
是逗号分隔的单词,t_f
是yes或no(1或0)的标志。我想将这些逗号分隔的单词分开并按t_f
分组。然后将tag/t_f
分组汇总到一个名为count
df
report_tags t_f
0 bec,eac,fbi,ic3,scam 1
1 dlink,router,wifi 0
2 adobe 0
3 bec, fbi 1
4 bec, fbi, scam 0
所需的输出:
df2
tag t_f count
0 bec 1 2
1 eac 1 1
2 fbi 1 2
3 ic3 1 1
4 scam 1 1
5 dlink 0 1
6 router 0 1
7 wifi 0 1
8 adobe 0 1
9 bec 0 1
10 fbi 0 1
11 scam 0 1
答案 0 :(得分:1)
使用str.split
+ explode
k = dict(sort=False)
(df.set_index('t_f')['report_tags']
.str.split(r',\s*').explode()
.groupby(level=0, **k).value_counts(**k)
.rename('count').reset_index())
t_f report_tags count
0 1 bec 2
1 1 eac 1
2 1 fbi 2
3 1 ic3 1
4 1 scam 1
5 0 adobe 1
6 0 bec 1
7 0 dlink 1
8 0 fbi 1
9 0 router 1
10 0 scam 1
11 0 wifi 1