我有一个看起来像这样的数据框:
Category Shuffled Name Sequence Length
0 pgm 0 protein1 IAAI 4
1 pgm 0 protein2 PGGP 4
2 pgm 0 protein3 KIIK 4
3 pgm 0 protein4 PGGP 4
4 btn 0 protein1 ABBA 4
5 btn 0 protein2 IAAI 4
6 btn 0 protein3 ABBA 4
7 btn 0 protein4 PGGP 4
8 pgm 1 protein1 IAAI 4
9 pgm 1 protein2 PGGP 4
10 pgm 1 protein3 KIIK 4
11 pgm 1 protein4 PGGP 4
12 btn 1 protein1 ABBA 4
13 btn 1 protein2 IAAI 4
14 btn 1 protein3 ABBA 4
15 btn 1 protein4 PGGP 4
我想计算每个Sequence
/ Category
组中Shuffled
的出现次数,并将其添加为新列。结果数据应如下所示:
Category Shuffled Name Sequence Length Sequence_count
0 pgm 0 protein1 IAAI 4 1
1 pgm 0 protein2 PGGP 4 2
2 pgm 0 protein3 KIIK 4 1
3 pgm 0 protein4 PGGP 4 2
4 btn 0 protein1 ABBA 4 2
5 btn 0 protein2 IAAI 4 1
6 btn 0 protein3 ABBA 4 2
7 btn 0 protein4 PGGP 4 1
8 pgm 1 protein1 IAAI 4 1
9 pgm 1 protein2 PGGP 4 2
10 pgm 1 protein3 KIIK 4 1
11 pgm 1 protein4 PGGP 4 2
12 btn 1 protein1 ABBA 4 2
13 btn 1 protein2 IAAI 4 1
14 btn 1 protein3 ABBA 4 2
15 btn 1 protein4 PGGP 4 1
到目前为止,我一直在尝试的有效方法是
counts = df.groupby(['Category', 'Shuffled'])['Sequence'].value_counts()
这给了我
Category Shuffled Sequence
pgm 0 PGGP 2
IAAI 1
KIIK 1
1 PGGP 2
IAAI 1
KIIK 1
btn 0 ABBA 2
IAAI 1
PGGP 1
1 ABBA 2
IAAI 1
PGGP 1
这些是我想要的值,但是如何在原始数据帧的相应行中获取它们?
答案 0 :(得分:1)
您可以
df['Sequence_count'] = df.groupby(['Category', 'Shuffled','Sequence'])['Sequence'].transform('count')