Pandas groupby,pivot还是stack?将单个列的组转换为多个列

时间:2016-11-02 14:17:07

标签: python-2.7 pandas

我的数据如下:

2     PresentationID   12954
5          Attendees      65
6          Downloads       0
7          Questions       0
8              Likes      11
9             Tweets       0
10             Polls       0
73    PresentationID   12953
76         Attendees      64
77         Downloads      31
78         Questions       0
79             Likes      11
80            Tweets       0
81             Polls       0
143   PresentationID   12951
146        Attendees      64
147        Downloads      28
148        Questions       2
149            Likes       2
150           Tweets       0
151            Polls       0

我需要达到这种格式:

   PresentationID  Attendees  Downloads  Questions  Likes  Tweets  Polls   
0           12954         65          0          0     11       0      0   
1           12953         64         31          6       0      4   
2           12892        204          0          0     14       0      0  

我尝试了几种groupby,pivot和stack的组合,但没有用。任何建议都非常感谢。感谢。

1 个答案:

答案 0 :(得分:5)

您可以cumcount使用pivot

print (df)
      A               B      C
0     2  PresentationID  12954
1     5       Attendees     65
2     6       Downloads      0
3     7       Questions      0
4     8           Likes     11
5     9          Tweets      0
6    10           Polls      0
7    73  PresentationID  12953
8    76       Attendees     64
9    77       Downloads     31
10   78       Questions      0
11   79           Likes     11
12   80          Tweets      0
13   81           Polls      0
14  143  PresentationID  12951
15  146       Attendees     64
16  147       Downloads     28
17  148       Questions      2
18  149           Likes      2
19  150          Tweets      0
20  151           Polls      0

df['G'] = df.groupby('B').cumcount()
df = df.pivot(index='G', columns='B', values='C')
print (df)
B  Attendees  Downloads  Likes  Polls  PresentationID  Questions  Tweets
G                                                                       
0         65          0     11      0           12954          0       0
1         64         31     11      0           12953          0       0
2         64         28      2      0           12951          2       0
df = pd.pivot(index=df.groupby('B').cumcount(), columns=df.B, values=df.C)
print (df)
B  Attendees  Downloads  Likes  Polls  PresentationID  Questions  Tweets
0         65          0     11      0           12954          0       0
1         64         31     11      0           12953          0       0
2         64         28      2      0           12951          2       0