Pandas通过总和几列来保留另一列

时间:2017-11-09 13:05:20

标签: python pandas dataframe

我有一张看起来像这样的桌子。

 msno      date  num_25  num_50  num_75  num_985  num_100  num_unq  \
0   rxIP2f2aN0rYNp+toI0Obt/N/FYQX8hcO1fTmmy2h34=  20150513       0       0       0        0        1        1   
1   rxIP2f2aN0rYNp+toI0Obt/N/FYQX8hcO1fTmmy2h34=  20150709       9       1       0        0        7       11   
2   yxiEWwE9VR5utpUecLxVdQ5B7NysUPfrNtGINaM2zA8=  20150105       3       3       0        0       68       36   
3   yxiEWwE9VR5utpUecLxVdQ5B7NysUPfrNtGINaM2zA8=  20150306       1       0       1        1       97       27   
4   yxiEWwE9VR5utpUecLxVdQ5B7NysUPfrNtGINaM2zA8=  20150501       3       0       0        0       38       38   
5   yxiEWwE9VR5utpUecLxVdQ5B7NysUPfrNtGINaM2zA8=  20150702       4       0       1        1       33       10   
6   yxiEWwE9VR5utpUecLxVdQ5B7NysUPfrNtGINaM2zA8=  20150830       3       1       0        0        4        7   
7   yxiEWwE9VR5utpUecLxVdQ5B7NysUPfrNtGINaM2zA8=  20151107       1       0       0        0        4        5   
8   yxiEWwE9VR5utpUecLxVdQ5B7NysUPfrNtGINaM2zA8=  20160110       2       0       1        0       11        6   
9   yxiEWwE9VR5utpUecLxVdQ5B7NysUPfrNtGINaM2zA8=  20160316       9       3       4        1       67       50   
10  yxiEWwE9VR5utpUecLxVdQ5B7NysUPfrNtGINaM2zA8=  20160510       5       3       2        1       67       66   
11  yxiEWwE9VR5utpUecLxVdQ5B7NysUPfrNtGINaM2zA8=  20160804       1       4       5        0       36       43   
12  yxiEWwE9VR5utpUecLxVdQ5B7NysUPfrNtGINaM2zA8=  20160926       7       1       0        1       38       20   
13  yxiEWwE9VR5utpUecLxVdQ5B7NysUPfrNtGINaM2zA8=  20161115       0       1       4        1       38       40   
14  yxiEWwE9VR5utpUecLxVdQ5B7NysUPfrNtGINaM2zA8=  20170106       0       0       0        1       39       38   
15  PNxIsSLWOJDCm7pNPFzRO/6Mmg2WeZA2nf6hw6t1x3g=  20151201       3       3       2        0        8       11   
16  PNxIsSLWOJDCm7pNPFzRO/6Mmg2WeZA2nf6hw6t1x3g=  20160628       0       0       1        1        1        3   
17  PNxIsSLWOJDCm7pNPFzRO/6Mmg2WeZA2nf6hw6t1x3g=  20170106       2       1       0        0       35       34   
18  KXF9c/T66LZIzFq+xS64icWMhDQE6miCZAtdXRjZHX8=  20150803       0       0       0        0       16       11   
19  KXF9c/T66LZIzFq+xS64icWMhDQE6miCZAtdXRjZHX8=  20160527       4       3       0        2        2       11   
20  KXF9c/T66LZIzFq+xS64icWMhDQE6miCZAtdXRjZHX8=  20160808      14       3       4        1       15       31   

我应该如何总结列'num_25', 'num_50', 'num_75', 'num_985', 'num_100', 'num_unq', 'total_secs'以获得总数并且只留下一个唯一的msno数字?

例如,在对所有相同的msno数字行进行分组后,它将生成下面的结果,丢弃日期列。

msno        num_25  num_50  num_75  num_985  num_100  num_unq  \
0    rxIP2f2aN0rYNp+toI0Obt/N/FYQX8hcO1fTmmy2h34=        9       1       0        0        8        12

我尝试了这个,但msno仍然重复,日期列仍在那里。

df_user_logs_v2.groupby(['msno', 'date'])['num_25', 'num_50', 'num_75', 'num_985', 'num_100', 'num_unq', 'total_secs'].sum()

1 个答案:

答案 0 :(得分:0)

使用drop + groupby + sum

df = df_user_logs_v2.drop('date', axis=1).groupby('msno', as_index=False).sum()