如何使用value_counts为一列与另一列派生新列

时间:2018-07-30 21:14:34

标签: python pandas data-science

我有包含许多列的数据框。

df2



   TargetDescription                               Output_media_duration
0   VMN 4.0 16x9 25 - 1920x1080, 1280x720, 960x540...    NaN
1   VMN 4.0 16x9 25 - 1920x1080, 1280x720, 960x540...    NaN
2   XDCAM HD NTSC 1920x1080 MXF 8CA                      661.120000
3   VMN 4.0 16x9 29.97 - 1920x1080, 1280x720, 960x...   285.647686
4   VMN 4.0 16x9 29.97 - 1920x1080, 1280x720, 960x...   402.697303
5   VMN 4.0 16x9 29.97 - 1920x1080, 1280x720, 960x...   269.597070
6   VMN 4.0 16x9 29.97 - 1920x1080, 1280x720, 960x...   307.059607
7   Caption QC HD MOV 2CA                               2516.096917
8   QT Proxy 640x360 2997 12CA                          NaN
9   XDCAM HD NTSC 1920x1080 MXF 8CA                     1414.785215
10  Caption QC HD MOV 2CA                               1295.027067
11  QT Proxy 640x360 2398 4CA                           2524.980792
12  Caption QC HD MOV 2CA                               120.820700
13  Caption QC HD MOV 2CA                               2516.096917

现在我想获得一个新的数据框,该数据框将显示为

TargetDescription                                                     format_duration
1   VMN 4.0 16x9 25 - 1920x1080, 1280x720, 960x540...                       NaN
2   XDCAM HD NTSC 1920x1080 MXF 8CA                                         661.120000
3   VMN 4.0 16x9 29.97 - 1920x1080, 1280x720, 960x...                       1656.561906 
4   Caption QC HD MOV 2CA                                                   2516.096917
5   QT Proxy 640x360 2997 12CA                                              NaN
6   Caption QC HD MOV 2CA                                                   2636.917

我要如何在熊猫中实现这一目标?

1 个答案:

答案 0 :(得分:1)

df.groupby('TargetDescription')['Output_media_duration'].sum().reset_index(name ='format_duration')