Pandas从类别总和中创建百分比

时间:2017-11-09 15:22:06

标签: pandas multi-index pandas-groupby

我有一个网站流量数据集,在一个月内有大约2000个网站,按流量来源的设备类型列出:

In [12]: df.sample(10)
Out[12]:
         date                  device  nb_uniq_visitors  site_id
11 2017-10-31                     Tv               0.0   3331.0
6  2017-10-22            Car browser               0.0    503.0
7  2017-10-22                 Camera               0.0   3259.0
7  2017-10-08            Car browser               0.0    630.0
3  2017-10-23                 Camera               0.0    118.0
0  2017-10-12                Desktop               1.0   4769.0
11 2017-10-31                     Tv               0.0    361.0
5  2017-10-12                Phablet               0.0   2999.0
9  2017-10-17  Portable media player               0.0   1725.0
0  2017-10-13                Desktop            2410.0   1004.0
4  2017-10-13                all                 900.0   1271.0

请注意,all列的device类别代表所有设备的总和,因此它可以作为百分比计算的分母。

我想看看每个网站的设备类型的百分比,我想象输出看起来像这样(我手动计算下面的例子):

                               nb_uniq_visitors
site_id device
74.0    Camera                         0.00
        Car browser                    0.00
        Console                        0.00
        Desktop                        0.56
        Feature phone                  0.00
        Phablet                        0.01
        Portable media player          0.00
        Smart display                  0.00
        Smartphone                     0.37
        Tablet                         0.05
        Tv                             0.00
        Unknown                        0.00
        all                            1.00
96.0    Camera                         0.00
        Car browser                    0.00
        Console                        0.00
        Desktop                        0.64
        Feature phone                  0.00
        Phablet                        0.01
        Portable media player          0.00
        Smart display                  0.00
        Smartphone                     0.29
        Tablet                         0.06
        Tv                             0.00
        Unknown                        0.01
        all                            1.00

我使用groupbysite_iddevice分组:

In [23]: sl = df.groupby(['site_id', 'device']).sum()

In [24]: sl.head(25)
Out[24]:
                               nb_uniq_visitors
site_id device
74.0    Camera                              0.0
        Car browser                         0.0
        Console                             1.0
        Desktop                         10534.0
        Feature phone                       0.0
        Phablet                           178.0
        Portable media player               4.0
        Smart display                       0.0
        Smartphone                       6955.0
        Tablet                           1022.0
        Tv                                  1.0
        Unknown                            62.0
        all                             18757.0
96.0    Camera                              0.0
        Car browser                         2.0
        Console                             6.0
        Desktop                        118157.0
        Feature phone                       0.0
        Phablet                          1061.0
        Portable media player              73.0
        Smart display                       0.0
        Smartphone                      53292.0
        Tablet                          11060.0
        Tv                                  2.0
        Unknown                          1717.0
        all                            185370.0

如何将上述值从汇总值转换为百分比?还是完全有更好的方法?

1 个答案:

答案 0 :(得分:2)

使用DataFrame.xs分隔DataFrame.div选择的all行:

sl = df.groupby(['site_id', 'device']).sum()
a = sl.div(sl.xs('all', level=1))
print (a)
                               nb_uniq_visitors
site_id device                                 
74.0    Camera                         0.000000
        Car browser                    0.000000
        Console                        0.000053
        Desktop                        0.561604
        Feature phone                  0.000000
        Phablet                        0.009490
        Portable media player          0.000213
        Smart display                  0.000000
        Smartphone                     0.370795
        Tablet                         0.054486
        Tv                             0.000053
        Unknown                        0.003305
        all                            1.000000
96.0    Camera                         0.000000
        Car browser                    0.000011
        Console                        0.000032
        Desktop                        0.637412
        Feature phone                  0.000000
        Phablet                        0.005724
        Portable media player          0.000394
        Smart display                  0.000000
        Smartphone                     0.287490
        Tablet                         0.059664
        Tv                             0.000011
        Unknown                        0.009263
        all                            1.000000

详情:

print (sl.xs('all', level=1))
         nb_uniq_visitors
site_id                  
74.0              18757.0
96.0             185370.0