我有一个网站流量数据集,在一个月内有大约2000个网站,按流量来源的设备类型列出:
In [12]: df.sample(10)
Out[12]:
date device nb_uniq_visitors site_id
11 2017-10-31 Tv 0.0 3331.0
6 2017-10-22 Car browser 0.0 503.0
7 2017-10-22 Camera 0.0 3259.0
7 2017-10-08 Car browser 0.0 630.0
3 2017-10-23 Camera 0.0 118.0
0 2017-10-12 Desktop 1.0 4769.0
11 2017-10-31 Tv 0.0 361.0
5 2017-10-12 Phablet 0.0 2999.0
9 2017-10-17 Portable media player 0.0 1725.0
0 2017-10-13 Desktop 2410.0 1004.0
4 2017-10-13 all 900.0 1271.0
请注意,all
列的device
类别代表所有设备的总和,因此它可以作为百分比计算的分母。
我想看看每个网站的设备类型的百分比,我想象输出看起来像这样(我手动计算下面的例子):
nb_uniq_visitors
site_id device
74.0 Camera 0.00
Car browser 0.00
Console 0.00
Desktop 0.56
Feature phone 0.00
Phablet 0.01
Portable media player 0.00
Smart display 0.00
Smartphone 0.37
Tablet 0.05
Tv 0.00
Unknown 0.00
all 1.00
96.0 Camera 0.00
Car browser 0.00
Console 0.00
Desktop 0.64
Feature phone 0.00
Phablet 0.01
Portable media player 0.00
Smart display 0.00
Smartphone 0.29
Tablet 0.06
Tv 0.00
Unknown 0.01
all 1.00
我使用groupby
按site_id
和device
分组:
In [23]: sl = df.groupby(['site_id', 'device']).sum()
In [24]: sl.head(25)
Out[24]:
nb_uniq_visitors
site_id device
74.0 Camera 0.0
Car browser 0.0
Console 1.0
Desktop 10534.0
Feature phone 0.0
Phablet 178.0
Portable media player 4.0
Smart display 0.0
Smartphone 6955.0
Tablet 1022.0
Tv 1.0
Unknown 62.0
all 18757.0
96.0 Camera 0.0
Car browser 2.0
Console 6.0
Desktop 118157.0
Feature phone 0.0
Phablet 1061.0
Portable media player 73.0
Smart display 0.0
Smartphone 53292.0
Tablet 11060.0
Tv 2.0
Unknown 1717.0
all 185370.0
如何将上述值从汇总值转换为百分比?还是完全有更好的方法?
答案 0 :(得分:2)
使用DataFrame.xs
分隔DataFrame.div
选择的all
行:
sl = df.groupby(['site_id', 'device']).sum()
a = sl.div(sl.xs('all', level=1))
print (a)
nb_uniq_visitors
site_id device
74.0 Camera 0.000000
Car browser 0.000000
Console 0.000053
Desktop 0.561604
Feature phone 0.000000
Phablet 0.009490
Portable media player 0.000213
Smart display 0.000000
Smartphone 0.370795
Tablet 0.054486
Tv 0.000053
Unknown 0.003305
all 1.000000
96.0 Camera 0.000000
Car browser 0.000011
Console 0.000032
Desktop 0.637412
Feature phone 0.000000
Phablet 0.005724
Portable media player 0.000394
Smart display 0.000000
Smartphone 0.287490
Tablet 0.059664
Tv 0.000011
Unknown 0.009263
all 1.000000
详情:
print (sl.xs('all', level=1))
nb_uniq_visitors
site_id
74.0 18757.0
96.0 185370.0