查找每单位时间的点击次数

时间:2017-06-03 09:59:31

标签: python-2.7 pandas sklearn-pandas

  Ad-Slot Id   Click Time                    Click IP
0   208878    2017-03-23 18:30:00.059   2405:204:c:3868:f27d:db2c:e2a9:c90c
1   195915  2017-03-23 18:30:00.107   2405:204:4183:6939:d3c2:bf40:ed47:3a6d
2   129192  2017-03-23 18:30:00.309   2405:204:900a:5700:cd84:2eec:449e:6af6
3   195987  2017-03-23 18:30:00.311   27.62.33.203
4   209078  2017-03-23 18:30:00.523   182.65.23.82
5   206706  2017-03-23 18:30:00.637   2405:205:1308:f499:b1d1:931a:2266:a738
6   210917  2017-03-23 18:30:01.136   42.106.17.94
7   236944  2017-03-23 18:30:01.226   171.61.19.146
8   195980  2017-03-23 18:30:01.331   2405:204:4088:1b4d::17ac:38ac

我有以上数据摘要,需要找出每个发布商的每单位时间(1分钟,5分钟,1小时)的点击次数(广告位ID)。

1 个答案:

答案 0 :(得分:0)

注意:我修改了您的数据代码段,每秒只有一些Ad-slot Id,仅用于测试。所以我的输出将与你的不同。

数据摘录:

 Ad-Slot_Id   Click Time                    Click_IP
0   208878    2017-03-23 18:30:00.059   2405:204:c:3868:f27d:db2c:e2a9:c90c
1   236944  2017-03-23 18:30:00.107   2405:204:4183:6939:d3c2:bf40:ed47:3a6d
2   129192  2017-03-23 18:30:00.309   2405:204:900a:5700:cd84:2eec:449e:6af6
3   129192  2017-03-23 18:30:00.311   27.62.33.203
4   236944  2017-03-23 18:30:00.523   182.65.23.82
5   206706  2017-03-23 18:30:00.637   2405:205:1308:f499:b1d1:931a:2266:a738
6   129192  2017-03-23 18:30:01.136   42.106.17.94
7   236944  2017-03-23 18:30:01.226   171.61.19.146
8   129192  2017-03-23 18:30:01.331   2405:204:4088:1b4d::17ac:38ac

我按Ad-Slot Id对DataFrame进行分组,然后重新采样每秒/分钟/您想要的任何内容,并计算对象数量:

df = pd.read_clipboard()
df.index = pd.to_datetime(df['Click'] + ' ' + df['Time'])
resampletime = 's'
for theid, thedf in df.groupby(by=['Ad-Slot_Id'], axis=0):
    print theid
    print thedf.resample(resampletime, how='count')['Ad-Slot_Id']

这可以帮助你顺利上路。