Ad-Slot Id Click Time Click IP
0 208878 2017-03-23 18:30:00.059 2405:204:c:3868:f27d:db2c:e2a9:c90c
1 195915 2017-03-23 18:30:00.107 2405:204:4183:6939:d3c2:bf40:ed47:3a6d
2 129192 2017-03-23 18:30:00.309 2405:204:900a:5700:cd84:2eec:449e:6af6
3 195987 2017-03-23 18:30:00.311 27.62.33.203
4 209078 2017-03-23 18:30:00.523 182.65.23.82
5 206706 2017-03-23 18:30:00.637 2405:205:1308:f499:b1d1:931a:2266:a738
6 210917 2017-03-23 18:30:01.136 42.106.17.94
7 236944 2017-03-23 18:30:01.226 171.61.19.146
8 195980 2017-03-23 18:30:01.331 2405:204:4088:1b4d::17ac:38ac
我有以上数据摘要,需要找出每个发布商的每单位时间(1分钟,5分钟,1小时)的点击次数(广告位ID)。
答案 0 :(得分:0)
注意:我修改了您的数据代码段,每秒只有一些Ad-slot Id,仅用于测试。所以我的输出将与你的不同。
数据摘录:
Ad-Slot_Id Click Time Click_IP
0 208878 2017-03-23 18:30:00.059 2405:204:c:3868:f27d:db2c:e2a9:c90c
1 236944 2017-03-23 18:30:00.107 2405:204:4183:6939:d3c2:bf40:ed47:3a6d
2 129192 2017-03-23 18:30:00.309 2405:204:900a:5700:cd84:2eec:449e:6af6
3 129192 2017-03-23 18:30:00.311 27.62.33.203
4 236944 2017-03-23 18:30:00.523 182.65.23.82
5 206706 2017-03-23 18:30:00.637 2405:205:1308:f499:b1d1:931a:2266:a738
6 129192 2017-03-23 18:30:01.136 42.106.17.94
7 236944 2017-03-23 18:30:01.226 171.61.19.146
8 129192 2017-03-23 18:30:01.331 2405:204:4088:1b4d::17ac:38ac
我按Ad-Slot Id对DataFrame进行分组,然后重新采样每秒/分钟/您想要的任何内容,并计算对象数量:
df = pd.read_clipboard()
df.index = pd.to_datetime(df['Click'] + ' ' + df['Time'])
resampletime = 's'
for theid, thedf in df.groupby(by=['Ad-Slot_Id'], axis=0):
print theid
print thedf.resample(resampletime, how='count')['Ad-Slot_Id']
这可以帮助你顺利上路。