我有一个如下所示的数据框:
date timestamp transfer ID IP Address Username Encryption File Bytes Speed DateTimeStamp
1 20160525 08:22:06.838 F798256B 10.199.194.38:57708 wei2dt - "" 264 "1.62 seconds (1.30 kilobits/sec)" 20160525 08:22:06.838
2 20160525 08:28:26.920 F798256C 10.19.105.15:57708 wei2dt - "isi_audit_log.dmp-sv.tmp" 69 "0.29 seconds (1.93 kilobits/sec)" 20160525 08:28:26.920
3 20160525 08:28:26.923 F798256D 10.19.105.15:57708 wei2dt - "isi_audit_log.dmp-sv.met" 0 "Unable to stat isi_audit_log.dmp-sv.met: No such file or directory" 20160525 08:28:26.923
4 20160525 08:28:26.933 F798256E 10.19.105.15:57708 wei2dt - "CG0009 1364_GT_report.txt" 34 "0.01 seconds (34.0 kilobits/sec)" 20160525 08:28:26.933
我想计算某个时间在线的用户(用户名)数量。基本上,我想每隔五分钟检查一下有多少用户是活跃的。我需要使用DateTimestamp列来创建我的间隔,并将其用作计算该时间段内不同用户数量的条件。我已经尝试使用while循环来做某种事情,但它没有用。关于我应该怎么做,有什么建议吗?
答案 0 :(得分:1)
使用dplyr
df %>% mutate(timeInt=cut(DateTimeStamp,breaks="5 min")) %>%
group_by(timeInt) %>% summarise(numberUniqueUsers=length(unique(Username)))