Question

这对我来说很难。我有3个月的数据（最多1米的数据），我的data.frame

中有2列

Date_Time                Number
12/1/2015 12:00:01 AM    92222222
12/1/2015 12:00:29 AM    32211111
12/1/2015 12:00:41 AM    22333333
12/1/2015 12:00:43 AM    12222222
.....                    .....
12/1/2015 9:00:02 AM     92222222
12/2/2015 12:00:02 AM    32211111

如何计算列中每个值的出现次数/频率＆＃34; Number＆＃34;在24小时的时间范围内。

以上示例的预期结果

92222222 Freq: 2
32211111 Freq: 2
22333333 Freq: 1
12222222 Freq: 1

EDIT
24小时的时间范围是指24小时的间隔。从午夜到午夜并不意味着什么。例如，如果有人今天下午5点拨打电话，并在第二天下午3点再次拨打电话，那么这应该算作2

编辑2：更清楚的是，此分析的目的是了解呼叫中心24小时窗口期间的重复呼叫次数。

例如，客户在2016年1月1日下午1:32:01从联系电话01101111致电＆安培;然后在2016年1月1日下午1:59:43再次致电。最后称为第二天2016年1月2日下午12:21:02 它被认为是0110111的频率是＆＃34; 3＆＃34;因为这个数字在不到24小时内重复了3次。

Answer 1

根据您的评论，对于任何号码，期间的开头是该号码最早的电话。以下是注释代码：

library(lubridate)                                                              
library(dplyr)          

calls <- structure(list(Date_Time = structure(1:6, .Label = c("12/1/2015 12:00:01 AM", 
"12/1/2015 12:00:29 AM", "12/1/2015 12:00:41 AM", "12/1/2015 12:00:43 AM", 
"12/1/2015 9:00:02 AM", "12/2/2015 12:00:02 AM"), class = "factor"), 
    Number = structure(c(4L, 3L, 2L, 1L, 4L, 3L), .Label = c("12222222", 
    "22333333", "32211111", "92222222"), class = "factor")), .Names = c("Date_Time", 
"Number"), row.names = c(NA, -6L), class = "data.frame")


count_freq <- function(timestamps){                                             
    #Given all the ocurrences of calls from a number find the 
    #earliest one and count how many occur within 24 hours
    dtime <- sort(mdy_hms(timestamps))                                            
    start_time <- dtime[1]                                                        
    end_time <- start_time + hours(24)                                            
    sum(dtime >= start_time & dtime <= end_time)                                  
}


out <- group_by(calls, Number) %>% 
       summarise(freq = count_freq(Date_Time))

Answer 2

这是另一种在24小时内输出每行数字频率的方法，但很可能比tfc更慢。

df<-read.table(header = TRUE, sep=",", text="Date_Time,  Number
              12/1/2015 12:00:01 AM,    92222222
               12/1/2015 12:00:29 AM,    32211111
               12/1/2015 12:00:41 AM,    22333333
               12/1/2015 12:00:43 AM,    12222222
               12/1/2015 9:00:02 AM,     92222222
               12/2/2015 12:00:02 AM,    32211111")

df$Date_Time<-as.POSIXct(df$Date_Time, format="%m/%d/%Y %I:%M:%S %p")

library(dplyr)
ncount<-function(x){
  target<-x[2]
  starttime<-as.POSIXct(x[1], format="%Y-%m-%d %H:%M:%S")
  endtime<-starttime+ 24*60*60  #1 day later
  nrow(filter(df, Number==target & Date_Time>=starttime & Date_Time<=endtime))
}

df$freq<-apply(df, 1, function(x){ncount(x)} )

计算R

2 个答案: