如何计算特定时间段内的事件数量

时间:2016-02-20 00:20:56

标签: r datetime plyr seq

我正在尝试计算" df2"中的事件数量(每行是一个事件)。在" df1"定义的时间段内。我能够在大约5分钟的整个时间段内完成此操作,但是我想将时间段分解为更小的块(1分钟)并进行相同的计算

df1<- structure(list(Location = 1:10, Lattitude = c(57.140532, 57.140527, 
57.13959, 57.13974, 57.14059, 57.14058, 57.1398, 57.13989, 57.14158, 
57.14386), t_in = structure(c(1455626730, 1455627326, 1455628122, 
1455628644, 1455629174, 1455629708, 1455630230, 1455630765, 1455631396, 
1455631931), class = c("POSIXct", "POSIXt"), tzone = ""), t_out = structure(c(1455627047, 
1455627615, 1455628462, 1455628933, 1455629486, 1455630015, 1455630552, 
1455631070, 1455631719, 1455632242), class = c("POSIXct", "POSIXt"
), tzone = "")), .Names = c("Location", "Lattitude", "t_in", 
"t_out"), class = "data.frame", row.names = c(NA, -10L))

df2<- structure(list(date.time = structure(c(1455630964, 1455630976, 
1455630987, 1455630998, 1455631009, 1455631021, 1455631032, 1455631043, 
1455631054, 1455631066, 1455631077, 1455631088, 1455631099, 1455631111, 
1455631423, 1455631446, 1455631479, 1455631502, 1455631569, 1455631772
), class = c("POSIXct", "POSIXt"), tzone = ""), code = structure(c(2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L), .Label = c("1003", "32221"), class = "factor"), 
rec_id = structure(c(2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = c("301976", 
"301978", "301985", "301988"), class = "factor"), Lattitude = c("57.14066", 
"57.14066", "57.14066", "57.14066", "57.14066", "57.14066", 
"57.14066", "57.14066", "57.14066", "57.14066", "57.14066", 
"57.14066", "57.14066", "57.14066", "57.141869", "57.141869", 
"57.141869", "57.141869", "57.141869", "57.141869"), Longitude = c("2.075702", 
"2.075702", "2.075702", "2.075702", "2.075702", "2.075702", 
"2.075702", "2.075702", "2.075702", "2.075702", "2.075702", 
"2.075702", "2.075702", "2.075702", "2.081576", "2.081576", 
"2.081576", "2.081576", "2.081576", "2.081576"), Location = list(
    8, 8, 8, 8, 8, 8, 8, 8, 8, 8, NA, NA, NA, NA, 9, 9, 9, 
    9, 9, NA)), .Names = c("date.time", "code", "rec_id", 
"Lattitude", "Longitude", "Location"), row.names = 94:113, class = "data.frame")

如果df2中的date.time位于df1 $ t_in和df1 $ t_out之间,则函数从df1返回位置。这可能看起来很圆,但可以使用此代码进行后续计算

ids <- as.numeric(df1$Location)
f <- function(x){
  a <- ids[ (df1$t_in < x) & (x < df1$t_out) ]
  if (length(a) == 0) NA else a
}   

df2$Location <- lapply(df2$date.time, f)

以上返回一个列表,因此需要将其转换为数字。 一点点但不能绕过它

df2$Location<- paste(df2$Location)
df2$Location<- as.numeric(df2$Location)

然后删除NA,因为它们位于df1中定义的时间段之外,因此无关紧要。

df2<-df2[!is.na(df2$Location),]

然后计算每个rec_id和位置

的事件数(即每行)
library (plyr)
df3 <- ddply(df2, c("rec_id","Location"), function(df){data.frame (detections=nrow(df))})

  rec_id Location detections
1 301976        9          5
2 301978        8         10

...完美!

但是我希望在较短的时间内完成这项工作。每分钟都是准确的。并且周期应该从每个位置的t_in(df1)开始直到t_out(df1)。我可以在excel中做很多工作,但肯定可以在R中自动化(它是一个大型数据集)。

所以最终我可以计算df1中t_in和t_out之间每1分钟时间段内每个位置的事件数(nrow)

例如(仅视觉示例而非实际数据):

  rec_id Location  minute(or period) detections
 301976        9             1           1
 301976        9             2           2
 301976        9             3           0
 301976        9             4           0
 301976        9             5           2
 301978        8             1           4
 301978        8             2           3
 301978        8             3           1
 301978        8             4           0
 301978        8             5           2

我可以从第一个位置创建间隔,但我不确定如何进一步应用

seq(from = head(df1$t_in,1), to = head(df1$t_out,1) , by = "mins")

1 个答案:

答案 0 :(得分:1)

我认为以下内容可用于生成包含序列拆分输出的新var f={ "Housing": 0, "Late Comers": 0, "Income cut": "12500", "Study Allowance": 0, "test": 0, "i": 0, "staff_no": "9", "staff_name": "Abeja Vicky", "staff_department": "Production Staff", "staff_position": "Production Manager", "staff_salary": "0", "GrossValue": 12500, "GrossSalary": 0, "NSSF": 0, "PAYE": 0, "GrossValueAddiotion": 0, "GrossValueDecuction": 12500, "netPay": -12500, "Balance": 0, "Paidx": 0, "balance": -12500 }; function removeWhiteSpace(obj) { if (typeof obj !== "object") return obj; for (var prop in obj) { if (obj.hasOwnProperty(prop)) { obj[prop.replace(" ", "_")] = removeWhiteSpace(obj[prop]); if (prop.indexOf(" ") > -1) { delete obj[prop]; } } } return obj; } var ob=removeWhiteSpace(f); 数据框,然后您可以使用新的df1应用上面的步骤。

他们可能会合并,但我只是想确保它实际上能满足你的需求。

首先,我们扩展原始数据框中的时间间隔,并生成扩展期间的列表。 df1中的每一行都成为列表中的元素。

df1

然后我们将序列列表转换为数据帧(两列)

res1 <- sapply(1:nrow(df1), function(i) {
                 seq(from = df1$t_in[i], to = df1$t_out[i] , by = "mins")})

最后我们将所有内容合并在一起

res2 <- lapply(res1, function(x) { 
                 data.frame(t_in = x[1:(length(x)-1)], t_out=x[2:length(x)]) })

然后(调整你的代码)

df1v2 <- Reduce(function(...) merge(..., all=T), res2)

产生

ids <- seq_len(nrow(df1v2))
f <- function(x){
  a <- ids[ (df1v2$t_in < x) & (x < df1v2$t_out) ]
  if (length(a) == 0) NA else a
}   

df2$Location <- lapply(df2$date.time, f)

我不确定边界检查是否正确(修改 date.time code rec_id Lattitude Longitude Location 94 2016-02-16 14:56:04 32221 301978 57.14066 2.075702 37 95 2016-02-16 14:56:16 32221 301978 57.14066 2.075702 37 96 2016-02-16 14:56:27 32221 301978 57.14066 2.075702 37 97 2016-02-16 14:56:38 32221 301978 57.14066 2.075702 37 98 2016-02-16 14:56:49 32221 301978 57.14066 2.075702 38 99 2016-02-16 14:57:01 32221 301978 57.14066 2.075702 38 100 2016-02-16 14:57:12 32221 301978 57.14066 2.075702 38 101 2016-02-16 14:57:23 32221 301978 57.14066 2.075702 38 102 2016-02-16 14:57:34 32221 301978 57.14066 2.075702 38 103 2016-02-16 14:57:46 32221 301978 57.14066 2.075702 NA 104 2016-02-16 14:57:57 32221 301978 57.14066 2.075702 NA 105 2016-02-16 14:58:08 32221 301978 57.14066 2.075702 NA 106 2016-02-16 14:58:19 32221 301978 57.14066 2.075702 NA 107 2016-02-16 14:58:31 32221 301978 57.14066 2.075702 NA 108 2016-02-16 15:03:43 32221 301976 57.141869 2.081576 39 109 2016-02-16 15:04:06 32221 301976 57.141869 2.081576 39 110 2016-02-16 15:04:39 32221 301976 57.141869 2.081576 40 111 2016-02-16 15:05:02 32221 301976 57.141869 2.081576 40 112 2016-02-16 15:06:09 32221 301976 57.141869 2.081576 41 113 2016-02-16 15:09:32 32221 301976 57.141869 2.081576 NA ),但看起来好像是你得到的。加速有多重要?