如何仅选择具有特定时差的相关行?

时间:2019-05-16 17:39:56

标签: r datetime

这是我的数据:

> str(heard2)
'data.frame':   616 obs. of  3 variables:
 $ DateTime   : POSIXct, format: "2017-07-26 22:28:10" "2017-07-26 22:31:18" "2017-07-26 22:32:18" ...
 $ Transmitter: int  30759 30759 30759 30759 30759 30759 30759 30759 30759 30759 ...
 $ Station    : Factor w/ 35 levels "TRA1-69","TRA2-69",..: 21 21 21 21 21 22 21 22 21 22 ...
> dput(heard2[c(37:47),])
structure(list(DateTime = structure(c(1501109904, 1501109950, 
1501109953, 1501110005, 1501110008, 1501110053, 1501110056, 1501110105, 
1501110108, 1501110166, 1501110169), class = c("POSIXct", "POSIXt"
), tzone = "GMT"), Transmitter = c(30759L, 30759L, 30759L, 30759L, 
30759L, 30759L, 30759L, 30759L, 30759L, 30759L, 30759L), Station = structure(c(21L, 
22L, 21L, 22L, 21L, 22L, 21L, 22L, 21L, 22L, 21L), .Label = c("TRA1-69", 
"TRA2-69", "TRA3-69", "TRA4-69", "TRA5-69", "TRA6-69", "TRA7-69", 
"TRA8-69", "TRB1-69", "TRB2-69", "TRB3-69", "TRB4-69", "TRB5-69", 
"TRB6-69", "TRC1-69", "TRC2-69", "TRC3-69", "TRC4-69", "TRC5-69", 
"TRC6-69", "TRD1-69", "TRD2-69", "TRE1-69", "TRE2-69", "TRE3-69", 
"TRE4-69", "TRE5-69", "TRF1-69", "TRF2-69", "TRF3-69", "TRF4-69", 
"TRG1-69", "TRG2-69", "TRG3-69", "TRG4-69"), class = "factor")), row.names = 45:55, class = "data.frame")

datetime列是指在特定站点上检测到发射机的时间。这些检测大多数间隔30-60秒或更长时间。如何选择与仅相隔8秒(或更短)的时间相对应的行?

4 个答案:

答案 0 :(得分:1)

编辑:如果您需要较早的记录,稍后的记录或同时包含这两个记录,则原始请求不清楚。这将产生两个记录。

创建一个变量用于向前检查(lead),然后创建一个变量用于向后检查(lag)。然后使用filter确定是否要第一个(设置dist_lead == 8),第二个(设置dist_lag == 8)或两者(设置filter(dist_lead == 8 | dist_lag == 8))。

k %>% 
mutate(dist_lead = ifelse(lead(Transmitter) == Transmitter,difftime(lead(DateTime), 
                         DateTime, units = "secs"), NA),
dist_lag = ifelse(lag(Transmitter) == Transmitter, difftime(DateTime, lag(DateTime), 
units = "secs"), NA)) %>% 
filter(dist_lead == 8)

答案 1 :(得分:1)

这是使用dplyr的一种方法。带有timediff <= 8的每一行都与上面的一行成对。

heard2 %>% 
  arrange(DateTime) %>% 
  mutate(
    timediff = c(NA_real_, diff(DateTime))
  ) %>% 
  filter(timediff <= 8 | lead(timediff) <= 8)

              DateTime Transmitter Station timediff
1  2017-07-26 22:59:10       30759 TRD2-69       46
2  2017-07-26 22:59:13       30759 TRD1-69        3
3  2017-07-26 23:00:05       30759 TRD2-69       52
4  2017-07-26 23:00:08       30759 TRD1-69        3
5  2017-07-26 23:00:53       30759 TRD2-69       45
6  2017-07-26 23:00:56       30759 TRD1-69        3
7  2017-07-26 23:01:45       30759 TRD2-69       49
8  2017-07-26 23:01:48       30759 TRD1-69        3
9  2017-07-26 23:02:46       30759 TRD2-69       58
10 2017-07-26 23:02:49       30759 TRD1-69        3

答案 2 :(得分:0)

一种方法是对表本身进行交叉联接,然后根据时间差进行过滤。请注意,这也会返回相同记录的对。

    library(dplyr)
    heard2$tmp = 1
    dplyr::full_join(heard2, heard2, by = 'tmp') %>% 
      filter(abs(DateTime.x -DateTime.y) <= 8/60) %>% 
      select(-tmp)

答案 3 :(得分:0)

如果我做对了,而您只需要标记相隔8秒的行,则有一个简单的解决方案:

library(tidyverse)

mutate(heard2, Grp = cut(DateTime, '9 sec', F, F)) %>%
  semi_join(count(., Grp) %>% filter(n > 1))

#               DateTime Transmitter Station Grp
# 1  2017-07-26 22:59:10       30759 TRD2-69   6
# 2  2017-07-26 22:59:13       30759 TRD1-69   6
# 3  2017-07-26 23:00:05       30759 TRD2-69  12
# 4  2017-07-26 23:00:08       30759 TRD1-69  12
# 5  2017-07-26 23:00:53       30759 TRD2-69  17
# 6  2017-07-26 23:00:56       30759 TRD1-69  17
# 7  2017-07-26 23:01:45       30759 TRD2-69  23
# 8  2017-07-26 23:01:48       30759 TRD1-69  23
# 9  2017-07-26 23:02:46       30759 TRD2-69  30
# 10 2017-07-26 23:02:49       30759 TRD1-69  30