这是我的数据:
> str(heard2)
'data.frame': 616 obs. of 3 variables:
$ DateTime : POSIXct, format: "2017-07-26 22:28:10" "2017-07-26 22:31:18" "2017-07-26 22:32:18" ...
$ Transmitter: int 30759 30759 30759 30759 30759 30759 30759 30759 30759 30759 ...
$ Station : Factor w/ 35 levels "TRA1-69","TRA2-69",..: 21 21 21 21 21 22 21 22 21 22 ...
> dput(heard2[c(37:47),])
structure(list(DateTime = structure(c(1501109904, 1501109950,
1501109953, 1501110005, 1501110008, 1501110053, 1501110056, 1501110105,
1501110108, 1501110166, 1501110169), class = c("POSIXct", "POSIXt"
), tzone = "GMT"), Transmitter = c(30759L, 30759L, 30759L, 30759L,
30759L, 30759L, 30759L, 30759L, 30759L, 30759L, 30759L), Station = structure(c(21L,
22L, 21L, 22L, 21L, 22L, 21L, 22L, 21L, 22L, 21L), .Label = c("TRA1-69",
"TRA2-69", "TRA3-69", "TRA4-69", "TRA5-69", "TRA6-69", "TRA7-69",
"TRA8-69", "TRB1-69", "TRB2-69", "TRB3-69", "TRB4-69", "TRB5-69",
"TRB6-69", "TRC1-69", "TRC2-69", "TRC3-69", "TRC4-69", "TRC5-69",
"TRC6-69", "TRD1-69", "TRD2-69", "TRE1-69", "TRE2-69", "TRE3-69",
"TRE4-69", "TRE5-69", "TRF1-69", "TRF2-69", "TRF3-69", "TRF4-69",
"TRG1-69", "TRG2-69", "TRG3-69", "TRG4-69"), class = "factor")), row.names = 45:55, class = "data.frame")
datetime列是指在特定站点上检测到发射机的时间。这些检测大多数间隔30-60秒或更长时间。如何选择与仅相隔8秒(或更短)的时间相对应的行?
答案 0 :(得分:1)
编辑:如果您需要较早的记录,稍后的记录或同时包含这两个记录,则原始请求不清楚。这将产生两个记录。
创建一个变量用于向前检查(lead
),然后创建一个变量用于向后检查(lag
)。然后使用filter
确定是否要第一个(设置dist_lead == 8
),第二个(设置dist_lag == 8
)或两者(设置filter(dist_lead == 8 | dist_lag == 8)
)。
k %>%
mutate(dist_lead = ifelse(lead(Transmitter) == Transmitter,difftime(lead(DateTime),
DateTime, units = "secs"), NA),
dist_lag = ifelse(lag(Transmitter) == Transmitter, difftime(DateTime, lag(DateTime),
units = "secs"), NA)) %>%
filter(dist_lead == 8)
答案 1 :(得分:1)
这是使用dplyr
的一种方法。带有timediff <= 8
的每一行都与上面的一行成对。
heard2 %>%
arrange(DateTime) %>%
mutate(
timediff = c(NA_real_, diff(DateTime))
) %>%
filter(timediff <= 8 | lead(timediff) <= 8)
DateTime Transmitter Station timediff
1 2017-07-26 22:59:10 30759 TRD2-69 46
2 2017-07-26 22:59:13 30759 TRD1-69 3
3 2017-07-26 23:00:05 30759 TRD2-69 52
4 2017-07-26 23:00:08 30759 TRD1-69 3
5 2017-07-26 23:00:53 30759 TRD2-69 45
6 2017-07-26 23:00:56 30759 TRD1-69 3
7 2017-07-26 23:01:45 30759 TRD2-69 49
8 2017-07-26 23:01:48 30759 TRD1-69 3
9 2017-07-26 23:02:46 30759 TRD2-69 58
10 2017-07-26 23:02:49 30759 TRD1-69 3
答案 2 :(得分:0)
一种方法是对表本身进行交叉联接,然后根据时间差进行过滤。请注意,这也会返回相同记录的对。
library(dplyr)
heard2$tmp = 1
dplyr::full_join(heard2, heard2, by = 'tmp') %>%
filter(abs(DateTime.x -DateTime.y) <= 8/60) %>%
select(-tmp)
答案 3 :(得分:0)
如果我做对了,而您只需要标记相隔8秒的行,则有一个简单的解决方案:
library(tidyverse)
mutate(heard2, Grp = cut(DateTime, '9 sec', F, F)) %>%
semi_join(count(., Grp) %>% filter(n > 1))
# DateTime Transmitter Station Grp
# 1 2017-07-26 22:59:10 30759 TRD2-69 6
# 2 2017-07-26 22:59:13 30759 TRD1-69 6
# 3 2017-07-26 23:00:05 30759 TRD2-69 12
# 4 2017-07-26 23:00:08 30759 TRD1-69 12
# 5 2017-07-26 23:00:53 30759 TRD2-69 17
# 6 2017-07-26 23:00:56 30759 TRD1-69 17
# 7 2017-07-26 23:01:45 30759 TRD2-69 23
# 8 2017-07-26 23:01:48 30759 TRD1-69 23
# 9 2017-07-26 23:02:46 30759 TRD2-69 30
# 10 2017-07-26 23:02:49 30759 TRD1-69 30