我有以下数据。我想找到连续的秒数,然后仅选择较大的Value2。如果您查看下面的内容,那么我们会在4月13日04:25连续3秒发生事件。现在我只想选择一个较大的值2,即04:25:18。
time Value1 Value2
2018-04-13 01:19:04 0.09760860 68.41634
2018-04-13 01:20:10 0.24585245 32.94790
2018-04-13 01:21:16 0.24487727 28.99412
2018-04-13 01:54:06 0.22994130 37.63333
2018-04-13 03:27:17 0.11139787 83.40588
2018-04-13 03:36:20 0.04642794 102.15588
2018-04-13 03:37:39 0.04144001 109.93137
2018-04-13 03:38:17 0.03933649 106.77124
2018-04-13 04:04:15 0.27627418 42.60554
2018-04-13 04:13:24 0.11536228 65.87941
2018-04-13 04:13:25 0.14011963 66.10706
2018-04-13 04:13:46 0.09159499 70.96471
2018-04-13 04:24:27 0.03760945 120.97294
2018-04-13 04:24:39 0.02905284 116.59853
2018-04-13 04:24:41 0.02751022 116.32059
2018-04-13 04:24:42 0.03271061 116.60840
2018-04-13 04:24:43 0.02836884 116.32471
2018-04-13 04:25:09 0.02983106 117.32745
2018-04-13 04:25:18 0.03332321 118.45747
2018-04-13 04:25:19 0.03218042 117.61882
2018-04-13 04:25:20 0.02625636 118.06667
预期输出如下:
time Value1 Value2
2018-04-13 01:19:04 0.09760860 68.41634
2018-04-13 01:20:10 0.24585245 32.94790
2018-04-13 01:21:16 0.24487727 28.99412
2018-04-13 01:54:06 0.22994130 37.63333
2018-04-13 03:27:17 0.11139787 83.40588
2018-04-13 03:36:20 0.04642794 102.15588
2018-04-13 03:37:39 0.04144001 109.93137
2018-04-13 03:38:17 0.03933649 106.77124
2018-04-13 04:04:15 0.27627418 42.60554
2018-04-13 04:13:25 0.14011963 66.10706
2018-04-13 04:13:46 0.09159499 70.96471
2018-04-13 04:24:27 0.03760945 120.97294
2018-04-13 04:24:39 0.02905284 116.59853
2018-04-13 04:24:42 0.03271061 116.60840
2018-04-13 04:25:09 0.02983106 117.32745
2018-04-13 04:25:18 0.03332321 118.45747
我正在尝试使用RLE对象。然后找到连续的秒并在其中找到最大值。但是,我收效甚微。
答案 0 :(得分:1)
这是dplyr
的一种方法,当行之间的连续差异大于1时,我们创建新的组,然后从每个组中选择max
Value2
。
library(dplyr)
df %>%
mutate(time = as.POSIXct(time)) %>%
group_by(group = cumsum(time - lag(time, default = first(time)) != 1)) %>%
slice(which.max(Value2)) %>%
ungroup() %>%
select(-group)
# A tibble: 16 x 3
# time Value1 Value2
# <dttm> <dbl> <dbl>
# 1 2018-04-13 01:19:04 0.0976 68.4
# 2 2018-04-13 01:20:10 0.246 32.9
# 3 2018-04-13 01:21:16 0.245 29.0
# 4 2018-04-13 01:54:06 0.230 37.6
# 5 2018-04-13 03:27:17 0.111 83.4
# 6 2018-04-13 03:36:20 0.0464 102.
# 7 2018-04-13 03:37:39 0.0414 110.
# 8 2018-04-13 03:38:17 0.0393 107.
# 9 2018-04-13 04:04:15 0.276 42.6
#10 2018-04-13 04:13:25 0.140 66.1
#11 2018-04-13 04:13:46 0.0916 71.0
#12 2018-04-13 04:24:27 0.0376 121.
#13 2018-04-13 04:24:39 0.0291 117.
#14 2018-04-13 04:24:42 0.0327 117.
#15 2018-04-13 04:25:09 0.0298 117.
#16 2018-04-13 04:25:18 0.0333 118.
数据
df <- structure(list(time = structure(1:21, .Label = c("2018-04-13 01:19:04",
"2018-04-13 01:20:10", "2018-04-13 01:21:16", "2018-04-13 01:54:06",
"2018-04-13 03:27:17", "2018-04-13 03:36:20", "2018-04-13 03:37:39",
"2018-04-13 03:38:17", "2018-04-13 04:04:15", "2018-04-13 04:13:24",
"2018-04-13 04:13:25", "2018-04-13 04:13:46", "2018-04-13 04:24:27",
"2018-04-13 04:24:39", "2018-04-13 04:24:41", "2018-04-13 04:24:42",
"2018-04-13 04:24:43", "2018-04-13 04:25:09", "2018-04-13 04:25:18",
"2018-04-13 04:25:19", "2018-04-13 04:25:20"), class = "factor"),
Value1 = c(0.0976086, 0.24585245, 0.24487727, 0.2299413,
0.11139787, 0.04642794, 0.04144001, 0.03933649, 0.27627418,
0.11536228, 0.14011963, 0.09159499, 0.03760945, 0.02905284,
0.02751022, 0.03271061, 0.02836884, 0.02983106, 0.03332321,
0.03218042, 0.02625636), Value2 = c(68.41634, 32.9479, 28.99412,
37.63333, 83.40588, 102.15588, 109.93137, 106.77124, 42.60554,
65.87941, 66.10706, 70.96471, 120.97294, 116.59853, 116.32059,
116.6084, 116.32471, 117.32745, 118.45747, 117.61882, 118.06667
)), class = "data.frame", row.names = c(NA, -21L))