我在R中有一个名为info的数据框,其中包含Date列下的几个日期,它们以“%Y-%m-%d”排序我希望只有那些相隔不到6天的值且删除“异常值”,任何人都知道如何做到这一点?
数据框的样子
'> info
Date ens seps
3 1951-01-08 mem01 2
4 1951-01-12 mem01 4
37 1959-12-08 mem01 4
42 1959-12-30 mem01 3
43 1960-01-01 mem01 2
47 1961-01-03 mem01 2
49 1961-01-18 mem01 2
50 1961-01-20 mem01 2
62 1964-11-29 mem01 4
93 1971-02-12 mem01 2
99 1972-02-15 mem01 2
100 1972-02-18 mem01 3
102 1972-02-21 mem01 2
119 1981-10-16 mem01 3
121 1981-10-19 mem01 2
131 1984-12-24 mem01 2
134 1987-01-02 mem01 2
答案 0 :(得分:0)
如果我正确理解了这个问题,那么你可以试试
library(dplyr)
df %>%
arrange(Date) %>%
mutate(date_diff = as.numeric(Date - lag(Date))) %>%
filter(date_diff < 6 | lead(date_diff) < 6) %>%
select(-date_diff)
输出为:
Date ens seps
1 1951-01-08 mem01 2
2 1951-01-12 mem01 4
3 1959-12-30 mem01 3
4 1960-01-01 mem01 2
5 1961-01-18 mem01 2
6 1961-01-20 mem01 2
7 1972-02-15 mem01 2
8 1972-02-18 mem01 3
9 1972-02-21 mem01 2
10 1981-10-16 mem01 3
11 1981-10-19 mem01 2
示例数据:
df <- structure(list(Date = structure(c(-6933, -6929, -3677, -3655,
-3653, -3285, -3270, -3268, -1859, 407, 775, 778, 781, 4306,
4309, 5471, 6210), class = "Date"), ens = c("mem01", "mem01",
"mem01", "mem01", "mem01", "mem01", "mem01", "mem01", "mem01",
"mem01", "mem01", "mem01", "mem01", "mem01", "mem01", "mem01",
"mem01"), seps = c(2L, 4L, 4L, 3L, 2L, 2L, 2L, 2L, 4L, 2L, 2L,
3L, 2L, 3L, 2L, 2L, 2L)), .Names = c("Date", "ens", "seps"), row.names = c("3",
"4", "37", "42", "43", "47", "49", "50", "62", "93", "99", "100",
"102", "119", "121", "131", "134"), class = "data.frame")
答案 1 :(得分:0)
使用基数R的可能性如下。
inx <- c(TRUE, diff(info$Date) < 6)
new_info <- info[inx, ]
new_info
# Date ens seps
#3 1951-01-08 mem01 2
#4 1951-01-12 mem01 4
#43 1960-01-01 mem01 2
#50 1961-01-20 mem01 2
#100 1972-02-18 mem01 3
#102 1972-02-21 mem01 2
#121 1981-10-19 mem01 2