它似乎很容易,但经过很长一段时间的搜索和尝试,我没有得到它:
我有一个时间序列列表,一个简短的再现示例:
a <- seq(as.Date("1970-01-01"), as.Date("1970-01-05"), "days")
b <- seq(as.Date("1985-10-01"), as.Date("1985-10-05"), "days")
c <- seq(as.Date("2014-03-01"), as.Date("2014-03-05"), "days")
d <- c(a, b, c)
df1 <- data.frame(d)
colnames(df1) <- c("date")
e <- seq(as.Date("1975-01-01"), as.Date("1975-01-05"), "days")
f <- seq(as.Date("1990-10-01"), as.Date("1990-10-05"), "days")
g <- c(e, f)
df2 <- data.frame(g)
colnames(df2) <- c("date")
ll <- list(df1, df2)
现在我想将列出的data.frames子集化为:
> llsubset
[[1]]
date
1 1970-01-01
2 1970-01-05
3 1985-10-01
4 1985-10-05
5 2014-03-01
6 2014-03-05
[[2]]
date
1 1975-01-01
2 1975-01-05
3 1990-10-01
4 1990-10-05
我已经rollapply
尝试了它,但它不起作用且不值得一看。也许你可以帮帮我?谢谢!
答案 0 :(得分:3)
确定哪些点与之前的差异超过1天,并且从该构造开始,逻辑在每个序列的末尾为TRUE,在其他位置为FALSE。由它子集。没有包使用。
lapply(ll, subset, { dif <- diff(date) > 1; c(TRUE, dif) | c(dif, TRUE) } )
,并提供:
[[1]]
date
1 1970-01-01
5 1970-01-05
6 1985-10-01
10 1985-10-05
11 2014-03-01
15 2014-03-05
[[2]]
date
1 1975-01-01
5 1975-01-05
6 1990-10-01
10 1990-10-05
答案 1 :(得分:1)
也许是这样的?使用cumsum
和diff
创建一个组变量,然后对日期进行子集化(假设您要查找每个连续时间段内的最小和最大日期,date
按升序排序事先订购):
library(dplyr)
lapply(ll, function(df) {
df %>%
group_by(cumsum(c(TRUE, diff(date) != 1))) %>%
slice(c(1, n())) %>%
ungroup() %>%
select(date) }
)
#[[1]]
# A tibble: 6 × 1
# date
# <date>
#1 1970-01-01
#2 1970-01-05
#3 1985-10-01
#4 1985-10-05
#5 2014-03-01
#6 2014-03-05
#[[2]]
# A tibble: 4 × 1
# date
# <date>
#1 1975-01-01
#2 1975-01-05
#3 1990-10-01
#4 1990-10-05
答案 2 :(得分:0)
可能有一个包正是如此,但我还不知道它的名字。
在日期上使用diff()
可以突出显示哪些日期之间只有一天,如下所示:
diff(df1$date)
Time differences in days
[1] 1 1 1 1 5748 1 1 1 1 10374 1
[12] 1 1 1
我们可以使用它。
end_finder <- function(x) {
# find the gap between dates.
# mark dates where the diff > 1,
# also mark the entry prior to that one;
# this will be the end of the previous date.
# also include the first and last element.
diff_dates <- c(100,diff(x$dates))
diff_idx <- which(diff_dates > 1)
diff_idx <- c((diff_idx -1 ), diff_idx)
# remove any elements < 1
diff_idx <- diff_idx[diff_idx >= 1 ]
# include the first element
diff_idx <- c(1, diff_idx)
# include the last element
diff_idx <- c(diff_idx, length(x$date))
# remove duplicates and sort for easier reading
diff_idx <- sort(unique(diff_idx))
x$dates[diff_idx]
}
现在运行。
> lapply(ll, end_finder)
[[1]]
[1] "1970-01-01" "1970-01-05" "1985-10-01" "1985-10-05" "2014-03-01"
[6] "2014-03-05"
[[2]]
[1] "1975-01-01" "1975-01-05" "1990-10-01" "1990-10-05"
答案 3 :(得分:0)
使用dplyr
的另一种解决方案:首先我们计算每个日期的年份,并且每年我们找到最小和最大日期
分别使用来自lubridate和reshape2包的year
和melt
函数
library(dplyr)
library(lubridate)
library(reshape2)
ll <- list(df1, df2)
fn_endPoint_Years = function(DF) {
newDF = DF %>%
mutate(Year=year(date)) %>%
group_by(Year) %>%
do(.,data.frame(minDate=min(.$date),maxDate=max(.$date) )) %>%
melt(id="Year",value.name = "date") %>%
arrange(date) %>%
select(date)
}
lapply(ll,fn_endPoint_Years)
# [[1]]
# date
# 1 1970-01-01
# 2 1970-01-05
# 3 1985-10-01
# 4 1985-10-05
# 5 2014-03-01
# 6 2014-03-05
# [[2]]
# date
# 1 1975-01-01
# 2 1975-01-05
# 3 1990-10-01
# 4 1990-10-05