这是我的时间间隔数据
dates <- seq(from=as.Date("1980-01-01"), to=as.Date("1980-01-31"), by = 'day')
my_data <-
data.frame(from_date = c(dates[1], dates[15], dates[20], dates[30]),
to_date = c(dates[14], dates[19], dates[22], dates[31]),
id = c(1, 1, 2, 3))
很少有观察结果连续,它们会在下一个开始前一天结束。
my_data$is_continued <- c(TRUE, TRUE, FALSE, FALSE)
my_data
from_date to_date id is_continued
1 1980-01-01 1980-01-14 1 TRUE
2 1980-01-15 1980-01-19 1 TRUE
3 1980-01-20 1980-01-22 2 FALSE
4 1980-01-30 1980-01-31 3 FALSE
现在我想简化我的表格。我想要一个观察而不是两个(或两个以上):
id
也就是说,我想要这个结果
desired_result <-
data.frame(from_date = c(dates[1], dates[20], dates[30]),
to_date = c(dates[19], dates[22], dates[31]),
id = c(1, 2, 3))
最好的方法是什么?
答案 0 :(得分:0)
循环:
groupContinous <- function(df) {
df <- df[order(df$from_date),]
df$continuing <- c(FALSE, c(df$from_date[2:nrow(df)] - df$to_date[1:(nrow(df))-1] == 1))
membership <- 1
for (i in 2:nrow(df)) {
if (df$continuing[i] == TRUE) {
membership <- c(membership, max(membership))
} else {
membership <- c(membership, max(membership)+1)
}
}
df$membership <- membership
require(dplyr)
df_simp <-
df %>%
dplyr::group_by(id, membership) %>%
dplyr::summarize(from_date = min(from_date),
to_date = max(to_date))
df_simp$membership <- NULL
return(as.data.frame(df_simp))
}