假设我有以下数据集:
library(dplyr)
name <- c("b", "a", "a", "b","b","a", "b", "c", "c", "c", "c", "a")
class <- c(0, 1, 0, 1, 1, 1, 0, 0, 0, 1, 1, 1)
date <- c("10-06-2018", "11-06-2018", "12-06-2018", "13-06-2018", "14-06-2018", "15-06-2018", "16-06-2018","17-06-2018", "18-06-2018", "19-06-2018", "20-06-2018", "21-06-2018")
dates <- as.Date(date, "%d/%m/%Y")
df <- data.frame(name, class, date)
df <- df %>%
group_by(name) %>%
arrange(date) %>%
ungroup() %>%
arrange(name)
我想过滤数据集,以便对于每个名称组,我具有班级0的最小日期和班级0之后的班级1的最小日期。在这种情况下,我将:
df.new <- df[c(2,3,5,6,9,11), ]
答案 0 :(得分:0)
可能有一个更简洁的解决方案,但以下是一种解决方法
#split into two dataframes
# find the min dates for class == 0
df0 <- df %>%
filter(class == 0) %>%
group_by(name) %>%
summarise(dates0 = min(dates))
# find min date of class == 1 that is coming after class == 0
# and join the two dataframes
df1 <- df %>%
filter(class == 1) %>%
select(-class) %>%
left_join(df0, by = 'name')
# keep only the relevant dates
df1 <- df1 %>%
mutate(dates1 = ifelse(dates > dates0, 1, 0)) %>%
filter(dates1 != 0) %>%
group_by(name) %>%
summarise(dates = min(dates)) %>%
mutate(class = 1)
# combine the two dataframes into one with the correct dates
df <- df0 %>%
mutate(class = 0) %>%
rename(dates = dates0) %>%
bind_rows(df1) %>%
group_by(name) %>%
arrange(dates) %>%
ungroup() %>%
arrange(name)