我有一个带有三个变量的长格式表; id,日期和一个因子变量。
dates <- (seq.Date(from = as.Date(c("2015-02-01")),
to = as.Date(c("2016-01-01")),
by = "month") - 1)
data <- data.frame("date" = rep(dates, 2),
"id" = rep(c(1, 2), each = 12),
"grade" = c(rep("Z", 4), rep("T", 3), rep("R", 5),
rep("T", 2), rep("R", 3), rep("T", 7)))
我想得到一张这样的桌子
id start date fin date grade
1 2015-01-31 2015-04-30 Z
1 2015-05-31 2015-07-31 T
1 2015-08-31 2015-12-31 R
2 2015-01-31 2015-02-28 T
2 2015-03-31 2015-05-31 R
2 2015-06-30 2015-12-31 T
我使用dplry软件包以及基本的R函数尝试了以下代码,但没有任何尝试产生我想要的结果。
1st attempt
data %>% group_by(id, grade) %>%
summarize(Min_val = min(date), Max_val = max(date))
2nd attempt
first <- with(data, by(data, list(id, grade), head, n=1))
last <- with(data, by(data, list(id, grade), tail, n=1))
highestd <- do.call("rbind", as.list(first))
lowestd <- do.call("rbind", as.list(last))
data.f <- cbind(highestd[, c("id", "date")], lowestd[, c("date", "grade")])
colnames(data.f) <- c("id", "start.date", "fin.date", "grade")
data.f <- data.f[order(data.f$id, data.f$start.date),]
data.f
答案 0 :(得分:1)
一种dplyr
可能是:
data %>%
group_by(id, grade, rleid = with(rle(grade), rep(seq_along(lengths), lengths))) %>%
summarise(start_date = min(date),
fin_date = max(date)) %>%
arrange(rleid) %>%
ungroup() %>%
select(-rleid)
id grade start_date fin_date
<dbl> <chr> <date> <date>
1 1 Z 2015-01-31 2015-04-30
2 1 T 2015-05-31 2015-07-31
3 1 R 2015-08-31 2015-12-31
4 2 T 2015-01-31 2015-02-28
5 2 R 2015-03-31 2015-05-31
6 2 T 2015-06-30 2015-12-31
它只是在“成绩”列周围创建行程类型组ID。
与rleid()
中的data.table
相同:
data %>%
group_by(id, grade, rleid = rleid(grade)) %>%
summarise(start_date = min(date),
fin_date = max(date)) %>%
arrange(rleid) %>%
ungroup() %>%
select(-rleid)