我的日期范围由两个变量(id
和type
)分组,这两个变量当前存储在名为data
的数据框中。我的目标是扩展日期范围,以便我在日期范围内每天都有一行,其中包含相同的id
和type
。
以下是重现数据框示例的代码段:
data <- structure(list(id = c(1, 1, 1, 1, 1, 2, 2, 2, 2, 2), type = c("a",
"a", "b", "c", "b", "a", "c", "d", "e", "f"), from = structure(c(1235199600,
1235545200, 1235545200, 1235631600, 1235631600, 1242712800, 1242712800,
1243058400, 1243058400, 1243231200), class = c("POSIXct", "POSIXt"
), tzone = ""), to = structure(c(1235372400, 1235545200, 1235631600,
1235890800, 1236236400, 1242712800, 1243058400, 1243231200, 1243144800,
1243576800), class = c("POSIXct", "POSIXt"), tzone = "")), .Names = c("id",
"type", "from", "to"), row.names = c(700L, 753L, 2941L, 2178L,
2959L, 679L, 2185L, 12L, 802L, 1796L), class = "data.frame")
这是数据集的直观表示:
id type from to
1 a 2009-02-21 2009-02-23
1 a 2009-02-25 2009-02-25
1 b 2009-02-25 2009-02-26
1 c 2009-02-25 2009-03-01
1 b 2009-05-26 2009-03-05
2 a 2009-05-26 2009-05-19
2 c 2009-05-19 2009-05-23
2 d 2009-05-19 2009-05-25
2 e 2009-05-23 2009-05-24
2 f 2009-05-25 2009-05-29
以下是预期结果的直观表示:
id type date
1 a 2009-02-21
1 a 2009-02-22
1 a 2009-02-23
1 b 2009-02-25
1 b 2009-02-26
1 c 2009-02-26
1 c 2009-02-27
1 c 2009-02-28
1 c 2009-03-01
...
2 f 2009-05-25
2 f 2009-05-26
2 f 2009-05-27
2 f 2009-05-28
2 f 2009-05-29
我发现了几个类似的帖子(link和link),这些帖子有助于我找到一个起点。我试图使用plyr解决方案:
data2 <- adply(data, 1, summarise, date = seq(data$from, data$to))[c('id', 'type')]
但是,这会导致错误:
Error: 'from' must be of length 1
我还尝试使用data.table解决方案:
data[, list(date = seq(from, to)), by = c('id', 'type')]
然而,这给了我一个不同的错误:
Error in `[.data.frame`(data, , list(date = seq(from, to)), by = c("id", :
unused argument (by = c("id", "type"))
对于如何解决这些错误(或使用不同方法)的任何想法都将不胜感激。
答案 0 :(得分:7)
1)以下是使用来自R基础的by
的三行答案。首先,我们将日期转换为"Date"
类给出data2
。然后我们应用f
来完成每一行的实际工作,最后我们将rbind
生成的行放在一起:
data2 <- transform(data, from = as.Date(from), to = as.Date(to))
f <- function(x) with(x, data.frame(id, type, date = seq(from, to, by = "day")))
do.call("rbind", by(data, 1:nrow(data), f))
2)data.table 使用与data.table相同的data2
,我们这样做:
library(data.table)
dt <- data.table(data2)
dt[, list(id, type, date = seq(from, to, by = "day")), by = 1:nrow(dt)]
2a)data.table 或者其中dt
来自(2)而f
来自(1):
dt[, f(.SD), by = 1:nrow(dt)]
使用dplyr 3)dplyr 会发出警告但在data2
和f
来自(1)的情况下无效:
data2 %>% rowwise() %>% do(f(.))
更新一些改进。
答案 1 :(得分:0)
以下是使用基本函数执行此类转换的一种方法
do.call(rbind,Map(function(id,type,from,to) {
dts <- seq(from=from, to=to, by="1 day")
dur <- length(dts)
data.frame(
id=rep(id, dur),
type=rep(type,dur),
date=dts
)
}, data$id, data$type, data$from, data$to))
输出的第一个chunck是
id type date
1 1 a 2009-02-21 02:00:00
2 1 a 2009-02-22 02:00:00
3 1 a 2009-02-23 02:00:00
4 1 a 2009-02-25 02:00:00
5 1 b 2009-02-25 02:00:00
6 1 b 2009-02-26 02:00:00
7 1 c 2009-02-26 02:00:00
8 1 c 2009-02-27 02:00:00
9 1 c 2009-02-28 02:00:00
10 1 c 2009-03-01 02:00:00
11 1 b 2009-02-26 02:00:00