我正在尝试创建一个基于开始日期差异的数据集。结束日期。作为一个例子
Name Start_Date End_Date
Alice 1-1-2017 3-1-2017
John 4-3-2017 5-3-2017
Peter 12-3-2017 12-3-2017
因此,最终数据集将包括开始日期,结束日期以及差异。最终它应该看起来像
Name Date
Alice 1-1-2017
Alice 2-1-2017
Alice 3-1-2017
John 4-3-2017
John 5-3-2017
Peter 12-3-2017
每一个帮助都是很好的帮助。谢谢 !
答案 0 :(得分:0)
我们可以使用Map
来获取序列,并使用melt
list
来获取data.frame`
df1[-1] <- lapply(df1[-1], as.Date, format = "%d-%m-%Y")
lst <- setNames(Map(function(x, y) seq(x, y, by = "1 day"),
df1$Start_Date, df1$End_Date), df1$Name)
library(reshape2)
melt(lst)[2:1]
df1 <- structure(list(Name = c("Alice", "John", "Peter"), Start_Date = structure(c(17167,
17229, 17237), class = "Date"), End_Date = structure(c(17169,
17230, 17237), class = "Date")), .Names = c("Name", "Start_Date",
"End_Date"), row.names = c(NA, -3L), class = "data.frame")
答案 1 :(得分:0)
这使用了splitstackshape包中的expandRows函数:
df = df %>%
mutate(days_between = as.numeric(End_Date - Start_Date),
id = row_number(Name)) %>%
expandRows("days_between") %>%
group_by(id) %>%
mutate(Date = seq(first(Start_Date),
first(End_Date) - 1,
by = 1)) %>%
ungroup()
答案 2 :(得分:0)
使用for循环:
library(data.table)
library(foreach)
library(lubridate)
setDT(df)
names = df[, unique(Name)]
l = foreach(i = 1:length(names)) %do% {
# make a date sequence per name
d = df[Name == names[i], ]
s = seq(from = dmy(d$Start_Date), to = dmy(d$End_Date), by = "days")
# bind the results in a data.table
dx = data.table(name = rep(names[i], length(s)))
dx = cbind(dx, date = s)
}
rbindlist(l)