我有一个数据框" DF"像这样:
Flight.Start Flight.End Device Partner Creative Days.in.Flight
2015-08-31 2015-08-31 Standard MSN Video 35
我需要做的是"吹嘘"像这样:
Flight.Start Flight.End Date Device Partner Creative Days.in.Flight
2015-08-31 2015-10-04 2015-08-31 Standard MSN Video 35
2015-08-31 2015-10-04 2015-09-01 Standard MSN Video 35
2015-08-31 2015-10-04 2015-09-02 Standard MSN Video 35
2015-08-31 2015-10-04 2015-09-03 Standard MSN Video 35
2015-08-31 2015-10-04 2015-09-04 Standard MSN Video 35
2015-08-31 2015-10-04 2015-09-05 Standard MSN Video 35
2015-08-31 2015-10-04 2015-09-06 Standard MSN Video 35
2015-08-31 2015-10-04 2015-09-07 Standard MSN Video 35
ETC ......直到Date变量达到2015-10-04,然后转到下一个重复
基本上每一行都会被飞行天数 - 1 重复(因为已经存在的行可以占用一天中的一天,然后是一个新列"日期"填写该航班的相关日期。因此,如果一行的开始和结束日期分别为9/1和9/5,则4个重复的行将附加到已存在的行,一个新的将创建列(日期),并且原始行的航班起始日期和结束日期的日期顺序将填写列值。
所有日期值都被格式化为日期,飞行天数是一个数字,其余的是因子。
修改
回复重复的问题标记:
为了澄清,这不像被标记为重复的情况,因为我的问题并不是真正关注如何根据飞行天数复制(我已经知道如何做到这一点!),而是如何然后,我可以将列添加到该输出数据框,并在相应的航班期内依次插入日期。谢谢你们抬头......
答案 0 :(得分:7)
以下是splitstackshape
和dplyr
的一种方法。使用expandRows()
包中的splitstackshape
,您可以按照描述扩展数据框。然后,您想使用mutate()
添加一系列日期。我所做的是按Flight.Start
和Flight.End
的组合对数据进行分组,并使用seq()
为每个组创建一个日期序列。 first()
正在使用Flight.Start
和Flight.End
的第一个元素。通过这种方式,您可以创建所需的序列。我希望这会对你有所帮助。
数据和代码
mydf <- data.frame(Flight.Start = as.Date(c("2015-09-01", "2015-09-10")),
Flight.End = as.Date(c("2015-09-03", "2015-09-15")),
Device = "Standard",
Creative = "Video",
Days.in.Flight = c(3, 6),
stringsAsFactors = FALSE)
# Flight.Start Flight.End Device Creative Days.in.Flight
#1 2015-09-01 2015-09-03 Standard Video 3
#2 2015-09-10 2015-09-15 Standard Video 6
library(splitstackshape)
library(dplyr)
expandRows(mydf, "Days.in.Flight", drop = FALSE) %>%
group_by(Flight.Start, Flight.End) %>%
mutate(Date = seq(first(Flight.Start),
first(Flight.End),
by = 1))
# Flight.Start Flight.End Device Creative Days.in.Flight Date
# (date) (date) (chr) (chr) (dbl) (date)
#1 2015-09-01 2015-09-03 Standard Video 3 2015-09-01
#2 2015-09-01 2015-09-03 Standard Video 3 2015-09-02
#3 2015-09-01 2015-09-03 Standard Video 3 2015-09-03
#4 2015-09-10 2015-09-15 Standard Video 6 2015-09-10
#5 2015-09-10 2015-09-15 Standard Video 6 2015-09-11
#6 2015-09-10 2015-09-15 Standard Video 6 2015-09-12
#7 2015-09-10 2015-09-15 Standard Video 6 2015-09-13
#8 2015-09-10 2015-09-15 Standard Video 6 2015-09-14
#9 2015-09-10 2015-09-15 Standard Video 6 2015-09-15
答案 1 :(得分:5)
或者使用data.table
,我们会转换&#39; data.frame&#39;到&#39; data.table&#39; (setDT(mydf)
),按照&#39; Days.in.Flight&#39;复制行序列,根据该索引,我们对数据集(.SD[rep(...
)进行子集,按&#39;分组。 Flight.Start&#39;和&#39; Flight.End&#39;,我们创建了&#39;日期&#39;列。
library(data.table)
setDT(mydf)[, .SD[rep(1:.N, Days.in.Flight)]][,
Date:= seq(Flight.Start , Flight.End, by = '1 day'),
by = .(Flight.Start, Flight.End)][]
答案 2 :(得分:1)
以下是基础R的方法:
mydf <- data.frame(Flight.Start = as.Date(c("2015-09-01", "2015-09-10")),
Flight.End = as.Date(c("2015-09-03", "2015-09-15")),
Device = "Standard",
Creative = "Video",
Days.in.Flight = c(3, 6),
stringsAsFactors = FALSE)
expanded <-mydf[rep(row.names(mydf), mydf$ Days.in.Flight), ]
data.frame(expanded,Date=expanded$Flight.Start+(sequence(mydf$Days.in.Flight)-1))
> data.frame(expanded,Date=expanded$Flight.Start+(sequence(mydf$Days.in.Flight)-1))
Flight.Start Flight.End Device Creative Days.in.Flight Date
1 2015-09-01 2015-09-03 Standard Video 3 2015-09-01
1.1 2015-09-01 2015-09-03 Standard Video 3 2015-09-02
1.2 2015-09-01 2015-09-03 Standard Video 3 2015-09-03
2 2015-09-10 2015-09-15 Standard Video 6 2015-09-10
2.1 2015-09-10 2015-09-15 Standard Video 6 2015-09-11
2.2 2015-09-10 2015-09-15 Standard Video 6 2015-09-12
2.3 2015-09-10 2015-09-15 Standard Video 6 2015-09-13
2.4 2015-09-10 2015-09-15 Standard Video 6 2015-09-14
2.5 2015-09-10 2015-09-15 Standard Video 6 2015-09-15