我有以下数据框:
mydf <- data.frame(Date.Start = as.Date(c("2015-09-01", "2015-09-10")),
Date.End = as.Date(c("2017-09-10", "2020-09-15")),
Number.of.Years = c(3, 6),
stringsAsFactors = FALSE)
# Date.Start Date.End Number.of.Years
#1 2015-09-01 2017-09-10 3
#2 2015-09-10 2020-09-15 6
我正在尝试每年将数据框“炸毁”一次:
# Date.Start Date.End Number.of.Years Year
#1 2015-09-01 2017-09-10 3 2015
#1 2015-09-01 2017-09-10 3 2016
#1 2015-09-01 2017-09-10 3 2017
#2 2017-09-10 2020-09-15 6 2015
#2 2017-09-10 2020-09-15 6 2016
#2 2017-09-10 2020-09-15 6 2017
#2 2017-09-10 2020-09-15 6 2018
#2 2017-09-10 2020-09-15 6 2019
#2 2017-09-10 2020-09-15 6 2020
所以我尝试了以下操作:
library(splitstackshape)
library(dplyr)
library(lubridate)
expandRows(mydf, "Number.of.Years", drop = FALSE) %>%
group_by(Date.Start, Date.End) %>%
mutate(Date = seq(year(first(Date.Start)),
year(first(Date.End)),
by = 1))
但是出现以下错误:
Error in mutate_impl(.data, dots) :
Column `Date` must be length 6 (the group size) or one, not 4
上面的代码出了什么问题?
如果我尝试将其更改为天数(来自stackoverflow上的另一篇文章),则可以正常运行:
mydf <- data.frame(Date.Start = as.Date(c("2015-09-01", "2015-09-10")),
Date.End = as.Date(c("2015-09-03", "2015-09-15")),
Number.of.Days = c(3, 6),
stringsAsFactors = FALSE)
library(splitstackshape)
library(dplyr)
library(lubridate)
expandRows(mydf, "Number.of.Days", drop = FALSE) %>%
group_by(Date.Start, Date.End) %>%
mutate(Date = seq(first(Date.Start),
first(Date.End),
by = 1))
# A tibble: 9 x 4
# Groups: Date.Start, Date.End [2]
# Date.Start Date.End Number.of.Days Date
# <date> <date> <dbl> <date>
#1 2015-09-01 2015-09-03 3 2015-09-01
#2 2015-09-01 2015-09-03 3 2015-09-02
#3 2015-09-01 2015-09-03 3 2015-09-03
#4 2015-09-10 2015-09-15 6 2015-09-10
#5 2015-09-10 2015-09-15 6 2015-09-11
#6 2015-09-10 2015-09-15 6 2015-09-12
#7 2015-09-10 2015-09-15 6 2015-09-13
#8 2015-09-10 2015-09-15 6 2015-09-14
#9 2015-09-10 2015-09-15 6 2015-09-15
答案 0 :(得分:0)
如果您必须使用不同的年份,这就是为什么它在您的第一种情况下不起作用的原因(2015年与2020年-您要填充6行,并且您尝试将序列保留在2015年-2020年之间,因此会出现错误),然后我们可以使用组大小n()
来创建序列,即
library(tidyverse)
library(splitstackshape)
expandRows(mydf, "Number.of.Years", drop = FALSE) %>%
group_by(grp = cumsum(!duplicated(paste0(Date.Start, Date.End)))) %>%
mutate(Date = seq(first(Date.Start), (first(Date.Start)+n()-1), by = 1))
给出,
# A tibble: 9 x 5 # Groups: grp [2] Date.Start Date.End Number.of.Years grp Date <date> <date> <dbl> <int> <date> 1 2015-09-01 2015-09-03 3 1 2015-09-01 2 2015-09-01 2015-09-03 3 1 2015-09-02 3 2015-09-01 2015-09-03 3 1 2015-09-03 4 2017-09-10 2020-09-15 6 2 2017-09-10 5 2017-09-10 2020-09-15 6 2 2017-09-11 6 2017-09-10 2020-09-15 6 2 2017-09-12 7 2017-09-10 2020-09-15 6 2 2017-09-13 8 2017-09-10 2020-09-15 6 2 2017-09-14 9 2017-09-10 2020-09-15 6 2 2017-09-15
答案 1 :(得分:0)
我自己解决了这个问题。我不知道的数据质量似乎是个问题。
因此,如果您执行group_by,则必须确保没有重复的行具有samen功能,但具有不同的date.start和/或date.end。