mutate_impl(.data,点)中的错误:列..............的长度必须为...(组大小),或者不是一个

时间:2019-03-01 14:31:05

标签: r group-by dplyr

我有以下数据框:

mydf <- data.frame(Date.Start = as.Date(c("2015-09-01", "2015-09-10")),
                   Date.End = as.Date(c("2017-09-10", "2020-09-15")),
                   Number.of.Years = c(3, 6),
                   stringsAsFactors = FALSE)

#  Date.Start   Date.End Number.of.Years
#1 2015-09-01 2017-09-10               3
#2 2015-09-10 2020-09-15               6

我正在尝试每年将数据框“炸毁”一次:

#  Date.Start   Date.End Number.of.Years  Year
#1 2015-09-01 2017-09-10               3  2015
#1 2015-09-01 2017-09-10               3  2016
#1 2015-09-01 2017-09-10               3  2017
#2 2017-09-10 2020-09-15               6  2015
#2 2017-09-10 2020-09-15               6  2016
#2 2017-09-10 2020-09-15               6  2017
#2 2017-09-10 2020-09-15               6  2018
#2 2017-09-10 2020-09-15               6  2019
#2 2017-09-10 2020-09-15               6  2020

所以我尝试了以下操作:

library(splitstackshape)
library(dplyr)
library(lubridate)

expandRows(mydf, "Number.of.Years", drop = FALSE) %>%
  group_by(Date.Start, Date.End) %>%
  mutate(Date = seq(year(first(Date.Start)),
                    year(first(Date.End)),
                    by = 1))

但是出现以下错误:

Error in mutate_impl(.data, dots) : 
  Column `Date` must be length 6 (the group size) or one, not 4

上面的代码出了什么问题?

如果我尝试将其更改为天数(来自stackoverflow上的另一篇文章),则可以正常运行:

mydf <- data.frame(Date.Start = as.Date(c("2015-09-01", "2015-09-10")),
                   Date.End = as.Date(c("2015-09-03", "2015-09-15")),
                   Number.of.Days = c(3, 6),
                   stringsAsFactors = FALSE)

library(splitstackshape)
library(dplyr)
library(lubridate)

expandRows(mydf, "Number.of.Days", drop = FALSE) %>%
  group_by(Date.Start, Date.End) %>%
  mutate(Date = seq(first(Date.Start),
                    first(Date.End),
                    by = 1))

# A tibble: 9 x 4
# Groups:   Date.Start, Date.End [2]
#  Date.Start Date.End   Number.of.Days Date      
#  <date>     <date>              <dbl> <date>    
#1 2015-09-01 2015-09-03              3 2015-09-01
#2 2015-09-01 2015-09-03              3 2015-09-02
#3 2015-09-01 2015-09-03              3 2015-09-03
#4 2015-09-10 2015-09-15              6 2015-09-10
#5 2015-09-10 2015-09-15              6 2015-09-11
#6 2015-09-10 2015-09-15              6 2015-09-12
#7 2015-09-10 2015-09-15              6 2015-09-13
#8 2015-09-10 2015-09-15              6 2015-09-14
#9 2015-09-10 2015-09-15              6 2015-09-15

2 个答案:

答案 0 :(得分:0)

如果您必须使用不同的年份,这就是为什么它在您的第一种情况下不起作用的原因(2015年与2020年-您要填充6行,并且您尝试将序列保留在2015年-2020年之间,因此会出现错误),然后我们可以使用组大小n()来创建序列,即

library(tidyverse)
library(splitstackshape)

expandRows(mydf, "Number.of.Years", drop = FALSE) %>% 
   group_by(grp = cumsum(!duplicated(paste0(Date.Start, Date.End)))) %>% 
   mutate(Date = seq(first(Date.Start), (first(Date.Start)+n()-1), by = 1))

给出,

# A tibble: 9 x 5
# Groups:   grp [2]
  Date.Start Date.End   Number.of.Years   grp Date      
  <date>     <date>               <dbl> <int> <date>    
1 2015-09-01 2015-09-03               3     1 2015-09-01
2 2015-09-01 2015-09-03               3     1 2015-09-02
3 2015-09-01 2015-09-03               3     1 2015-09-03
4 2017-09-10 2020-09-15               6     2 2017-09-10
5 2017-09-10 2020-09-15               6     2 2017-09-11
6 2017-09-10 2020-09-15               6     2 2017-09-12
7 2017-09-10 2020-09-15               6     2 2017-09-13
8 2017-09-10 2020-09-15               6     2 2017-09-14
9 2017-09-10 2020-09-15               6     2 2017-09-15

答案 1 :(得分:0)

我自己解决了这个问题。我不知道的数据质量似乎是个问题。

因此,如果您执行group_by,则必须确保没有重复的行具有samen功能,但具有不同的date.start和/或date.end。