Question

我的数据框如下：

country   day     value

AE        1        23
AE        2        30
AE        3        21
AE        4        3
BD        1        2
BD        2        23
...       ..       ..
BD        22       23

我想从2020-08-01开始日期2020-08-21开始将日期列填充到我的数据框中每组。这是我的尝试：

values = seq(from = as.Date("2020-08-01"), to = as.Date("2020-08-21"), by = 'day')
df<- df %>% group_by(country) %>% mutate(date=values)

但是它不能给我正确的结果。

这是我想要的结果：

国庆节起息日

AE        1        23      2020-08-01
AE        2        30      2020-08-02
AE        3        21      2020-08-03
AE        4        3       2020-08-04
BD        1        2       2020-08-01
BD        2        23      2020-08-02
...       ..       ..
BD        21       23      2020-08-21

请让我知道如何解决此问题。这是错误：

Error: Problem with `mutate()` input `date`.
x Input `date` can't be recycled to size 23.
ℹ Input `date` is `seq(...)`.
ℹ Input `date` must be size 23 or 1, not 23.
ℹ The error occured in group 22: country = "CU".
Run `rlang::last_error()` to see where the error occurred.

Answer 1

问题在于创建的“值”没有任何分组。我们可以做一个group_by，并在每个“国家”中创建seq的“日期”，并指定length.out

library(dplyr)
df %>%
    group_by(country) %>%
    mutate(date=seq(from = as.Date("2020-08-01"), length.out = n(), 
          by = 'day'))

在大型数据集中，可能有不同的“国家”以具有不同的频率数。因此，最好使用length.out而不是to选项

如果“国家”的长度与“值”的长度都相同，并且长度相同，那么我们无需创建group_by，则可以rep进行“值”的组合

df %>%
    mutate(date = rep(values, length.out = sum(county == first(country))))

使用R将日期序列添加到数据框

1 个答案: