在时间序列数据中添加缺失的日期

时间:2020-06-06 07:25:39

标签: r missing-data

我有2008年至2020年的随机日期及其对应的值

Date                    Val
September 16, 2012       32
September 19, 2014       33
January 05, 2008         26
June 07, 2017            02
December 15, 2019        03
May 28, 2020             18

我想填写2008年1月1日至2020年3月31日之间的遗漏日期,并将其对应的值为1。

我引用了Post1Post2之类的一些帖子,但我无法以此为基础解决问题。我是R语言的初学者。

我正在寻找这样的数据

 Date                    Val
 January 01, 2008        1
 January 02, 2008        1
 January 03, 2008        1
 January 04, 2008        1
 January 05, 2008       26
 ........

2 个答案:

答案 0 :(得分:1)

使用+----+------------+-------+ | id | name | title | +----+------------+-------+ | 15 | Sydney | City | | 14 | London | City | | 13 | Helsinki | City | | 10 | Tesla | Car | | 9 | Ferrari | Car | | 8 | Mitsubishi | Car | | 5 | Pear | Fruit | | 4 | Watermelon | Fruit | | 3 | Apple | Fruit | +----+------------+-------+

tidyr::complete

数据

library(dplyr)

df %>%
  mutate(Date = as.Date(Date, "%B %d, %Y")) %>%
  tidyr::complete(Date = seq(as.Date('2008-01-01'), as.Date('2020-03-31'), 
                           by = 'day'), fill = list(Val = 1)) %>%
  mutate(Date = format(Date, "%B %d, %Y"))


# A tibble: 4,475 x 2
#   Date               Val
#   <chr>            <dbl>
# 1 January 01, 2008     1
# 2 January 02, 2008     1
# 3 January 03, 2008     1
# 4 January 04, 2008     1
# 5 January 05, 2008    26
# 6 January 06, 2008     1
# 7 January 07, 2008     1
# 8 January 08, 2008     1
# 9 January 09, 2008     1
#10 January 10, 2008     1
# … with 4,465 more rows

答案 1 :(得分:0)

我们可以创建具有所需日期范围的数据框,然后将其加入数据框,并将所有NAs替换为1:

library(tidyverse)
days_seq %>% 
  left_join(df) %>% 
  mutate(Val = if_else(is.na(Val), as.integer(1), Val))

Joining, by = "Date"
# A tibble: 4,474 x 2
   Date         Val
   <date>     <int>
 1 2008-01-01     1
 2 2008-01-02     1
 3 2008-01-03     1
 4 2008-01-04     1
 5 2008-01-05    33
 6 2008-01-06     1
 7 2008-01-07     1
 8 2008-01-08     1
 9 2008-01-09     1
10 2008-01-10     1
# ... with 4,464 more rows

数据

days_seq <- tibble(Date = seq(as.Date("2008/01/01"), as.Date("2020/03/31"), "days"))

df <- tibble::tribble(
                   ~Date, ~Val,
        "2012/09/16",  32L,
        "2012/09/19",  33L,
        "2008/01/05",  33L
        ) 
df$Date <- as.Date(df$Date)