我有2008年至2020年的随机日期及其对应的值
Date Val
September 16, 2012 32
September 19, 2014 33
January 05, 2008 26
June 07, 2017 02
December 15, 2019 03
May 28, 2020 18
我想填写2008年1月1日至2020年3月31日之间的遗漏日期,并将其对应的值为1。
我引用了Post1,Post2之类的一些帖子,但我无法以此为基础解决问题。我是R语言的初学者。
我正在寻找这样的数据
Date Val
January 01, 2008 1
January 02, 2008 1
January 03, 2008 1
January 04, 2008 1
January 05, 2008 26
........
答案 0 :(得分:1)
使用+----+------------+-------+
| id | name | title |
+----+------------+-------+
| 15 | Sydney | City |
| 14 | London | City |
| 13 | Helsinki | City |
| 10 | Tesla | Car |
| 9 | Ferrari | Car |
| 8 | Mitsubishi | Car |
| 5 | Pear | Fruit |
| 4 | Watermelon | Fruit |
| 3 | Apple | Fruit |
+----+------------+-------+
:
tidyr::complete
数据
library(dplyr)
df %>%
mutate(Date = as.Date(Date, "%B %d, %Y")) %>%
tidyr::complete(Date = seq(as.Date('2008-01-01'), as.Date('2020-03-31'),
by = 'day'), fill = list(Val = 1)) %>%
mutate(Date = format(Date, "%B %d, %Y"))
# A tibble: 4,475 x 2
# Date Val
# <chr> <dbl>
# 1 January 01, 2008 1
# 2 January 02, 2008 1
# 3 January 03, 2008 1
# 4 January 04, 2008 1
# 5 January 05, 2008 26
# 6 January 06, 2008 1
# 7 January 07, 2008 1
# 8 January 08, 2008 1
# 9 January 09, 2008 1
#10 January 10, 2008 1
# … with 4,465 more rows
答案 1 :(得分:0)
我们可以创建具有所需日期范围的数据框,然后将其加入数据框,并将所有NAs
替换为1:
library(tidyverse)
days_seq %>%
left_join(df) %>%
mutate(Val = if_else(is.na(Val), as.integer(1), Val))
Joining, by = "Date"
# A tibble: 4,474 x 2
Date Val
<date> <int>
1 2008-01-01 1
2 2008-01-02 1
3 2008-01-03 1
4 2008-01-04 1
5 2008-01-05 33
6 2008-01-06 1
7 2008-01-07 1
8 2008-01-08 1
9 2008-01-09 1
10 2008-01-10 1
# ... with 4,464 more rows
数据
days_seq <- tibble(Date = seq(as.Date("2008/01/01"), as.Date("2020/03/31"), "days"))
df <- tibble::tribble(
~Date, ~Val,
"2012/09/16", 32L,
"2012/09/19", 33L,
"2008/01/05", 33L
)
df$Date <- as.Date(df$Date)