Question

我目前正在使用美国县的《纽约时报》冠状病毒数据集。

按日期设置格式，这样可以在任何给定日期仅输入病例数大于1的县。因此，对于日期（1/21），第一个县只有一个案例，只有一行。

例如：

     date         county       state    fips cases deaths
1   2020-01-21  Snohomish   Washington  53061   1   0     #Snomish data starts 1/21
2   2020-01-22  Snohomish   Washington  53061   1   0
3   2020-01-23  Snohomish   Washington  53061   1   0
4   2020-01-24  Cook        Illinois    17031   1   0     #Cook data starts 1/24
8   2020-01-25  Snohomish   Washington  53061   1   0
7   2020-01-25  Cook        Illinois    17031   1   0
6   2020-01-25  Orange      California  6059    1   0     #Orange data starts 1/25

......

如何填写每个县的缺失日期？

例如，在这里，我想输入前几天库克县和奥兰治县的数据，用0 0表示案件和死亡人数，但保留状态，漏洞和其他信息。我会手动完成，但是现在他有成千上万的县。

Answer 1

您可以使用complete添加缺少的日期，并使用fill保存state和其他列。

library(dplyr)
library(tidyr)

df %>%
  mutate(date = as.Date(date)) %>%
  complete(county, date, fill = list(cases = 0, deaths = 0)) %>%
  fill(everything(), .direction = "updown")


#  county    date       state       fips cases deaths
#   <fct>     <date>     <fct>      <int> <dbl>  <dbl>
# 1 Cook      2020-01-21 Illinois   17031     0      0
# 2 Cook      2020-01-22 Illinois   17031     0      0
# 3 Cook      2020-01-23 Illinois   17031     0      0
# 4 Cook      2020-01-24 Illinois   17031     1      0
# 5 Cook      2020-01-25 Illinois   17031     1      0
# 6 Orange    2020-01-21 California  6059     0      0
# 7 Orange    2020-01-22 California  6059     0      0
# 8 Orange    2020-01-23 California  6059     0      0
# 9 Orange    2020-01-24 California  6059     0      0
#10 Orange    2020-01-25 California  6059     1      0
#11 Snohomish 2020-01-21 Washington 53061     1      0
#12 Snohomish 2020-01-22 Washington 53061     1      0
#13 Snohomish 2020-01-23 Washington 53061     1      0
#14 Snohomish 2020-01-24 Washington 53061     0      0
#15 Snohomish 2020-01-25 Washington 53061     1      0

数据

df <- structure(list(date = structure(c(1L, 2L, 3L, 4L, 5L, 5L, 5L), 
.Label = c("2020-01-21", "2020-01-22", "2020-01-23", "2020-01-24", "2020-01-25"),
 class = "factor"),county = structure(c(3L, 3L, 3L, 1L, 3L, 1L, 2L), 
.Label = c("Cook","Orange", "Snohomish"), class = "factor"), 
state = structure(c(3L,3L, 3L, 2L, 3L, 2L, 1L),
 .Label = c("California", "Illinois","Washington"), class = "factor"), 
fips = c(53061L, 53061L, 53061L, 17031L, 53061L, 17031L, 6059L), 
cases = c(1L, 1L, 1L, 1L, 1L, 1L, 1L), deaths = c(0L, 0L, 0L, 0L, 0L, 0L, 0L
)), class = "data.frame", row.names = c(NA, -7L))

添加R中过去日期的缺失行（例如，美国县的NYT冠状病毒病例数据集）

1 个答案: