回填NAs以查找跨主题的缺失日期

时间:2016-07-16 03:23:27

标签: r date

我似乎无法解决这个问题,并希望有人可以帮助我。

我有一个数据集,其中有一堆受试者已经过几天的测试。然而,有些科目在某些日子进行了测试而不是其他科目。我想知道是否有办法为受试者插入缺失的日子,当他们没有经过测试时,只需要阅读" NA"对于感兴趣的变量?这样,每个主题都会显示日期。

这是一个示例数据集,我有4个测试日,1/1/2016 - 2016年1月4日。您可以看到某些主题在该时间段内缺少日期。

Subject <- c("Pat", "Pat", "Pat", "Pat", "Bob", "Bob", "Bob", "Bob", "Jeff", "Jeff", "Tom", "Tom", "Tom", "Tom", "Art", "Art", "Art", "Karl", "Karl", "Hal", "Hal", "Hal", "Hal")
variable.1 <- rnorm(n = Subject, mean = 10, sd = 5)
variable.2 <- rnorm(n = Subject, mean = 20, sd = 5)
Date <- c("1/1/2016", "1/2/2016", "1/3/2016", "1/4/2016","1/1/2016", "1/2/2016", "1/3/2016", "1/4/2016", "1/1/2016", "1/3/2016", "1/1/2016", "1/2/2016", "1/3/2016", "1/4/2016", "1/2/2016", "1/2/2016", "1/3/2016", "1/2/2016", "1/4/2016", "1/1/2016", "1/2/2016", "1/3/2016", "1/4/2016" )

d <- data.frame(Subject, Date, variable.1, variable.2)
d$Date <- as.Date(d$Date, "%m/%d/%Y")

d

&#13;
&#13;
   Subject       Date variable.1 variable.2
1      Pat 2016-01-01   8.341378   22.51838
2      Pat 2016-01-02  13.654822   19.50904
3      Pat 2016-01-03  14.078425   28.36888
4      Pat 2016-01-04  10.023648   24.18750
5      Bob 2016-01-01  11.409657   30.06393
6      Bob 2016-01-02   6.169438   21.85819
7      Bob 2016-01-03  12.388085   14.60456
8      Bob 2016-01-04  15.311546   20.31606
9     Jeff 2016-01-01  16.502111   30.14965
10    Jeff 2016-01-03   9.941720   22.56740
11     Tom 2016-01-01   9.594301   24.72596
12     Tom 2016-01-02  17.798279   14.81699
13     Tom 2016-01-03   6.097222   24.92846
14     Tom 2016-01-04   8.434669   20.47638
15     Art 2016-01-02   1.687036   37.17307
16     Art 2016-01-02   5.855712   19.91173
17     Art 2016-01-03   8.295704   18.69689
18    Karl 2016-01-02   4.747927   21.72881
19    Karl 2016-01-04   0.676263   27.17804
20     Hal 2016-01-01   7.685603   23.51874
21     Hal 2016-01-02  16.965498   15.08288
22     Hal 2016-01-03   7.018053   20.09474
23     Hal 2016-01-04  11.111013   22.21986
&#13;
&#13;
&#13;

主题日期变量.1变量 1 Pat 2016-01-01 8.341378 22.51838 2 Pat 2016-01-02 13.654822 19.50904 3 Pat 2016-01-03 14.078425 28.36888 4 Pat 2016-01-04 10.023648 24.18750 5 Bob 2016-01-01 11.409657 30.06393 6 Bob 2016-01-02 6.169438 21.85819 7 Bob 2016-01-03 12.388085 14.60456 8 Bob 2016-01-04 15.311546 20.31606 9 Jeff 2016-01-01 16.502111 30.14965 10 Jeff 2016-01-03 9.941720 22.56740 11 Tom 2016-01-01 9.594301 24.72596 12 Tom 2016-01-02 17.798279 14.81699 13 Tom 2016-01-03 6.097222 24.92846 14 Tom 2016-01-04 8.434669 20.47638 15 Art 2016-01-02 1.687036 37.17307 16 Art 2016-01-02 5.855712 19.91173 17 Art 2016-01-03 8.295704 18.69689 18卡尔2016-01-02 4.747927 21.72881 19 Karl 2016-01-04 0.676263 27.17804 20 Hal 2016-01-01 7.685603 23.51874 21 Hal 2016-01-02 16.965498 15.08288 22 Hal 2016-01-03 7.018053 20.09474 23 Hal 2016-01-04 11.111013 22.21986

1 个答案:

答案 0 :(得分:1)

我们可以使用expand.grid为“主题”和“日期”创建完整组合,然后使用原始数据集执行left_join

library(dplyr)
expand.grid(Subject = unique(d$Subject), Date = unique(d$Date)) %>%
                 left_join(., d) %>%
                 arrange(Subject, Date)
#   Subject       Date  variable.1 variable.2
#1      Art 2016-01-01          NA         NA
#2      Art 2016-01-02  9.65145589   28.44836
#3      Art 2016-01-02 12.58161500   16.06862
#4      Art 2016-01-03  0.02990953   19.62926
#5      Art 2016-01-04          NA         NA
#6      Bob 2016-01-01  7.82691227   19.08990
#7      Bob 2016-01-02  8.88546512   27.16044
#8      Bob 2016-01-03 12.26231157   19.81463
#9      Bob 2016-01-04 12.60452244   20.30380
#10     Hal 2016-01-01  2.66644221   17.86939
#11     Hal 2016-01-02 11.45246295   23.04896
#12     Hal 2016-01-03  4.94271258   22.06501
#13     Hal 2016-01-04  0.92676435   11.43378
#14    Jeff 2016-01-01  9.19183973   22.99084
#15    Jeff 2016-01-02          NA         NA
#16    Jeff 2016-01-03 12.56990234   18.69434
#17    Jeff 2016-01-04          NA         NA
#18    Karl 2016-01-01          NA         NA
#19    Karl 2016-01-02  9.80615533   14.65699
#20    Karl 2016-01-03          NA         NA
#21    Karl 2016-01-04 11.04105033   16.88379
#22     Pat 2016-01-01  5.50443769   14.81744
#23     Pat 2016-01-02 15.96919707   15.67234
#24     Pat 2016-01-03  5.52737822   15.48899
#25     Pat 2016-01-04  5.70531242   25.04813
#26     Tom 2016-01-01  0.09573680   32.44053
#27     Tom 2016-01-02 14.82955222   21.76676
#28     Tom 2016-01-03 13.17820753   11.44786
#29     Tom 2016-01-04 15.23101038   26.10275

如果我们使用tidyr,则会变得更紧凑

library(tidyr)
complete(d, Subject, Date)