我似乎无法解决这个问题,并希望有人可以帮助我。
我有一个数据集,其中有一堆受试者已经过几天的测试。然而,有些科目在某些日子进行了测试而不是其他科目。我想知道是否有办法为受试者插入缺失的日子,当他们没有经过测试时,只需要阅读" NA"对于感兴趣的变量?这样,每个主题都会显示日期。
这是一个示例数据集,我有4个测试日,1/1/2016 - 2016年1月4日。您可以看到某些主题在该时间段内缺少日期。
Subject <- c("Pat", "Pat", "Pat", "Pat", "Bob", "Bob", "Bob", "Bob", "Jeff", "Jeff", "Tom", "Tom", "Tom", "Tom", "Art", "Art", "Art", "Karl", "Karl", "Hal", "Hal", "Hal", "Hal")
variable.1 <- rnorm(n = Subject, mean = 10, sd = 5)
variable.2 <- rnorm(n = Subject, mean = 20, sd = 5)
Date <- c("1/1/2016", "1/2/2016", "1/3/2016", "1/4/2016","1/1/2016", "1/2/2016", "1/3/2016", "1/4/2016", "1/1/2016", "1/3/2016", "1/1/2016", "1/2/2016", "1/3/2016", "1/4/2016", "1/2/2016", "1/2/2016", "1/3/2016", "1/2/2016", "1/4/2016", "1/1/2016", "1/2/2016", "1/3/2016", "1/4/2016" )
d <- data.frame(Subject, Date, variable.1, variable.2)
d$Date <- as.Date(d$Date, "%m/%d/%Y")
d
Subject Date variable.1 variable.2
1 Pat 2016-01-01 8.341378 22.51838
2 Pat 2016-01-02 13.654822 19.50904
3 Pat 2016-01-03 14.078425 28.36888
4 Pat 2016-01-04 10.023648 24.18750
5 Bob 2016-01-01 11.409657 30.06393
6 Bob 2016-01-02 6.169438 21.85819
7 Bob 2016-01-03 12.388085 14.60456
8 Bob 2016-01-04 15.311546 20.31606
9 Jeff 2016-01-01 16.502111 30.14965
10 Jeff 2016-01-03 9.941720 22.56740
11 Tom 2016-01-01 9.594301 24.72596
12 Tom 2016-01-02 17.798279 14.81699
13 Tom 2016-01-03 6.097222 24.92846
14 Tom 2016-01-04 8.434669 20.47638
15 Art 2016-01-02 1.687036 37.17307
16 Art 2016-01-02 5.855712 19.91173
17 Art 2016-01-03 8.295704 18.69689
18 Karl 2016-01-02 4.747927 21.72881
19 Karl 2016-01-04 0.676263 27.17804
20 Hal 2016-01-01 7.685603 23.51874
21 Hal 2016-01-02 16.965498 15.08288
22 Hal 2016-01-03 7.018053 20.09474
23 Hal 2016-01-04 11.111013 22.21986
&#13;
主题日期变量.1变量 1 Pat 2016-01-01 8.341378 22.51838 2 Pat 2016-01-02 13.654822 19.50904 3 Pat 2016-01-03 14.078425 28.36888 4 Pat 2016-01-04 10.023648 24.18750 5 Bob 2016-01-01 11.409657 30.06393 6 Bob 2016-01-02 6.169438 21.85819 7 Bob 2016-01-03 12.388085 14.60456 8 Bob 2016-01-04 15.311546 20.31606 9 Jeff 2016-01-01 16.502111 30.14965 10 Jeff 2016-01-03 9.941720 22.56740 11 Tom 2016-01-01 9.594301 24.72596 12 Tom 2016-01-02 17.798279 14.81699 13 Tom 2016-01-03 6.097222 24.92846 14 Tom 2016-01-04 8.434669 20.47638 15 Art 2016-01-02 1.687036 37.17307 16 Art 2016-01-02 5.855712 19.91173 17 Art 2016-01-03 8.295704 18.69689 18卡尔2016-01-02 4.747927 21.72881 19 Karl 2016-01-04 0.676263 27.17804 20 Hal 2016-01-01 7.685603 23.51874 21 Hal 2016-01-02 16.965498 15.08288 22 Hal 2016-01-03 7.018053 20.09474 23 Hal 2016-01-04 11.111013 22.21986
答案 0 :(得分:1)
我们可以使用expand.grid
为“主题”和“日期”创建完整组合,然后使用原始数据集执行left_join
。
library(dplyr)
expand.grid(Subject = unique(d$Subject), Date = unique(d$Date)) %>%
left_join(., d) %>%
arrange(Subject, Date)
# Subject Date variable.1 variable.2
#1 Art 2016-01-01 NA NA
#2 Art 2016-01-02 9.65145589 28.44836
#3 Art 2016-01-02 12.58161500 16.06862
#4 Art 2016-01-03 0.02990953 19.62926
#5 Art 2016-01-04 NA NA
#6 Bob 2016-01-01 7.82691227 19.08990
#7 Bob 2016-01-02 8.88546512 27.16044
#8 Bob 2016-01-03 12.26231157 19.81463
#9 Bob 2016-01-04 12.60452244 20.30380
#10 Hal 2016-01-01 2.66644221 17.86939
#11 Hal 2016-01-02 11.45246295 23.04896
#12 Hal 2016-01-03 4.94271258 22.06501
#13 Hal 2016-01-04 0.92676435 11.43378
#14 Jeff 2016-01-01 9.19183973 22.99084
#15 Jeff 2016-01-02 NA NA
#16 Jeff 2016-01-03 12.56990234 18.69434
#17 Jeff 2016-01-04 NA NA
#18 Karl 2016-01-01 NA NA
#19 Karl 2016-01-02 9.80615533 14.65699
#20 Karl 2016-01-03 NA NA
#21 Karl 2016-01-04 11.04105033 16.88379
#22 Pat 2016-01-01 5.50443769 14.81744
#23 Pat 2016-01-02 15.96919707 15.67234
#24 Pat 2016-01-03 5.52737822 15.48899
#25 Pat 2016-01-04 5.70531242 25.04813
#26 Tom 2016-01-01 0.09573680 32.44053
#27 Tom 2016-01-02 14.82955222 21.76676
#28 Tom 2016-01-03 13.17820753 11.44786
#29 Tom 2016-01-04 15.23101038 26.10275
如果我们使用tidyr
,则会变得更紧凑
library(tidyr)
complete(d, Subject, Date)