将数据添加到data.frame以完成序列

时间:2017-07-11 19:03:01

标签: r data.table dplyr

根据下面的数据,第一位看起来像

head(dat, 9)
      IndID BinID Freq
1 BHS_034_A     7   20
2 BHS_034_A     8   27
3 BHS_034_A     9   67
4 BHS_034_A    10  212
5 BHS_037_A     5    1
6 BHS_037_A     7   12
7 BHS_037_A     8   65
8 BHS_037_A     9  122
9 BHS_037_A    10  301

我想填写BinID的缺失数字,以便所有个人(IndID)都有1到10个BinID序列。Freq值应为0添加了BinID的新值。

我希望容纳很多人,但这里只包含了一些。

这个问题与another post类似,但在这里我也试图将0添加到填充的列中。

数据:

dat <- structure(list(IndID = c("BHS_034_A", "BHS_034_A", "BHS_034_A", 
"BHS_034_A", "BHS_037_A", "BHS_037_A", "BHS_037_A", "BHS_037_A", 
"BHS_037_A", "BHS_068_A", "BHS_068_A", "BHS_068_A", "BHS_068_A", 
"BHS_068_A", "BHS_068_A", "BHS_068_A", "BHS_070_A", "BHS_070_A", 
"BHS_070_A", "BHS_071_A", "BHS_071_A", "BHS_071_A", "BHS_071_A", 
"BHS_071_A", "BHS_071_A", "BHS_071_A", "BHS_071_A", "BHS_071_A"
), BinID = c(7L, 8L, 9L, 10L, 5L, 7L, 8L, 9L, 10L, 3L, 4L, 5L, 
7L, 8L, 9L, 10L, 8L, 9L, 10L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 
10L), Freq = c(20L, 27L, 67L, 212L, 1L, 12L, 65L, 122L, 301L, 
2L, 1L, 1L, 4L, 14L, 104L, 454L, 7L, 90L, 470L, 6L, 11L, 11L, 
7L, 18L, 19L, 15L, 31L, 344L)), .Names = c("IndID", "BinID", 
"Freq"), row.names = c(NA, 28L), class = "data.frame")

2 个答案:

答案 0 :(得分:2)

tidyr提供了完整的功能,可让您在数据集中找到缺少的组合:

tidyr::complete(dat, IndID, BinID = 1:10)

答案 1 :(得分:1)

使用:

library(data.table)
setDT(dat)[CJ(BinID = 1:10, IndID = IndID, unique = TRUE), on = .(IndID, BinID)]

或者:

library(dplyr)
library(tidyr)
dat %>% 
  group_by(IndID) %>% 
  expand(BinID = 1:10) %>% 
  left_join(., dat)

给出:

        IndID BinID Freq
 1: BHS_034_A     1   NA
 2: BHS_037_A     1   NA
 3: BHS_068_A     1   NA
 4: BHS_070_A     1   NA
 5: BHS_071_A     1   NA
 6: BHS_034_A     2   NA
 7: BHS_037_A     2   NA
 8: BHS_068_A     2   NA
 9: BHS_070_A     2   NA
10: BHS_071_A     2    6
11: BHS_034_A     3   NA
12: BHS_037_A     3   NA
13: BHS_068_A     3    2
14: BHS_070_A     3   NA
15: BHS_071_A     3   11
16: BHS_034_A     4   NA
17: BHS_037_A     4   NA
18: BHS_068_A     4    1
19: BHS_070_A     4   NA
20: BHS_071_A     4   11
21: BHS_034_A     5   NA
22: BHS_037_A     5    1
23: BHS_068_A     5    1
24: BHS_070_A     5   NA
25: BHS_071_A     5    7
26: BHS_034_A     6   NA
27: BHS_037_A     6   NA
28: BHS_068_A     6   NA
29: BHS_070_A     6   NA
30: BHS_071_A     6   18
31: BHS_034_A     7   20
32: BHS_037_A     7   12
33: BHS_068_A     7    4
34: BHS_070_A     7   NA
35: BHS_071_A     7   19
36: BHS_034_A     8   27
37: BHS_037_A     8   65
38: BHS_068_A     8   14
39: BHS_070_A     8    7
40: BHS_071_A     8   15
41: BHS_034_A     9   67
42: BHS_037_A     9  122
43: BHS_068_A     9  104
44: BHS_070_A     9   90
45: BHS_071_A     9   31
46: BHS_034_A    10  212
47: BHS_037_A    10  301
48: BHS_068_A    10  454
49: BHS_070_A    10  470
50: BHS_071_A    10  344