每个组都有left_join,并在缺少

时间:2018-02-21 14:17:01

标签: r dplyr

我想知道如何在每个组中执行一种left_join(在dplyr中根据我的理解这是不可能的),并且我将使用组的值替换组值的缺失值

这就是我的意思: 从以下开始:

  ref  group value
    B group1     3
    C group1     4
    D group1     3
    A group2     6
    C group2     5

我想为每个组添加缺失的字母(从A到F) 所以它会是这样的:

  ref  group   vol
    A NA        NA
    B group1     3
    C group1     4
    D group1     3
    E NA        NA
    F NA        NA
    A group2     6
    B NA        NA 
    C group2     5
    D NA        NA
    E NA        NA
    F NA        NA

然后替换(或同时)用它所属的组值替换组中的NA ..

  ref  group    vol
    A group1     NA
    B group1     3
    C group1     4
    D group1     3
    E group1     NA
    F group1     NA
    A group2     6
    B group2     NA 
    C group2     5
    D group2     NA
    E group2     NA
    F group2     NA

这是初始数据:

db <- structure(list(ref = c("B", "C", "D", "A", "C"), 
  group = c("group1", "group1", "group1", "group2", "group2"), 
  vol = c(3, 4, 3, 6, 5)), class = "data.frame", 
  .Names = c("ref", "group", "vol"), row.names = c(NA, -5L))

和推荐信:

vars_to_add <- structure(list(ref = c("A", "B", "C", "D", "E", "F")),
   class = "data.frame", .Names = "ref", row.names = c(NA, -6L))

我可以对每个组进行功能过滤,然后执行left_join然后替换该值,然后将每个组再次附加到一个数据框中,但也许有一种聪明的方法...... 我也可以在vars_to_add中定义组,但是对于扩展更多组来说这是不可行的......

感谢

1 个答案:

答案 0 :(得分:0)

我们可以使用complete

library(dplyr)
library(tidyr)
complete(db, ref = vars_to_add$ref, group) %>%  
                 arrange(group)
# A tibble: 12 x 3
#   ref   group  value
#   <chr> <chr>  <int>
# 1 A     group1    NA
# 2 B     group1     3
# 3 C     group1     4
# 4 D     group1     3
# 5 E     group1    NA
# 6 F     group1    NA
# 7 A     group2     6
# 8 B     group2    NA
# 9 C     group2     5
#10 D     group2    NA
#11 E     group2    NA
#12 F     group2    NA