使用dplyr group_by填充缺失的分类值

时间:2017-10-05 13:49:50

标签: r group-by dplyr missing-data

我的数据框不完整,我想填充缺少的值以匹配该组。

incomplete_table <- 
    tibble(id = c(1,1,2,2,3,3,3),
       value = c("a",NA,"b","b","c","d", NA))

# # A tibble: 7 x 2
#      id value
#   <dbl> <chr>
# 1     1     a
# 2     1  <NA>
# 3     2     b
# 4     2     b
# 5     3     c
# 6     3     d
# 7     3  <NA>

使用数值我可以使用这样的东西:

complete_table <- incomplete_table %>% 
    group_by(id) %>% 
    mutate(value = max(value))

如何使用dplyr以类似的方式填充分类值? 这是我想要的结果:

# # A tibble: 7 x 2
#      id value
#   <dbl> <chr>
# 1     1     a
# 2     1     a
# 3     2     b
# 4     2     b
# 5     3     c
# 6     3     d
# 7     3  <NA>

1 个答案:

答案 0 :(得分:2)

如果所有值相同(coalesce),则可以n_distinct == 1 列的唯一值NAincomplete_table %>% group_by(id) %>% mutate(value = coalesce(value, if (n_distinct(na.omit(value)) == 1) na.omit(value)[1] else NA_character_)) # A tibble: 7 x 2 # Groups: id [3] # id value # <dbl> <chr> #1 1 a #2 1 a #3 2 b #4 2 b #5 3 c #6 3 d #7 3 <NA> ,这将离开列原样:

<LinearLayout
    xmlns:android="http://schemas.android.com/apk/res/android"
    android:orientation="vertical"
    android:layout_width="match_parent"
    android:layout_height="match_parent"
    android:weightSum="3"
    android:background="@android:color/darker_gray">
...
</LinearLayout>