排除整齐数据集中具有NA的组

时间:2018-08-27 11:17:09

标签: r dplyr tibble

我有一个整洁的tibble,其中一个值列由4个ID列标识。

 > MWA
# A tibble: 16 x 5
# Groups:   Dir [2]
      VP   Con   Dir   Seg time_seg
   <int> <int> <int> <int>    <int>
 1    10     2     1     1     1810
 2    10     2     1     2      260
 3    10     2     1     3      540
 4    10     2     1     4     1470
 5    10     2     1     5      460
 6    10     2     1     6      690
 7    10     2     1     7      760
 8    10     2     1     8       NA
 9    10     2     2     1      320
10    10     2     2     2     1110
11    10     2     2     3      450
12    10     2     2     4      600
13    10     2     2     5     1680
14    10     2     2     6      730
15    10     2     2     7      850
16    10     2     2     8      840

要复制的dput

> dput(MWA)
structure(list(VP = c(10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 
10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L), Con = c(2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), Dir = c(1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), 
    Seg = c(1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 1L, 2L, 3L, 4L, 5L, 
    6L, 7L, 8L), time_seg = c(1810L, 260L, 540L, 1470L, 460L, 
    690L, 760L, NA, 320L, 1110L, 450L, 600L, 1680L, 730L, 850L, 
    840L)), row.names = c(NA, -16L), class = c("grouped_df", 
"tbl_df", "tbl", "data.frame"), vars = "Dir", drop = TRUE, indices = list(
    0:7, 8:15), group_sizes = c(8L, 8L), biggest_group_size = 8L, labels = structure(list(
    Dir = 1:2), row.names = c(NA, -2L), class = "data.frame", vars = "Dir", drop = TRUE))

它们来自更大的数据集,其中已按VPCon和最后Dir进行分组。

如您所见,在第10小标题行中有一个NA

现在我要排除整个Dir组(因此第1行到第8行),因为这种情况是使用dplyr会丢失该值。

filteris.nacomplete.cases一起使用只会删除带有NA的行,而不是完整的组(在此数据集中是一个“个案”)。

3 个答案:

答案 0 :(得分:2)

您可以先检查特定列中是否缺少任何值,然后排除整个组。

library(dplyr)

MWA %>% 
  group_by(VP, Con, Dir) %>% 
  mutate(any_na = any(is.na(time_seg))) %>% 
  filter(!any_na)

# A tibble: 8 x 6
# Groups:   VP, Con, Dir [1]
#     VP   Con   Dir   Seg time_seg any_na
#   <int> <int> <int> <int>    <int> <lgl> 
# 1    10     2     2     1      320 FALSE 
# 2    10     2     2     2     1110 FALSE 
# 3    10     2     2     3      450 FALSE 
# 4    10     2     2     4      600 FALSE 
# 5    10     2     2     5     1680 FALSE 
# 6    10     2     2     6      730 FALSE 
# 7    10     2     2     7      850 FALSE 
# 8    10     2     2     8      840 FALSE 

答案 1 :(得分:2)

anyNA中有base R

library(dplyr)
MWA %>%
    group_by(Dir) %>%
    filter(!anyNA(time_seg))
# A tibble: 8 x 5
# Groups:   Dir [1]
#     VP   Con   Dir   Seg time_seg
#  <int> <int> <int> <int>    <int>
#1    10     2     2     1      320
#2    10     2     2     2     1110
#3    10     2     2     3      450
#4    10     2     2     4      600
#5    10     2     2     5     1680
#6    10     2     2     6      730
#7    10     2     2     7      850
#8    10     2     2     8      840

答案 2 :(得分:1)

使用all()将评估整个组,因此您可以跳过mutate步骤。

MWA %>% 
  group_by(Dir) %>% 
  filter(all(!is.na(time_seg)))

# A tibble: 8 x 5
# Groups:   Dir [1]
     VP   Con   Dir   Seg time_seg
  <int> <int> <int> <int>    <int>
1    10     2     2     1      320
2    10     2     2     2     1110
3    10     2     2     3      450
4    10     2     2     4      600
5    10     2     2     5     1680
6    10     2     2     6      730
7    10     2     2     7      850
8    10     2     2     8      840