在条件下选择并删除组内的NA

时间:2017-04-16 00:46:55

标签: r dplyr

以下数据框:

   id participate grade year
1   1          NA     4 1982
2   1           1     4 1982
3   1           4     4 1982
4   4          NA    NA 1987
5   5          NA    NA 1986
6   5          NA     1 1986
7   5          NA     1 1986
8   7          NA     2 1984
9   7           4     2 1984
10  7           1     2 1984
11  9          NA     1 1987
12  9           1     1 1987
13 10          NA    NA 1984
14 10          NA     2 1984
15 10           4     2 1984
16 11          NA     4 1985
17 11           1     4 1985
18 13          NA     3 1985
19 13           1     3 1985

我的目标是识别并删除每组(id)"参与" is.na,但只有"参与"填充在该组中的其他行。

这意味着在这种情况下:删除第1行,id = 1。 对于id = 4,我不会删除,因为组内没有更多信息。对于id = 5也是如此。 应删除第8,11,13,14行等

这是所需的输出。

      id participate grade  year
1      1           1     4  1982
2      1           4     4  1982
3      4          NA    NA  1987
4      5          NA    NA  1986
5      5          NA     1  1986
6      5          NA     1  1986
7      7           4     2  1984
8      7           1     2  1984
9      9           1     1  1987
10    10           4     2  1984
11    11           1     4  1985
12    13           1     3  1985

1 个答案:

答案 0 :(得分:1)

# Load package
library(tidyverse)

# Create example dataset
dat <- data_frame(id = c(1L, 1L, 1L, 4L, 5L,
                         5L, 5L, 7L, 7L, 7L,
                         9L, 9L, 10L, 10L, 10L,
                         11L, 11L, 13L, 13L),
                  participate = c(NA, 1L, 4L, NA, NA,
                                  NA, NA, NA, 4L, 1L,
                                  NA, 1L, NA, NA, 4L,
                                  NA, 1L, NA, 1L),
                  grade = c(4L, 4L, 4L, NA, NA,
                            1L, 1L, 2L, 2L, 2L,
                            1L, 1L, NA, 2L, 2L, 
                            4L, 4L, 3L, 3L),
                  year = c(1982, 1982, 1982, 1987, 1986,
                           1986, 1986, 1984, 1984, 1984,
                           1987, 1987, 1984, 1984, 1984,
                           1985, 1985, 1985, 1985))

# Filter the data
dat2 <- dat %>%
  group_by(id) %>%
  filter(!is.na(participate) | all(is.na(participate)))

# See the result
dat2

Source: local data frame [12 x 4]
Groups: id [8]

      id participate grade  year
   <int>       <int> <int> <dbl>
1      1           1     4  1982
2      1           4     4  1982
3      4          NA    NA  1987
4      5          NA    NA  1986
5      5          NA     1  1986
6      5          NA     1  1986
7      7           4     2  1984
8      7           1     2  1984
9      9           1     1  1987
10    10           4     2  1984
11    11           1     4  1985
12    13           1     3  1985