Question

我有以下向量：

col1<-c("one", NA,"three",NA,"four","five")
col2<-c("fish", "cat","dog",NA,"deer","fox")
(df<-as.data.frame(cbind(col1,col2), stringsAsFactors = F))
   col1 col2
1   one fish
2  <NA>  cat
3 three  dog
4  <NA> <NA>
5  four deer
6  five  fox

我想删除第一行之后具有所有NA的所有行（以及NA行本身）。我的预期结果：

   col1 col2
1   one fish
2  <NA>  cat
3 three  dog

Answer 1

这是一个基数R选项，它找到具有一个或多个NA值的行的所有索引。然后，它找到倒数第二个索引（例如索引），并将原始数据帧作为子集，以包括直到但不包括倒数第二个NA索引的所有行。

na_index <- which(rowSums(is.na(df)) > 0)                # rows with one or more NA
keep_index <- min(na_index[na_index != min(na_index)])   # second to last NA index
df[1:(keep_index-1), ]                                   # subset data frame

   col1 col2
1   one fish
2  <NA>  cat
3 three  dog

Answer 2

带有rowSums和cumsum的选项。

df[cumsum(rowSums(is.na(df)) == ncol(df)) == 0, ]

#   col1 col2
#1   one fish
#2  <NA>  cat
#3 three  dog

要了解这一单线，我们可以逐步将其分解

rowSums(is.na(df))
#[1] 0 1 0 2 0 0

rowSums(is.na(df)) == ncol(df)
#[1] FALSE FALSE FALSE  TRUE FALSE FALSE

cumsum(rowSums(is.na(df)) == ncol(df))
#[1] 0 0 0 1 1 1

现在仅过滤那些带有0的行。

或另一个带有which.max的替代方案，它将返回第一个TRUE值的索引

df[1:(which.max(rowSums(is.na(df)) == ncol(df)) - 1), ]

#   col1 col2
#1   one fish
#2  <NA>  cat
#3 three  dog

Answer 3

基本解决方案可能是：

df[1:nrow(df) < min(which(rowSums(is.na(df[, 1:length(df)])) == length(df))), ]

   col1 col2
1   one fish
2  <NA>  cat
3 three  dog

首先，它标识最小行号，其中缺失值的数量等于变量的数量。然后，它通过仅保留给定条件下行号以下的行来对数据进行子集化。

或与dplyr相同：

df %>%
 filter(row_number() < min(which(rowSums(is.na(.[, 1:length(.)])) == length(.))))

删除第一行之后具有所有NA的所有行

3 个答案: