Question

我想产生的输出显示我的df按每行中NA的数量排序（如下面的df_rows_sorted_by_NAs列中所示），但保留原始行名/编号（df col）。组合类似于下面的第3列：

# df_rows_sorted_by_NAs    df                  desired_output
# Row   1 :  38            Row  442  :  37     Row  3112 :  38 
# Row   2 :  38            Row  3112 :  38     Row  3113 :  38
# Row   3 :  37            Row  3113 :  38     Row  442  :  37
# Row  18 :  30            Row  1128 :  30     Row  1128 :  30

我得到第一个输出：

# Sort df by num of NAs
df_rows_sorted_by_NAs <- df[order(rowSums(is.na(df)), decreasing = TRUE), drop = FALSE, ]

# View obs with >=30 NAs
for (row_name in row.names(df_rows_sorted_by_NAs)) {
  if (rowSums(is.na(df_rows_sorted_by_NAs[row_name,])) >= 30) {
    cat("Row ", row_name, ": ", 
        rowSums(is.na(df_rows_sorted_by_NAs[row_name,])), "\n")
  }
}

我得到第二个输出：

for (row_name in row.names(df)) {
  if (rowSums(is.na(df[row_name,])) >= 30) {
    cat("Row ", row_name, ": ", rowSums(is.na(df[row_name,])), "\n")
  }
}

我为drop = FALSE尝试了order，但得到了相同的结果。创建新df时如何保留行名的任何建议？

Answer 1

tidyverse软件包非常适合以下任务：

library(tidyverse)

示例数据框：

df <- tribble(
  ~Length, ~Width, ~Mass, ~Date,
  10.3, 3.1, 0.021, "2018-11-28",
  NA, 3.3, NA, "2018-11-29",
  10.5, NA, 0.025, "2018-11-30"
)

使用软件包dplyr，可以使用row_number()和rowSums创建ID列和“ NA数量”列。当然，如果您已经有一个行ID列，则可以从mutate中删除ID = row_number()：

df %>%
  mutate(ID = row_number(), noNAs = rowSums(is.na(.)))

...导致...

# A tibble: 3 x 6
  Length Width   Mass Date          ID noNAs
   <dbl> <dbl>  <dbl> <chr>      <int> <dbl>
1   10.3   3.1  0.021 2018-11-28     1     0
2   NA     3.3 NA     2018-11-29     2     2
3   10.5  NA    0.025 2018-11-30     3     1

...按ID和noNA添加select，按noNA排列（降序排列）：

df <- df %>%
  mutate(ID = row_number(), noNAs = rowSums(is.na(.)))%>%
  select(ID, noNAs) %>%
  arrange(desc(noNAs))

...导致...

# A tibble: 3 x 2
     ID noNAs
  <int> <dbl>
1     2     2
2     3     1
3     1     0

最后，如果您要过滤具有30个以上NA的行，则：

df %>% filter(noNAs > 30)

Answer 2

这似乎对我有用：

a <- c(1, 2, 3)
b<- c(1, NA, 3)
c <- c(NA, NA, 3)
d <- c(1, NA, NA)
e <- c(NA, 2, 3)
df <- data.frame(a, b, c, d, e)
df

df <- df[order(rowSums(is.na(df)), decreasing = TRUE),]
df

给予

  a  b  c  d  e
1 1  1 NA  1 NA
2 2 NA NA NA  2
3 3  3  3 NA  3

然后

  a  b  c  d  e
2 2 NA NA NA  2
1 1  1 NA  1 NA
3 3  3  3 NA  3

然后 df [rowSums（is.na（df））> 1，]

  a  b  c  d  e
2 2 NA NA NA  2
1 1  1 NA  1 NA

实际问题是您如何在前面放置“行：”？

paste0("Row ", row.names( df[rowSums(is.na(df)) >1,]), ": ",
               rowSums(is.na(df)))

为您提供带有字符串的向量，您可以垂直打印，但这与完成排序是一个不同的问题。

r使用order（）保留行名

2 个答案: