Question

我有一个与dat类似的多个data.frames，其中A是数字，B是因子：

A,B
1,null
2,null
3,null

我想删除只包含“null”的所有列。我尝试了很多解决方案，包括：

dat[, !apply(dat == "null", 2, all)]

Error in `[.data.frame`(newdat, , !apply(dat == "null", 2, all)) : 
  undefined columns selected

dat %>% mutate_if(is.factor, as.null)

Error in mutate_impl(.data, dots) : 
  Column `B` is of unsupported type NULL

其他解决方案会产生类似错误（大多数情况下会出现“未定义列”错误）。我想这样做而不用名称或数字调用列。谢谢！

Answer 1

如果列中的值为null（小写），那么一个可能的解决方案可以是：

df[,colSums(df=="null")!=nrow(df)]

来自OP的数据：

dat[,apply(dat, 2, function(x)any(!as.character(x)=="null")), drop = FALSE]
# B
# 1 6.455973
# 2 6.455973
# 3 6.455973
# 4 6.455973
# 5 6.455973

数据

dat <- structure(list(A = structure(c(1L, 1L, 1L, 1L, 1L), .Label = "null", class = "factor"), B = c(6.45597297196173, 6.45597297196173, 6.45597297196173, 6.45597297196173, 6.45597297196173)), .Names = c("A", "B" ), row.names = c(NA, 5L), class = "data.frame")

Answer 2

以下是另一种dplyr解决方案：

dat <- structure(list(A = structure(c(1L, 1L, 1L, 1L, 1L), .Label = "null", class = "factor"), 
                      B = c(6.45597297196173, 6.45597297196173, 6.45597297196173, 6.45597297196173, 6.45597297196173)), 
                 .Names = c("A", "B" ), row.names = c(NA, 5L), class = "data.frame") 

library(dplyr)

dat %>%
  summarise_all(function(x) sum(x[!is.na(x)] == "null") == length(x[!is.na(x)])) %>% # check if number of nulls is equal to number of rows after removing NAs
  select_if(function(x) x == FALSE) %>%       # select columns that don't have only nulls
  names() -> vars_to_keep                     # keep column names

dat %>% select(vars_to_keep)                  # select columns captured above

#   B
# 1 6.455973
# 2 6.455973
# 3 6.455973
# 4 6.455973
# 5 6.455973

Answer 3

不那么优雅，但仍然可读：

> dat[dat=="null"]<-NA_character_
> dat <- dat[,colSums(is.na(dat))<nrow(dat)]
> dat
[1] 6.455973 6.455973 6.455973 6.455973 6.455973

从data.frame中删除包含“null”的列

3 个答案: