从data.frame中删除包含“null”的列

时间:2018-04-05 18:12:04

标签: r dataframe dplyr

我有一个与dat类似的多个data.frames,其中A是数字,B是因子:

A,B
1,null
2,null
3,null

我想删除只包含“null”的所有列。我尝试了很多解决方案,包括:

dat[, !apply(dat == "null", 2, all)]

Error in `[.data.frame`(newdat, , !apply(dat == "null", 2, all)) : 
  undefined columns selected

dat %>% mutate_if(is.factor, as.null)

Error in mutate_impl(.data, dots) : 
  Column `B` is of unsupported type NULL

其他解决方案会产生类似错误(大多数情况下会出现“未定义列”错误)。我想这样做而不用名称或数字调用列。谢谢!

3 个答案:

答案 0 :(得分:0)

如果列中的值为null(小写),那么一个可能的解决方案可以是:

df[,colSums(df=="null")!=nrow(df)]

来自OP的数据:

dat[,apply(dat, 2, function(x)any(!as.character(x)=="null")), drop = FALSE]
# B
# 1 6.455973
# 2 6.455973
# 3 6.455973
# 4 6.455973
# 5 6.455973

数据

dat <- structure(list(A = structure(c(1L, 1L, 1L, 1L, 1L), .Label = "null", class = "factor"), 
        B = c(6.45597297196173, 6.45597297196173, 6.45597297196173, 6.45597297196173, 6.45597297196173)), 
        .Names = c("A", "B" ), row.names = c(NA, 5L), class = "data.frame") 

答案 1 :(得分:0)

以下是另一种dplyr解决方案:

dat <- structure(list(A = structure(c(1L, 1L, 1L, 1L, 1L), .Label = "null", class = "factor"), 
                      B = c(6.45597297196173, 6.45597297196173, 6.45597297196173, 6.45597297196173, 6.45597297196173)), 
                 .Names = c("A", "B" ), row.names = c(NA, 5L), class = "data.frame") 

library(dplyr)

dat %>%
  summarise_all(function(x) sum(x[!is.na(x)] == "null") == length(x[!is.na(x)])) %>% # check if number of nulls is equal to number of rows after removing NAs
  select_if(function(x) x == FALSE) %>%       # select columns that don't have only nulls
  names() -> vars_to_keep                     # keep column names

dat %>% select(vars_to_keep)                  # select columns captured above

#   B
# 1 6.455973
# 2 6.455973
# 3 6.455973
# 4 6.455973
# 5 6.455973

答案 2 :(得分:0)

不那么优雅,但仍然可读:

> dat[dat=="null"]<-NA_character_
> dat <- dat[,colSums(is.na(dat))<nrow(dat)]
> dat
[1] 6.455973 6.455973 6.455973 6.455973 6.455973