我有一个与dat类似的多个data.frames,其中A是数字,B是因子:
A,B
1,null
2,null
3,null
我想删除只包含“null”的所有列。我尝试了很多解决方案,包括:
dat[, !apply(dat == "null", 2, all)]
Error in `[.data.frame`(newdat, , !apply(dat == "null", 2, all)) :
undefined columns selected
dat %>% mutate_if(is.factor, as.null)
Error in mutate_impl(.data, dots) :
Column `B` is of unsupported type NULL
其他解决方案会产生类似错误(大多数情况下会出现“未定义列”错误)。我想这样做而不用名称或数字调用列。谢谢!
答案 0 :(得分:0)
如果列中的值为null
(小写),那么一个可能的解决方案可以是:
df[,colSums(df=="null")!=nrow(df)]
来自OP的数据:
dat[,apply(dat, 2, function(x)any(!as.character(x)=="null")), drop = FALSE]
# B
# 1 6.455973
# 2 6.455973
# 3 6.455973
# 4 6.455973
# 5 6.455973
数据强>
dat <- structure(list(A = structure(c(1L, 1L, 1L, 1L, 1L), .Label = "null", class = "factor"),
B = c(6.45597297196173, 6.45597297196173, 6.45597297196173, 6.45597297196173, 6.45597297196173)),
.Names = c("A", "B" ), row.names = c(NA, 5L), class = "data.frame")
答案 1 :(得分:0)
以下是另一种dplyr
解决方案:
dat <- structure(list(A = structure(c(1L, 1L, 1L, 1L, 1L), .Label = "null", class = "factor"),
B = c(6.45597297196173, 6.45597297196173, 6.45597297196173, 6.45597297196173, 6.45597297196173)),
.Names = c("A", "B" ), row.names = c(NA, 5L), class = "data.frame")
library(dplyr)
dat %>%
summarise_all(function(x) sum(x[!is.na(x)] == "null") == length(x[!is.na(x)])) %>% # check if number of nulls is equal to number of rows after removing NAs
select_if(function(x) x == FALSE) %>% # select columns that don't have only nulls
names() -> vars_to_keep # keep column names
dat %>% select(vars_to_keep) # select columns captured above
# B
# 1 6.455973
# 2 6.455973
# 3 6.455973
# 4 6.455973
# 5 6.455973
答案 2 :(得分:0)
不那么优雅,但仍然可读:
> dat[dat=="null"]<-NA_character_
> dat <- dat[,colSums(is.na(dat))<nrow(dat)]
> dat
[1] 6.455973 6.455973 6.455973 6.455973 6.455973